Machine learning. Part 1.

  1. Introduction.
Problem setting. Objects and features. Binary, categorical, ordinal and real-valued features. Problem types: classification, regression, ranking, clustering, forecast. Case examples.
Basic concepts: model, learning algorithm, loss function and metric. Empirical risk minimization. Generalizability. Overfitting. Cross validation. Training and test.
  1. Metric algorithms
KNN and its generalizations. Selecting k. Problem of metric choise. Kernel methods.
Data structures for fast nearest neighbours search.
Curse of dimensionality.
(Non-para)metric algorithms for regression.
Outlier detection.
  1. Decision Trees and Random Forests
Logical patterns. Concepts of information: heuristic, statistical approaches, entropy. Pareto efficiency. Connection to classification problem. Feature binarization.
Decision lists and trees. Constructing a decision tree. Greedy strategy. Metrics: gini impurity, information gain. Handling missing values. How to counter overfitting: pruning, oblivious decision trees. Random Forest.
  1. Linear methods
Linear classification. Continuous approximations to threshold loss function. Margins.
Stochastic gradient descent (SGD). Strategies of weight initialization. Using mini-batches. Handling imbalanced classification problems. Variants of SGD: Momentum, Nesterov momentum, SAG, AdaGrad, AdaDelta, Adam. Second-order methods.
Regularization and its probabilistic interpretation.
Two-class and multiclass logistic regression.
Predicting probabilities: Platt scaling.
  1. SVM
Support vector machines.Dual problem setting. Kernel SVM. How to choose kernels. Learning regression with SVM. Regularization. Feature selection with LASSO SVM. Support Features Machine (SFM) and Relevance Features Machine (RFM). Relevance Vector Machine (RVM)
  1. Linear Regression and PCA
Least Squares. Linear Regression. Probabilistic and geometric interpretations.
Singular Value Decomposition (SVD).
How to counter overfitting. Regularization. Effective dimension, Principal Component Analysis (PCA).
  1. Nonlinear Regression and non-standard loss functions
Non-linear regression model. Computational aspects. Newton-Raphson and Newton-Gauss algorithms. Iteratively Reweighted Least Squares (IRLS). Generalized Additive Model. Backfitting. Generalized Linear Model (GLM). Logistic regression as a particular case of GLM. Quantile regression. Robust regression models.
  1. Model evaluations and feature selection
Classification metrics: accuracy, precision, recall, f1-score, AUC, log-loss. ROC-curve and precision-recall curve.
Model selection. External and internal metrics.
Feature selection: greedy algorithm, breadth-first search, genetic algorithms.
  1. Time Series prediction
Problem setting and case examples. Trend and seasonality. Autocorrelation (ACF) and partial autocorrelation (PACF). Time series models: exponential smoothing, ARMA and ARIMA, SARIMA, other models.
  1. Bayesian classification
Problem setting. Optimal bayesian classifier. Naive Bayes. Case of multidimensional normal distribution.
Density estimation. Parametric and non-parametric techniques. Mixture models.
Expectation-Maximization (EM) algorithm. Multidimensional gaussian mixtures,
  1. Associative rules
Problem setting. What is an associative rule. Mining associative rules: APriori and FP-growth algorithms.
  1. Recap