**Introduction.**

Problem setting. Objects and features. Binary, categorical, ordinal and real-valued features. Problem types: classification, regression, ranking, clustering, forecast. Case examples.

Basic concepts: model, learning algorithm, loss function and metric. Empirical risk minimization. Generalizability. Overfitting. Cross validation. Training and test.

Basic concepts: model, learning algorithm, loss function and metric. Empirical risk minimization. Generalizability. Overfitting. Cross validation. Training and test.

**Metric algorithms**

KNN and its generalizations. Selecting k. Problem of metric choise. Kernel methods.

Data structures for fast nearest neighbours search.

Curse of dimensionality.

(Non-para)metric algorithms for regression.

Outlier detection.

Data structures for fast nearest neighbours search.

Curse of dimensionality.

(Non-para)metric algorithms for regression.

Outlier detection.

**Decision Trees and Random Forests**

Logical patterns. Concepts of information: heuristic, statistical approaches, entropy. Pareto efficiency. Connection to classification problem. Feature binarization.

Decision lists and trees. Constructing a decision tree. Greedy strategy. Metrics: gini impurity, information gain. Handling missing values. How to counter overfitting: pruning, oblivious decision trees. Random Forest.

Decision lists and trees. Constructing a decision tree. Greedy strategy. Metrics: gini impurity, information gain. Handling missing values. How to counter overfitting: pruning, oblivious decision trees. Random Forest.

**Linear methods**

Linear classification. Continuous approximations to threshold loss function. Margins.

Stochastic gradient descent (SGD). Strategies of weight initialization. Using mini-batches. Handling imbalanced classification problems. Variants of SGD: Momentum, Nesterov momentum, SAG, AdaGrad, AdaDelta, Adam. Second-order methods.

Regularization and its probabilistic interpretation.

Two-class and multiclass logistic regression.

Predicting probabilities: Platt scaling.

Stochastic gradient descent (SGD). Strategies of weight initialization. Using mini-batches. Handling imbalanced classification problems. Variants of SGD: Momentum, Nesterov momentum, SAG, AdaGrad, AdaDelta, Adam. Second-order methods.

Regularization and its probabilistic interpretation.

Two-class and multiclass logistic regression.

Predicting probabilities: Platt scaling.

**SVM**

Support vector machines.Dual problem setting. Kernel SVM. How to choose kernels. Learning regression with SVM. Regularization. Feature selection with LASSO SVM. Support Features Machine (SFM) and Relevance Features Machine (RFM). Relevance Vector Machine (RVM)

**Linear Regression and PCA**

Least Squares. Linear Regression. Probabilistic and geometric interpretations.

Singular Value Decomposition (SVD).

How to counter overfitting. Regularization. Effective dimension, Principal Component Analysis (PCA).

Singular Value Decomposition (SVD).

How to counter overfitting. Regularization. Effective dimension, Principal Component Analysis (PCA).

**Nonlinear Regression and non-standard loss functions**

Non-linear regression model. Computational aspects. Newton-Raphson and Newton-Gauss algorithms. Iteratively Reweighted Least Squares (IRLS). Generalized Additive Model. Backfitting. Generalized Linear Model (GLM). Logistic regression as a particular case of GLM. Quantile regression. Robust regression models.

**Model evaluations and feature selection**

Classification metrics: accuracy, precision, recall, f1-score, AUC, log-loss. ROC-curve and precision-recall curve.

Model selection. External and internal metrics.

Feature selection: greedy algorithm, breadth-first search, genetic algorithms.

Model selection. External and internal metrics.

Feature selection: greedy algorithm, breadth-first search, genetic algorithms.

**Time Series prediction**

Problem setting and case examples. Trend and seasonality. Autocorrelation (ACF) and partial autocorrelation (PACF). Time series models: exponential smoothing, ARMA and ARIMA, SARIMA, other models.

**Bayesian classification**

Problem setting. Optimal bayesian classifier. Naive Bayes. Case of multidimensional normal distribution.

Density estimation. Parametric and non-parametric techniques. Mixture models.

Expectation-Maximization (EM) algorithm. Multidimensional gaussian mixtures,

Density estimation. Parametric and non-parametric techniques. Mixture models.

Expectation-Maximization (EM) algorithm. Multidimensional gaussian mixtures,

**Associative rules**

Problem setting. What is an associative rule. Mining associative rules: APriori and FP-growth algorithms.

**Recap**