# Annotation

This course is devoted to the formalism of graphical models which has become the state-of-the-art approach for working with data that contain internal dependencies. Entities of such data include images, temporal signals, video, texts, web-data, social networks, etc. Graphical models generalize and expand the applicability of several important machine learning algorithms.

Together with the description of graphical models this course touches elements of combinatorial and continuous optimization, Bayesian statistical methods, structured support vector machine, Monte-Carlo methods. On the application side this course deals with computer vision, probabilistic logic, natural language processing, signal processing.

Successful course students will be able to build models taking into account interdependencies in data, learn parameters of such models with and without training data, perform predictions in case of missing data.

# Course program

## Theory, algorithms, models:

- Graphical models. Factor graphs. Directed models (Bayesian networks) and undirected models (Markov random fields). Observed and hidden variables.
- Inference problem. Inference criteria, loss functions, empirical risk minimization. Supervised and unsupervised learning problems.
- Inference as optimization. Coordinate descent (Iterated Conditional Modes). Inference via convex optimization. Gaussian models.
- Inference via message passing:
- Inference in chain model via the shortest path in a graph.
- Message passing in chain- and tree-structured graphs.
- Computing marginals. Message passing in semirings.
- Message passing in graphs with cycles.
- Message passing in factor graphs.

- Inference via graph cuts:
- Exact inference in binary submodular problems.
- Alpha-expansion.
- Nonsubmodular models (QPBO), fusion moves.

- Dual decomposition. Subgradient descent methods.
- Supervised learning, log-linear model, feature representation. Learning via minimization:
- Structured perceptron.
- Learning via emphirical risk minimization and structured support vector machines.
- Optimization via subgradient descent and quadratic programming, delayed constraint generation.

- Maximum likelihood supervised learning. Pseudo-likelihood, piecewise learning, deep supervised learning.
- Bayesian networks. Probabilistic interpretation of message passing. Notion of dependency between variables. The explaining away effect. Usage examples.
- Linear dynamic systems (LDS).
- Kalman filter.
- Maximum likelihood training of LDS.
- Extended Kalman filter.

- EM-algorithm. Examples.
- Unsupervised training of HMM.
- Unsupervised training of LDS.

- Monte-Carlo Markov chain methods. (Metropolis-Hastings and Gibbs sampling). Examples.
- Particle filter.
- Ising model.
- Simulated annealing.

- Variational inference. Examples.
- Variational linear regression.
- Variational Ising model.

- Bayesian regularization in machine learning. Examples.
- Relevance Vector Machine (RVM).
- Bayesian Principal Component Analysis (PCA).

# Applications:

- Computer vision / image processing:
- Stereo.
- Image segmentation.
- Human pose estimation.
- Image restoration.
- 3D reconstruction.
- Superresolution.
- Image stitching.

- Natural language processing:
- Part-Of-Speech tagging.
- Named entity recognition.

- Clustering, facility location problem.
- Placing labels on a map.
- Planning, route selection.
- Web-page classification.
- Tracking.
- Relevant feature selection.
- Choosing the number of principal components and components in a mixture.