Optimization in Machine Learning

1. Basic unconstrained smooth optimization
1.1. Introduction. Optimality conditions for smooth optimization. Optimization oracles. Convergence rates of optimization processes.
1.2. Matrix calculus, matrix differentiation. Solving linear systems.
1.3. One-dimensional optimization.
1.4. Gradient descent. Linear convergence rate. Adaptive procedures for choosing step lengths.
1.5. Newton method. Superlinear convergence rate. Hessian corrections for non-convex optimization.
2. Advanced unconstrained smooth optimization.
2.1. Conjugate gradients for linear systems. Preconditioning.
2.2. Conjugate gradients for general functions.
2.3. Non-exact Hessian-free Newton methods.
2.4. Automatic differentiation.
2.5. Quasi-Newton methods. L-BFGS.
3. Constrained optimization.
3.1. Constrained optimization. KKT conditions.
3.2. Duality. Equivalent problem reformulations. Projected gradient method.
3.3. Primal Newton methods for constrained optimization.
3.4. Primal-Dual Newton methods for constrained optimization.
4. Special-purpose optimization.
4.1. Non-smooth optimization. Subgradient method, its convergence rate.
4.2. Sparse linear models. Proximal gradient method.
4.3. Subgradient calculus, proximal operator calculus.
4.4. Stochastic optimization.
4.5. Accelerated optimization methods.