## Distributed Coordinate Descent for Regularized Logistic Regression

**Ilya Trofimov**

*Russia, Yandex Data Factory*

Logistic regression with regularization is the method of choice for solving classification and class probability estimation problems in text classification, clickstream data analysis and web data mining. Despite the fact that logistic regression can build only linear separating surfaces, the testing accuracy of it, with proper regularization, is often good for high dimensional input spaces. For several problems the testing accuracy has shown to be close to that of nonlinear classifiers such as kernel methods. At the same time, training and testing of linear classifiers is much faster. It makes the logistic regression a good choice for large-scale problems. Choosing the right regularizer is problem dependent. L2-regularization is known

to shirk coefficients towards zero leaving correlated ones in a model. L1-regularization leads to a sparse solution and typically selects only one coefficient from a group of correlated ones. Elastic net regularizer is a linear combination of L1 and L2. It allows selecting a trade-off between them. Other regularizers are less often used: group lasso and non-convex SCAD regularizer.

to shirk coefficients towards zero leaving correlated ones in a model. L1-regularization leads to a sparse solution and typically selects only one coefficient from a group of correlated ones. Elastic net regularizer is a linear combination of L1 and L2. It allows selecting a trade-off between them. Other regularizers are less often used: group lasso and non-convex SCAD regularizer.

Nowadays we see a growing number of problems where both the number of examples and the number of features are very large. Many problems grow beyond the capabilities of a single computer and need to be handled by distributed systems. Distributed machine learning is now an area of active research.

We propose a new architecture for fitting logistic regression with regularizers in the distributed settings. Inside this architecture we implement a new parallel coordinate descent algorithm for L1 and L2 regularized logistic regression and guarantee its convergence. We show how our algorithm can be modified to solve the slow node problem which is common in distributed machine learning. We empirically show the effectiveness of our algorithm and its implementation in comparison with several state-of-the-art methods.