Efficient Elastic Net Regularization for Sparse Linear Models in the Multilabel Setting

Zachary Chase Lipton
ML scientist, Amazon, University of California, San Diego, USA
Sparse models are easily and quickly trained absent regularization. However, working with large sparse datasets simultaneously requires regularization and efficiency. Elastic net, which generalizes L1 and squared L2 regularization, is a popular choice of regularizer because it simultaneously confers model sparsity similar to L1 regularizers and the generally superior performance of squared L2 regularization. In this paper, we present an algorithm for efficient training of sparse linear models with elastic net regularization. Our algorithm applies stochastic gradient updates to non-zero features only, lazily bringing weights current as needed with closed-form updates. Although the closed-form delayed updates for the L1, L-infinity, and rarely used squared L2 regularizers have been described previously, the updates for elastic net with and without a decayed learning rate are novel.
The new elastic net method handles both fixed and varying learning rates, and both standard stochastic gradient descent (SGD) and forward backward splitting (FoBoS). Our regularization calculations utilize dynamic programming to perform each delayed update in constant time even when the learning rate varies. Our methods are especially advantageous in the multilabel setting because any computational overhead is shared across labels. We confirm the correctness of our algorithm with experiments
on rcv1 andbookmarks, the two large sparse publicly available multilabel datasets. On rcv1, we train elastic net logistic regression 489.7 times faster than a naive implementation, achieving the benefits of sparsity with negligible penalty for additional computation.