MatrixNet Applications at Yandex

Michael Levin
Russia, Yandex Data Factory, Yandex
MatrixNet is a proprietary machine learning tool developed by Yandex and used widely throughout the company. It has different learning modes, such as ranking, regression, classification. The algorithm is based on gradient boosting over decision trees, but its implementation has a lot
of heuristics that give strong defense against overfitting. Heuristics and their parameters were optimized using an array of datasets of different nature, both internal and publicly accessible, so that the models created with those default parameters work well on all those datasets. MatrixNet was optimized for speed, allowed parallelized learning early on, often had superior results compared to other algorithms even with default parameters and didn’t require much parameter tuning, so since its introduction in 2009 it spread out very fast.
We start with some details of the algorithm itself, then discuss its applications in web search, ad click prediction, user segmentation, music recommendations, ad relevance, bot detection and other areas. We note some limitations of the algorithm and how they impact feature engineering process, then describe feature selection process through feature evaluation and finish with some conclusions drawn from six years of operational experience.