Combinatorial Theory of Overfitting
Overfitting is one of the most challenging problems in Statistical Learning Theory. Classical approaches recommend restricting complexity of the search space of classifiers. Recent approaches benefit from a more refined analysis of a localized part of the search space. Combinatorial theory of overfitting is a new developing approach that gives tight data-dependent bounds on the probability of overfitting. It requires a binary loss function and uses a detailed representation of the search space in the form of a directed acyclic graph. The size of the graph is usually enormous, but the bounds can be effectively estimated by walking through its small localized part that contains the best classifiers. We consider exact combinatorial bounds for some nontrivial model sets of classifiers. Also, we apply combinatorial bounds on real data sets to build voting ensembles of low dimensional linear classifiers and conjunction rules.