Modified Naive Bayes with Hurst Exponent as Quantitative Measure of Data Mutual Dependence

Gleb Turkanov
Russia, Russian Academy of Sciences, Moscow Institute of Physics and Technology
The so-called naive Bayes classifier is based on the assumption of independency of characteristics (features) in question. That is, estimated probability of an event with a given set of features, according to Bayes, is based on product conditional probabilities of the event relative to the features in question.
A practical use of the Bayes estimate is mostly determined not by its precision, but by its covariance of the observed probability, meaning that the higher the observed event probability is, the higher the estimate is.
An essential condition of covariance property of the Bayes estimate to be implemented is the constancy of the number of features that define it.
For example, if we consider а vital task of CTR prediction – a click-through rate of an online ad banner – based on the known CTR statistics for banners of a given company, a given type of goods, etc., almost all conditional probabilities of the Bayes product are low (lower than 1%). Therefore, a banner with many characteristics will receive a fortiori conservative estimate, which will cause the loss of its covariance.
The simplest thing to do in the case of variable characteristics is to move from product to geometrical mean of conditional probabilities.
To further enhance the classifier covariance in the case of variable characteristics it is necessary to analyze the nature of dependence of the Bayesian product on the number of multipliers.
If the total number of characteristics is big (thousands) while only a small part of them (tens) is used to classify each event, it is logical to expect that the Bayesian product log is showing an asymptotically linear growth together with the growth of the number of co-multipliers.
This work presents a way of modification of the Bayes classifier based on the next term of the asymptotic decomposition of the Bayes product log and retrieving Hurst exponent.
Hurst exponent appears as a result of the data self-similarity and is a quantitative measure of its mutual dependence.
The experimental results have proved the assumptions underlying the research that the additional information brings a positive contribution to prediction in the form of fractal dimension.