Evgeny Burnaev, Alexey Zaytsev, Vladimir Spokoiny

Critical Sample Size for Bayesian Inference in Gaussian Process Regression
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade and regression based on Gaussian processes (GP regression) is widely used in various applications.
However, at the present moment theoretical results about its properties are known only for very special cases. At the same time, a theoretical analysis of the GP regression properties for the case of high dimensional data and finite sample size, as well as the analysis of the GP regression behaviour for the case of possible model misspecification, are of vital importance. Such analysis forms grounds for justification of the often-used marginal likelihood maximisation de facto being a standard procedure in machine learning industry for estimation of GP parameters in practically important cases.
In the framework of the GP regression it is assumed that an unknown function is generated by Gaussian stochastic process. Therefore, it is very natural to perform a theoretical analysis of this regression method using Bayesian approach. The central result of the Bayesian statistics is the celebrated Bernsteinvon Mises Theorem (BvM) about the proximity of the posterior distribution of an unknown parameter vector, defining the GP regression model, to the corresponding normal distribution.
Classical asymptotic methods of statistics are not suited to analyze properties of the posterior distribution for the case of growing parameter dimension and finite data sample size. Therefore, new statistical approaches, based on an advanced theory of empirical processes, are necessary to perform the analysis.
In the current work we make a contribution by describing finite-sample properties of the posterior in potentially growing dimension for a rather complicated parametric model. In particular:
  1. Non-asymptotic bounds on the approximation accuracy in our theorem are exact and do not involve any small asymptotic terms.
  1. One can track the impact of important constants like sample size n and dimension p on the accuracy of Gaussian approximation of the posterior.
The bound p4 _ n is especially informative and useful. An explicit expression for the error bound on the posterior approximation can also be obtained from the proofs.
  1. The study admits that the parametric model can be misspecified.
  1. Parameters enter the considered GP model in a quite complicated manner via the covariance matrix of the data that makes the analysis very involved.
Obtained results allow getting constructive answers to all posed questions, including dependence on the sample size and the dimension of the parameter space. All these issues are not covered by the existing literature and are practically relevant for machine learning applications.
Plan of the talk:
  1. Gaussian Processes Regression:
– Main assumptions
– Covariance function modeling
– Predictive distribution construction
– Covariance function parameters estimation based on maximum quasilikelihood
  1. BvM Theorem
– Asymptotic BvM Theorem for Gaussian Processes Regression: review of known results
– New challenges: growning dimension of the parameter space, finite sample size and possibly misspecified parametric assumption
  1. Non-asymptotc BvM Theorem for Gaussian Processes Regression
– Used assumptions and their discussion
– Main Results
  1. Computational experiments
– Proximity of the posterior distribution to the corresponding normal distribution. Dependence on the sample size and the parameter space dimension.
– Critical sample size: when does BvM Theorem stop to work?