Empirical Inference Conference Paper 2008

Sparse Multiscale Gaussian Process Regression

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i.e. a very high sparsity ratio.

Author(s): Walder, C. and Kim, KI. and Schölkopf, B.
Book Title: Proceedings of the 25th International Conference on Machine Learning
Pages: 1112-1119
Year: 2008
Month: July
Day: 0
Editors: WW Cohen and A McCallum and S Roweis
Publisher: ACM Press
Bibtex Type: Conference Paper (inproceedings)
Address: New York, NY, USA
DOI: 10.1145/1390156.1390296
Event Name: ICML 2008
Event Place: Helsinki, Finland
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{5121,
  title = {Sparse Multiscale Gaussian Process Regression},
  booktitle = {Proceedings of the 25th International Conference on Machine Learning},
  abstract = {Most existing sparse Gaussian process (g.p.)
  models seek computational advantages by
  basing their computations on a set of m basis
  functions that are the covariance function of
  the g.p. with one of its two inputs fixed. We
  generalise this for the case of Gaussian covariance
  function, by basing our computations on
  m Gaussian basis functions with arbitrary diagonal
  covariance matrices (or length scales).
  For a fixed number of basis functions and
  any given criteria, this additional flexibility
  permits approximations no worse and typically
  better than was previously possible.
  We perform gradient based optimisation of
  the marginal likelihood, which costs O(m2n)
  time where n is the number of data points,
  and compare the method to various other
  sparse g.p. methods. Although we focus on
  g.p. regression, the central idea is applicable
  to all kernel based algorithms, and we also
  provide some results for the support vector
  machine (s.v.m.) and kernel ridge regression
  (k.r.r.). Our approach outperforms the other
  methods, particularly for the case of very few
  basis functions, i.e. a very high sparsity ratio.},
  pages = {1112-1119},
  editors = {WW Cohen and A McCallum and S Roweis},
  publisher = {ACM Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  month = jul,
  year = {2008},
  slug = {5121},
  author = {Walder, C. and Kim, KI. and Sch{\"o}lkopf, B.},
  month_numeric = {7}
}