Empirical Inference Technical Report 2007

Sparse Multiscale Gaussian Process Regression

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, such as the support vector machine. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Our approach outperforms the other methods, particularly for the case of very few basis functions, i.e. a very high sparsity ratio.

Author(s): Walder, C. and Kim, KI. and Schölkopf, B.
Number (issue): 162
Year: 2007
Month: August
Day: 0
Bibtex Type: Technical Report (techreport)
Digital: 0
Electronic Archiving: grant_archive
Institution: Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@techreport{5102,
  title = {Sparse Multiscale Gaussian Process Regression},
  abstract = {Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their
  computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs
  fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian
  basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis
  functions and any given criteria, this additional flexibility permits approximations no worse and typically better
  than was previously possible. Although we focus on g.p. regression, the central idea is applicable to all kernel
  based algorithms, such as the support vector machine. We perform gradient based optimisation of the marginal
  likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various
  other sparse g.p. methods. Our approach outperforms the other methods, particularly for the case of very few basis
  functions, i.e. a very high sparsity ratio.},
  number = {162},
  organization = {Max-Planck-Gesellschaft},
  institution = {Max Planck Institute for Biological Cybernetics, Tübingen, Germany},
  school = {Biologische Kybernetik},
  month = aug,
  year = {2007},
  slug = {5102},
  author = {Walder, C. and Kim, KI. and Sch{\"o}lkopf, B.},
  month_numeric = {8}
}