Empirische Inferenz Technical Report 2005

Measuring Statistical Dependence with Hilbert-Schmidt Norms

We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

Author(s): Gretton, A. and Bousquet, O. and Smola, AJ. and Schölkopf, B.
Number (issue): 140
Year: 2005
Month: June
Day: 0
Bibtex Type: Technical Report (techreport)
Digital: 0
Electronic Archiving: grant_archive
Institution: Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@techreport{3437,
  title = {Measuring Statistical Dependence with Hilbert-Schmidt Norms},
  abstract = {We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator  (we term this a Hilbert-Schmidt Independence Criterion, or HSIC).  This approach has several advantages, compared with previous kernel-based independence criteria.  First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates.
  Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.},
  number = {140},
  organization = {Max-Planck-Gesellschaft},
  institution = {Max Planck Institute for Biological Cybernetics, Tübingen, Germany},
  school = {Biologische Kybernetik},
  month = jun,
  year = {2005},
  slug = {3437},
  author = {Gretton, A. and Bousquet, O. and Smola, AJ. and Sch{\"o}lkopf, B.},
  month_numeric = {6}
}