Empirische Inferenz Conference Paper 2008

A Kernel Statistical Test of Independence

Whereas kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m^2), where m is the sample size. We demonstrate that this test outperforms established contingency table-based tests. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

Author(s): Gretton, A. and Fukumizu, K. and Teo, CH. and Song, L. and Schölkopf, B. and Smola, AJ.
Book Title: Advances in neural information processing systems 20
Journal: Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007
Pages: 585-592
Year: 2008
Month: September
Day: 0
Editors: JC Platt and D Koller and Y Singer and S Roweis
Publisher: Curran
Bibtex Type: Conference Paper (inproceedings)
Address: Red Hook, NY, USA
Event Name: 21st Annual Conference on Neural Information Processing Systems (NIPS 2007)
Event Place: Vancouver, BC, Canada
Digital: 0
Electronic Archiving: grant_archive
ISBN: 978-1-605-60352-0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4928,
  title = {A Kernel Statistical Test of Independence},
  journal = {Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007},
  booktitle = {Advances in neural information processing systems 20},
  abstract = {Whereas kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m^2), where m is the sample size. We demonstrate that this test outperforms established contingency table-based tests. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.},
  pages = {585-592},
  editors = {JC Platt and D Koller and Y Singer and S Roweis},
  publisher = {Curran},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Red Hook, NY, USA},
  month = sep,
  year = {2008},
  slug = {4928},
  author = {Gretton, A. and Fukumizu, K. and Teo, CH. and Song, L. and Sch{\"o}lkopf, B. and Smola, AJ.},
  month_numeric = {9}
}