Empirical Inference Technical Report 2004

Hilbertian Metrics and Positive Definite Kernels on Probability Measures

We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and one image classification problem.

Author(s): Hein, M. and Bousquet, O.
Number (issue): 126
Year: 2004
Month: July
Day: 0
Bibtex Type: Technical Report (techreport)
Electronic Archiving: grant_archive
Institution: Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@techreport{2815,
  title = {Hilbertian Metrics and Positive Definite Kernels on Probability Measures},
  abstract = {We investigate the problem of defining Hilbertian metrics resp.
  positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good
  results in text classification and has a wide range of possible
  applications. In this paper we extend the two-parameter family of
  Hilbertian metrics of Topsoe such that it now includes all
  commonly used Hilbertian metrics on probability measures. This
  allows us to do model selection among these metrics in an elegant
  and unified way. Second we investigate further our approach to
  incorporate similarity information of the probability space into
  the kernel. The analysis provides a better understanding of these
  kernels and gives in some cases a more efficient way to compute
  them. Finally we compare all proposed kernels in two text and one
  image classification problem.},
  number = {126},
  organization = {Max-Planck-Gesellschaft},
  institution = {Max Planck Institute for Biological Cybernetics, T{\"u}bingen, Germany},
  school = {Biologische Kybernetik},
  month = jul,
  year = {2004},
  slug = {2815},
  author = {Hein, M. and Bousquet, O.},
  month_numeric = {7}
}