Empirical Inference Article 2004

A Compression Approach to Support Vector Model Selection

In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct "compression coefficients" for SVMs which measure the amount by which the training labels can be compressed by a code built from the separating hyperplane. The main idea is to relate the coding precision to geometrical concepts such as the width of the margin or the shape of the data in the feature space. The so derived compression coefficients combine well known quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the kernel matrix, and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on benchmark data sets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

Author(s): von Luxburg, U. and Bousquet, O. and Schölkopf, B.
Journal: Journal of Machine Learning Research
Volume: 5
Pages: 293-323
Year: 2004
Month: April
Day: 0
Bibtex Type: Article (article)
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@article{2666,
  title = {A Compression Approach to Support Vector Model Selection},
  journal = {Journal of Machine Learning Research},
  abstract = {In this paper we investigate connections between statistical learning 
  theory and data compression on the basis of support vector machine (SVM) 
  model selection. Inspired by several generalization bounds we construct 
  "compression coefficients" for SVMs which measure the amount by which the 
  training labels can be compressed by a code built from the separating 
  hyperplane. The main idea is to relate the coding precision to geometrical 
  concepts such as the width of the margin or the shape of the data in the 
  feature space. The so derived compression coefficients combine well known 
  quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the 
  kernel matrix, and the number of support vectors. To test whether they are 
  useful in practice we ran model selection experiments on benchmark data 
  sets. As a result we found that compression coefficients can fairly 
  accurately predict the parameters for which the test error is minimized.},
  volume = {5},
  pages = {293-323},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = apr,
  year = {2004},
  slug = {2666},
  author = {von Luxburg, U. and Bousquet, O. and Sch{\"o}lkopf, B.},
  month_numeric = {4}
}