Cross-validation Optimization for Large Scale Structured Classification Kernel Methods

Institute Homepage

Institute Homepage DE Sign In

Empirical Inference Article 2008

We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work.

Author(s):	Seeger, M.
Journal:	Journal of Machine Learning Research
Volume:	9
Pages:	1147-1178
Year:	2008
Month:	June
Day:	0

Bibtex Type:	Article (article)

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF PDF

BibTex

@article{5242,
  title = {Cross-validation Optimization for Large Scale Structured Classification Kernel Methods},
  journal = {Journal of Machine Learning Research},
  abstract = {We propose a highly efficient framework for penalized likelihood kernel methods applied
  to multi-class models with a large, structured set of classes. As opposed to many previous
  approaches which try to decompose the fitting problem into many smaller ones, we focus
  on a Newton optimization of the complete model, making use of model structure and
  linear conjugate gradients in order to approximate Newton search directions. Crucially,
  our learning method is based entirely on matrix-vector multiplication primitives with the
  kernel matrices and their derivatives, allowing straightforward specialization to new kernels,
  and focusing code optimization efforts to these primitives only.
  Kernel parameters are learned automatically, by maximizing the cross-validation log
  likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate
  our approach on large scale text classification tasks with hierarchical structure on
  thousands of classes, achieving state-of-the-art results in an order of magnitude less time
  than previous work.},
  volume = {9},
  pages = {1147-1178},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = jun,
  year = {2008},
  slug = {5242},
  author = {Seeger, M.},
  month_numeric = {6}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives