Empirical Inference Conference Paper 2007

An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold cross-validation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations.

Author(s): Keerthi, SS. and Sindhwani, V. and Chapelle, O.
Book Title: Advances in Neural Information Processing Systems 19
Journal: Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference
Pages: 673-680
Year: 2007
Month: September
Day: 0
Editors: Sch{\"o}lkopf, B. , J. Platt, T. Hofmann
Publisher: MIT Press
Bibtex Type: Conference Paper (inproceedings)
Address: Cambridge, MA, USA
Event Name: Twentieth Annual Conference on Neural Information Processing Systems (NIPS 2006)
Event Place: Vancouver, BC, Canada
Digital: 0
Electronic Archiving: grant_archive
ISBN: 0-262-19568-2
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{5371,
  title = {An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models},
  journal = {Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference},
  booktitle = {Advances in Neural Information Processing Systems 19},
  abstract = {We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold cross-validation error, 
  using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations.},
  pages = {673-680},
  editors = {Sch{\"o}lkopf, B. , J. Platt, T. Hofmann},
  publisher = {MIT Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Cambridge, MA, USA},
  month = sep,
  year = {2007},
  slug = {5371},
  author = {Keerthi, SS. and Sindhwani, V. and Chapelle, O.},
  month_numeric = {9}
}