Mismatch String Kernels for SVM Protein Classification

Institute Homepage

Institute Homepage Sign In

Empirical Inference Conference Paper 2003

We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

Author(s):	Leslie, C. and Eskin, E. and Weston, J. and Noble, WS.
Book Title:	Advances in Neural Information Processing Systems 15
Journal:	Advances in Neural Information Processing Systems
Pages:	1417-1424
Year:	2003
Month:	October
Day:	0
Editors:	Becker, S. , S. Thrun, K. Obermayer
Publisher:	MIT Press

Bibtex Type:	Conference Paper (inproceedings)

Address:	Cambridge, MA, USA
Event Name:	Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002)
Event Place:	Vancouver, BC, Canada

Digital:	0
Electronic Archiving:	grant_archive
ISBN:	0-262-02550-7
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF Web

BibTex

@inproceedings{2055,
  title = {Mismatch String Kernels for SVM Protein Classification},
  journal = {Advances in Neural Information Processing Systems},
  booktitle = {Advances in Neural Information Processing Systems 15},
  abstract = {We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity
  based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.},
  pages = {1417-1424},
  editors = {Becker, S. , S. Thrun, K. Obermayer},
  publisher = {MIT Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Cambridge, MA, USA},
  month = oct,
  year = {2003},
  slug = {2055},
  author = {Leslie, C. and Eskin, E. and Weston, J. and Noble, WS.},
  month_numeric = {10}
}