Empirical Inference Talk 2007

Positional Oligomer Importance Matrices

At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k−mer based scoring systems.

Author(s): Sonnenburg, S. and Zien, A. and Philips, P. and Rätsch, G.
Year: 2007
Month: December
Day: 0
Bibtex Type: Talk (talk)
Digital: 0
Electronic Archiving: grant_archive
Event Name: NIPS 2007 Workshop on Machine Learning in Computational Biology
Event Place: Whistler, BC, Canada
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@talk{5033,
  title = {Positional Oligomer Importance Matrices},
  abstract = {At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k−mer based scoring systems.},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = dec,
  year = {2007},
  slug = {5033},
  author = {Sonnenburg, S. and Zien, A. and Philips, P. and R{\"a}tsch, G.},
  month_numeric = {12}
}