Empirische Inferenz Talk 2007

An Automated Combination of Kernels for Predicting Protein Subcellular Localization

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions.We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We utilize an extension of the multiclass support vector machine (SVM)method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets, and show that we perform better than the current state of the art. Furthermore, our method provides some insights as to which features are most useful for determining subcellular localization, which are in agreement with biological reasoning.

Author(s): Zien, A. and Ong, CS.
Year: 2007
Month: December
Day: 0
Bibtex Type: Talk (talk)
Digital: 0
Electronic Archiving: grant_archive
Event Name: NIPS 2007 Workshop on Machine Learning in Computational Biology
Event Place: Whistler, BC, Canada
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@talk{5032,
  title = {An Automated Combination of Kernels for Predicting Protein Subcellular Localization},
  abstract = {Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions.We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We utilize an extension of the multiclass support vector machine (SVM)method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets, and show that we perform better than the current state of the art. Furthermore, our method provides some insights as to which features are most useful for determining subcellular localization, which are in agreement with biological reasoning.},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = dec,
  year = {2007},
  slug = {5032},
  author = {Zien, A. and Ong, CS.},
  month_numeric = {12}
}