Empirical Inference Technical Report 2006

An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

Author(s): Zien, A. and Ong, CS.
Number (issue): 146
Year: 2006
Month: April
Day: 0
Bibtex Type: Technical Report (techreport)
Digital: 0
Electronic Archiving: grant_archive
Institution: Max Planck Institute for Biological Cybernetics, Tübingen
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@techreport{3943,
  title = {An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization},
  abstract = {Protein subcellular localization is a crucial ingredient to many
  important inferences about cellular processes, including prediction of
  protein function and protein interactions. While many predictive
  computational tools have been proposed, they tend to have complicated
  architectures and require many design decisions from the developer.
  We propose an elegant and fully automated approach to building a
  prediction system for protein subcellular localization.  We propose a
  new class of protein sequence kernels which considers all motifs
  including motifs with gaps. This class of kernels allows
  the inclusion of pairwise amino acid distances into their
  computation. We further propose a multiclass support vector machine method
  which directly solves protein subcellular localization without
  resorting to the common approach of splitting the problem into several
  binary classification problems.  To automatically search over families
  of possible amino acid motifs, we generalize our method to optimize over
  multiple kernels at the same time.  We compare our automated approach
  to four other predictors on three different datasets.},
  number = {146},
  organization = {Max-Planck-Gesellschaft},
  institution = {Max Planck Institute for Biological Cybernetics, Tübingen},
  school = {Biologische Kybernetik},
  month = apr,
  year = {2006},
  slug = {3943},
  author = {Zien, A. and Ong, CS.},
  month_numeric = {4}
}