Empirical Inference Conference Paper 2006

PALMA: Perfect Alignments using Large Margin Algorithms

Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

Author(s): Rätsch, G. and Hepp, B. and Schulze, U. and Ong, CS.
Book Title: GCB 2006
Journal: Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)
Pages: 104-113
Year: 2006
Month: September
Day: 0
Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell
Publisher: Gesellschaft f{\"u}r Informatik
Bibtex Type: Conference Paper (inproceedings)
Address: Bonn, Germany
Event Name: German Conference on Bioinformatics 2006
Event Place: Tübingen, Germany
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4157,
  title = {PALMA: Perfect Alignments using Large Margin Algorithms},
  journal = {Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)},
  booktitle = {GCB 2006},
  abstract = {Despite many years of research on how to properly align sequences in
  the presence of sequencing errors, alternative splicing and
  micro-exons, the correct alignment of mRNA sequences to genomic DNA is
  still a challenging task.  We present a novel approach based on large
  margin learning that combines kernel based splice site predictions
  with common sequence alignment techniques. By solving a convex
  optimization problem, our algorithm -- called PALMA -- tunes the
  parameters of the model such that the true alignment scores higher
  than all other alignments. In an experimental study on the alignments
  of mRNAs containing artificially generated micro-exons, we show that
  our algorithm drastically outperforms all other methods: It perfectly
  aligns all 4358 sequences on an hold-out set, while the best other
  method misaligns at least 90 of them. Moreover, our algorithm is very
  robust against noise in the query sequence: when deleting, inserting,
  or mutating up to 50% of the query sequence, it still aligns 95% of
  all sequences correctly, while other methods achieve less than 36%
  accuracy.  For datasets, additional results and a stand-alone
  alignment tool see
  http://www.fml.mpg.de/raetsch/projects/palma.},
  pages = {104-113},
  editors = {Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell},
  publisher = {Gesellschaft f{\"u}r Informatik},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Bonn, Germany},
  month = sep,
  year = {2006},
  slug = {4157},
  author = {R{\"a}tsch, G. and Hepp, B. and Schulze, U. and Ong, CS.},
  month_numeric = {9}
}