Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.
Author(s): | Rätsch, G. and Hepp, B. and Schulze, U. and Ong, CS. |
Book Title: | GCB 2006 |
Journal: | Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006) |
Pages: | 104-113 |
Year: | 2006 |
Month: | September |
Day: | 0 |
Editors: | Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell |
Publisher: | Gesellschaft f{\"u}r Informatik |
Bibtex Type: | Conference Paper (inproceedings) |
Address: | Bonn, Germany |
Event Name: | German Conference on Bioinformatics 2006 |
Event Place: | Tübingen, Germany |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
BibTex
@inproceedings{4157, title = {PALMA: Perfect Alignments using Large Margin Algorithms}, journal = {Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)}, booktitle = {GCB 2006}, abstract = {Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.}, pages = {104-113}, editors = {Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell}, publisher = {Gesellschaft f{\"u}r Informatik}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {Bonn, Germany}, month = sep, year = {2006}, slug = {4157}, author = {R{\"a}tsch, G. and Hepp, B. and Schulze, U. and Ong, CS.}, month_numeric = {9} }