Empirische Inferenz Technical Report 2003

Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms

A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to capture global structure in data, can significantly improve over PSI-BLAST.

Author(s): Weston, J. and Leslie, C. and Elisseeff, A. and Noble, WS.
Number (issue): 111
Year: 2003
Month: June
Day: 0
Bibtex Type: Technical Report (techreport)
Electronic Archiving: grant_archive
Institution: Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Links:

BibTex

@techreport{2300,
  title = {Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms},
  abstract = {A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we 
  look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to  capture global structure in data, can significantly improve over PSI-BLAST.},
  number = {111},
  institution = {Max Planck Institute for Biological Cybernetics, T{\"u}bingen, Germany},
  month = jun,
  year = {2003},
  slug = {2300},
  author = {Weston, J. and Leslie, C. and Elisseeff, A. and Noble, WS.},
  month_numeric = {6}
}