Back
Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms
A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to capture global structure in data, can significantly improve over PSI-BLAST.
@techreport{2300, title = {Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms}, abstract = {A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to capture global structure in data, can significantly improve over PSI-BLAST.}, number = {111}, institution = {Max Planck Institute for Biological Cybernetics, T{\"u}bingen, Germany}, month = jun, year = {2003}, slug = {2300}, author = {Weston, J. and Leslie, C. and Elisseeff, A. and Noble, WS.}, month_numeric = {6} }