Ranking Methods for the Prediction of Frequent Top Scoring Peptides from Proteomics Data
- *Corresponding Author:
- Carsten Henneges,
Eberhardt Karls Universität
Tübingen, Sand 1
72076 Tübingen, Germany
Tel: 07071 / 29 – 77 175
Fax: 07071 / 29 - 50 91
E-mail : [email protected]
Received Date: April 08, 2009; Accepted Date: May 20, 2009; Published Date: May 20, 2009
Citation: Henneges C, Hinselmann G, Jung S, Madlung J, Schütz W, et al. (2009) Ranking Methods for the Prediction of Frequent Top Scoring Peptides from Proteomics Data. J Proteomics Bioinform 2: 226-235. doi: 10.4172/jpb.1000081
Copyright: © 2009 Henneges C, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Proteomics facilities accumulate large amounts of proteomics data that are archived for documentation purposes. Since proteomics search engines, e.g. Mascot or Sequest, are used for peptide sequencing resulting in peptide hits that are ranked by a score, we apply ranking algorithms to combine archived search results into predictive models. In this way peptide sequences can be identified that frequently achieve high scores. Using our approach they can be predicted directly from their molecular structure and then be used to support protein identification or perform experiments that require reliable peptide identification. We prepared all peptide sequences and Mascot scores from a four year period of proteomics experiments on Homo sapiens of the Proteome Center Tuebingen for training. To encode the peptides MacroModel and DragonX were used for molecular descriptor computation. All features were ranked by ranking-specific feature selection using the Greedy Search Algorithm to significantly improve the performance of RankNet and FRank. Model evaluation on hold-out test data resulted in a Mean Average Precision up to 0.59 and a Normalized Discounted Cumulative Gain up to 0.81. Therefore we demonstrate that ranking algorithms can be used for the analysis of long term proteomics data to identify frequently top scoring peptides.