Author(s): Gay S, Binz PA, Hochstrasser DF, Appel RD
Abstract Share this page
Abstract Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88\% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.
This article was published in Proteomics
and referenced in Journal of Proteomics & Bioinformatics