ABOid: A Software for Automated Identification and Phyloproteomics Classification of Tandem Mass Spectrometric Data
- *Corresponding Author:
- Dr. Samir V. Deshpande
Science and Technology Corporation
500 Edgewood Road, Ste 205, Edgewood
MD 21040, USA
Tel: 410- 436-4348
E-mail: [email protected]
Received Date: June 03, 2011; Accepted Date: July 20, 2011; Published Date: July 22, 2011
Citation: Deshpande SV, Jabbour RE, Snyder PA, Stanford M, Wick CH, et al. (2011) ABOid: A Software for Automated Identification and Phyloproteomics Classification of Tandem Mass Spectrometric Data. J Chromatograph Separat Techniq S5:001. doi:10.4172/2157-7064.S5-001
Copyright: © 2011 Deshpande SV, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
We have developed suite of bioinformatics algorithms for automated identification and classification of microbes based on comparative analysis of protein sequences. This application uses sequence information of microbial proteins revealed by mass spectrometry-based proteomics for identification and phyloproteomics classification. The algorithms transforms results of searching product ion spectra of peptide ions against a protein database, performed by commercially available software (e.g. SEQUEST), into a taxonomically meaningful and easy to interpret output. To achieve this goal we constructed a custom protein database composed of theoretical proteomes derived from all fully sequenced bacterial genomes (1204 microorganisms as of August 25th, 2010) in a FASTA format. Each protein sequence in the database is supplemented with information on a source organism and chromosomal position of each protein coding open reading frame (ORF) is embedded into the protein sequence header. In addition this information is linked with a taxonomic position of each database bacterium. ABOid analyzes SEQUEST search results files to provide the probabilities that peptide sequence assignments to a product ion mass spectrum (MS/MS) are correct and uses the accepted spectrumâ€“to-sequence matches to generate a sequence-to-organism (STO) matrix of assignments. Because peptide sequences are differentially present or absent in various strains being compared this allows for the classification of bacterial species in a high throughput manner. For this purpose, STO matrices of assignments, viewed as assignment bitmaps, are next analyzed by a ABOid module that uses phylogenetic relationships between bacterial species as a part of decision tree process, and by applying multivariate statistical techniques (principal component and cluster analysis), to reveal relationship of the analyzed unknown sample to the database microorganisms. Our bacterial classification and identification algorithm uses assignments of an analyzed organism to taxonomic groups based on an organized scheme that begins at the phylum level and follows through classes, orders, families and genus down to strain level.