Author(s): Saleh MT, Fillon M, Brennan PJ, Belisle JT
Abstract Share this page
Abstract The increasing number of bacterial genomes being sequenced fuels an equal demand for methods to rapidly analyze the proteomes of these organisms. One group of proteins of pressing importance is the exported/secreted proteins, given their dominant immunogenicity and role in pathogenesis. With this in mind, a weight matrix algorithm and two artificial neural networks, one based on amino acid position within the N-terminus and the other on amino acid frequency, were developed for identification of such proteins. The neural networks and a hybrid method, combining the weight matrix algorithm and the amino acid frequency neural network, were tested independently against a standard data set of secreted and cytoplasmic proteins to determine their accuracy in predicting secreted prokaryotic proteins. The results of these analyses demonstrated that the amino acid position neural network provided the highest accuracy (Mathews correlation coefficient of 0.93) in predicting secreted proteins of Gram-negative bacteria, whereas the hybrid method was best (Mathews correlation coefficient of 0.97) for prediction of Gram-positive secreted proteins. These two methods were integrated into a single program (ExProt) designed to analyze whole proteomes. In addition to protein localization, ExProt also contains a neural network trained to identify the most probable signal peptidase I cleavage site of secreted proteins. When tested against the standard protein data set ExProt correctly predicted 73.5 and 84.5\% of the cleavage sites in Gram-positive and Gram-negative secreted proteins, respectively. Comparative analysis of Gram-negative, Gram-positive, Mycobacterium tuberculosis, and Archaea proteomes with ExProt revealed that the fraction of putative exported/secreted proteins encoded by bacterial genomes ranged from 8\% for Methanococcus jannaschii to 37\% for Mycoplasma pneumoniae.
This article was published in Gene
and referenced in Journal of Proteomics & Bioinformatics