Author(s): Feng ZP
Abstract Share this page
Abstract The present paper overviews the issue on predicting the subcellular location of a protein. Five measures of extracting information from the global sequence based on the Bayes discriminant algorithm are reviewed. 1) The auto-correlation functions of amino acid indices along the sequence; 2) The quasi-sequence-order approach; 3) the pseudo-amino acid composition; 4) the unified attribute vector in Hilbert space, 5) Zp parameters extracted from the Zp curve. The actual performance of the predictive accuracy is closely related to the degree of similarity between the training and testing sets or to the average degree of pairwise similarity in dataset in a cross-validated study. Many scholars considered that the current higher predictive accuracy still cannot ensure that some available algorithms are effective in practice prediction for the higher pairwise sequence identity of the datasets, but some of them declared that construction of the dataset used for developing software should base on the reality determined by the Mother Nature that some subcellular locations really contain only a minor number of proteins of which some even have a high percentage of sequence similarity. Owing to the complexity of the problem itself, some very sophisticated and special programs are needed for both constructing dataset and improving the prediction. Anyhow finding the target information in mature protein sequence and properly cooperating it with sorting signals in prediction may further improve the overall predictive accuracy and make the prediction into practice.
This article was published in In Silico Biol
and referenced in Journal of Health & Medical Informatics