Author(s): Ken NISHIKAWA, Yasushi KUBOTA, Tatsuo OOI
Correlations of the amino acid composition of a protein to its location in an organism, biological function, folding type, and disulfide bond(s) were examined for 356 proteins. In the present data set, 325 proteins of known location and biological characters were divided into 122 intracellular enzymes (BI), 73 intracellular non-enzymes (BIT), 45 extracellular enzymes (Bill), and 85 extracellular nonenzymes (BIV). The composition of these proteins were expressed as points in the composition space of 18 orthogonal axes, each representing the content of an amino acid. The distributions of points of BI and Bill were narrow and approximately spherical but those of BII and BIV were distributed rather widely. The groups are separated from each other in the space. We divided the space into four regions (Al to A4) corresponding to the groups BI to BIV. A protein could be assigned to one of the four groups (Al to A4) from its amino acid composition: The proteins correctly assigned amounted to 177 out of 195 intracellular proteins, and 94 out of 130 extracellular proteins. The correspondence was about 80% for classification into intracellular and extracellular proteins and 66% for that into the four groups. The folding type also had a significant correlation to the above groups, i.e., intracellular enzymes are rich in α/β, nonenzymes α, extracellular enzymes β and α+β, and nonenzymes β. The differences in average composition between intra- and extracellular proteins, and between enzymes and nonenzymes were related to the structural characters, i.e., intracellular proteins contain more amino acids favoring α-helix than extracellular ones, and enzymes contain more hydrophobic amino acids than nonenzymes. The statistics on 213 Cys-containing proteins showed that disulfide bond(s) are found mostly (90%) in the extracellular proteins. The results indicate that amino acid composition is well correlated to location in an organism, biological function, folding type, and disulfide bonding. The implications of the new findings are discussed from the protein-taxonomical point of view, and the validity of the present method is assessed.