Author(s): Green H, Wang N
Abstract Share this page
Abstract Sequence data banks have been searched for proteins possessing uninterrupted reiterations of any amino acid. Hydrophilic amino acids, and particularly glutamine, account for a large proportion of the longer reiterants. In the genes for these proteins, the most common reiterants are those that contain poly(CAG), even out-of-frame or, to a lesser degree, those that contain repeated doublets of CA, AG, or GC. The preferential generation of such reiterants requires that DNA strand-specific signals predispose to reiteration and thus to the extension of coding regions.
This article was published in Proc Natl Acad Sci U S A
and referenced in Journal of Data Mining in Genomics & Proteomics