Author(s): Shepherd JC
Abstract Share this page
Abstract The periodic variations obtained by correlating the relative positions of purines and pyrimidines (and of the four bases thymine, cytosine, adenine, and guanine) in a wide variety of genomes of wholly or partly known sequence suggest that there may be enough of an earlier comma-free coding system (i.e., only readable in one frame) still present to permit determination of the reading frame and approximate extent of the present protein coding stretches. The characteristics of these variations support the hypothesis that these primitive messages were formed of coding triplets having the form RNY (R = purine; Y = pyrimidine; and N = purine or pyrimidine). The base sequences and reading frames that have a minimal deviation from such a message are still good predictors of actual coding regions and reading frames in spite of the many mutations that have occurred since such a genetic code was last in use. In fact, the right frame for almost all the proteins in a number of viruses and various prokaryotes and eukaryotes is deduced purely from purine/pyrimidine information and not by using the normal start and stop signals.
This article was published in Proc Natl Acad Sci U S A
and referenced in Journal of Computer Science & Systems Biology