Moscow State Medical University, Moscow, Russia
Received date: September 28, 2016; Accepted date: January 24, 2017; Published date: January 30, 2017
Citation: Altstein AD (2017) Origin of Biological Nucleic Acids and the First Genetic System (The Progene Hypothesis). Transcriptomics 5:139. doi:10.4172/2329-8936.1000139
Copyright: © 2017 Altstein AD. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Transcriptomics: Open Access
The most accepted concept of the origin of nucleic acids and life is the RNA world hypothesis that is supported by many scientists. There are very strong objections against this hypothesis (problems of selection of compounds in prebiotic conditions, processivity of polynucleotide synthesis without protein polymerases, arising of the genetic code and translation). In order to overcome these obstacles and to explain how the first biological nucleic acid (the first gene) arises simultaneously with a specific protein (a processive polymerase) forming a bimolecular genetic system, I have proposed an alternative hypothesis (the progene hypothesis). According to this hypothesis, the bimolecular genetic system emerges not from mononucleotides and monoamino acids, but from progenes, namely, trinucleotides aminoacylated on 3’-end by a non-random amino acid (NpNpNp~pX~Aa, where N - deoxyribo- or ribonucleoside, p –phosphate, X - a bifunctional agent, for example ribose, Aa - amino acid, ~ macroerge bond). The progenes are used as the only substrates for interconnected synthesis of a polynucleotide and a polypeptide. The growth of the system “polynucleotide – polypeptide” is controlled by the enzymatic properties of the growing polypeptide, and the bimolecular genetic system emerges as an extremely rare event. The progene forming mechanism (NpNp + Np~pX~Aa) makes it possible to explain the emergence of the prebiotic physicochemical group genetic code, as well as the selection of organic compounds for the future genetic system from the racemic heterogeneous environment. The bimolecular genetic system is reproduced on a progene basis via replication-transcription-translation (the first molecular genetic process) that is similar to its modern counterparts. Nothing is required for the emergence and reproduction of the bimolecular genetic system except for progenes and conditions for their formation, including lipid vesicles and short oligonucleotides (2-6 bases).
Nucleic acids; RNA world hypothesis; Progenies; Progene hypothesis; Selection of compounds; Origin of genetic code; Origin of translation; Origin of life
Abbreviations: Aa: Amino acid; DN: Dinucleotide; D, L- D- and L: Stereoisomers of nucleotides and amino acids; A*: Diaminopurine nucleotide; T, U, C, A, G: Standard nucleotides
Biological nucleic acids perform some well known fundamental functions: 1) template; 2) coding of polypeptides; 3) transport of activated the specific amino acids; 4) enzymatic (ribozymes); 5) regulating (small RNAs); 6) structural (ribosomes, chromosomes, virions). It is suggested that the first polynucleotides are synthesized spontaneously from mononucleotides on prebiotic Earth. According to the RNA world hypothesis, the first living beings (protoorganisms) consisted of RNA with both the template and the enzymatic functions, without any proteins; the translation process and the genetic code appeared later in evolution.
Biosynthesis of nucleic acids is impossible without proteins and biosynthesis of polypeptides is impossible without nucleic acids. What kind of biopolymers appeared first in evolution – nucleic acids or proteins? This is an important part of the mysterious problem - origin of life. It is a “chicken-and-egg” enigma. I would like to discuss two different theoretical approaches to resolve the problem.
What is the first most primitive genetic system? It is a bimolecular system - a gene-polynucleotide and its product - a processive polymerase (Figure 1). The gene codes itself and the polymerase, the polymerase catalyzes synthesis of the gene and itself. The important question is: what is nature of the first processive polymerase?
Two hypothesis are possible. 1) The first processive polymerase is a ribozyme; it is the well known RNA world hypothesis, the most accepted idea, supported by many outstanding scientists and included into text-books on biology [1-4]. 2) The first processive polymerase is a protein (nucleoprotein world hypothesis) [5-10].
There are three general objections against the RNA world hypothesis.
1) Synthesis of long polynucleotides with template function from a racemic mixture of different prebiotic mononucleotides is impossible without stereospecific catalysts . Gerald Joice  called such situation as “a replicative chaos”. Hence there is very difficult unsolved problem - selection of appropriate nucleotides for synthesis of long polynucleotides with the template function in prebiotic conditions.
2) Emergence of the first genetic system is impossible without a processive (moving along a template) polymerase. Can processive polymerases be of nucleotide nature? This problem is studied more than 30 years. Such natural polymerases is unknown. Natural processive complexes, containing RNA (telomerases, ribosomes), contain also proteins. Some distributive (but not processive) polymerases and ligases out of nucleotides were discovered or artificially synthesized [13-18]. I suggest that processive polymerases out of nucleotides cannot exist in principle due to complementary interaction between the polymerase and its template.
3) Within frames of the RNA world hypothesis there is not clear resolve of the genetic code and translation arising.
It’s enough to call into big question the RNA world hypothesis.
I proposed another hypothesis to explain arising of the first nucleoprotein genetic system in frames of the nucleoprotein world hypothesis. It is the progene hypothesis [9,10] that considered long prebiotic polynucleotides and polypeptides are synthesized not from mononucleotides and monoamino acids. First nucleotides and amino acids are united into special structures called progenes, and then a polynucleotide and a polypeptide are synthesized from progenes. The hypothesis tries to explain:
1. Selection of nucleotides and amino acids for the first nucleoprotein genetic system.
2. Emergence of the primitive prebiotic genetic code and analogues of transport RNAs.
3. Substrates for simultaneous synthesis of a polynucleotide with template and coding functions and a polypeptide with processive polymerase properties.
What are the progenes? They are trinucleotides amino acylated with a nonrandom amino acid on special 3’-end of the trinucleotide: NpNpNp~p-X~Aa (N is a nucleoside, p – phosphate, X - a bifunctional agent, for example ribose, Aa - amino acid, ~ macroerge bonds, necessary for synthesis of a polynucleotide and a polypeptide).
The central postulate of the hypothesis is a mechanism of progene formation: the progenes are formed by the combination of dinucleotides (NpNp) and aminoacyl nucleotides ( Np-p-X~Aa) in accordance with template principle (Figures 2A-2C). As it is shown on the figure (Figure 2A), a dinucleotide and an amino acyl nucleotide interact by stacking and specific interaction of the amino acid and the dinucleotide. The unstable “triplet” is formed. The specific interaction “amino acid – dinucleotide” and stacking between three nucleotides are critically important for half-life of the complex. If complex stability is enough (~10-9 sec.) two complementary unstable “triplets” are overlapped by Watson-Crick pairing (Figure 2B). The stability of the all complex increases drastically (10-3-10-5 sec). The complementary interaction influences positively for formation of phosphodiester bond between the dinucleotide and the aminoacyl nucleotide. Activation of the 3’hydroxyl substantially increases the probability of the phosphodiester bond formation in comparison with 5’ hydroxyl activation [19,20]. The important role of a template for phosphodiester bond formation was shown by Lesli Orgel and his coworkers [21-24]. Now we have the progene – triplet with nonrandom amino acid and well selected nucleotides (Figure 2C). This postulate shows how the interaction between amino acids and dinucleotides, as well as the interaction between nucleotides, create conditions for joining of a specific amino acid to a trinucleotide. The idea that the amino acid helps nucleotides to join by keeping them together was first presented earlier [8,25]. A possibility of the specific interaction of amino acids with nucleotides has been discussed by many researchers, but is still considered unproved. The nature of catalysts for phosphodiester bond formation between DN and AAN is unclear. It is possible participation of a heavy metal ions (Zn2+, Fe2+ or others), connected with AAN α-phosphate, in this process.
Figure 2: The mechanism of the progene formation . 1-dinucleotide; 2-aminoacyl nucleotide; 3-amino acid; 4-3' “tail” (p~pX~Aa); 5-complementary H-bonds between “triplets”; 6-phosphodiesther bond; 7-stacking between nucleotides; 8-progene. A. Formation of unstable “triplet” between a dinucleotide (DN) and aminoacyl nucleotide due to stacking and specific interaction between the amino acid (Aa) and DN. B. Formation of complementary interaction between two unstable “triplets”; the condition for formation of the templatedirected phosphodiester bond takes place between 2nd and 3d (amino acyl) nucleotides. C. The progene; arises on B-stage after phosphodiester bond formation between the DN and the aminoacyl nucleotide; contains the nucleotide triplet, Aa specific for the DN and two macroerges (NpNpNp~pX~A).
What can explain by the mechanism of progene formation? In general we can explain via the progene formation mechanism three very important obstacles in understanding of the origin of the first genetic system: 1) origin of the primitive prebiotic genetic code and primitive analogues of tRNA; 2) selection of appropriate compounds for a future genetic system; 3) the substrate for simultaneous synthesis of a polynucleotide and a polypeptide and possibilities for their further replication and translation.
Emergence of the First Bimolecular Genetic System
How the first genetic system arises from the progenies? (Figures 3 and 4). Two progenes meet and form a complex that holds by stacking, the first amino acid interaction with the second progene and overlapping of both progenes with a prebiotic oligonucleotide. Then the N-end of the second amino acid is approached to the activated C-end of the first amino acid and the dipeptide can form. Now the first amino acid can reach an area between two progenes. If the amino acid is dicarbonic one, it can catalyze phosphodiester bond between the progenes (the basic catalysis): the oligonucleotide of six nucleotides connected with the dipeptide is formed. The process repeats and we have the oligonucleotide of nine nucleotides with the tripeptide. So connection between the nucleotide sequences (the genotype) and the amino acid sequence (the phenotype) is formed. It is known that a mechanism of connection between genotype and phenotype is one of the main problems in the origin of life. Growth of the system continues under control of the growing peptide. If it gets “better”, the system grows. If it gets “worse”, growth of the system stops. As extremely rare event an appropriate enzyme (a processive polymerase, progene ligase) forms around its substrate, and a bimolecular genetic system arises.
The first bimolecular genetic system, consisted of a polynucleotide gene and a polypeptide enzyme (a processive polymerase), is a result of random combination of the progenes, selected during growth of the system.
How the first genetic system self-replicates on the basis of the progenes? Two progenes associate with the 3’-end of the template on complementary principle (Figure 5). The polymerase catalyzes formation of the peptide bond, then phosphodiester bond and moves along template. The next progene comes and the process repeats. Because the first template is a minus strand, synthesis of “incorrect” protein and a plus strand takes place. Then the polymerase goes over to 3’end of the plus strand and moves to its 5’end, synthesizing a new molecule of the polymerase and a new minus strand. Then two molecules of the polymerase move along minus strand and two plus strands are synthesized and so on. Hence the system is reproduced on principles very similar to replication, transcription and translation in modern genetic systems, but three main modern template genetic processes are united in one process “replication-transcription-translation” (RTT). It is the first molecular genetic process using only one enzyme.
It is not needed nothing for arising and replication of the first polynucleotide-polypeptide genetic system except the progenes and conditions for their synthesis.
It is clear that such a system can evolve on Darvin’s principle “heredity-variability (mutations, gene duplication) - natural selection”.
What is necessary in prebiotic conditions for progene formation and connection? Next components has to be available in the prebiotic world locally: 1) nucleotides, amino acids, sugars, lipids; 2) special aminoacyl nucleotides (Np~p-X~Aa); 3) activation of nucleotides and amino acids; 4) 2-6 b oligonucleotides; 5) lipid vesicle formation and microconcentration of organic substances and naturally 6) water, mineral salts. There is big literature on possibility of synthesis of organic compounds in conditions imitating prebiotic ones [26-30]. Of course there are many unsolved problems in this area.
Now some words on the problem of selection of compounds for the genetic system. Only substances (nucleotides and amino acids), compatible with the progene formation principle, are selected. Trinucleotides are very weak template. That is why the process of progene formation selects only the most optimal nucleotides. Two demands have to be respected: 1) maximal number of hydrogen bonds between two complementary progenes (not less than 8); 2) optimal stacking. The G-C complementary pair (3 hydrogen bonds) is necessary (the main pair). These strong nucleotides have to prevail in progenes (not less than 2 per progene). The pair A-T/U is additional (2 hydrogen bonds), not more than 1 per progene. The optimal stacking in two complementary triplets demands alternation of purine and pyrimidine nucleotides. If these rules are respected, only 8 variants of the progenes could prevail (of 64 possible) in prebiotic conditions: GTG; GCG, GCA, ACG; CAC; CGC, CGT/U, T/UGC).
There is an analog of adenyl nucleotide – 2-aminoadenyl (2,6-diaminopurine) nucleotide (A*). It can form three hydrogen bonds with T or U. It is not excluded that diaminopurine nucleotides were in prebiotic medium together with adenine nucleotides [31,32]. 2-diaminoadenine nucleotides have to have important preference before adenine nucleotides for progene formation. In this case the rule of strong and weak nucleotides is canceled, because all pairs G-C and A*-T/U are strong, but the rule of purine-pyrimidine alternation are kept. Hence 16 variants of the progenes will prevail. Maybe diaminopurine nucleotides were in the first nucleic acid instead of (together with) adenyl ones. More available A changed A* after arising of new appropriate enzymes in early evolution. It was easy to do because diaminopurine and adenine have identical coding properties (see below).
As I mentioned above the progenes are activated on 3’end. Why is it not 5’end as we have in modern biochemistry? Activation of 3’ end is more efficient for phosphodiester bond formation but it means that the progenes are built mainly out of 3’phosphodeoxynucleotides but not ribonucleotides due to cyclophosphate formation between activated 3’- end phosphate and 2’end hydroxyl.
Now about origin of chiral purity of nucleic acids. It is known that the chiral purity of a nucleic acid is absolute demand for its template function. It was shown in Lesli Orgel’s laboratory  that L-nucleotides can include into D-chain on D-template in the enzymeless polymerase reaction. Including “incorrect” nucleotide leads to inhibition of further elongation of the chain (the stereoinhibition phenomenon). The progene hypothesis proposes three levels of protection against inclusion of “incorrect” nucleotides: 1) the most progenes are chirally pure (D or L); 2) the progenes of the different chirality can not joined during formation of the first genetic system (due to the stereoinhibition phenomenon); 3) arising of the polymerase (progene ligase) helps to select progenes with the correct chirality. It is possible that L-amino acids better correspond to D- than to L-nucleotides, and vice versa. This question requires further investigation. The modern chirality (D-sugars and L-amino acids) is a result of chance fixation of one of two situations during formation of the first genetic system.
The progene hypothesis allows to find an own approach to mechanisms of the first prebiotic code arising. The progene hypothesis postulated a specific interaction between an amino acid of an aminoacyl nucleotide and a dinucleotide during formation of a progene. The simple stereochemical analysis with atomic models was done. Trinucleotides were taken in B-form, with an amino acid on 3’end (NpNpNpppAa). It was found that the plus charged N-end of every amino acid is able to a standard interaction with the minus charged oxygen of the phosphate group of the first dinucleotides. The standard interaction is equal for all amino acids, but their side groups a directed to second base of dinucleotides (the central base of triplets) and interact with them differently. It is possible to observe opportunity for weak interactions between an Aa side groups and dinucleotides (DN): hydrophobic interactions, van-der-Waals contacts, hydrogen bonds. It is important to take into account dehydration of donors and acceptors of hydrogen bond because it is an energetic loss. These are main results of the stereochemical analysis: 1. Interaction between Aa and DN takes place in the major groove. 2. All atomic groups, participating in the interaction, can be identified. 3. The second (central) nucleotide is the most important for interaction specificity. 4. DNs can be found that are optimal for interaction with determined groups of Aa.
It was shown good correspondence between results of stereochemical analysis and cotemporary genetic code for all ten amino acids, high and middle presented in prebiotic conditions: Gly, Ala, Val, Asp, Ser, Leu, Ile, Pro, Glu, Thr. For ten minor Aa different results were obtained: 1) Arg, Met, Phe – 3/3 – good correspondence; 2) Tyr, Trp, His, Lys – 0/4 – full discrepancy; 3) Cys, Asn, Gln - indeterminate results. Apparently the minor amino acids were absent in the first proteins and not presented in the prebiotic genetic code.
On basis of the progene hypothesis and the results of the stereochemical analysis it can obtained a table of a prebiotic group physicochemical 8-codons genetic code. Four groups can be identificated (Table 1): 1) the middle nucleotide of codons is T, it codes hydrophobic amino acids; T is important for hydrophobic interaction due to 5-CH3 group and has prevalence before U; there are not conditions for coding of hydroxyl- and dicarbonic amino acids; 2) the middle nucleotide of codons is C (weak specificity), it codes small amino acids (polar and nonpolar); 3) the middle nucleotide of codons is A, it codes dicarbonic acids due to interaction with 6-NH group (hydrogen bond) without dehydration of N7 group ; it is impossible to code the hydroxy- and hydrophobic amino acids due to dehydration of these groups; 4) the middle nucleotide of codons is G, it codes hydroxy amino acids due to hydrogen bond with N7 of guanine; no good opportunity for interaction of hydrophobic and dicarbonic amino acids.
|Hydrophobic Aa||Small polar and nonpolar Aa||Dicarbonic Aa||Hydroxy Aa|
|Val, n-Val,Leu,n-Leu,Ile,a-But||Ala,Gly (Asp, Ser)||Asp,Glu||Ser,Thr|
Table 1: First prebiotic genetic code (8-codons, groups, physico-chemical).
If diaminopurine nucleotide (A*) was used for the progene formation instead of adenine nucleotide (A), four codons will be in each column (Table 2). It is quite possible that the first bimolecular genetic system was based on A* nucleotides instead of adenine nucleotides. Diaminopurine forms three hydrogen bonds with T and U (it is important for progene formation) and its coding properties the same because all A and A* atomic group responsible for interaction with amino acids (6-N, N7, CH8) are the same. It is easy to change A* for A in early evolution when progene formation is based not only on physicochemical mechanisms but on a future coded polypeptide enzyme (a progene synthtase).
|Hydraphobic Aa Val, n-Val, Leu, n-Leu, Ile, a-But||Small polar and non- polar Aa Ala, Gly (Asp, Ser)||Dicarbonic Aa Asp, Glu||Hydroxy Aa Ser, Thr|
Table 2: The first prebiotic genetic code with 2,6-diaminopurine nucleotide (A*) instead of A.
The progene hypothesis allows to explain many peculiarities of the modern genetic code that evolves from the first primitive prebiotic code:
• Tripletness (the progenes are triplets)
• Degeneracy (third nucleotide of the progenes is low specific)
• More specificity of the first duplet of codons (the stereochemical analysis)
• The most importance of the second nucleotide (the stereochemical analysis)
• Absence of commas (mechanism of genetic system formation from the progenes
• Т - encodes hydrophobic Aa (the stereochemical analysis)
• С - encodes small polar and non-polar Aa (the stereochemical analysis)
• А - encodes dicarbonic Aa (the stereochemical analysis)
• G - encodes Ser, Arg (the stereochemical analysis)
• T and C code preferably nonpolar Aa (7/9), A and G – polar Aa (11/12) (the stereochemical analysis)
The progene hypothesis proposes a framework for a plausible explanation of the origin of the first biological nucleic acids, the first evolvable genetic system (and hence, the origin of life).
The postulated mechanism of the progene formation and its consequences (the selection of compounds and origin of the primitive genetic code) can be tested in chemical experiments and stereochemical studies.