Mitochondrial Genomes and Frameshift Mutations: Hidden Stop Codons, their Functional Consequences and Disease Associations

Mitochondria are the power house of the cell. They are present in virtually every cell in body. They play a central role in metabolism, apoptosis, disease and aging. They are the site of oxidative phosphorylation, essential for the production of ATP, as well as for other biochemical functions. Mitochondria have a genome separate from the nuclear genome referred to as mitochondrial DNA (mt DNA). Mt DNA is one of the most frequently used markers in molecular systematics; because of its wide spread characteristics such as consistent gene order, maternal inheritance, rapid rate of evolution and haploid nature. Mt DNA has been extensively studied for evidences of selection in the last several years. Several interesting functional and evolutionary facts were reveled after analyzing complete mt genomic sequences during last two decades [1-7]. With the availability of thousands of complete mt genomes, comparative mt genomics promises to be the basis for distinct patterns and processes of functional genomics and molecular evolution of mitochondria and associated biological entities. Mt genomic study may reveal significant insight into many aspects of genome evolution like gene rearrangements evolution, gene regulation, patterns of gene expression, and replication mechanisms [2,3,8-10].


Introduction
Mitochondria are the power house of the cell. They are present in virtually every cell in body. They play a central role in metabolism, apoptosis, disease and aging. They are the site of oxidative phosphorylation, essential for the production of ATP, as well as for other biochemical functions. Mitochondria have a genome separate from the nuclear genome referred to as mitochondrial DNA (mt DNA). Mt DNA is one of the most frequently used markers in molecular systematics; because of its wide spread characteristics such as consistent gene order, maternal inheritance, rapid rate of evolution and haploid nature. Mt DNA has been extensively studied for evidences of selection in the last several years. Several interesting functional and evolutionary facts were reveled after analyzing complete mt genomic sequences during last two decades [1][2][3][4][5][6][7]. With the availability of thousands of complete mt genomes, comparative mt genomics promises to be the basis for distinct patterns and processes of functional genomics and molecular evolution of mitochondria and associated biological entities. Mt genomic study may reveal significant insight into many aspects of genome evolution like gene rearrangements evolution, gene regulation, patterns of gene expression, and replication mechanisms [2,3,[8][9][10].
There is scientific proof and agreement that codons are translated at different rates [11]. The first report of non-uniform translation rates was the observation that there are pauses during polypeptide elongation and that these can be identified with short strings of rarely used codons [12,13]. As soon as a significant number of genes, and genomes have been sequenced, it became an accepted opinion that biased codon usage could regulate the expression levels of individual genes by modulating the rates of polypeptide formation [14][15][16]. There are several occurrences of codon reassignments, premature stop codons, and read through stop codons, in protein coding sequences at various taxonomic levels [17][18][19][20]. This ambiguous role of stop codons, codon reassignments and its implications could provide useful insights for the functional association of mt genomic entities with various diseases.

Frameshift Events
Standard translational rules could be altered locally to reprogram mRNA translation and is termed as Recoding. Recoding events occur in rivalry with standard readout of the transcript, and are site specific. The three classes of recoding are (1) Frameshifting (2) Bypassing (Hopping) and (3) Codon redefinition [21][22][23][24].
(1) Two protein products could be yielded at a particular site from one coding sequence by frameshifting, or one protein product from two overlapping open reading frames (ORFs). The known cases of frameshifting where the product is utilized involve shifts of one base either +1 or -1.
(2) When a block of nucleotides within a coding sequence is not translated then the process of bypassing (hopping) occurs. Temporary suspension of translation affects the ribosomes which traverse the coding gap and protein synthesis resumes, yielding a single protein.
(3) Codon redefinition involves site-specific alteration of codon meaning which could be the redefinition of an initiation codon or stop codon to specify an amino acid.
Protein translations that start not at the first, but either at the second (+1 frameshift) or the third (-1 frameshift) nucleotide of the codon are defined as frameshifts ( Figure 1). Apparently, most frameshifts would yield nonfunctional proteins. Therefore frameshifts lead to waste of energy, resources and activity of the biosynthetic machinery. In addition, some peptides synthesized after frameshifts are most likely cytotoxic [25,26]. It will be interesting to evaluate these frameshift products and their consequences on biological machineries.

Mitochondrial genomes and hidden stop codons
Recent advances in sequencing techniques have made available a great deal of data on whole genome basis. Complete mt genome sequences are available for thousands of organisms. Mitochondria encode a small set of highly conserved genes that are critical for respiration. Codon reassignments are pervasive in mitochondrial genomes and have been attributed extensively to directional mutational changes, affecting the GC content of genomes [27], to reduce number of tRNAs of mitochondria to minimize genome size, and the existence of ambiguous translational mechanisms [28,29]. Knight et al. [30] suggested that codon disappearance and coding ambiguity can act alone but usually act in accordance with each other. Most mRNA coding genes exhibit different patterns of nonrandom codon usage. Various studies suggested that no single, overriding selection process is responsible for the preference in codon usage [31][32][33]. Codon bias observed in an mRNA primary sequence may be a function of selective preferences for mRNA processing and transport; translational efficiency and mRNA secondary structure stability [33].
Selection pressure in mt genomes may be changed by their differences from nuclear genomes; as they are much smaller in size; they encode comparatively few proteins; and are extremely biased in nucleotide composition and are all AT-rich with few exceptions [30]. Since AT pressure is acting on all mt genomes, codon disappearance and reassignment must also have followed some kind of patterns and can be correlated with GC or AT contents of the genomes. Variations are also found in mitochondria. The variation results from reassignments of codons especially stop codons. The reassignments take place by disappearance of a codon from coding sequences, followed by its reappearance in a new role [27].
In order to evaluate all the above mentioned functional and evolutionary aspects such as directional mutational pressure for the AT-richness of mt genomes, mRNA processing and transport; translational efficiency and mRNA secondary structure stability, and their correlation with the putative functional and physiological events and their association with hidden stop codons; detailed analysis on several mt genomes of vertebrates was performed. It is believed that this analysis would serve as an essential aid on to provide useful insights to frameshift mutations and their significance in mt genomics.

Off-frame Stop Codons Density, base composition, and ribosomal secondary structure stability
The GC contents of the first, second, and third positions of codons, of tRNAs, of rRNAs and of spacer elements are highly correlated. In fact, the GC content accounts for 98% of the variance in coding sequences and 84% of the variance in GC content in rRNA genes [34]. Therefore we can evaluate the effect of GC content on codon disappearance and codon reassignment [30]. Since AT pressure is acting on all mt genomes and all mt genomes are AT rich, effect of nucleotide composition on hidden stop codon can be evaluated. I appraise this effect on studid complete vertebrate mt genomes. I plotted graphs for the correlation between the potential contribution to hidden stops and the AT and GC contents of the 13 coding sequences of 820 vertebrate mt genomes (Figures 2a and 2b). It might be possible that AT richness of mt genomes has been supported partially by the positive selection of hidden stops as 66% of stop codon positions in mt vertebrates code belong to A and T nucleotides.
Comparing the stop codon assignments taxonomically, it has been found that +1 frameshift has more common combinations of stop codon assignments than -1 frameshift ( Table 1 in [26]). This suggests that +1 frameshift might be optimized through natural selection. This hypothesis is supported by the correlation between codon usage frequencies and potential contribution of codons to hidden stops in coding sequences of vertebrate mt genomes. Among 820 studied mt genomes of vertebrates only 1 genome of Salvelinus alpinus represents more contribution towards -1 frameshift; rest of the genomes had shown more hidden stops in +1 frameshift, supporting the natural occurrence of +1 frameshift relatively more than -1 frameshift (data not shown).
Another important finding of this analysis is the base composition pressure and its distribution. While dealing with the AT and GC content analysis we found that among coding sequences of 820 vertebrates mt genomes, 25 genome's coding sequences are GC rich, 24 of them are members of bony fishes. According to Hickman and Roberts [35], "the bony fishes had developed several key adaptations that contributed to  According to ambush hypothesis [25], early termination of off frame transcription should increase the efficiency of expression of a gene, because less time and resources are invested in unproductive off frame contexts. If the reading frame of the ribosome is not zero, earlier a stop terminates translation; the earlier mRNA and ribosome are available for interaction. If the ambush hypothesis is correct, hidden stops should be more frequent for large and frequently expressed genes, since costs of off-frame translation are likely to increase with gene size and expression levels [26,36].
Ambush hypothesis implies that the need for hidden stops increases with the probability of frameshifts. It seems plausible that less stable ribosomes are more likely to frameshift and vice versa. In order to resolve this assumption I performed analysis to predict any correlation between the ribosomal stability, counted as a function of predicted stability or ΔG of secondary structure [37,38] and hidden stops in mammalian mt genomes. Significant positive correlation (P<0.05) in case of 12s rRNA supports the assumption that low rRNA stabilities (high ΔG) associate with high counts of hidden stop codons ( Figure 3). On the other hand correlation is negative (non-significant) in 16s rRNA. Same kind of analysis also performed on 100 mammals mt genomes where results were favorable in 12s rRNA ( Figure 4). Calculated values of correlation coefficients (r), and probability (P) are shown in respective figures.

Hidden stops density and gene expression
There are several evidences where codon usage bias was correlated positively with the expression levels of genes such as in E. coli [14,[39][40]; in Saccharomyces cerevisiae [33]; in nitrogen fixing endosymbiont Bradorhizobium japonicum [41]. Another kind of evidences for positive correlation between codon usage bias and gene's size were also shown; in Drosophila [42]; in E. coli, Arabidopsis, Holobacterium and Homo [43]; and in yeast [44]. Diverse rate of protein evolution is a vital problem in molecular evolution and best predictor of evolutionary rate is expression level [45]. The cost of off-frame translation is likely to increase for the large and high expression level genes. Several hypothesis such as translational efficiency, functional loss, and translational sturdiness entails that selection can act on nucleotide sequence, to increase the translational accuracy by optimizing codon usage, and on amino acid sequence, to increase the number of proteins that fold properly [46].   If the ambush hypothesis is correct, then hidden stops should be more frequent for large and frequently expressed genes. In order to evaluate this assumption an analysis for the correlation between ΔG of secondary structure for mRNA and hidden stops in 13 mt coding genes in primates and two near-primates outgroups was performed. There is negative correlation between ΔG of secondary structure (P<0.05) of the coding genes and number of hidden stops in respective sequences (Table 2 and Figure 5). More stable structures have more count of hidden stops and vice versa, indicate optimization of codon usage for more stable mRNA structures or less evolved proteins. Further, correlation between codon adaptive index (CAI) and hidden stops was calculated for these genomes. Results (Table 2) are not in support of the previous view, and indicate conflict about this event. This kind of analysis on larger and diverse data set could provide more insight into this process.
Ozbudak et al. [47], measured rates of transcription and translation under different independent conditions of a single fluorescent reporter gene in Bacillus subtilis to explain the variation of gene expression levels. They induce artificial variation in the sequence between promoter region and the initiation codon. They suggested that "increase in translational efficiency will strongly increase the variation in the expression of any naturally occurring gene". Low translation rates will lead to reduced fluctuations in protein concentration. Similar kind of experiment was performed by Gheysan et al. [48] on the expression of the cloned 'Sv40 small-t antigen' genes in bacteria. They altered the nucleotide sequence preceding the translational initiation codon.
Noise in the gene expression is harmful, as it mangles cell signals, corrupts circadian clocks [49], and disrupts the fine-tuned process of development. Various techniques like cell signaling pathways [50], developmental switches [51], and autoregulation [52], have evolved to minimize the disruptive effect of such fluctuations. Ambush hypothesis implies, if the reading frame of the ribosome is not zero; the earlier a stop terminates translation, the earlier mRNA and ribosomes are available for interacting correctly and it will also help in the reduction of off-frame noise. Variation in gene expression levels or noise strength shows a strong positive correlation with translational efficiency [47,48]. There are several experimental evidences of independent regulation of transcriptional units in E. coli [53], and the influence of ribosome-    (table 2). Graph is plotted separately for each species and merged in Origin-lab.   [54]. All experiments are indicative of manipulations towards 5' end of coding sequences.

ΔG of secondary structure
Computational evaluation of 5' end coding sequence parameters could be accomplished by calculating number of hidden stops in mt coding genes of primates and vertebrates. Analysis has been performed by dividing each coding gene in two parts. If ambush hypothesis is correct then more hidden stops should be in 5' portion of the gene, as costs of off-frame translation are presumably higher when frameshifts occur near 5' end of coding genes. In all primates (columns 17 and 18, Table 1), hidden stops are more frequent in first part, supporting the hypothesis. Similar analysis was performed in vertebrates mt coding sequences and found more hidden stops in first half of the coding genes in 820 vertebrates (data not shown), supports the selection of hidden stops near 5' portion of gene sequences in mt genomic sequences.

Frameshift mutations and their associations with disease and other biological aberrations
There are some specific disease based studies where the significance of this putative event of frameshift could be reflected based on frameshift mutations and their involvement in various diseases and other biological machineries. E. coli's association with chromosomal reference and mutational sites is a recent study where authors show that in a mismatch repair deficient background, a condition where the mutation rate reflects the fidelity of the DNA polymerization process, the frameshift mutation rate could vary up to four times among different chromosomal contexts. The results presented in this work show that even though frameshift mutations can be efficiently generated and/or repaired anywhere in the genome, these processes can be modulated by the chromosomal context that surrounds the mutation site [55]. Several other mutational studies on human DNA repair system coluld be of interest to the scientific community [56]. Studies for the frameshift mutation in release factor 2 (RF2) in several bacterial species were reported and found associated with important biosysnthesis of RF2 [57]. There are several cases of frameshift mutations in viruses. Study on bean yellow mosaic virus where frameshift mutations were forund associated with an overlapping gene of a viral protein resulted in ployprotein [58].
There are several examples where frameshift mutations were found associated with various human disorders. A study performed on Diamond-blackfan anemia, where authors reported frameshift mutation in p53 regulator RPL26 and its association with multiple physical abnormalities. Additionally they have found a specific preribosomal RNA processing defect which is associated with this putative event [59]. A study on thirteen patients collected from 8 families with a retinal dystrophy was performed to analyze to screen mutational effects. Authors reported that that mutations in RLBP1 may lead to FAP with cone dystrophy. They finally concluded that a homozygous frameshift mutation in LRAT causes Retinitis Punctata Albescens [60].
Pendred syndrome (PS) is an autosomal recessive disorder characterized by congenital bilateral sensorineural hearing loss, goiter, and incomplete iodide organification. Authors performed genetic investigation and revealed compound heterozygous mutations for p.R677AfsX11, a novel frameshift mutation, and p.H723R in the SLC26A4 gene in Korean population. These findings provide detailed information regarding the distribution of mutant alleles for PS and colud be helpful in future research [61]. Kim et al. [62] found an association of frameshift mutation of the gene SMARCC2 in gastric and colorectal cancers with microsatellite instability.
There are several other evidences where frameshift mutations, their direct or by-products found associated with several biological processes and diseases [63][64][65]. Study of these mutations and their functional and evolutionary consequences could provide insights into the proper management of biochemical processes associated with these diseases. Huge amount of biochemical energy could be saved by applying biotechnological techniques for these mutations in a positive directions and it is estimated that appropriate manipuation of frameshift mutations could help in the prevention of many diseases.

Conclusion
Mt genomic data analysis revealed that frameshift mutation mechanism has association with myriad biochemical processes of genomic context. Strong correlations were found between rRNA and hidden stops in mt genomic data. Vertebrate mitochondrial genome's AT richness is also corroborated by this mechanism. Therefore study of this mechanism will help molecular and evolutionary biologists, and biotechnologists to verify various aspects related to this evolutionary event and will provide new directions to the research in this area. Knowledge about this mechanism will also provide opportunities to discuss other evolutionary events and to associate them with this mechanism. It is hoped that this kind of studies would serve as a useful complement for analyzing hidden stop codons in all the lineages through their respective genetic code systems. Additionally it will help to manipulate the biological sequences through their natural biological phenomenon by applying various physiological, and biochemical parameters to analyze its impact on natural sequences and their future biological predictions. In summary, hidden stop codons plays an important role in the process of evolution and could help in increasing the efficiency of biosynthetic machinery, by manipulating them in a biochemical, or bioenergetic way.