alexa The GC Content of Bacterial Genomes | Open Access Journals
ISSN: 2329-9002
Journal of Phylogenetics & Evolutionary Biology
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

The GC Content of Bacterial Genomes

Luciano Brocchieri*
Department of Molecular Genetics & Microbiology and Genetics Institute, University of Florida, Gainesville FL, USA
Corresponding Author : Luciano Brocchieri
Department of Molecular Genetics & Microbiology and Genetics Institute
University of Florida, Gainesville FL, USA
Tel: +1 352 273 8131
E-mail: [email protected]
Received March 30, 2014; Accepted March 31, 2014; Published April 10, 2013
Citation: Brocchieri L (2014) The GC Content of Bacterial Genomes. J Phylogen Evolution Biol 2:e108. doi:10.4172/2329-9002.1000e108
Copyright: © 2014 Brocchieri L. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google

Visit for more related articles at Journal of Phylogenetics & Evolutionary Biology

Bacterial genomes exhibit a wide range of compositional diversity, most spectacularly represented by variation in genome GC content, which varies in different organisms from as low as 17% to as high as 75%. The nature of the biological processes underlying these differences has been long debated and two polarizing interpretations have been advanced, one proposing that GC content is driven by genome-specific mutational biases (the mutational hypothesis), and one that it reflects different selective processes in different organisms (the selectionist hypothesis). The hypothesis that differences in GC content are mostly driven by species-specific mutational biases [1] implies that smaller variation in GC content across genomes should be seen at positions that are most constrained by any form of purifying selection, and conversely that greatest variation should be observed in positions that are functionally neutral. Differences in GC content among prokaryotic genomes largely reflect on, or are driven by the GC content of protein coding sequences, which usually occupy the majority of the genome. When considering separately the GC content at the three codon positions of genes (GC1, GC2, GC3), typical patterns are observed Figure 1-A. The GC content of all positions varies roughly linearly with the overall content of the genes, but variations in the first two codon positions, and especially in the second codon position, are much reduced compared to the variability observed in third codon positions, where the GC content spans across species almost all possible values from close to GC3 = 0.0 to almost GC3 = 1.0. These differences in variability are consistent with expected constraints imposed by the relation between codons and amino acids [2], with first and second codon positions mostly determining the amino acid type (and second codon position mostly determining the physico-chemical properties of the amino acid), and third codon position being mostly either synonymous, or encoding amino acids with similar properties. It is interesting to observe that the GC content of genomic intergenic regions closely correlates with the GC content of the coding sequences Figure 1 panel B, and it varies across genomes approximately to the same extent as it does in coding regions, and thus much less than in third codon positions.
A simple toy model relating mutational bias to codon compositional substitutability can be advocated to explain the overall contrasts and variability in GC content observed between codon positions of different genomes. In this model, coding regions are represented as sequences formed from a two-letter alphabet {S, W} in which bases are identified either as Strong (S = G or C) or as Weak (W = A or T). Each sequence position is assumed either to evolve freely by substitution between S and W states, or to be constrained by purifying selection either in state S or in state W. Each of the three codon-base-positions i (i = 1, 2, or 3) will be then characterized by a codon-position-specific fraction of sites constrained to be of type S, a fractions of sites constrained to be of base-type W, and a fraction of sites freely variable. The idea that a position is either variable or constrained independently of the state of the neighboring positions is an obvious simplification, but we assume that the approximation is sufficient to capture the compositional properties of codons we are interested in. Another assumption is that the fractions of constrained and variable sites do not depend on the genome, i.e., all genomes have identical frequencies of constrained and variable sites. This assumption may be violated more significantly, for example, in genomes that deviate more strongly from the linear relations of GC contents Figure 1-A, such as AT-rich small genomes of very reduced gene content. We finally assume that, within a genome, sequences evolve under the pressure of a homogeneous mutational process characteristic of each genome, defined by two substitution rates, one for substitutions and one for substitutions . The equilibrium frequencies corresponding to this substitution model are for nucleotide type S, and for nucleotide type W. Since we assume that any mutation occurring at a constrained site is removed by purifying selection, the mutational process results in substitutions only at variable positions, thus affecting only the fraction i of codon-base-positions i. At equilibrium, the GC content at codon-base-position i will be:
and the total GC content of the coding sequence will be the average of the GC content at each codon-base-position:
From these relations, the GC content at each of the three codonbase- positions can be expressed as a linear function of the total GC content S:
where and are the fractions of S-constrained and variable sites in coding regions, respectively. From the observed relations between Si and S , we can infer the fractions of variable and constrained sites in the three codon-base positions, the equilibrium frequencies at variable sites, and the rates of substitution in different genomes. From the distribution among genomes of GC content in third codon position, spanning almost all possible values between 0.0 and 1.0, we deduce that the fraction of variable sites in third codon position is close to . The fractions of S, W, and V sites at all three codon positions can be estimated Table 1 from the equations above and from the coefficients of the linear regressions obtained from the data (Figure 1-A). A similar model can also be applied to intergenic regions, suggesting that these regions harbor about 4-5% more W-constrained positions than coding regions, including, e.g., AT-rich promoters, and a fraction of variable sites similar to the overall fraction estimated for coding regions Table 1, thus, much less than in third codon positions. The model also predicts that the highest possible GC content of genomic coding regions is 75.7%, consistently with observations, and the lowest is 20.0%. The existence of genomes with coding regions of GC content lower than 20% can be explained assuming that these genomes have evolved different fractions of variable and constrained regions. This is not an unrealistic assumption, since genomes with lowest GC content are also very reduced in size and in number of genes [3].
The ratio R of mutational rates, , in coding regions of different GC content, S, can be derived as:
This relation between mutation-rate ratio and gene GC content (Figure 2) suggests that in genes of the lowest GC content the mutational rate towards AT is orders of magnitude higher than the rate towards GC. The very biased rate of mutation towards AT predicted for AT-rich coding regions is consistent with experimental analyses of mutational rates in repair-deficient constructs of Salmonella typhimurium [3] and with the deficiency of repair enzymes observed in AT-rich intracellular parasites and endosymbionts of reduced genome size. Conversely, the model predicts higher mutational rates towards GC bases in coding regions of the highest GC content (GC = 0.757), in which only mutations are predicted to occur and . However, evidence that this is not the case has been recently provided by the works of Hershberg and Petrov [4] and of Hildebrand and co-workers [5]. Hershberg and Petrov [4] analyzed mutations in five clonal pathogens spanning a wide range of GC content and with no evidence of deficiencies in repair systems, and found that mutations were universally biased towards AT even in bacteria of high GC content, concluding that mutations are universally biased towards AT independently of GC content and that high level of GC content must be maintained by selection (or by selection-like processes). Similarly, Hildebrand and co-workers [5] examined mutations at 4-fold degenerate codon positions in a dataset of 149 phylogenetically diverse species, and also found a large excess of synonymous mutations over GCmutations in all but the most AT-rich bacteria. These data strongly suggest that variations of GC content across prokaryotic genomes are determined by selection or selection-like process, with weakest constraints against the prevailing mutational bias observed in parasitic bacteria evolving under relaxed-selection conditions [6,7]. Since compositional biases extend to intergenic regions, they seem not to be related to codon usage. Furthermore, Hildebrand and co-workers (2010) observe that “optimal” codons as used in genes that are highly expressed and hence supposedly under more intense selection, are generally more AT-rich than the average gene within the same genomes, and thus selection on codon usage cannot explain the bias in GC content observed of synonymous codon positions. Rocha and Feil [8] review several theories on environmental factors selecting for optimal genome-wide GC content in prokaryotes, frustrated by mediocre-at-best correlation of GC content with environmental variables [9-15]. Nevertheless, it is intriguing that GC content at third codon positions seems to vary balancing the constraints acting on the first two codon positions in such a way that the overall GC content of coding regions closely reflects the GC content in intergenic regions. To further investigate the possible balancing role of GC3, we identified within individual genes sequence segments with significant compositional contrasts between codon positions, and compared GC content at these positions with the GC content of regions with non-significant contrasts. Not surprisingly, we found that within the same genome, non-contrasted regions have a reduced GC bias at third codon positions compared to contrasted regions Figure 3-A. However, we also found that the two regions maintained a very similar overall GC content Figure 3-B, suggesting that indeed GC3 usage played a role in stabilizing the GC content of coding regions with variable constraints on GC usage at non-synonymous positions.
Acknowledgements
This work is supported by NIH Grant 5R01GM87485-2.
References
 

Figures at a glance

image   image   image
Figure 1   Figure 2   Figure 3
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Article Usage

  • Total views: 14714
  • [From(publication date):
    June-2014 - Oct 22, 2017]
  • Breakdown by view type
  • HTML page views : 10653
  • PDF downloads :4061
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords