Received Date: December 22, 2016; Accepted Date: January 09, 2017; Published Date: January 13, 2017
Citation: Mukherjee D, Diehl WJ (2017) Speciation Genomics of Protein-Coding Genes Common to Mycoplasmatales. J Phylogenetics Evol Biol 5:175. doi: 10.4172/2329-9002.1000175
Copyright: © 2017 Mukherjee D, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Phylogenetics & Evolutionary Biology
Identifying regions of a genome that evolve by natural selection, particularly as species diverge, has been a matter of considerable interest. The genomes of 12 species in the eubacterial order Mycoplasmatales were compared to test the hypothesis that natural selection targets genes by function and/or at given moments in the phylogenetic history of the species. These species possess some of the smallest genomes known, and analyses on the set of protein-coding genes common to all species in the study will shed light on the evolution of some of the most critical genes to living organisms. Genes that control cellular processes showed greater evidence of natural selection than genes of unknown function or genes associated with information processing and storage or metabolism. Moreover evidence of natural selection was only detected in the deepest branches of the Mycoplasmatales phylogeny, including one node where a host shift from land plants to insects likely occurred and another node where a host shift from land plants/insects to land vertebrates likely occurred. Many of the genes that showed the strongest evidence of natural selection (e.g. secA, secY, ftsH, ftsY, yidC, lepA, dnaK) encode proteins that are components of the Sec-dependent secretory pathway, which regulates the extracellular translocation of proteins. The Sec-dependent secretory pathway is proposed to play a role in speciation of Mycoplasmatales by altering the type and amount of secreted proteins, thereby affecting virulence of Mycoplasma sp. in response to infection of novel hosts.
Minimal genome; Mycoplasma; Speciation; Natural selection
The proliferation of sequenced genomes has permitted the evaluation of the role of natural selection at that level of organization. Studies have shown that some species, such as Drosophila melanogaster, show positive Darwinian selection in relatively large number of genes [1,2], whereas other species, such as Arabidopsis thaliana show very low levels of positive selection compared to purifying or negative selection [3,4]. By comparison Homo sapiens show intermediate levels of positive selection [5,6]. Such studies have in common an evaluation of selection within a single lineage but generally do not address selection that may be occurring as new species arise, that is at splits in their respective phylogenetic trees. A few studies have targeted selection as species diverge [7,8], but most of them are restricted to evaluating SNPs scattered throughout the genome [9,10] and not whole sequences of genes. Using this approach, one may infer whether selection is common or rare during speciation but not necessarily whether selection is associated with particular functional groups of genes, which in turn may inform hypotheses on the genetics of speciation.
The genus Mycoplasma (Order Mycoplasmatales, Domain Eubacteria) is a polyphyletic group  that comprises single cell, gram-positive-like, obligate parasites that may cause respiratory, urogenital and other diseases in vertebrates including humans [12,13]. They lack cell walls due to their inability to synthesize peptidoglycan, and they are considered to be the simplest form of self-replicating biological systems but are entirely dependent on the host cells for essential nutrients. Moreover they possess extremely small genomes (range: 524-1053 genes) , which make them ideal for studying natural selection at the genome level. As such, the set of protein-coding genes common to all Mycoplasmatales species should comprise a set of genes that is among the most functionally critical to living organisms and that has potential for the greatest consequence if selection acts commonly and consistently thereon.
The objectives of this study were to test the hypotheses (1) that natural selection has acted on the same sets of genes at different times in the Mycoplasmatales phylogeny, (2) that mutation saturation has not affected the pattern of selection acting on the genes in the Order Mycoplasmatales, and (3) that the likelihood of natural selection targeting a particular set of genes has depended on the function of the gene products. This approach has the potential to identify genes (or combinations of genes, gene complexes, or pathways) that may be involved directly or indirectly in speciation.
A Bayesian phylogenetic tree (Figure 1a) was constructed from the 16S ribosomal DNA (rDNA) sequences from 12 Mycoplasma species (Table 1), whose genomes had been sequenced and sufficiently annotated as of April 15, 2008. These sequences were obtained from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/) . The 16S rDNA sequences from Lactobacillus acidophilus and Escherichia coli were used as the out groups. The sequences were aligned with ClustalX [15,16], and MrBayes version 3.1  was used to construct the tree. A General Time Reversible (GTR) model with Gamma-distributed rates (GTR+γ) was determined to be the best fit model for the run by Modeltest version 3.7  and a likelihood ratio test was used for model selection. A total of 100,000 generations of Markov Chain Monte Carlo (MCMC) simulations were run with the first 2500 generations ignored as burnins, and the consensus tree was selected for further analyses. This tree was found to converge after the run, indicated by a final standard deviation of split frequency of 0.006444. Four nodes (A, B, C and D, Figure 1a) were chosen for selection analyses, since these nodes united two clades each with multiple species/variants. A more detailed phylogenetic tree showing the divergence of the Mycoplasma/ Ureaplasma group from the Spiroplasma and Mesoplasma/ Entomoplasma groups have been presented in the Figure 1b, with the divergence dates of the latter two groups indicated (extrapolated from Maniloff ).
|M. agalactiae PG2||CU179680||Sirand-Pugnet et al. |
|M. capricolum subsp capricolum ATCC 27343||CP000123||Craig Venter Institute |
|M. gallisepticum str R (low)||AE015450||Papazist et al. |
|M. genitalium G37||L43967||Fraser et al. |
|M. hyopneumoniae 232||AE017332||Minion et al. |
|M. hyopneumoniae 7448||AE017244||Vasconcelos et al. |
|M. hyopneumoniae J||AE017243||Vasconcelos et al. |
|M. mobile 163K||AE017308||Jaffe et al. |
|M. penetrans HF-2||BA000026||Sasaki et al. |
|M. mycoides subsp mycoides PG1||BX293980||Westberg et al. |
|M. pneumonia M129||U00089||Himmelreich et al. |
|M. pulmonis UAB CTIP||AL445566||Chambaud et al. |
|M. synoviae 53||AE017245||Vasconcelos et al. |
|U. parvum serovar 3 str. ATCC 700970||AF222894||Glass et al. |
Table 1: List of the complete genomes used in the study, obtained from the National Center for Biotechnology information (NCBI) database (http://www.ncbi.nlm.nih.gov/).
Figure 1a: Bayesian phylogeny from 16S ribosomal DNA (rDNA) sequences for 12 Mycoplasmatales species, using Lactobacillus acidophilus (L. acidophilus) and Escherichia coli (E. coli) as outgroups. The clade credibility values for each branch are given. Approximate divergence dates (extrapolated from Maniloff ), are indicated for each node. Nodes A, B, C, & D were chosen for tests of natural selection. The specific hosts for each species are indicated in parentheses [12,19-23].
Figure 1b: Mollicutes phylogenetic tree showing the relationships among Spiroplasma, Mesoplasma/Entomoplasma and Mycoplasma/ Ureaplasma groups [extrapolated from Maniloff  and incorporating phylogeny from the current study].
A database was constructed that contained the 221 protein coding gene sequences common to all of the 12 species of Mycoplasma used in the study (Table 1). Sequences and functions were obtained from National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/) database. The Kyoto Encyclopedia of Genes (KEGG) and Genomics (http://www.genome.jp/kegg/)  database was used to detect orthologs for a particular gene. Clusters of Orthologus Groups (COG)  were used to segregate these genes into functional groups, namely information storage and processing genes, cellular processes genes and metabolism genes plus a group of poorly characterized genes (Table 2).
|Functional Category||Number of Genes|
|Information Processing and Storage Genes||123|
|Cellular Processes Genes||26|
|Poorly Categorized Genes||21|
Table 2: Number of genes belonging to the different functional categories.
The software program package Data Analysis in Molecular Biology and Evolution (DAMBE)  was used to align the sequences. First, the nucleotide sequences were translated to amino acid sequences. Next, multiple sequence alignments were conducted using ClustalW [16,39] with gap open and extension penalties of 10 and 0.1, respectively and using BLOSUM as the amino acid substitution matrix. Finally, the original nucleotide sequences were realigned against the aligned amino acid sequences. This method is a codon-based approach of sequence alignment, which can prove to be useful in accounting for gaps in the middle of the sequences.
Two separate statistical procedures were followed for the analysis of selection patterns–a codon substitution models test  and the McDonald-Kreitman test . The ratio of non-synonymous (dN) to synonymous (dS) nucleotide substitution rates (dN/dS, designated by the letter ω)  forms the basis of both tests. A ratio of 1 indicates neutrality. In other words, the rate of fixation of a neutral amino acid mutation will be equivalent to that of a synonymous substitution. A ratio less than 1 indicates purifying selection and the substitution is eventually eliminated from the population. If the ratio is greater than 1, then positive Darwinian selection will retain the amino acid mutation in the population . Codon substitution models tests can be used to compare dN/dS ratios among branches on a phylogenetic tree, where a significant difference implies selection. The McDonald- Kreitman test effectively partitions dN/dS ratios into fixed differences between clades vs. polymorphism across clades, where a significant difference implies selection by either adaptive fixation or polymorphism excess. Because each test uses dN/dS ratios differently, the results are effectively independent.
Codon substitution models tests
The Codeml program of the software program package Phylogenetic Analysis using Maximum Parsimony (PAML) version 4.0  was used to conduct these analyses. First, each gene in the original dataset was analyzed by the free and one ratio models test . The free ratio model assumes different dN/dS ratios for the different branches of the phylogenetic tree, whereas the one ratio model assumes a single dN/dS value for the entire tree. This test effectively determines whether selection at a gene may be occurring somewhere in the phylogeny but it cannot determine where. Log-likelihood values for each of these two models were computed and twice the log-likelihood differences for each of the genes were compared to a χ2 distribution with α=0.000057 (the same α value that was used for the McDonald-Kreitman test after applying a Bonferroni correction-described later) and 24 degrees of freedom (25 branches analyzed under the free ratio model plus one under the one ratio model minus two) . Result from the Fisher’s Exact test (P=0.0035) [45,46] showed that the ratio of the genes showing evidence of selection to the ones showing neutrality among functional groups was significantly different.
To determine whether the dN/dS ratios of the different clades in the phylogeney were different from each other as well as from the background, two branch models tests (three vs. two branch ratio models test)  were used. The three branch ratio model assumes different dN/dS values for the two clades as well as the background. The two branch ratio model, on the other hand, assumes an equal dN/dS ratio for the two clades that are being compared, while the dN/dS value for the rest of the tree (the ‘background’) is assumed to be free. These two tests were applied only to the 153 genes that gave evidence of selection in response to the application of free and one ratio models tests. Four clades of the phylogenetic tree with sufficient species/variants to show sequence variation were analyzed this way. Log-likelihood values for these models were computed, and twice the log-likelihood differences were compared to a χ2 distribution with α=0.05 with 3 degrees of freedom (three clades tested under the three branch ratio model plus two under the two branch ratio model minus 2). A three way contingency table (4 functional categories × 4 clades × 2 possible selection results, namely selection or neutrality, Table 3) was constructed to summarize the results from the analyses using the various codon substitution models. Log-linear analysis (α=0.05)  was conducted on the data to determine whether selection pattern depended on gene function and/or on nodes of the phylogenetic tree (Table 4).
|A||Codon Substitution Models Tests (free- vs. one-ratio model)|
|Node||Information Processing and Sotrage Genes||Cellular Processes Genes||Metabolism Genes||Poorly Characterized Genes||Total|
|B||Codon Substitution Models Tests (three- vs. two-ratio model)|
|Node||Information Processing and Storage Genes||Cellular Processes Genes||Metabolism Genes||Poorly Characterized Genes||Total|
|Node||Information Processing and Sotrage Genes||Cellular Processes Genes||Metabolism Genes||Poorly Characterized Genes||Total|
Table 3: Contingency tables of genes showing natural selection (S) or neutrality (N) partitioned among functional groups and nodes in the Mycoplasmatales phylogeny. All genes showing selection were significant in respective tests at α=0.000057 after applying Bonferroni correction . Cells with ratios of genes showing selection to genes showing neutrality that are arbitrarily 2.5× greater than the respective total S/N ratio are enclosed in boxes; cells with S/N ratios that are 5× greater than the total S/N ratio are enclosed in double boxes. (A) Distribution of genes among functional groups for the entire phylogeny from the free vs. one ratio model of codon substitutional models tests . (B) Distribution of genes among functional groups and nodes from the three vs. two ratio model of codon substitutional models tests . (C) Distribution of genes among functional groups and nodes from McDonald-Kreitman tests .
|Three- vs. Two-ratio Codon Substitution Models Test|
|Selection State × Gene Function × Node (Phylogeny)||<0.0001***|
|Selection State × Gene Function (Effect of Node removed)||<0.0177*|
|Selection State × Node (Effect of Gene Function removed)||<0.0001***|
|B. McDonald-Kreitman Test|
|Selection State × Gene Functional × Node (Phylogeny)||<0.0001***|
|Selection State × Gene Function (Effect of Node removed)||<0.0012**|
|Selection State × Node (Effect of Gene Function removed)||<0.0001***|
Table 4: Log-linear analysis  of association among selection state, gene function, and node in the Mycoplasmatales phylogeny from 3- way contingency tables (Tables 3B and 3C) for (A) codon substitution models tests  and (B) McDonald-Kreitman tests .
The software program package DNA Sequence Polymorphism (DnaSP) version 4.20.2  was used to compute the neutrality indices  for each gene for each pair of clades tested in the phylogeny. Two-tailed Fisher’s exact tests  were conducted to assess the significances of the computed NIs, with an α value of 0.000057 (after applying a Bonferroni correction , α=[0.05/Number of genes studied, 221 × Number of clades analyzed 4]). These tests were conducted using the software program package DnaSP . Finally, a three way contingency table (4 functional categories × 4 clades × 2 McDonald-Kreitman test results, namely adaptive fixation or neutrality, since none of the genes exhibited polymorphism excess, Table 3) was constructed to summarize the outcomes from the McDonald-Kreitman tests. The results were tested by log-linear analysis (α=0.05)  to determine whether selection pattern varies depending on functional categories of the gene and/or clades of the phylogenetic tree (Table 4). For both the codon substitution models and the McDonald-Kreitman tests, the VassarStats Web Site for Statistical Computation (http://dogsbody.psych.mun.ca/VassarStats/abc.html; ) was used for conducting the log-linear analyses.
Detection of mutation saturation
Mutation saturation [51,52] can be detected by analyzing the frequency of the complex codons in a gene. Complex codons are a group of highly variable codons for which the pattern of non-synonymous and synonymous substitutions for fixed differences and polymorphisms among species sets cannot be inferred, as defined by Rozas et al. , and that subsequently cannot be analyzed by the package DnaSP. We assumed that the greater the degree to which a gene shows mutation saturation, the greater the number of complex codons that gene would possess. Thus mutation saturation can be analyzed by plotting number of complex codons as a function of the total number of codons in a gene for the all genes in the genome. Genes at each node were analyzed separately. Complete mutation saturation would be indicated if all codons were complex. Complete absence of mutation saturation would be indicated if no codons in a gene were complex.
The network of the protein-protein interactions of the cellular processes gene products from M. pneumonia M129 , based on their involvement in interconnected biochemical pathways, 173 was generated in the Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) database  of known and predicted protein-protein interactions. The software program Cytoscape version 2.8.2  was used to visualize the network.
An initial analysis of natural selection among all species in the study (free vs. one ratio codon substitution models test)  indicated that 153 of 221 genes showed preliminary evidence of natural selection (α=0.000057) after applying a Bonferroni correction  somewhere in the Mycoplasmatales phylogeny. There was a significant difference in the ratio of genes showing selection to genes showing neutrality among functional groups (Fisher exact test, P=0.0035 [45,46]) with 92% of cellular processes genes showing evidence of selection compared to 66% of genes showing selection for other groups combined (Table 3A). The significant results here justified subsequent node by node analyses. Genes that showed neutrality in this initial evaluation were assumed to be neutral in the all subsequent tests.
Nodes A-D (Figure 1a) was chosen for subsequent analyses because the entire respective sister clades nested within them contained multiple species or variants with sequenced genomes, hence the potential for genetic variation at every gene. Of the 153 genes showing evidence of selection in the initial test, 135 genes showed evidence of selection in subsequent three vs. two ratio codon substitution models tests  conducted node by node (Table 3B), and 90 genes showed evidence of adaptive fixation (similar to divergent selection) in McDonald-Kreitman tests  conducted node by node (Table 3C). In the latter tests, no genes showed a significant excess of polymorhisms.
Only 62 genes showed evidence of selection by both codon substitution models tests and McDonald-Kreitman tests. When selection state was partitioned among functional groups and phylogenetic nodes, log-linear analyses  showed a significant three-way interaction (P<0.0001) regardless of the selection test used (Table 4), indicating that the pattern of selection depended on both gene function and phylogeny. Moreover, partial interactions of selection state and functional group (P<0.017) and selection state and phylogenetic node (P<0.0001) were also significant (Table 4). Cellular processes genes showed greater evidence of selection (24-30%) than genes in all other functional categories combined (17-22%). Nodes A and B had a greater proportion of genes showing evidence of selection (29-38%) than nodes C and D (6-9%). Only 45 genes showed evidence of selection by both tests at either node A or node B, and 29% of these genes were in the cellular processes category despite accounting for only 12% of genes overall. Regardless of selection test, the greatest proportion of genes showing evidence of selection occurred in the cellular processes category at either node A or B (Table 3).
Figures 2a and 2b show the relationships between the total number of codons in a gene and the number of complex codons, indicating mutation saturation. As one would expect, the more codons that a gene possesses, the greater the number of complex codons that occur as well (Table 5). It is evident from the Figures 2a and 2b that in the course of evolution, the genomes of the Mycoplasmatales species have acquired some degree of mutation saturation, especially at the deeper nodes A and B, although in all cases the level is less than required to show complete saturation. But because nodes A and B showed the greatest evidence of selection, there is possibility that this pattern could have been caused by mutation saturation. If that were the case, one would expect that the majority of genes showing selection would occur above or below the regression line, which did not occur at any node. That is at all nodes, the genes showing natural selection showed no more tendencies toward mutation saturation than genes showing neutrality (Figure 2a). Further, if evolutionary rate varies with gene function, then mutation saturation could theoretically, albeit not necessarily, lead to an apparent excess of natural selection in functional groups with greater evolutionary rates. If this were the case, the slopes of the relationship between the number of complex codons and the total number of codons in a gene would be greater for functional groups showing excess natural selection. Not only is this not the case at any node, but also the slope of the aforementioned relationship for cellular processes genes is less than that for all functional groups combined at each node (Table 5).
Figure 2a: Relationship between the total number of codons and number of complex codons  (solid lines from linear regressions) for genes showing selection under the McDonald and Kreitman tests  (closed circles) and genes showing neutrality (open circles) at nodes A (a), B (b), C (c), and D (d) in the phylogeny of the Mycoplasmatales. Complete saturation would be indicated by the dashed lines.
Figure 2b: Relationship between the total number of codons and number of complex codons  (solid lines from linear regressions) for genes showing selection under the codon substitution models tests, Yang  (closed circles) and genes showing neutrality (open circles) at nodes A (a), B (b), C (c), and D (d) in the phylogeny of the Mycoplasmatales. Complete saturation would be indicated by the dashed lines.
|A||Information Processing and storage||0.5428||0.8605||<0.000*|
|All Functional Categories combined||0.4968||0.7836||<0.000*|
|B||Information Processing and storage||0.4781||0.8417||<0.000*|
|All Functional Categories combined||0.4357||0.767||<0.000*|
|C||Information Processing and storage||0.1489||0.7273||<0.000*|
|All Functional Categories combined||0.1281||0.5912||<0.000*|
|D||Information Processing and storage||0.1675||0.7319||<0.000*|
|All Functional Categories combined||0.158||0.6517||<0.000*|
Table 5: The r2 and p-values for the relationship between the total number of codons and the number of complex codons in a gene for all genes at each node. The p-values that were significant at α=0.0125 [after applying Bonferroni correction (0.05/4, the number of nodes tested)] are indicated with an asterisk (*).
Using M. pneumoniae M129  as a reference species, pathway analysis based on the protein-protein interaction patterns of the total set of proteins encoded by cellular process genes revealed some evidence of clustering within two subdivisions, namely within post-translational modification, protein turnover and chaperone proteins, and within inorganic ion transport and metabolism proteins (Figure 3). Proteins with the greatest number of interactions (secA, ffh, secY, dnaK and lepA) are distributed among three functional subdivisions but are all components or putative components of the Mycoplasmatales Sec-dependent secretory pathway . Moreover, a significantly greater proportion of genes encoding these proteins (7 of 8) showed natural selection at both nodes A and B compared to that for genes encoding proteins outside the Sec-dependent secretory pathway (6 of 18; Fisher Exact Test, P<0.05).
Figure 3: Interaction patterns among the proteins based on involvement in interconnected biochemical pathways encoded by cellular processes genes from M. pneumoniae M129 . Proteins showing evidence of natural selection at both Nodes A and B are indicated with yellow inner circles; otherwise inner circles are green. Proteins in the Sec-dependent secretory pathway are bounded by the red polygon. Proteins showing 7 or more interactions are indicated with asterisks. Post translational modification, protein turnover and chaperone proteins (blue outer ring): molecular chaperon DnaK (dnaK), ATP dependent heat shock protease Lon (lon), cell division protein FtsH (ftsH), heat shock protein GrpE (grpE), SSRA-binding protein (smpB), trigger factor (tig), o-sialoglycoprotein endopeptidase (gcp), molecular chaperone/heat shock protein DnaJ (dnaJ), thioredoxin reductase (trxB), glycoprotease family protein (MPN_291). Intracellular trafficking and secretion proteins (red outer ring): cell division protein FtsY (ftsY), preprotein translocase subunit SecA (secA), preprotein translocase subunit SecY (secY), inner membrane protein translocase YidC (oxaA), signal recognition particle protein/ GTPase (ffh). Cell wall/membrane biogenesis proteins (purple outer ring): prolipoprotein diacylglyceryl transferase (lgt), GTP-binding protein LepA (lepA), glucose-inhibited division protein B/ methyltransferase GidB (rsmG), signal peptidase II (lspA), s-adenosyl- methyltransferase (mraW). Inorganic ion transport and metabolism proteins (pink outer ring): cobalt transporter ATP binding subunit (cbi01), cobalt transporter ATP binding subunit (cbi02), cobalt ABC transporter permease protein (MPN_195). Cell cycle control, mitosis and meiosis proteins (green outer ring): chromosomal segregation protein SMC (p115), tRNA-uracil-5- carboxymethylaminomethyl modification enzyme (mnmG). Defense mechanisms protein (brown outer ring): ABC transporter ATP-binding and permease protein (MPN_571).
The first objective of this study was to test the hypotheses that natural selection has acted on the same sets of genes across different branches in the Mycoplasmatales phylogeny, with greater number of genes exhibiting selection at the deeper nodes of the phylogeny. The excess of cellular process genes showing evidence of selection at both nodes A and B indicates the possibility of an evolutionary process that was influenced by selection acting on a common set of genes as species, represented by contemporary clades, diverged early in Mycoplasmatales evolution. Failure to detect sufficient evidence of natural selection by both the statistical tests in more recent nodes C and D may indicate (1) that selection was acting on a suite of genes that is unique to a particular clade or species and that therefore was not evaluated here, (2) that selection was acting on regulatory rather than structural regions of the genome (because regulatory sequences do not encode proteins), the statistical methods used cannot assess the pattern of natural selection acting on them, (3) that evolutionary divergence was dominated by neutral processes, an unlikely scenario given evidence from nodes A & B, or (4) that there has not been enough time for non-synonymous substitutions to accumulate sufficiently to reveal evidence of natural selection, another unlikely since divergence at nodes C–D was preceded by a period of rapid genomic change in Mycoplasma groups about 190 Myr ago . Regardless, in spite of the inability to detect selection in recent nodes, the study proposes a common pattern of selection acting at more primitive nodes.
The genes showing evidence of natural selection were detected mostly in the deepest nodes of the Mycoplasmatales phylogeny. It can be speculated by evaluation of this region of the Mycoplasmatales phylogeny (Figure 1a) that early species divergence coincided with the origin of insects (396-407 Myr BP)  during the early Devonian (node A) and with the origin of land vertebrates (Amphibians) (368 Myr BP)  during the late Devonian (node B). This is consistent with the fact that the M. mycoides/capricolum clade comprises the Entoplasmatales group within which M. mycoides and M. capricolum are derived species  that have likely infected vertebrate hosts independently of that by other Mycoplasmatales and are thus taxonomically grouped in the genus Mycoplasma by convergence. The Entoplasmatales includes the basal group Spiroplasma , whose hosts are plants and insects . The data suggest that early in the evolution of Mycoplasmatales natural selection promoted speciation in response to novel environments associated with host shifts as plants, insects, and vertebrates’ successively colonized land and were infected by species of Mycoplasma. Speciation under such circumstances is not unexpected. For example, it has been reported that especially repeatrich parts of the genome of different lineages of the Irish potato famine pathogen Phytophthora infestans evolve through host jumps .
The second objective was to test the hypotheses that mutation saturation [51,52] has not affected by causing a false-positive pattern of selection acting on the genes in the order Mycoplasmatales. Mutation saturation is a neutral genetic phenomenon that can potentially cause a significant but spurious pattern of selection. The process occurs when a particular base mutates to a different one, then mutates back to the original state-for example, an A mutating to a T, then back to A, or when multiple mutations occur at a given site (an A mutating to G, then to T). In such a case, it is difficult to determine if a particular base underwent two successive mutations in its evolutionary history or none, confounding one’s ability to assess the number of non-synonymous and synonymous changes. If mutation saturation is more likely to occur or persist in one type of change than the other, then selection may be inappropriately implicated. The members of the order Mycoplasmatales have high rates of mutation , which is one of the primary driving forces behind their evolution, and thus mutation saturation is likely. Indeed our analyses showed that the Mycoplasma genomes have acquired some mutation saturation, especially at the deeper nodes. However at each node, the distribution of genes showing selection vs. genes showing neutrality are essentially the same with regard to mutation saturation, indicating that mutation saturation is not responsible for the patterns of selection observed. Further, it is not at all likely that mutation saturation could produce a pattern of selection that varies with gene function as is seen in this study, since excess mutation saturation, where it existed, tended to occur in functional groups not showing high levels of natural selection.
The final objective was to test the hypotheses that the likelihood of natural selection targeting a particular set of genes in the order Mycoplasmatales has depended on the function of the gene products, which has finally influenced speciation. In this study, natural selection has been shown to target cellular processes genes in general and Sec-dependent secretory pathway genes in particular. The Sec-dependent secretory pathway is ubiquitous to living organisms  and in the Mycoplasmatales functions in extracellular protein transport , including the export of proteins affecting virulence–a scenario that was hypothesized to play a role in the evolution of the Phytoplasmas, the plant pathogenic group within the class Mollicutes . It has been indicated that genes for protein export should be part of the predicted minimal genome in bacteria . We hypothesize that natural selection, acting on genes encoding proteins of the Sec-dependent secretion pathway, has caused speciation by altering the type and amount of secreted proteins, thereby affecting virulence of the Mycoplasmatales in response to infection of novel hosts.
The current approach to identifying genes that may play a role in a previous speciation event is very conservative as the final set of genes that has passed through 6 filters (3 natural selection tests, 2 selection-by- function interaction tests, and 1 association by specific common function analysis). It provides a mechanism for identifying the signature of an original selection event that may have been buried in subsequent accumulated genetic variation and sweeps that fixed differences in many other genes.