Received date: November 29, 2016; Accepted date: December 22, 2016; Published date: December 30, 2016
Citation: Xu J, Li H, Zhou GY, Liu JA (2016) When Do We Call Genetically Distinct Strains Different Species? - A Cautionary Case Study of the Colletotrichum gloesporioides Species Complex. Fungal Genom Biol 6:146. doi:10.4172/2165-8056.1000146
Copyright: © 2016 Xu J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Fungal Genomics & Biology
Different criteria have been used to define fungal species. Due to the increasing availability of DNA sequence information, the phylogenetic species concept (PSC) has become popular among fungal taxonomists. However, controversies remain about the criteria used to define the “phylogenetic distinctiveness” between closely related taxa and what other criteria are needed in order to derive more robust species. In a recent study, we reported the DNA sequences of four gene loci for each of 199 strains of the fungal pathogen Colletotrichum gloeosporioides species complex (CGSC) causing leaf anthracnose on the tea-oil tree Camellia oleifera in southern China. Our combined sequence analyses clustered 194 of the 199 isolates into four of the 22 previously described species within CGSC: Colletotrichum fructicola (167 isolates), Colletotrichum siamense (19 isolates), Colletotrichum gloeosporioides (6 isolates) and Colletotrichum camelliae (2 isolates). The remaining five isolates clustered distinctly from all 22 known species. Using the data, here we further investigate the extent of reproductive isolation among these species in our samples. Our analyses revealed extensive allele - sharing among three of the “phylogenetic species” and between them and the five unassigned isolates. Furthermore, there was extensive evidence for prevalent recombination among three of the four species. Our results suggest that caution should be taken in naming new species and highlight the importance of large-scale population genetic studies in helping define reproductive isolation and species boundaries.
Species concepts; Gene genealogy; Phylogenetic incompatibility; Recombination.
It’s generally agreed that a species should represent a distinct evolutionary entity. However, how to define such entities remains controversial. For fungi, early studies relied on microscopic and macroscopic morphological features to define and identify species. However, subsequent research identified that many genetically divergent species couldn’t be distinguished based on morphological features alone. As a result, polyphasic systems were introduced to include information on many different types of traits into classification [1,2]. These traits include the utilization profiles of carbon and nitrogen sources, the metabolites that they produce (chemotyping), and DNADNA hybridization/melting curve analyses, to complement the structural and morphological features. In addition, for fungi capable of mating and sexual reproduction, mating tests against standard testers of known mating types are also commonly used. However, many fungi can’t mate in artificial laboratory conditions or it’s difficult to observe their mating in nature . Thus, while the biological species concept is the dominant species concept for animals and plants, it has not been used broadly to define fungal species. As a result, the evolutionary or phylogenetic species concept (PSC) based on DNA sequences has attracted increasing attention among fungal biologists [4,5].
Broadly speaking, the PSC states that a species should represent a distinct evolutionary entity containing its own unique phylogenetic signal(s) not shared with other closely related entities. In practical terms to identify such phylogenetic species, DNA sequences from multiple genetic loci are typically analyzed. Groups of strains that are always clustered together at the analyzed loci but show differences from other groups of strains would constitute a distinct phylogenetic species. However, it has been very difficult to define and generalize the number of phylogenetic signals (e.g. unique nucleotides at specific base positions in a gene) required to separate strains into different species. Indeed, different groups of fungal species have been defined based on different degrees of sequence divergence, creating abundant confusions and ambiguities among fungal biologists .
To reduce ambiguity and avoid confusion, one broadly agreed criterion to separate closely related species is to examine whether genealogies from multiple genes consistently separate them from each other, as originally proposed  and introduced for fungi later . In this approach, if the different gene trees share the same branching node for a group of isolates, all the isolates descending from this node would constitute a phylogenetic species. Any conflict among gene trees is interpreted as evidence of genetic exchange and recombination among individuals within a species. The transition from genealogy concordance to conflict determines the boundaries between species.
Practical issues with phylogenetic species recognition
While the overall principle for the phylogenetic species concept is straight forward, its application has been problematic. For several reasons, identifying the true transition from phylogenetic concordance to conflict may be complicated when dealing with natural samples. First, the lack of evidence for phylogenetic conflict (i.e., the presence of evidence for phylogenetic concordance) only means that there was no statistical evidence showing conflict in the given dataset involving the specific strains and set of genes. The inclusion of other genes and/ or other samples in the analyses may result in phylogenetic conflict between the analyzed “species”. In many fungi, asexual propagation is the main mode of reproduction and sexual recombination may be very rare [7,8]. As a result, evidence for sexual recombination (i.e., phylogenetic conflict) may not be evident when only a small sample is analyzed using a limited number of gene sequences. Second, in the case of genealogical conflicts being observed due to recombination, if the conflicts were due to very short branch lengths separating groups of species and strains, the null hypothesis of concordant genealogies may be difficult to reject using phylogenetic methods . Indeed, most fungal systematic studies have used conserved protein-coding genes where purifying selection is prevalent and the closely related species may have no to very few mutations separating them . As a result, phylogenetic approaches will be ineffective at rejecting the null hypothesis of congruent genealogies. Third, traditionally, most taxonomic studies use relatively few isolates from diverse geographic and ecological niches as representatives [10,11]. However, due to both the small sample size and the potential inability of these strains to mate each other in nature, the signals of recombination and phylogenetic transition from genealogical conflict to concordance may not be found in such samples. Finally, when genetic isolation failed to fix formerly polymorphic loci into different species, incomplete lineage sorting may cause genealogy conflicts and lead to underestimate of species .
Below we describe how the different samples and/or increased sample sizes can have an impact on the interpretations of species boundaries and species recognition among several species within the fungal phytopathogen Colletotrichum gloeosporioides species complex (CGSC).
Our case study organisms and issues with current taxonomy
The ascomycete genus Colletotrichum has had a complicated taxonomic history and currently is recognized as containing over 60 species [11-15]. The genus is distributed throughout the tropical, subtropical and temperate regions and causes some of the most devastating diseases in plants including major groups of agricultural crops such as cereals, vegetables and fruits, in both pre- and postharvest stages [11,12,16,17]. Among the species in this genus, 22 belong to CGSC and they can infect most plant organs, including stems, leaves, flowers and fruits and cause diseases in many vegetables and fruits, including peppers, cocoa, oranges, apples, bananas, mangos, ramie, mulberry, pistachio, persimmon, strawberries and tea oil trees [17- 26]. The most common disease is anthracnose that can result in 30- 50% of crop failure [16-27]. Thus, a clear understanding of the species boundaries would have a significant impact on the diagnosis of these disease agents and the control and prevention of the diseases.
Until recently, CGSC was regarded as one morphological species, C. gloeosporioides [12-15]. Recent molecular phylogenetic analyses based on sequences from five loci suggested that the C. gloeosporioides species complex contain at least 22 phylogenetic species . The five loci encode the actin (ACT), the internal transcribed spacers (ITS) of the nuclear ribosomal RNA gene cluster, calmodulin (CL), glutamine synthetase (GS) and glyceraldehyde-3-phosphate dehydrogenase (GD). In this study , a total of 156 isolates were sequenced at these loci and these isolates represented a broad range of genetic, geographic, and host plant diversity. The number of isolates analyzed for each species varied widely, from 1 to 22 isolates, with a mean of 7 isolates per species.
However, similar to the issues mentioned above for identifying phylogenetic species for fungi in general, there were several issues with the current species delineations proposed for CGSC . First, most of these species were distinguishable based only on the combined gene sequences from the five loci, making the species identification difficult for practitioners to follow. Indeed, allele-sharing was common among many of the phylogenetic species at one or more loci. Second, though the species phylogeny based on combined sequences had high bootstrap support, it was not uncommon to see individual gene trees being different from each other and from the species tree constructed based on the concatenated sequences of five genes. Third, though the sample size for each species is similar to or slightly larger than most fungal taxonomy/systematics studies, most strains from within the same species were from diverse geographic and/or ecological niches. While this is a common and recommended practice for taxonomic and systematic studies, such an approach could lead to missed opportunities for detecting genetic exchange and recombination that might have occurred within individual geographic and/or ecological populations. Because of their importance in agricultural and forestry, correctly defining and efficiently diagnosing species and strains of the CGSC could have significant practical implications for disease management. In addition, such information is also very important to better understand the pathogenesis and evolution of these pathogens [7,8,28].
Implications of our recent population genetic data on CGSC taxonomy
In the recent population genetic study of the C. gloeosporioides species complex (CGSC) from the tea-oil tree species Camellia oleifera, we analyzed 199 strains from 15 geographic populations in southern China  using sequences from the same four DNA fragments (except ACT1) as those analyzed by Weir et al. . Even though our recent study was focused on the patterns of population genetic variations within the dominant species C. fructicola (containing 167 of the 199 isolates), the availability of information from several other species and with all samples from the same host plant species in a defined geographic region enable us to further examine the species boundaries within CGSC.
The endemic native tea-oil tree C. oleifera is an important economic plant species broadly distributed in southern China, contributing significantly to not only the economic welfare of the people in this region but also to environmental protection by limiting soil erosion [30,31]. Camellia oleifera is one of the major host plants of CGSC in southern China and several recent estimates suggested that up to 40% of C. oleifera yield were lost in some plantations due to CGSC anthracnose [18,27]. Strains of CGSC can infect the fruits, buds, and leaves of C. oleifera plants, resulting in premature leaf and fruit rot and drop as well as wilting of the leaves and buds. Anthracnose can occur from early April to late October, peaking in August [18,27,29].
Using the same species recognition approach and the reference data from the Weir et al. study , our recent analyses based on concatenated sequences unambiguously clustered 194 of the 199 isolates from C. oleifera in southern China into four of the 22 previously described species within CGSC: Colletotrichum fructicola (167 isolates), Colletotrichum siamense (19 isolates), Colletotrichum gloeosporioides (6 isolates) and Colletotrichum camelliae (2 isolates). The remaining five isolates clustered distinctly from all 22 known species. Interestingly, these five isolates shared alleles at one or more loci with isolates belonging to three of the four known species [29-31]. Here we further examined the allelic relationships among the 199 strains at the four loci to determine the extent of genetic exchange and recombination in this species complex from tea-oil tree plantations in southern China.
Two complementary methods are used to determine the allelic relationships among loci in our samples: the Index of Association (IA) that represents multilocus linkage equilibrium  and phylogenetic incompatibility . These two tests have different null hypotheses. The null model for IA is random recombination while that for phylogenetic incompatibility is strict asexual reproduction and clonality. The presence of linkage equilibrium and phylogenetic incompatibility suggests evidence of recombination in the sample. The presence of recombination among strains in a population is the hallmark that these strains belonged to the same biological species. The underlying principles, methods of calculations and interpretations of the results for both tests are described in the Multilocus program manual .
To identify evidence of recombination within and among the species within CGSC from southern China, we focused on the clonecorrected samples and eliminated the redundant genotypes. Because of sample size limitations, for within-species analyses, only samples from two “phylogenetic species” (C. fructicola and C. siamense) within CGSC were examined for evidence of recombination. For C. fructicola, both the total clone-corrected sample as well as individual geographic subpopulations failed to reject the null hypothesis of random mating (Table 1). Similarly, phylogenetic incompatibility test also indicated that variable proportions of loci were phylogenetically incompatible (Table 1). These results are consistent with different levels of recent or ongoing sexual recombination within the geographic populations of C. fructicola. For C. siamense, though the null hypothesis for random recombination was rejected, phylogenetic incompatibility tests provided robust evidence of recombination in the total sample as well as the Hunan sample, and Hunan Tianjiling plantation sample (Table 1). Other samples with small population sizes were not tested for allelic associations.
At the between “phylogenetic species” level, genotypic comparisons among isolates identified several pieces of evidence for recombination/ hybridization among three of the four phylogenetic species within the CGSC samples in our study (Tables 1-3). First, we found allele sharing at three of the four loci in different isolates between C. fructicola and C. siamense, the two most frequently isolated species in our samples (Table 2). Second, our allelic association tests revealed additional phylogenetic incompatibilities when samples of C. fructicola and C. siamense were analyzed together (Tables 1 and 3). Indeed, when these two species were analyzed together, all pairs of loci were phylogenetically incompatible (Table 1). The additional phylogenetic incompatibilities were found in both the total sample of the two species as well as in the Hunan sample of the two species (Table 1). Specifically, the two pairs of loci (ITS vs. GS and CL vs. GS) that showed phylogenetic compatibility (i.e., no evidence of recombination) within either species are phylogenetically incompatible in the combined dataset (Tables 1 and 3). Third, upon further genotype/allele comparisons, the five genetically novel and phylogenetically uncertain isolates (GXNN1, CQXS4, JXGS-B11, JXGS-B12 and JXGS-A28) identified in the recent study  could almost all be explained by recombination/hybridization among three of the four known phylogenetic species isolated from the tea-oil tree plantations (Table 2). Indeed, only one locus at each of two isolates (locus GS in isolate GXNN1 and locus GD in isolate JXGS-B12) contained unique alleles not found in other existing “species”. Taken together, among the four known phylogenetic species isolated and analyzed here, only C. gloeosporioides sensu stricto showed no evidence of allele-sharing with other three phylogenetic species, consistent with C. gloeosporioides sensu stricto as a species distinctly different from the others isolated here.
|c||C. fructicola||C. siamense||Total Samples|
|Sample size||PrPC1||IA2||Sample size||PrPC||IA||Sample size||PrPC||IA|
|Â Â Â JXGS||25||0.83||0.147||1||-||-||30||0.50||0.512**|
|Â Â Â Â Â Â JXGSA||19||0.83||0.136||1||-||-||21||0.83||0.126|
|Â Â Â JXCB||24||0.83||0.067||1||-||-||26||0.83||0.418**|
|Â Â Â HNTJL||12||1.0||0.030||8||1.0||0.185||20||0.50||0.877**|
|Â Â Â HNMJH||11||0.83||0.225||4||-||-||15||0.83||0.668**|
|Â Â Â HNLY||27||0.83||0.040||0||-||-||27||0.83||0.040|
|Â Â Â HNHH||15||0.83||0.006||2||-||-||18||0.83||0.972**|
|Â Â Â HNCD||4||-3||-||1||-||-||10||1.00||0.645**|
Table 1: Allelic associations within and among samples of the C. gloeosporioides species complex infecting leaves of Camellia oleifera trees in Southern China (Modified from Li et al. ).
|C. fructicola||C. siamense||C. camelliae||C. gloeosporioides|
|C. siamense||Allele sharing at ITS, GS, and GD loci|
|C. camelliae||Allele sharing for one strain at ITS||No allele sharing|
|C. gloeosporioides||No allele sharing||No allele sharing||No allele sharing|
|Strain GXNN1||Allele sharing at ITS and CL loci||Allele sharing at GD locus||No allele sharing||No allele sharing|
|Strain CQXS4||Allele sharing at ITS and GD loci||Allele sharing at ITS and GD loci||Allele sharing at CL and GS loci||No allele sharing|
|Strain JXGS-B11||Allele sharing at ITS, GS and GD loci||Allele sharing at ITS, GS, and GD loci||Allele sharing at CL locus||No allele sharing|
|Strain JXGS-B12||No allele sharing||No allele sharing||Allele sharing at ITS, CL and GS loci||No allele sharing|
|Strain JXGS-A28||Allele sharing at GS and GD loci||Allele sharing at GD locus||Allele sharing at ITS and CL loci||No allele sharing|
Table 2: Patterns of allele-sharing among the phylogenetic species of the C. gloeosporioides species complex from tea-oil tree leaves in Southern China. Five strains with uncertain phylogenetic placements are included to illustrate their allelic relationships with genotypes of known species.
|ITS (allele 4)||ITS (allele 7)||ITS (allele 10)|
|GS (allele 14)||1||1||68|
|GS (allele 20)||2||1||5|
|CL (allele 3)||CL (allele 6)||CL (allele 7)|
|GS (allele 13)||26||1||0|
|GS (allele 17)||1||1||1|
|GS (allele 20)||2||0||5|
Table 3: Allelic combinations showing evidence of recombination/hybridization between two phylogenetic species Colletotrichum fructicola and Colletotrichum siamense obtained from tea oil plantations in southern China. All six possible allelic combinations were found between (a) alleles 4, 7 and 10 of the ITS and alleles 14 and 20 of locus GS. Before samples of the two phylogenetic species were combined, these two loci were phylogenetically compatible. Similarly, loci pair CL and GS were phylogenetically compatible within each of the two species. However, after the two species were combined in the analyses, the two loci were phylogenetically incompatible (b). The allelic designations correspond to those in Supplementary Table 2 in Li et al. . The number of strains with each allelic combination is presented in corresponding boxes.
There are two possibilities to explain the observed phylogenetic incompatibilities between species. In the first, the isolates grouped into three of the previously identified phylogenetic species (C. fructicola, C. siamense and C. camelliae) as well as the five “unclassified isolates” in fact all belonged to one biological species that are currently recombining in nature. In the second possibility, the observed phylogenetic incompatibilities among the species were due to incomplete lineage sorting during the speciation of these species from a common ancestor. However, if the second hypothesis was correct and that speciation among C. fructicola, C. siamense and C. camelliae were completed (i.e., there is complete reproductive isolation in nature among them), sexual reproduction within each of the species should have generated/ maintained phylogenetic incompatibility for all loci pairs within each of the “species”. In addition, ancient phylogenetic incompatibilities before the speciation should be lost due to mutation and the generation of new alleles. The facts that phylogenetic compatibilities were found within each of the phylogenetic species but absent when samples from multiple phylogenetic species were analyzed together strongly suggest that the second hypothesis is not supported.
Using samples from infected tea-oil tree leaves in southern China, this study revealed incomplete reproductive isolation in nature among several phylogenetic species recently identified within CGSC. Of the four phylogenetic species identified in our sample of 199 isolates, only one species C. gloeosporioides (6 isolates) was found to not share allele or not create new phylogenetic incompatibility with the other three species at the four analyzed loci. In contrast, allele sharing and evidence for recombination were found for samples belonging to other three phylogenetic species C. fructicola, C. siamense and C. camelliae. Furthermore, this two-species concept for our strains also eliminated the uncertain taxonomic status of five strains (GXNN1, CQXS4, JXGS-B11, JXGS-B12, and JXGS-A28), allowing us to assign all 199 strains to the two biological species. As shown above, the most significant evidence for the two species hypothesis was the finding of additional phylogenetic incompatibilities when our samples from two or all three different “phylogenetic species” (C. fructicola, C. siamense and C. camelliae) were analyzed together (Tables 1 and 2). Indeed, evidence for recombination was found in all pairs of loci in this new biological species and that the five strains not clustered with known species could be explained as recombinants of other observed genotypes among the three “species” in our sample (Table 1).
Previous studies have noted that allele - sharing is common among many of the proposed phylogenetic species and that concatenated sequences at multiple loci are often needed to separate the phylogenetic species within CGSC . For example, no single locus was found capable of discriminating all the 22 phylogenetic species within CGSC due to allele sharing or limited sequence divergence. These results are also consistent with incomplete reproductive isolation among many of the proposed phylogenetic species in nature. However, due to the sampling problems mentioned above, it was difficult to track phylogenetic incompatibilities. Our results highlight the importance of using allelic association tests of large samples to critically evaluate these and other proposed phylogenetic species within CGSC. More broadly, our results suggest significant caution should be taken when describing closely related genotypes as distinct phylogenetic species. We suggest that stringent population genetic tests should be performed based on data from large population samples at multiple loci in order to determine species boundaries. The approach based on allelic association tests as shown here can be broadly applied for delineating species in other fungal species complexes.
The research in our labs was supported by grants from National Science Foundation of China (grant numbers: 31570641), the Natural Sciences and Engineering Research Council (NSERC) of Canada, and the Lotus Scholar Visiting Professorship program of Hunan Province. We thank Liao Yu for helping with sample collections.