Phylogenetic Model Choice: Justifying a Species Tree or Concatenation Analysis

There are two paradigms for the phylogenetic analysis of multi-locus sequence data: one which forces all genes to share the same underlying history, and another that allows genes to follow idiosyncratic patterns of descent from ancestral alleles. The first of these approaches (concatenation) is clearly a simplified model of the actual process of genome evolution while the second (species-tree methods) may be overly complex for histories characterized by long divergence times between cladogenesis. Rather than making an a priori determination concerning which of these phylogenetic models to apply to our data, we seek to provide a framework for choosing between concatenation and species-tree methods that treat genes as independently evolving lineages. We demonstrate that parametric bootstrapping can be used to assess the extent to which genealogical incongruence across loci can be attributed to phylogenetic estimation error, and demonstrate the application of our approach using an empirical dataset from 10 species of the Natricine snake sub-family. Since our data exhibit incongruence across loci that are clearly caused by a mixture of coalescent stochasticity and phyogenetic estimation error, we also develop an approach for choosing among species tree estimation methods that take gene trees as input and those that simultaneously estimate gene trees and species trees. *Corresponding author: Bryan C. Carstens, Evolution, Ecology and Organismal Biology, The Ohio State Univeristy, 318 W. 12th Ave., Columbus, OH 43210, USA, Tel: 614-292-6587; E-mail: carstens.12@osu.edu Received May 02, 2013; Accepted July 24, 2013; Published July 29, 2013 Citation: McVay JD, Carstens BC (2013) Phylogenetic Model Choice: Justifying a Species Tree or Concatenation Analysis. J Phylogen Evolution Biol 1: 114. doi:10.4172/2329-9002.1000114 Copyright: © 2013 McVay JD, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Introduction There are two primary paradigms for estimating phylogeny from multi-locus sequence data [1]. The conventional method, which developed from arguments in favor of total evidence [2], estimates phylogeny by concatenating data across multiple genes collected from exemplar samples. In this approach, the data are treated as a single locus, and essentially the estimate of genealogy from each locus is averaged across genes. Underlying this method is the intuition that phylogenetic accuracy improves with an increase in the number of variable sites [3]. While this assumption certainly holds within a particular locus, applying this method across multiple loci requires the assumption that the gene trees across loci share a similar topology. When this is demonstrably not the case, incongruence across loci is attributable to phylogenetic estimation error rather than to coalescent processes (e.g., the independent sorting of alleles across loci). Recently, the primacy of concatenation has been challenged on several fronts [4-8], and methods that estimate phylogeny while allowing for incongruence across loci due to coalescent processes have been proposed. These coalescent-based approaches to phylogeny inference estimate species tree either given gene trees [9,10], or estimate gene trees and species tree topologies simultaneously [11,12]. Either approach accounts for population-level processes, such as the incomplete sorting of ancestral polymorphism that can cause gene tree discordance. Given the growing criticism of concatenation, empiricists are faced with a vexing decision regarding the choice of phylogenetic method to apply to their system. Coalescent-based approaches are often favored a priori in phylogeographic investigations, where the incomplete sorting of ancestral polymorphism can be dramatically evident across loci [4,1317], while concatenation continues to be favored among those working at deeper taxonomic levels [18-20]. However, it is clear that population level processes such as the sorting of ancestral polymorphism have occurred throughout the history of life; further, one of the central theses of the modern synthesis is the expectation that evolutionary processes within populations ultimately produce phylogenetic patterns [21]. This led Edwards [1] to argue that species tree approaches are preferable on first principles. Philosophical implications aside, the question of phylogenetic method choice is also of dramatic practical importance because the ideal sampling schemes for concatenation and coalescent-based approaches are quite different. Since the former assumes that population-level processes do not have an effect on phylogeny estimation, systematists who concatenate their data benefit from sampling as many genes as possible and fewer individuals per species. Alternatively, coalescent-based approaches appear to be most accurate with intermediate numbers of loci and multiple individuals sampled within species [22-24]. This places an empiricist in a difficult position; optimally they need to recognize which of these approaches appears to be appropriate given their data before all of it is collected in order to employ the optimal sampling scheme. It is also the position we found ourselves in some months ago, and in this study we propose an approach to answering this question using a preliminary data set of 7 genes from 1-2 individuals for each of 10 species of thamnophiine snakes. Given our data, how should we determine which of the competing phylogenetic paradigms to employ? Perhaps the most important evidence available to empiricists who seek to objectively determine whether to concatenate their data or use species-tree methods is the degree of incongruence among loci. If the gene trees are mostly congruent, this is evidence that the branch lengths of the species tree are sufficiently long to have allowed lineage sorting to reach completion, and thus concatenation may be justified. Alternatively, incongruence among gene trees may be caused by coalescent processes and would suggest that coalescent-based methods are required. One approach would simply be to measure the incongruence across gene trees using a metric for tree comparison such as the RobinsonFoulds distance [25]. Distributions of the pairwise R-F distances can be substantial at shallow phylogenetic depths; this incongruence can J o u r n a l o f P hy log enet ics & Evoionary B i o l o g y ISSN: 2329-9002 Journal of Phylogenetics & Evolutionary Biology


Introduction
There are two primary paradigms for estimating phylogeny from multi-locus sequence data [1]. The conventional method, which developed from arguments in favor of total evidence [2], estimates phylogeny by concatenating data across multiple genes collected from exemplar samples. In this approach, the data are treated as a single locus, and essentially the estimate of genealogy from each locus is averaged across genes. Underlying this method is the intuition that phylogenetic accuracy improves with an increase in the number of variable sites [3]. While this assumption certainly holds within a particular locus, applying this method across multiple loci requires the assumption that the gene trees across loci share a similar topology. When this is demonstrably not the case, incongruence across loci is attributable to phylogenetic estimation error rather than to coalescent processes (e.g., the independent sorting of alleles across loci). Recently, the primacy of concatenation has been challenged on several fronts [4][5][6][7][8], and methods that estimate phylogeny while allowing for incongruence across loci due to coalescent processes have been proposed. These coalescent-based approaches to phylogeny inference estimate species tree either given gene trees [9,10], or estimate gene trees and species tree topologies simultaneously [11,12]. Either approach accounts for population-level processes, such as the incomplete sorting of ancestral polymorphism that can cause gene tree discordance.
Given the growing criticism of concatenation, empiricists are faced with a vexing decision regarding the choice of phylogenetic method to apply to their system. Coalescent-based approaches are often favored a priori in phylogeographic investigations, where the incomplete sorting of ancestral polymorphism can be dramatically evident across loci [4,[13][14][15][16][17], while concatenation continues to be favored among those working at deeper taxonomic levels [18][19][20]. However, it is clear that population level processes such as the sorting of ancestral polymorphism have occurred throughout the history of life; further, one of the central theses of the modern synthesis is the expectation that evolutionary processes within populations ultimately produce phylogenetic patterns [21]. This led Edwards [1] to argue that species tree approaches are preferable on first principles. Philosophical implications aside, the question of phylogenetic method choice is also of dramatic practical importance because the ideal sampling schemes for concatenation and coalescent-based approaches are quite different. Since the former assumes that population-level processes do not have an effect on phylogeny estimation, systematists who concatenate their data benefit from sampling as many genes as possible and fewer individuals per species. Alternatively, coalescent-based approaches appear to be most accurate with intermediate numbers of loci and multiple individuals sampled within species [22][23][24]. This places an empiricist in a difficult position; optimally they need to recognize which of these approaches appears to be appropriate given their data before all of it is collected in order to employ the optimal sampling scheme. It is also the position we found ourselves in some months ago, and in this study we propose an approach to answering this question using a preliminary data set of 7 genes from 1-2 individuals for each of 10 species of thamnophiine snakes. Given our data, how should we determine which of the competing phylogenetic paradigms to employ?
Perhaps the most important evidence available to empiricists who seek to objectively determine whether to concatenate their data or use species-tree methods is the degree of incongruence among loci. If the gene trees are mostly congruent, this is evidence that the branch lengths of the species tree are sufficiently long to have allowed lineage sorting to reach completion, and thus concatenation may be justified. Alternatively, incongruence among gene trees may be caused by coalescent processes and would suggest that coalescent-based methods are required. One approach would simply be to measure the incongruence across gene trees using a metric for tree comparison such as the Robinson-Foulds distance [25]. Distributions of the pairwise R-F distances can be substantial at shallow phylogenetic depths; this incongruence can also persist at deeper levels ( Figure 1). However, observed discordance among gene tree estimates can arise from other neutral sources such as mutational stochasticity, as well as phylogenetic estimation error, and thus a major challenge for empiricists is determining if the observed incongruence across gene trees can be attributed to phylogenetic estimation error alone. It is reasonable to conclude that concatenation is appropriate when the level of discord is of a magnitude that can be attributed to phylogenetic estimation error, here the substitutions across loci will provide valuable information regarding ancestral nodes. Conversely, gene tree estimates that are incongruent to a greater extent than would be expected due to phylogenetic estimation error alone is an indication that coalescent uncertainty has caused the discord, and therefore must be accounted for through the use of species tree estimation approaches. Here we use parametric bootstrapping to conduct a series of pair wise tests to ascertain whether the incongruence across genealogies estimated from our empirical data can be attributed to phylogenetic estimation error alone.
We explore these questions using data collected from the North American colubrid snake tribe Thamnophiini, which consists of ~65 species representing nine genera. Previous studies [26,27] have estimated partial phylogenies of this group with differing results, yet to date no complete phylogeny has been estimated. Most recently, Alfaro [28] published a Bayesian estimate of phylogeny in which the Nerodia, the water snakes, were not monophyletic. Specifically, two species of the genus Regina and the monotypic Tropidoclonion were nested within Nerodia. While this result was not strongly supported by the data, the relationships within the Thamnophiini remain unsatisfactorily resolved. As empiricists, our ultimate research goal is to resolve the phylogeny of this group; however, our proximate goal, and the primary goals of this study is to determine which phylogenetic method is appropriate so that we can identify the optimal sampling scheme.

Empirical
Data collection: Molecular sequence data from five nuclear and two mitochondrial gene fragments were collected from seven species of the snake genus Nerodia, two North American relatives (Regina grahamii and Tropodoclonion lineatum), and an old world relative (Natrix natrix). The additional North American species were included to include an individual that has been previously suggested to be within Nerodia (R. grahamii) and to include one from a putatively deeper split (Tropidoclonion; Alfaro, 2003). DNA was extracted from tissue or blood using DNeasy kit (Qiagen, Hilden, Germany) following manufacturer's protocols. Each fragment was amplified via polymerase chain reaction using standard protocols: 25-50 ng template, 5 pmoles each primer (

Colubridae -Family
Proportional RF

Serpentes -Suborder
Squamata -Order Figure 1: Histograms of pairwise symmetric difference distance among gene trees from four multi-locus empirical datasets. To assess the degree to which individual gene trees share the same topology for multigene datasets representing four depths of phylogeny (Table 1), we used the symmetric difference distance (RF distances; Robinson and Foulds, 1981) to compare all trees in a pairwise fashion using PAUP*. Observations of zero indicate no topological incongruence between two trees. *Number of loci used in this study; some loci data sets were incompleteand thus not used.  Antarctic phosphatase following Glenn and Schable [29]. Fragments were sequenced following manufacturer's protocol, and sequences were analyzed on an ABI 3130 Sequence Analyzer (Applied Biosystems, Foster City, CA). When heterozygotes were detected, we first attempted to determine phase based on sample parameters using Phase [30,31]. For those whose estimated phase had a posterior probability less than 0.95, amplicons were cloned using a Qiagen cloning kit, and sequenced multiple clones per heterozygous individual to determine the exact phase.

Study
Gene tree estimation: For each dataset, we generated a maximum likelihood estimate of genealogy for each nuclear gene and the concatenated mitochondrial data. After checking alignment by eye, DT-ModSel [32] was used to select the model of evolution that best fit each fragment, and a heuristic search was performed in PAUP* [33] to estimate the ML tree. Support for each gene tree was assessed by performing 1000 heuristic search bootstrap replicates. Using Bayes Factors [34] based on stepping-stone (ss) estimates of marginal likelihood [35], we tested three models of evolutionary rate: 1) under a strict clock, 2) under an uncorrelated relaxed clock (independent gamma rates), and 3) under a non-clock model, in MrBayes. Two stepping stone MCMC chains (2x10 6 generations) were run for each model for each gene fragment to ensure convergence; significance of marginal likelihood disparities between competing models were assessed following Kass and Raftery [34].

Concatenated phylogenetic analyses:
Phylogeny was estimated for Nerodia were using both a likelihood and Bayesian approach. A maximum likelihood phylogeny was estimated using PAUP*. The best model for the concatenated dataset was chosen using DT-ModSel [32], subsequently a heuristic search was performed using estimated model parameters. Statistical support was assessed with 1000 heuristic search bootstrap replicates. The Bayesian phylogeny estimate was produced using BEAST 1.7 [36]. The dataset was partitioned into genes and the each gene was allowed to evolve under its own estimated substitution model (Table 3) and under an uncorrelated lognormal relaxed molecular clock model. Two indentical runs of 10 9 generations, sampling every 10 4 generations were performed, and proper mixing of the MCMC was assessed using Tracer 1.5 [37] Species tree estimation: The methods that currently exist for estimating species trees can be placed into two categories: those which estimate a species given estimated gene trees as input (eg., MDC [10,38] and STEM [9]) and those which simultaneously estimate the gene trees and species tree (BEST [8], MrBayes [39],*BEAST [12]). The former class relies on simple algorithms to estimate the species tree, whereas the latter uses Markov chains to approximate the posterior probabilities of trees and parameters. These Bayesian methods are often computationally intensive, thus the "gene tree input" approaches (i.e., MDC, STEM) may be preferred when no a priori reason for method choice exists. However, these methods rely on the assumption that the gene trees are well-estimated, which may not be the case in many empirical datasets, and inclusion of poor estimates of gene trees into studies may decrease the accuracy of species tree estimates. Empiricists are in a difficult position, as there is no simple measure of accuracy for gene trees estimated from empirical data because the actual genealogy is unknowable. We proceed here by estimating species trees using both approaches (i.e., species tree from gene trees and simultaneous estimation of species and gene trees) and conducting several simulation studies to enhance our understanding of how accurate we can expect various methods to be given our data.
Species tree from gene trees estimation: Mesquite [40] was used to estimate the species tree by minimizing the number of deep coalescences [23]. Since Mesquite produces an estimate of the topology but not the branch lengths, STEM [9] was used to identify the ML estimate of the species tree (with branch lengths) given the gene trees. For both analyses, maximum likelihood estimates of the gene trees generated in PAUP were used.
Bayesian species tree estimation: Two methods for estimating species trees in a Bayesian framework are currently available, both of which simultaneously approximate the posterior distribution of the gene trees and the species tree, given multi-locus datasets and distributions of parameter priors. For BEST [11,41], we conducted two runs of seven chains (one for each gene tree, species tree), for 10 8 generations, sampling every 10 4 generations. We used an inverse gamma distribution with shape parameters α=3, β=0.003 (Θ = 0.0015 ) for the theta prior and a uniform genemu prior with bounds 0 and 5, with the upper bound corresponding to k-1 independent loci (D. Rabosky, pers. comm.). Convergence of chains was assessed using the program Tracer [37]. We also used *BEAST [12], which is implemented in the BEAST 1.7 software package. *BEAST allows use of a single or multiple MCMC chains to estimate both the species tree and the gene trees; here a single chain was allowed to run for 10 9 generations, sampling every 10 5 generations. The first 2000 samples were discarded as burn-in, and each parameter was checked for autocorrelation using the program Tracer provided in the BEAST package. A maximum clade credibility tree was created using Tree Annotater, also provided with the BEAST package.

Simulations
Quantifying the lingering effects of coalescent variance: To better understand how the coalescent processes that acted on the ancestral nodes of phylogenetic trees can influence phylogeny estimation, we conducted a series of analyses using data simulated in Mesquite 2.7.2 [40]. For each of ten ten-species topologies simulated under a birth/ death process, we simulated 20 coalescent gene trees (20 alleles) contained within each species for four depths : 1N, 10N, 100N, 1000N. For each of these topologies, we made pairwise comparisons of topology (RF distances) using PAUP*. Then, for each gene tree at all depths, DNA sequence data was simulated using the average fragment lengths and models of evolution from the empirical datasets that most closely resemble the each species tree depth (Table 1). For these simulations, effective population size was set to N e =10,000 and a generation time of 2.5 years was used [43]; estimated node ages based on fossil data [44][45][46] were converted to N generations. We estimated a ML tree in PAUP for each simulated dataset under the model of evolution used to simulate the data, and once again compared the topologies using RF distances. To measure how much phylogenetic estimation error affects the topology, comparisons of distributions of RF distances of the simulated gene trees and their respective estimated gene trees were performed. Finally, in order to discern the effect of gene tree

Data collection and gene tree estimation
A total of 3857 bp of phased DNA sequence data was collected for 13 individuals representing 10 species. Gene tree estimates for each gene are shown in Figure S1. Average nodal support across all gene trees as 47.6. This number is proportional to the number of segregating sites in each gene (data not shown). Descriptive statistics for each gene, model of evolution and molecular clock model selected can be seen in Table 3; both clock-like models were greatly preferred to a non-clock model (data not shown)

Quantifying the lingering effects of coalescent variance
For simulated species trees, coalescent gene trees showed some level of discordance at all depths ( Figure S2 and Figure 2), in 9/10 and 1/10 topologies at 100N and 1000N respectively, while there was some discordance among estimated gene trees in all cases. Comparisons of RF distributions of actual and estimated gene trees indicate that in most cases, the primary source of topological incongruence is phylogenetic error. In one of ten 100N and three of ten 1000N trees, concatenated ML estimates of the species tree differed from their respective simulated topologies.

Identifying the cause of gene tree incongruence
Discordance between topologies was significant (p < 0.002) in 18 of 25 pairwise tests (Table 4; 23/25 were "significant" prior to correction discordance on phylogenetic inference, a concatenated estimate was produced for each species tree and compared to the simulated topology using RF distances and the metric employed by Kuhner and Felsenstein [47], implemented in Ktreedist [48], which calculates relative Kuhner-Felsenstein (KF) distances for trees of differing total length. Because all fragments were simulated under the same model parameters, we did not partition the data for analysis.
Identifying the cause of gene tree incongruence: For any two gene trees estimated from independent loci, some combination of phylogenetic estimation error and coalescent uncertainty can account for observed topological discordance. Determining the relative contributions of these processes is a vital step towards determining which of the competing paradigms to use to estimate the species tree. To test whether incongruence among gene trees could be attributed solely to phylogenetic estimation error, we use the parametric bootstraps [49], an approach that utilizes simulation to construct a null distribution of the amount of phylogenetic error expected under a null model of no difference in topology across genes. We conducted pairwise test for all loci; in each we constrained the ML tree search of gene "A" to trees that matched the topology of gene "B", then measured the deterioration of the likelihood score between the topologically unconstrained and constrained trees (-lnL uncon --lnL con = δlnL) using PAUP. We then simulated 1000 datasets under the model and parameters of gene "A" on the topology of gene "B" using Seq-Gen [50] and built a null distribution of δlnL to examine our test statistic. Since parametric bootstrapping depends on an adequate fit of the model of sequence evolution to the data [51], we conducted an absolute goodness-of-fit tests on each gene and corresponding model with a modified method of Sullivan et al. [52], using 1000 simulated datasets. For both tests, significance was assessed using a Bonferroni corrected α (0.05/n comparisons).
Gene tree support and species tree accuracy: The quality of a maximum likelihood phylogenetic estimate is typically assessed by calculating non-parametric bootstrap for each node of the phylogeny [53]. To test if our set of gene trees estimates (with assessed nodal support) were sufficiently accurate for STEM to recover the correct species tree, a series of simulations were conducted. Starting with the topology of the species tree estimated from the empirical data using *BEAST, we simulated 1000 coalescent gene trees, consisting of two alleles per species with an N e of 10,000 (based on estimates of maximum likelihood estimates of N e [54] from Nerodia erythrogaster; data not shown) and a total tree depth of 50N (a depth loosely based on observed mitochondrial mutation rate and overall tree length), using Mesquite. For each gene tree, sequence data was simulated using the program Seq-Gen [50], under the HKY model of sequence evolution, with nucleotide frequencies and length (551 bp) of each fragment taken from a mean of the empirical data. Each dataset was simulated on a tree with a length drawn from an exponential distribution with mean 0.05 substitutions/site (i.e., the mean tree length of the nuclear gene tree estimates). Maximum likelihood gene trees were then estimated under the same model under which they were simulated, and 100 "fastsearch" bootstrap replicates were performed for each tree. Gene tree quality was assessed in two ways. First, a measure of average nodal support (ANS) was calculated for each tree as the sum of all nodal support values above 50 divided by the total number of potentially supported nodes [18].
We used STEM to estimate a species tree from 1000 subsets of 6 randomly chosen ML gene trees; then the Kuhner-Felsenstein and Robinson-Foulds metric was calculated between the estimated and actual species tree to assess accuracy. This number was the compared with linear regression to both the mean and variance of ANS. Perl scripts were written to automate these simulations and are available by request from the author (JDM). for multiple comparisons). Comparisons testing fit of the FSHR gene to other topologies were not conducted, as the model was a poor (p < 0.001) fit to the data.

Gene tree support and species tree accuracy
Results of this simulation exercise ( Figure S3 and Figure 3) are consistent with the prediction that accuracy of species tree estimation is directly correlated with quality of gene tree estimation. Average nodal support for the empirical dataset was 47.6, which was below the lowest simulated ANS for which the gene tree subset yielded the correct topology ( Figure 3). Based on these results, results of gene tree-based species tree estimators are not presented.

Species tree estimation
The maximum clade credibility tree obtained from *BEAST can be seen in figure 4. After 10 9 generations, effective sample sizes (ESS) of all parameters were greater than 200 (the minimum suggested by the authors for publication). BEST results are not shown. After 10 8 generations, standard deviation of split frequencies for all gene tree chains were above 0.07; stationarity is assumed when these values are below 0.01. Convergence was not assessed as stationarity had not been reached.

Discussion
We provide a simple framework which will allow researchers to make an a priori decisions about which model of phylogeny is best to use given their data: a simpler model in which all genes share a topologically identical history, or more complex models which allow genealogies to vary due to coalescent processes. In the first step, we compare the topologies of gene trees using a parametric approach. If topologies are not significantly different we could safely estimate our species tree using a concatenation approach, and additional loci can be gathered at the expense of within-species sampling. However, if gene trees exhibit an amount of incongruence that can not be attributed to phylogenetic estimation error alone, then a coalescent-based approach is preferred. For our data this is clearly the case as some 18/25 of our comparisons were able to reject the null hypothesis (i.e., that there is no difference in the topology of gene A and gene B) even using the conservative Bonferroni correction. Faced with these results, we attempted to determine if our gene trees were estimated sufficiently well to produce accurate results using STEM, since this program produces accurate estimates of species phylogenies when the gene trees are estimated without error [9,24]. We proposed a procedure based on the calculation of the average nodal support; trees that are estimated with little error will tend to have highly supported nodes as measured by non-parametric bootstrapping. Our results indicate that the ANS is low for our system, suggesting to us that we can not be sure of the accuracy of the species tree estimate from STEM. Therefore, we simultaneously

Relationships among Nerodia
Results of the *BEAST analysis recovers Nerodia as a monophyletic clade, with reasonable support at most nodes ( Figure 4). Disagreement between our estimation and that of Alfaro [28] may be due to several factors (e.g., taxon sampling in terms of the species representing the ingroup, the total number of individuals per species, and the total number of taxa included in the analysis). Results of our analysis will likely change as taxa and individuals are added, as our ultimate goal is to estimate phylogeny of Thamnophiini. Given our findings from these preliminary data; we are presently collecting data from multiple individuals in all (~60) of the Thamnophiini species and will present a more densely-sampled and rigorous phylogeny at a later date. However, our results illustrate several striking patterns that have clear implications for empirical systematists.

Quantifying the lingering effects of coalescent variance and phylogenetic estimation error
Our first set of simulations suggest that anomalous lineage sorting due to coalescent stochasticity can result in gene tree discordance even in phylogenies with large total tree depths. This is perhaps not surprising, since discordance should be expected within any species trees that possess internode less than ~6N generations in length [55] somewhere within the tree. However, we determined that, at deeper phylogenetic levels, phylogenetic estimation error was a more common source of estimation error than coalescent uncertainty. While these would seem to imply that concatenation would perform well, it is generally the case that both of these processes contribute to decreased accuracy in phylogeny estimation. We advocate parametric bootstrapping as a method for determining whether the observed incongruence across multiple loci can be attributed to phylogenetic estimation error alone.

Gene tree support and species tree accuracy
One of the approaches used by empiricists to measure the quality of their phylogeny estimates is the bootstrap support of particular nodes in the phylogeny. We extend this convention and measure the overall quality of our gene tree estimates by averaging the nodal support, and then used regression to demonstrate that the accuracy of species trees estimated using STEM are correlated to the ANS. Based on results of the third set of simulations, we consider the information contained within our gene trees inadequate to estimate gene genealogies sufficiently for use in gene tree-based species tree estimators. Species tree estimates using STEM and Mesquite differed in topology from both the concatenated and *BEAST results, with STEM not recovering Nerodia as a monophyletic group (Figure 5a and Figure 5b). It is possible that these estimates will improve when numbers of alleles per species are increased [56] or when more substitution-rich loci are incorporated in the analysis.

Causes of discordance
While we have evoked the explanation for discordance among our gene trees as coalescent stochasticity, there are other phenomena, such as hybridization and gene duplication/extinction, which may cause similar patterns in the observed data. While a theoretical and practical framework are currently being developed that incorporate hybridization into coalescent-based analyses of phylogenetic estimation [9,28,57], this long-standing problem remains a difficult one. A key to useful incorporation of hybrid mechanisms may be a better understanding of the system-specific patterns of introgression. Within Thamnophiini, hybridization has been shown to occur between both sister and non-sister pairs of species [58,59]; consequently we cannot ignore hybridization as a possible mechanism influencing the patterns we observe.

Rate heterogeneity and species tree estimation
An advantage to using species tree estimators that require gene trees as input is that each gene tree contributes equally to the likelihood of the species tree, thus no single gene tree topology can disproportionally influence the species tree estimation. This is not the case with concatenation. Disconcertingly, concatenated multilocus phylogenetic estimations often include one or more mitochondrial loci, and the sheer bias in the number of variable sites is likely to result in an estimation in which the signal from the nuclear data is treated a noise, overwhelmed by the information in the plasmid loci [60]. Even if the mitochondrial genealogy is concordant with other gene trees or the species phylogeny, the concatenated estimate will suffer from a bias in branch length estimates [61], which can result in incorrectly estimated node ages, or bias in ancestral character state reconstruction.

Conclusions
We demonstrate a useful and direct approach to choosing among the two dominant phylogenetic models; concatenation and species tree estimation. Central to our description of the issues related to choosing among these models is the assumption that it is important to have an a priori expectation of model performance in order to avoid a post hoc evaluation of the phylogenies. We also contend that the optimal sampling design differs for these competing models; for our data we show clearly that coalescent processes are likely to produce incongruence across loci and therefore future efforts will be focused on increasing the number of individuals included in the analysis.