Hye Won Lee and Luciano Brocchieri* | ||
Department of Molecular Genetics & Microbiology and Genetics Institute, University of Florida, Gainesville FL, USA | ||
Corresponding Author : | Luciano Brocchieri Department of Molecular Genetics & Microbiology and Genetics Institute University of Florida, Gainesville FL, USA Tel: +1 352 273 8131 E-mail: [email protected] |
|
Received December 24, 2012; Accepted December 28, 2012; Published December 31, 2012 | ||
Citation: Lee HW, Brocchieri L (2013) The Evolution of Fuzzy Proteins. J Phylogen Evolution Biol 1:e102. doi: 10.4172/2329-9002.1000e102 | ||
Copyright: © 2013 Lee HW, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. | ||
Related article at![]() ![]() |
Visit for more related articles at Journal of Phylogenetics & Evolutionary Biology
Advances in sequencing technology favor the accumulation of molecular data and the development of phylogenetic methods that use nucleotide or amino acid sequences to study the evolution of gene and protein families, and the phylogenetic relations of species. Phylogenetic tree reconstructions are based on a choice of algorithms, and rely on the accuracy of nucleotide or amino acid substitution models in describing the process of molecular evolution. Here, we describe recent approaches to modeling protein evolution and their biological interpretation based on the concept of “fuzzy protein”. | |
Probabilistic Approaches to Phylogenetic Tree Inference | |
Probabilistic approaches, including Maximum-Likelihood (ML) and Bayesian methods, are widely used and considered the most accurate in phylogenetic inference [1,2]. In probabilistic phylogenetic methods, protein evolution is modeled as a continuous Markov process, described by a matrix Q={qij} of amino acid transition rates, from amino acid type i to amino acid type j [2]. Q is derived by combining a symmetric substitutability matrix R, and a vector of amino acid equilibrium frequencies π, to obtain transition rates qij=Crijπj (i ≠j). The diagonal terms ![]() ![]() |
|
The observation from sequence alignments of related proteins that different protein sites show different propensities to differentiate, however, suggests the opportunity to incorporate in evolutionary models devices to model site heterogeneity of the evolutionary process. A traditionally used way to do so is to assign different rates of evolution to different sites. This is generally accomplished by rescaling Q by a coefficient vk specific to each site k, so that Q(k)=Qvk and Σk vk = 1.0. The value of vk is most commonly drawn from a discretized gamma distribution Γ(α,α), whose shape is optimized by the choice of α [3-6]. A second, commonly used device to fit site-dependence of evolutionary rates is to allow for a fraction I of invariable sites [7-9]. In a model including both invariable and gamma distributed sites (1+Γ), a rate-coefficient v=0 is assigned to a fraction I of sites, and gammadistributed positive rates are assigned to the remaining fraction (1–I) of sites. Evolutionary rates that substantially vary across sites have a significant effect on the relation between evolutionary distance and sequence similarity (Figure 1), as substitutions that would otherwise uniformly spread across all sites, tend instead to cumulate at fewer, fast evolving sites. | |
While site-specific rates affect the speed of evolution, they do not affect the evolutionary pattern of each position. Different evolutionary patterns can instead be fitted to individual sites by deriving sitespecific Q matrices. Remembering how Q is constructed, this can be accomplished by allowing site-specificity to R, to π , or to both. The first choice, implemented in the QMM model [10], is computationally quite challenging, requiring the optimization of 189 parameters per site-class. A relatively simpler approach is to allow for site-specific stationary frequencies π (k). This approach also appears consistent with the observation from multiple sequence alignments that different subsets of amino acid types are typically seen at different sites. Sitespecificity of equilibrium frequencies has many interesting repercussions on the features of the evolutionary process, on phylogenetic tree reconstruction, and on the relation between sequence conservation and mutational saturation. | |
Position Specific Profiles of Amino Acid Usage | |
Possibly, the most successful implementation of the idea of sitespecificity of amino acid stationary distributions is the CAT mixture model of Lartillot and collaborators [11-15]. In the CAT (category) model, amino acid equilibrium frequencies π (k )were empirically identified using a Bayesian approach [11]. To speed up computation, sets of preassembled profiles of amino acid frequencies are provided in ML and Bayesian phylogenetic reconstruction implementations [16,17]. Profiles π (k ) specific to each site k are used in combination with a general substitutability matrix R, to construct site-specific normalized transition-rate matrices Q(k), with ![]() ![]() |
|
Profiles, Fuzzy Proteins, and Neutral Constrained Amino Acid Replacements | |
Position-specific equilibrium frequency profiles are justified by the idea that functionality and structural stability of a protein requires certain residue types at certain positions, with different degrees of stringency, depending on functional constraints. For example, a position corresponding to an active site may correspond to a profile with one amino acid type, whereas different hydrophilic amino acid types may be allowed to substitute in loops exposed at the protein surface. This suggests an interpretation of profiles based on a model of neutral constrained evolution [19]. According to this interpretation, the profile associated with a particular position defines a subset of amino acid types that can be substituted at that position, without affecting the fitness of the protein (i.e., its functionality). This model asserts that a protein can be described as a functional unit as a possibly large set of alternative sequences, each functionally equivalent to the other. Thus, from a functional perspective, a protein would be described, rather than by a sequence of amino acids, by a sequence of amino acid subsets, whose size describes different degrees of “fuzziness” of different positions. A “fuzzy protein” can evolve within the limits imposed by the sequence of amino acid subsets that describe it with no effect on functionality. With this interpretation, position-specific profiles can explain not only the evolutionary pattern, but also the speed of evolution. The reasoning is that random substitutions will be retained only if they result in substitutions allowed by the profile. Thus, if the profile is stringent, most substitutions will be rejected slowing the evolutionary process; if the profile is permissive, most substitutions will be accepted, resulting in fast evolution. To model the effect of purifying selection on evolutionary rates, we first consider a general, normalized substitution-rate matrix, whose coefficients are derived from nucleotide and codon substitution matrices. At each position k, the substitution matrix is filtered by a position-specific “occupancy vector” that defines the subset of amino acid types allowed at that position, so that equilibrium frequencies and transformation rates towards amino acid types not represented in the occupancy vector are set to zero. The result is a reduced Q(k) matrix, with a slower average transition rate ![]() |
|
Acknowledgements | |
This work is supported by NIH Grant 5R01GM87485-2. | |
References | |
|
![]() |
![]() |
![]() |
||
Figure 1 | Figure 2 | Figure 3 |
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals