alexa
Reach Us +44-1202-068036
The Evolution of Fuzzy Proteins | OMICS International
ISSN: 2329-9002
Journal of Phylogenetics & Evolutionary Biology
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business

The Evolution of Fuzzy Proteins

Hye Won Lee and Luciano Brocchieri*
Department of Molecular Genetics & Microbiology and Genetics Institute, University of Florida, Gainesville FL, USA
Corresponding Author : Luciano Brocchieri
Department of Molecular Genetics &
Microbiology and Genetics Institute
University of Florida, Gainesville FL, USA
Tel: +1 352 273 8131
E-mail: [email protected]
Received December 24, 2012; Accepted December 28, 2012; Published December 31, 2012
Citation: Lee HW, Brocchieri L (2013) The Evolution of Fuzzy Proteins. J Phylogen Evolution Biol 1:e102. doi: 10.4172/2329-9002.1000e102
Copyright: © 2013 Lee HW, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google

Visit for more related articles at Journal of Phylogenetics & Evolutionary Biology

Advances in sequencing technology favor the accumulation of molecular data and the development of phylogenetic methods that use nucleotide or amino acid sequences to study the evolution of gene and protein families, and the phylogenetic relations of species. Phylogenetic tree reconstructions are based on a choice of algorithms, and rely on the accuracy of nucleotide or amino acid substitution models in describing the process of molecular evolution. Here, we describe recent approaches to modeling protein evolution and their biological interpretation based on the concept of “fuzzy protein”.
Probabilistic Approaches to Phylogenetic Tree Inference
Probabilistic approaches, including Maximum-Likelihood (ML) and Bayesian methods, are widely used and considered the most accurate in phylogenetic inference [1,2]. In probabilistic phylogenetic methods, protein evolution is modeled as a continuous Markov process, described by a matrix Q={qij} of amino acid transition rates, from amino acid type i to amino acid type j [2]. Q is derived by combining a symmetric substitutability matrix R, and a vector of amino acid equilibrium frequencies π, to obtain transition rates qij=Crijπj (i ≠j). The diagonal terms are normalized by choosing C, so that, that is, rates are scaled so that one unit of relative evolutionary time corresponds on average to one substitution per site. Q establishes a relation between evolutionary distance and expected sequence similarity, by which the evolutionary distance between two sequences can be inferred based on their sequence identity. Furthermore, Q can be used to calculate the likelihood of a phylogenetic tree in ML methods, or the ratio of the posterior probabilities of two phyogenetic trees in Bayesian approaches, to identify the optimal tree(s).
 
The observation from sequence alignments of related proteins that different protein sites show different propensities to differentiate, however, suggests the opportunity to incorporate in evolutionary models devices to model site heterogeneity of the evolutionary process. A traditionally used way to do so is to assign different rates of evolution to different sites. This is generally accomplished by rescaling Q by a coefficient vk specific to each site k, so that Q(k)=Qvk and Σk vk = 1.0. The value of vk is most commonly drawn from a discretized gamma distribution Γ(α,α), whose shape is optimized by the choice of α [3-6]. A second, commonly used device to fit site-dependence of evolutionary rates is to allow for a fraction I of invariable sites [7-9]. In a model including both invariable and gamma distributed sites (1+Γ), a rate-coefficient v=0 is assigned to a fraction I of sites, and gammadistributed positive rates are assigned to the remaining fraction (1–I) of sites. Evolutionary rates that substantially vary across sites have a significant effect on the relation between evolutionary distance and sequence similarity (Figure 1), as substitutions that would otherwise uniformly spread across all sites, tend instead to cumulate at fewer, fast evolving sites.
While site-specific rates affect the speed of evolution, they do not affect the evolutionary pattern of each position. Different evolutionary patterns can instead be fitted to individual sites by deriving sitespecific Q matrices. Remembering how Q is constructed, this can be accomplished by allowing site-specificity to R, to π , or to both. The first choice, implemented in the QMM model [10], is computationally quite challenging, requiring the optimization of 189 parameters per site-class. A relatively simpler approach is to allow for site-specific stationary frequencies π (k). This approach also appears consistent with the observation from multiple sequence alignments that different subsets of amino acid types are typically seen at different sites. Sitespecificity of equilibrium frequencies has many interesting repercussions on the features of the evolutionary process, on phylogenetic tree reconstruction, and on the relation between sequence conservation and mutational saturation.
Position Specific Profiles of Amino Acid Usage
Possibly, the most successful implementation of the idea of sitespecificity of amino acid stationary distributions is the CAT mixture model of Lartillot and collaborators [11-15]. In the CAT (category) model, amino acid equilibrium frequencies π (k )were empirically identified using a Bayesian approach [11]. To speed up computation, sets of preassembled profiles of amino acid frequencies are provided in ML and Bayesian phylogenetic reconstruction implementations [16,17]. Profiles π (k ) specific to each site k are used in combination with a general substitutability matrix R, to construct site-specific normalized transition-rate matrices Q(k), with ,and C(k) such that In comparison to the global vector of stationary frequencies, as implemented, for example in the LG model [18], profiles of the CAT model tend to favor different subsets of amino acid types with similar physico-chemical properties (Figure 2). As a consequence, while under a generalized Q matrix amino acid substitutions tend to wander over time across all 20 types, within a profile divergence is highly constrained within a few amino acid types, no matter how much evolution occurs, increasing the probability of homoplasy. Furthermore, the reduced effective size of the amino acid alphabet at each site produces higher expected similarity between sequences even at high evolutionary distance. For example, the generalized LG model [18] predicts that over time sequence similarity divergences to the asymptotic value of 5.996%. Profiles in the C20 set implemented in the Phylobayes [16] and PhyML [17] methods predict instead, on average, sequence divergence to 18.37% similarity, with a range for individual profiles from 7.54% to 33.56% similarity. Thus, the CAT model estimates that generalized models under-estimate the evolutionary distances of sequences of low similarity (Figure 1), providing an explanation for the phenomenon of long branch attraction [13].
Profiles, Fuzzy Proteins, and Neutral Constrained Amino Acid Replacements
Position-specific equilibrium frequency profiles are justified by the idea that functionality and structural stability of a protein requires certain residue types at certain positions, with different degrees of stringency, depending on functional constraints. For example, a position corresponding to an active site may correspond to a profile with one amino acid type, whereas different hydrophilic amino acid types may be allowed to substitute in loops exposed at the protein surface. This suggests an interpretation of profiles based on a model of neutral constrained evolution [19]. According to this interpretation, the profile associated with a particular position defines a subset of amino acid types that can be substituted at that position, without affecting the fitness of the protein (i.e., its functionality). This model asserts that a protein can be described as a functional unit as a possibly large set of alternative sequences, each functionally equivalent to the other. Thus, from a functional perspective, a protein would be described, rather than by a sequence of amino acids, by a sequence of amino acid subsets, whose size describes different degrees of “fuzziness” of different positions. A “fuzzy protein” can evolve within the limits imposed by the sequence of amino acid subsets that describe it with no effect on functionality. With this interpretation, position-specific profiles can explain not only the evolutionary pattern, but also the speed of evolution. The reasoning is that random substitutions will be retained only if they result in substitutions allowed by the profile. Thus, if the profile is stringent, most substitutions will be rejected slowing the evolutionary process; if the profile is permissive, most substitutions will be accepted, resulting in fast evolution. To model the effect of purifying selection on evolutionary rates, we first consider a general, normalized substitution-rate matrix, whose coefficients are derived from nucleotide and codon substitution matrices. At each position k, the substitution matrix is filtered by a position-specific “occupancy vector” that defines the subset of amino acid types allowed at that position, so that equilibrium frequencies and transformation rates towards amino acid types not represented in the occupancy vector are set to zero. The result is a reduced Q(k) matrix, with a slower average transition rate All matrices are finally renormalized to an average one substitution per site. With this model, selection against not-allowed transformations generates site-specific profiles of amino acid equilibrium frequencies, and a distribution across sites of different site-specific evolutionary rates, approximately proportional to the size of the profiles (Figure 3). The profiles that define a fuzzy protein are not likely to correspond to those identified by the CAT model, which combine substitutions within a profile with substitutions between profiles, taking into account events of profile evolution. An interesting question is how each process contributes to protein evolution.
Acknowledgements
This work is supported by NIH Grant 5R01GM87485-2.
References



















Figures at a glance

image   image   image
Figure 1   Figure 2   Figure 3
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Article Usage

  • Total views: 12438
  • [From(publication date):
    April-2013 - Dec 07, 2019]
  • Breakdown by view type
  • HTML page views : 8619
  • PDF downloads : 3819
Top