Phylogenetic Analyses of the Loops in Elongation Factors EF1A: Stronger Support for the Grouping of Animal and Fungi

Discerning how different major groups of organisms are related to each other and tracing their evolution from the common ancestor remains controversial and unsolved. In recent years, much new information based on a large number of gene and protein sequences has become available. So far phylogenetic analysis can be carried out based on either nucleic acid or protein sequences. However it was become evident that both approaches have a many serious limitations and pitfalls. Our novel findings follows from analysis of loops in elongation factors EF1A using novel informative characteristic which was called “loops” method. The method is based on the ability of amino acid sequences form loops in protein structure. The specificity of a criterion for grouping organisms is distinctly evident from the analysis of the loops using EF1A for three kingdoms of life. Each kingdom displayed variations in the number of loops and their location within the three EF1À domains and can be consider as imprint of molecular evolution. Stronger support that animals and fungi are sibling kingdoms was found.


Introduction
The phylogenetic relationship among the kingdoms Animalia, Plantae, and Fungi remains uncertain despite the extensive attempts. The current analysis shows that the proposed phylogenetic relationships differ from one another, depending on the type of molecules and the method that the authors use. There are three controversial hypotheses. One states that Animalia is more closely related to Plantae (Gouy and Li, 1989;Philip et al., 2005;Yang et al., 2005). The second supports Plantae and Fungi grouping (Loytynoja and Milinkovitch, 2001), and the third -Animalia and Fungi (Wainright et al., 1993;Baldauf and Palmer, 1993;Nikoh et al., 1994). Moreover, it was proposed that the eukaryotic supergroup including Animalia and Fungi should be expanded to include a collection of primitive unicellular eukaryotes (Protists) since there is an indel of ~12 amino acids in elongation factors EF1A of this supergroup (Cherkasov et al., 2006). United group of organisms was designated as Opisthokonta (Steenkamp et al., 2006). These data shows that despite the potential power of sequence-based phylogenies, of both proteins and rRNAs, we need to focus on suitable evidence from independent sources to elucidate evolutionary relation-ships among eukaryotes.
Here we present that the ability of amino acid sequences to form internal flexible regions that appear as loops in their 3D structure can be served as a new phylogenetic criterion. To demonstrate the specificity of new criterion, we selected elongation factors EF1A. There are several reasons for that. First, they are available in all cells and involved in protein biosynthesis as catalysts of codon-dependent binding of aminoacyl-tRNA to the ribosome (Abel and Jurnak, 1996). Second, eukaryotic factors, in addition to their general canonical function (binding with tRNA), form complexes with various ligands ranging from actin to virial RNA (Budkevich et al., 2002;Serdyuk, 2006). Third, elongation factors were the first proteins used for protein phylogenetic analyses (Iwabe et al., 1989;Lechner and Böck, 1987). Lastly the 3D structure of three prokaryotic, one archaeal, and one eukaryotic are known from x-ray crystallography data.
Analysis of loops in elongation factors assists in resolving molecular phylogeny of the kingdoms Animalia, Plantae, and Fungi. According to this analysis close evolutionary relatedness exists between the kingdoms Animalia and Fungi and Protists group can be consider as a precursor of it.

Structure of Elongation Factors EF1A
An elongation factor consists of three domains with two interconnecting peptides (Kjeldgard and Nyborg, 1992;Kjeldgard et al., 1993). Domain I (nucleotide-binding domain) is linked with domain II by a rather long peptide (e.g., of about 16 Å in E. coli). The length of this peptide depends on the isolation source, and it is enriched in proline, which ensures that the peptide has certain rigidity. Interactions between domain III and domain I are non-covalent. The three domains form a small inner cavity, which is liganddependent in size. With GDP replaced by GTP, the domains undergo a rearrangement: two parts of the molecule, one with domain I and the other with the rest, move towards each other until they meet, and a site enriched in polar amino acids emerges onto the surface. It is to this site that aminoacyl-tRNA binds (Kjeldgard and Nyborg, 1992;Kjeldgard et al., 1993).

Amino Acid Sequences of Elongation Factors EF1A
The amino acid sequences of all elongation factors considered here were extracted from the SWISS-PROT protein sequence database (http://us.expasy.org/sprot/). We list below the complete names of the organisms whose elongation factors were analyzed and the SWISS-PROT accession numbers of the sequences.

Program Fold-unfold
To predict loops we use a special program, FoldUnfold (Galzitskaya et al., 2006a;Galzitskaya et al., 2006b). We will call predicted unstructured regions 'loops,' using the terminology accepted for the description of flexible regions in 3D protein structure. Such loops reflect internal flexibility in proteins. It has been demonstrated that using this program it is possible to predict with good accuracy (80%) the status of residues to be protected or not from hydrogen exchange directly from amino acid sequences (Dovidchenko and Galzitskaya, 2008). As has been shown, the program gives the best results for the prediction of loop regions in G protein family in comparison with other programs for searching internal flexibility (Deryusheva et al., 2008). Moreover, the reliability of the predictions of loops in elongation factors by Program FoldUnfold w a s p r o v e d b y a d i r e c t c o m p a r i s o n w i t h X -r a y experimental data for three prokaryotic EFs (Serdyuk and found within domain I of thermophilic elongation factors (T. aquaticus, T. thermophilus); these are responsible for distinguishing thermophilic bacteria from others. The interconnecting peptide between domains I and II was detected using a five-residue window. It was not always detected with the standard 11-residue window. In addition, the linker predicted for thermophilic elongation factors does not always coincide with the experimentally determined region. The linker connecting domains II and III is detected in elongation factors at a standard window size of 11 residues. Analysis of the role of window width in prediction of loops is described in our recent publication (Deryusheva et al., 2008).

Predictions of Loop Regions in EF1A for Three Kingdoms of Life: Positions and Motives
As demonstrated in Eukaryotes and Archaebacteria, apart from inter-domain interconnecting peptides and the effec- 166, and 182-195. The position of Loop D is close to a.a 290-310 within the second domain, and that of Loop E is about a.a. 364-380 of the third domain. Loop F, if identified at all, is always found at the C-terminus of the polypeptide chain. Since the full chain length ranges significantly from factor to factor (e.g., 394 residues for E. coli and 463 residues for C. elegans), the loop positions vary accordingly, as tabulated for each Subkingdom.
As seen from Fig 1, the number of extra loops (not counting the effector loop) within prokaryotic factors is minimal, ranging from zero for Proteobacteria, to one for thermophilic bacteria (Loop C) and two for Cyanobacteria (Loops C and E). Loop C of Cyanobacteria is always found in the position of a thermophilic loop.
For Eukaria, the number of predicted extra loops ranges from two (Protists) to four (Animalia and Fungi). All eukaryotic factors, except Protists, have a fully disordered region of about 20 amino acids (Loop F) at their C-termini ( Table 1). The loop F length is independent of the factor origin (Serdyuk and Galzitskaya 2007). Interestingly, a complex of yeast EF1A with a fragment of nucleotide-exchangeable subunit eEE1B has a partially unfolded factor terminus (residues 442-452) as well (Andersen et al., 2000).
Extra loops within each domain, a large disordered region at the C-terminus, and high inter-domain mobility (Kjeldgard and Nyborg 1992;Kjeldgard et al., 1993) can be an explanation of the fact that none of the isolated eukaryotic factors has been crystallized so far. Its successful crystalliza-  Factors from kingdom Archaebacteria look like particularized protein structures where distribution of loops (usually three in number) over domains extremely depend on the sources of microorganisms. For example, an exclusive feature of sulfur bacteria is the presence of Loop A (Fig.1). Also, the predicted loops in Archaebacteria contain two more loops (D and E), which, in the case of S. solfotaricus, is in good agreement with X-ray data (Vitagliano et al., 2001). Variability of extra loops of specific sequences within eukaryotic elongation factors is very likely to their multifunctional activities. It is well known that in addition to their main function, i.e., interaction with aminoacyl tRNA, eu-  (Budkevich et al., 2002). The diverse nature of these ligands obviously requires different functional sites, whose roles may be played by the loops reported in the current study. The discovered correlation between the length of a polypeptide chain and proportion of lysine incorporation into elongation factors illustrates a considerable role played by this amino acid in RNA-protein interactions (Lechner and Böck, 1987). Thus, the C-terminal part of elongation factors (Loop F) is responsible for actin interactions (Gross and Kinzy, 2005).

Evolutionary Relationships Among the Main Groups of Modern Organisms (Animalia, Plantae, and Fungi)
As seen from Table 2, Protists contain Loop A and Loop E, which is a typical feature of Animalia and Fungi. Hence, Opisthokonta can be rated as an important taxon that includes both multicellular organisms and their extant unicellular relatives. However, the situation is not so straightforward: the long Loop F which is typical of all eukaryotic elongation factors is absent in Protists. This allows considering the Protists group as a precursor of Animalia and Fungi and agrees with some contemporary theories on the early opisthokont evolution. Most of these theories suggest that colonial naked choanoflagellat-like protists gave rise to the first animals, while chitinous thecate choanoflagellat-like protists gave rise to the first fungi (Cavalier-smith, 1987;Buck, 1990). The Loop F might be acquired in the course of evolution.
For our analysis, is important not only the existence of loops, but also specificity of a sequence within the loop. As seen from Table 2, the sequence motif of Loop A in all Protists, Animalia and Fungi is highly conserved (GEFEAGISKN(D)GQTRE), although the chain length changes significantly from factor to factor (e.g., 385 residues for Ministeria vibransi and 463 residues for Xenopus tropicalsis). At the same time, the sequence motif of Loop E is in an exclusive clade with either primitive unicellular animals (Amoebidium, Corallochytrium Chytriomyces and Monosiga ovata) or primitive unicellular fungi (Nuclearia simplex). The position of Ministeria is much closer to primitive animals. Since the sequence motif of Loop E in Amoebidium, Corallochytrium Chytriomyces and Monosiga ovata factors is very close to the motif of multicellular Animalia such as Bos taurus or Xenopus tropicals, we consider these Protists as precursors of the  Animalia lineage. The sequence motif of Loop E in Nuclear simplex is close to that of some multicellular Fungi such as Puccinia gramnis or Podospora anserina. This suggests that Nuclear simplex may be a precursor of the Fungi lineage. However this suggestion is not unambiguous: other fungi such as Saccharomyces cerevisiae and Rhizomucor racemosus may have ancestors from choanoflagellates as well, thereby likeness between Fungi and Animalia is demonstrated once again. Surpisingly, but indels provide no significant information on the matter.

Discussion
The characteristic features of EF1A were detected for each of the three kingdoms of life with the example of several dozens of typical representatives. These features include variations in the number of loops and their locations within the EF domains. It is of fundamental importance that, not only the presence of a particular loop is taken into account in our analysis, but also the specificity of its amino acid sequence. Note also that the total number of amino acid residues in EF1A reflects the division of the Living World into three kingdoms. The lengths of EF1A fall into the range of 393-406 residues in Eubacteria, 422-444 residues in Archaea, and 458-464 residues in Eukarya.
The specificity of the new criterion for grouping organisms is distinctly evident from the data shown in Fig. 1 and Table 1. First, disordered region F of about 20 residues at the C-end is characteristic of only eukaryotic EF1A and is absent in both the kingdoms Eubacteria, Archaea, and Protists also. Second, the exclusive characteristic feature of the Crenarchaeota (sulfur bacteria) is the presence of loop A, which is undetectable in the Euryarchaeota. Third, the exclusive characteristic of thermophilic bacteria is the thermophilic insert (loop C) of about 10 residues, which distinguishes them from all other bacteria.
The obtained data leads us to believe that the criterion based on the prediction of flexible loops -up to six -should have higher resolution than the method based on indels, the number of which is the same for EF1A for different sources. We suppose that the new criterion might be complementary protein phylogenies, based on alignments of particular sequences.

Possible Role of Loops in Evolution of Elongations Factors EF1A
Since the total number of loops predicted in the elongation factors increases with complexity of organisms, we propose the following role of these loops in evolution: hold-ing to the principle of "thrifty inventiveness," Nature operates with various universal inserts (loops), adapting their number and location among the factor domains, as well as their amino acid compositions, so that the protein could perform special functions: one in lower and several in higher organisms. This principle resembles a wellknown principle of the finite number of folding motifs in globular proteins despite a tremendous number of these proteins in nature.
We suppose that the introduction of a new structural criterion for phylogenetic analysis is interesting in itself, as, despite the apparent abundance of molecular characteristics, megasystematics and macrophylogeny lack an informative criterion that could be useful the evolutionary relationships among the main groups of modern organisms (Animalia, Plantae, and Fungi).