Ancient Origin of Chaperonin Gene Paralogs Involved in Ciliopathies

The Bardet-Biedl Syndrome (BBS) is a human developmental disorder that has been associated with fourteen BBS genes affecting the development of cilia. Three BBS genes are distant relatives of chaperonin proteins, a family of chaperones well known for the protein-folding role of their double-ringed complexes. Chaperonin-like BBS genes were originally thought to be vertebrate-specific, but related genes from different metazoan species have been identified as chaperonin-like BBS genes based on sequence similarity. Our phylogenetic analyses confirmed the classification of these genes in the chaperonin-like BBS gene family, and set the origin of the gene family earlier than the time of separation of Bilateria, Cnidaria, and Placozoa. By extensive searches of chaperonin-like genes in complete genomes representing several eukaryotic lineages, we discovered the presence of chaperonin-like BBS genes also in the genomes of Phytophthora and Pythium, belonging to the group of Oomycetes. This finding suggests that the chaperonin-like BBS gene family had already evolved before the origin of Metazoa, as early in eukaryote evolution as before separation of the lineages of Unikonts and Chromalveolates. The analysis of coding sequences indicated that chaperonin-like BBS proteins have evolved in all lineages under constraining selection. Furthermore, analysis of the predicted structural features suggested that, despite their high rate of divergence, chaperonin-like BBS proteins mostly conserve a typical chaperonin-like three-dimensional structure, but question their ability to assemble and function as chaperonin-like double-ringed complexes. *Corresponding author: Luciano Brocchieri, Department of Molecular Genetics and Microbiology and Genetics Institute, University of Florida, Gainesville, FL 32610-3610, USA, Tel: 352-273-8131; Email: lucianob@ufl.edu Received January 31, 2013; Accepted April 15, 2013; Published April 23, 2013 Citation: Mukherjee K, Brocchieri L (2013) Ancient Origin of Chaperonin Gene Paralogs Involved in Ciliopathies. J Phylogen Evolution Biol 1: 107. doi:10.4172/2329-9002.1000107 Copyright: © 2013 Mukherjee K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Introduction The Bardet-Biedl Syndrome (BBS) is a human developmental disorder affecting a variety of tissues, which has been linked to a group of fourteen genes (named BBS1 to BBS14) essential for the correct development of cilia [1-3]. The recent identification of BBS genes has stimulated new interest in the genetic control of the development and functionality of the eukaryotic cilium. The products of seven of these genes, namely BBS1, BBS2, BBS4, BBS5, BBS7, BBS8 and BBS9, assemble in a newly-discovered protein complex called BBSome, which localizes to the basal body and to the axoneme of cilia [3]. Three other BBS proteins, BBS6 (also known as MKKS), BBS10 and BBS12, are related to class 2 eukaryotic chaperonin proteins, including T-complex protein 1 (TCP1), also named chaperonin containing TCP1 subunit 1 (CCT1), and CCT2 to CCT8, best known for assembling in a double hetero-8-meric ringed complex (TRiC/CCT) essential for folding nascent actin, tubulin, and other proteins (see, e.g., [4-6] for reviews). The three chaperonin-like BBS proteins (CL-BBS) are mostly found associated with the centrosome and basal body of the cilium [7,8], where they associate with selected CCT chaperonin monomers and with BBS7 to form the “BBS/CCT complex”, required for BBSome assembly [8,9]. Chaperonin-like BBS (CL-BBS) genes originated from a duplication of a progenitor of the CCT8 gene [10]. While they were originally described as vertebrate-specific [8,11-13], sequences with highest similarity to CL-BBS genes have been also reported in nonvertebrate Metazoa, including the Urochordate Ciona intestinalis [7], Lophotrochozoa, Cnidaria and Placozoa [14], suggesting that the CLBBS gene family originated early in Metazoan evolution. In this work we supported with phylogenetic analyses the phenetic classification of these chaperonin-like genes in the CL-BBS gene family, confirming the orthology of vertebrate and non-vertebrate genes. Furthermore, we performed extensive searches and phylogenetic analyses of chaperoninlike genes found in completely sequenced genome sequences of several species belonging to a variety of anciently diverged eukaryotic lineages, and newly identified the presence of CL-BBS genes in the genomes of water molds (Oomycetes). All alternative evolutionary scenarios interpreting our findings and phylogenetic reconstructions imply that the CL-BBS gene family originated and twice duplicated at earlier stages of eukaryote evolution than previously thought. Chlorophyta (1) Prasinophyta (1) Streptophyta (4) Rhodophyta (1) Oomycetes (5) Bacillariophyta (1) Ciliata (1) Apicomplexa (1) (0) Kinetoplastida (1) Heterolobosea (1) Fornicata (1) Parabasilia (1) Mycetozoa (1) Archamoebae (1) Fungi (20) Choanoflagellata (1) Placozoa (1) Cnidaria (1) Mollusca (1) Annelida (2) Nematoda (2) Arthropoda (4) Echinodermata (1) Cephalochordata (1) Chordata


Introduction
The Bardet-Biedl Syndrome (BBS) is a human developmental disorder affecting a variety of tissues, which has been linked to a group of fourteen genes (named BBS1 to BBS14) essential for the correct development of cilia [1][2][3]. The recent identification of BBS genes has stimulated new interest in the genetic control of the development and functionality of the eukaryotic cilium. The products of seven of these genes, namely BBS1, BBS2, BBS4, BBS5, BBS7, BBS8 and BBS9, assemble in a newly-discovered protein complex called BBSome, which localizes to the basal body and to the axoneme of cilia [3]. Three other BBS proteins, BBS6 (also known as MKKS), BBS10 and BBS12, are related to class 2 eukaryotic chaperonin proteins, including T-complex protein 1 (TCP1), also named chaperonin containing TCP1 subunit 1 (CCT1), and CCT2 to CCT8, best known for assembling in a double hetero-8-meric ringed complex (TRiC/CCT) essential for folding nascent actin, tubulin, and other proteins (see, e.g., [4][5][6] for reviews). The three chaperonin-like BBS proteins (CL-BBS) are mostly found associated with the centrosome and basal body of the cilium [7,8], where they associate with selected CCT chaperonin monomers and with BBS7 to form the "BBS/CCT complex", required for BBSome assembly [8,9].
Chaperonin-like BBS (CL-BBS) genes originated from a duplication of a progenitor of the CCT8 gene [10]. While they were originally described as vertebrate-specific [8,[11][12][13], sequences with highest similarity to CL-BBS genes have been also reported in nonvertebrate Metazoa, including the Urochordate Ciona intestinalis [7], Lophotrochozoa, Cnidaria and Placozoa [14], suggesting that the CL-BBS gene family originated early in Metazoan evolution. In this work we supported with phylogenetic analyses the phenetic classification of these chaperonin-like genes in the CL-BBS gene family, confirming the orthology of vertebrate and non-vertebrate genes. Furthermore, we performed extensive searches and phylogenetic analyses of chaperoninlike genes found in completely sequenced genome sequences of several species belonging to a variety of anciently diverged eukaryotic lineages, and newly identified the presence of CL-BBS genes in the genomes of water molds (Oomycetes). All alternative evolutionary scenarios interpreting our findings and phylogenetic reconstructions imply that the CL-BBS gene family originated and twice duplicated at earlier stages of eukaryote evolution than previously thought.

Chaperonin-like sequences in eukaryotic genomes
Five to seven distinct major clades of eukaryotic organisms are recognized in recent phylogenetic analyses [15], including Opisthokonts and Amoebozoa (clustered by some authors into the group of Unikonts), Trichozoa and Discicristates (sometimes clustered as Excavata), Rhizaria, Chromalveolates, and Plantae ( Figure 1). With the exception of the clade of Rhizaria, multiple complete genome sequences from species representative of each clade have become available. We analyzed complete genome sequences from thirty-seven non-vertebrate species representative of the major eukaryotic clades, and subsequently augmented our set with two additional Oomycete genomes and 18 additional fungal genomes (Table 1, supplementary  table S1, and figure 1). To identify as many chaperonin-like sequences as possible, we first searched the genomes with human and other *See Supplementary Table S1 for an expanded set of 20 genomes from Fungi included based on results. ** Newly sequenced genomes included based on maximumlikelihood and Bayesian tree results. . Applying reciprocal BLAST analysis [16], we identified twenty-five sequences that were reciprocal nearest-neighbors of CL-BBS sequences, fourteen using BBS6 queries, six using BBS10 queries, and five using BBS12 queries ( Figure 1 and supplementary tables S2-S4). Twenty sequences were found in Metazoan species, including Brachiostoma, Ciona and sea urchin (deuterostomes), round worm and gastropods (Lophotrochozoa), and the anciently diverged metazoan groups of Cnidaria and Placozoa, confirming previous reports of sequences similar to CL-BBS proteins encoded in nonvertebrate animal genomes [7,14]. However, we also identified five closely-related chaperonin-like sequences in genomes of the water mold genera Phytophthora and Pythium, belonging to the Oomycetes (Chromalveolates, Heterokonts), a group separate from Metazoa (Unikonts, Opisthokonts) ( Figure 1 and Table 1). No reciprocal nearestneighbors of chaperonin-like BBS sequences were identified in genomes of insects or of nematodes (Ecdysozoa), nor from genomes of nonmetazoan Opisthokonts (including 20 Fungi genomes), Amoebozoa, Plants, Alveolates, Bacillariophyta, or Kinetoplastida ( Figure 1).

Evolutionary analysis
The evolutionary relations of the newly identified sequences with CL-BBS proteins were reconstructed in phylogenetic trees obtained with Maximum-likelihood (ML), Bayesian, or distance methods, based on the multiple protein alignment of the 26 newly-identified sequences with human CCT proteins, vertebrate CL-BBS proteins, and one representative of archaeal chaperonin class 2 proteins as out-group (see Methods for details, and Supplementary Material for Alignment). All methods resulted in tree topologies ( Figure 2 and supplementary figures S1 and S2) implying that all chaperonin-like genes identified by our searches originated monophyletically within the CL-BBS gene family by duplication of the CCT8 chaperonin gene. They also identified the newly found sequences from Oomycetes as belonging to the BBS6 subfamily. The substantial concordance between the ML and Bayesian trees emphasized the robustness of the results on substitution model and on possible long-branch attraction effects [17,18] (see Methods). We also tested robustness of the clusters over a wide range of shapes of gamma-distributed position-specific substitution rates, used to estimate pairwise evolutionary distances for neighbor-joining distance-based tree reconstructions. Association of the twenty-six newly-identified sequences within one of the three CL-BBS clusters was robustly reproduced over values of the parameter a of the gamma distribution in the interval 1.0 to 3.0, with a value a=2.212 estimated by the ML procedure (Supplementary figure S2). All relevant clusters, including the sequences identified in Oomycetes, were supported by very high values of aLRT (approximate likelihood ratio, for the ML tree), posterior probability (for the Bayesian tree), or bootstrap (for the neighborjoining trees). An exception was the order with which the three BBS6, BBS10 and BBS12 families were clustered, with the sister group (BBS10, BBS12) identified by the ML tree, and the sister group (BBS6, BBS12) resulting from Bayesian and neighbor-joining trees. The branching position of the Oomycete sequences outside of the cluster of metazoan BBS6 sequences reflected the phylogenetic relations of the respective species, thus suggesting that the Oomycete and Metazoan genes originated from a common ancestor predating radiation of Metazoan groups, including the anciently diverged lineages of Cnidaria and Placozoa. Thus, the topology of the tree excluded that sequences from Oomycetes have been laterally transferred from any of the Metazoan lineages represented in the trees. Furthermore, the association of the Oomycete sequences with the BBS6 subfamily indicated that when the BBS6 common ancestor separated into the Oomycetes and Metazoa lineages, the gene family had already duplicated into three paralogous subfamilies. Finally, the tree topologies confirmed that the CL-BBS gene family originated from a duplication of a CCT8 gene precursor before separation of Unikonts and Chromalveolates.
Since the CL-BBS genes evolved at a much higher rate than the canonical CCT sequences, as indicated by the respective branch lengths, their clustering in the phylogenetic trees could be the artifactual result of long-branch attraction [19]. Although some of the models (the CAT-GTR model in Bayesian analysis) used in this analysis are expected to reduce or eliminate this effect [17,18], long-branch attraction remains a potential alternative explanation to the clustering of Oomycete and  animal CL-BBS sequences. The long-branch attraction hypothesis, however, is contradicted by the reciprocal closest similarity of the fast evolving vertebrate and Oomycete CL-BBS sequences, which cluster together also in a phenogram (Supplementary figure S3). The highest sequence similarity of non-vertebrate metazoan and Oomycete chaperonin-like sequences to vertebrate CL-BBS sequences despite the high divergence rate of these sequences provides further support to their monophyletic origin.

Signature BBS6 sequence in Oomycetes
To further support the classification of Oomycete sequences we looked in the alignment of CL-BBS and CCT protein sequences for signature motifs unique to BBS6 proteins. We identified BBS6 sequence signatures within two regions conserved across CL-BBS and CCT proteins (Supplementary figure S4). One signature sequence, QK[IV] [IV]x 16 [DE]R[LIVA], was found within a conserved region of the predicted chaperonin structural Apical Domain, corresponding to the C-terminal ends of two adjacent parallel beta-strands [12,13].Two other signature positions, not structurally connected, corresponded, respectively, to a Leu and a His amino acid residue uniquely conserved in BBS6 sequences, within a conserved region including parts of the chaperonin C-terminal Intermediate and Equatorial structural domains (Supplementary figure S4).

Functionality of non-vertebrate chaperonin-like BBS sequence
The high rate of divergence of CL-BBS proteins suggests that their evolution was either driven by positive selection, as in functional differentiation, or by neutral differentiation, as could be expected in case of loss of functionality. To establish functionality of the newly identified CL-BBS sequences we evaluated (i) presence of codon-position-specific compositional contrasts in the predicted coding regions, (ii) the ratio between non-synonymous and synonymous evolutionary rates (Ka/Ks ratio), and (iii) presence of corresponding gene transcripts.
We tested all coding sequences newly predicted in Oomycetes for the presence of significant association of nucleotide usages with codon position typical of coding regions (see Supplementary Methods). Surprisingly and despite their sequence similarity, we identified great heterogeneity of codon-position-specific nucleotide usages among Oomycete CL-BBS coding sequences, which often conformed to expectations only within non-significant sequence stretches (Supplementary figure S5). However, we also found non-significant codon-position association of nucleotide usages in human BBS6, in sharp contrast to the high significance of the associations observed instead for the canonical chaperonin gene CCT8 (Supplementary figure  S5).
Using the PAML4 [20] software we estimated the overall ratio of non-synonymous and synonymous substitution rates (Ka/Ks ratio) during the evolution of lineage-specific CL-BBS genes within the BBS6, BBS10 and BBS12 gene families, based on the complete tree (including the root-branch) and on sub-trees of the same groups of sequences (excluding the root-branch) (Figure 3). All analyses resulted in highly significant (p<<0.001) reduction of non-synonymous compared to synonymous substitution rates (Ka/Ks<<1.0), indicating that the evolution of CL-BBS proteins within the corresponding lineages was characterized by strong constraining selection, despite their overall fast evolutionary rate.
We identified in public databases ESTs corresponding to many of the non-vertebrate CL-BBS genes here described (Supplementary   [20]. Black circles identify rooted clusters of "foreground" branches for which ω was estimated in comparison to all other branches ("background") of the complete tree, using PAML4 branch model 2 and one-rate model M0. Red circles identify unrooted subtrees for which ω was independently determined with model M0.
to the different taxonomic groups. A similar pattern was observed in BBS10 proteins, with the addition of non-conserved indels also in the N-terminal Equatorial or Intermediate domains, and of deletions in the Apical domain of the sequence from Ciona. BBS12 sequences were characterized by the greatest occurrence of non-conserved indels affecting the Intermediate and Equatorial domains, and, as previously mentioned, by failed recognition of typical chaperonin structural elements in the sequence from Lottia gigantea. Thus, with few exceptions, structural elements of the chaperonin Apical domain appeared to be remarkably conserved across most CL-BBS sequences, whereas other structural domains, particularly the C-terminal part of the Equatorial domain, showed the greatest amount of perturbations, in the form of missing sequences and of indels of different size, generally not conserved across phyla.
Chaperonin proteins are ATPases with well-characterized ADP/ ATP-binding and ATP-hydrolysis motifs, well conserved across eukaryotic CCT protein sequences. In contrast, the ADP/ATP-binding and, in particular, the ATP-hydrolysis motifs, are not as conserved among vertebrate CL-BBS proteins [7,10,12,13]. We compared profiles of amino acid usage (logos) in the ATP binding and hydrolysis motifs of a large collection of CCT proteins to those of non-vertebrate CL-BBS proteins (Supplementary figure S6). In BBS6 and BBS10 proteins we observed within the ADP/ATP-binding motif -[LYFMI]GPx[GAS] xxK[ILM] -substantial conservation of the GP dipeptide, which is shown in chaperonin structures to be in direct contact with ATP and to entail an unusual conformation of the protein backbone (phi/psi angles). In BBS12, the ADP/ATP-binding motif was less conserved, including substantial variability of the crucial GP dipeptide. The ATP-hydrolysis motif -GDGT[TN][TSG] -was less conserved than the ADP/ATP-

Structural features of non-vertebrate chaperonin-like BBS proteins
Homology modeling of CL-BBS structures based on the available chaperonin structures is biased by the implicit assumption that chaperonin and CL-BBS proteins share similar core structures. The considerable sequence divergence of CL-BBS proteins from canonical chaperonin proteins makes the assumption of homology modeling problematic. We chose instead to assess the structural features of CL-BBS proteins by predicting their secondary structure elements, reasoning that conservation of secondary structure elements typical of chaperonin structures would strongly indicate that CL-BBS proteins also conserve the tertiary structure of chaperonin proteins. Having previously shown that current prediction methods can successfully identify the secondary structure elements of chaperonin proteins [10], we independently predicted secondary structure elements from CL-BBS sequences of different taxonomic groups, excluding information from known structures and from alignments with chaperonin or with CL-BBS proteins from other groups. We found that, with the exception of the BBS12 sequence from Lottia gigantea, predicted secondary structure elements of non-vertebrate (as well as vertebrate) CL-BBS sequences corresponded in many instances to those of chaperonin proteins ( Figure 4). However, we also identified significant differences. In the case of BBS6, most structural elements appeared to be conserved in correspondence to the N-terminal part of the chaperonin Equatorial and Intermediate domains, and in correspondence to the Apical domain. The C-terminal part of the Intermediate and Equatorial domain regions was instead either not recognizable (Leech, Capitella) or perturbed by a variety of insertions and deletions (indels) specific Capitella sp. 10   binding motif. In BBS6 proteins aspartate (D) was conserved, but not glycine (G), which in canonical chaperonin structures also corresponds to unusual protein backbone conformation. In BBS10 proteins, although the ATP-hydrolysis motif was conserved as a consensus, in individual sequences substantial variability was observed at most positions. The ATP-hydrolysis motif was not conserved in BBS12 proteins.

Discussion
The identification of fourteen genes associated with the multisystemic developmental disorder Bardet-Biedl Syndrome (BBS) highlights the broad role of ciliary and other microtubule-based processes in cellular homeostasis and in organism development [21]. The identification of these genes prompted the discovery of an essential ciliary complex, the BBSome [3], and of chaperonin-gene paralogs mostly localized to the basal body and to the centrosome [7,12,13]. Phylogenetic studies indicated that the chaperonin-like BBS (CL-BBS) gene family originated from duplication of a progenitor of the CCT8 chaperonin gene [10], and its identification among vertebrates [7,8,12,13] and other metazoan species [7,14] suggested a metazoan origin. Our discovery of chaperonin-like BBS6 sequences also in Oomycetes and their position within the phylogenetic tree suggests instead that chaperonin-like BBS genes originated and triplicated before separation of the lineages of Opisthokonts (or Unikonts) and Chromalveolates (>2300 Ma ago [22]), hence much earlier than the time of origin of vertebrates (~500 Ma ago [21]) or of Metazoa (~1450 Ma ago [22]). The possibility that the BBS6 gene has been acquired by Oomycetes by Lateral Gene Transfer (LGT) from a different organism cannot however be excluded.
LGT events between eukaryotic species are not common and most of the times they involve either transfer to a protist phagotrophic recipient species, or transfer between plant species [23]. However, events of LGT have been recognized to play a significant role in the evolution of plant-parasitism in Oomycetes. These involved transfer of genetic material from fungi (Ascomycetes) [24,25] and through an ancestral photosynthetic plastid derived from an endosymbiont red alga [25,26]. The phylogenetic tree of BBS6 genes (Figure 2) would be consistent with both the hypothesis that the oomycete BBS6 genes originated from a lineage of fungi, and the hypothesis that they originated from a red alga (Rhodophyta). These hypotheses could be tested and verified by identifying the BBS6 donor gene from fungi or red algae and demonstrating that the Oomycete gene cluster with one or the other in the phylogenetic tree. Cilia were present in the common progenitor of Archaeplastida (plants) and are commonly found in lower plants, but they have been secondarily lost in red algae and in many land plants, where genes for proteins with an ancestral ciliary function are still found [27]. Cilia must also have been present in the common ancestor of Fungi, and they are still found in Chytridiomycota, the only group of real fungi known to develop flagellated zoospores. However, although we searched for CL-BBS genes in the available genome of the red alga Cyanidio schyzonmerolae as well as in the genomes of land plants and green algae, including the flagellated unicellular green alga Chlamydomonas reinhardtii (Table  1), we could not identify any CL-BBS gene in these genomes. We also searched for CL-BBS genes in twenty available genomes from Fungi (mostly Ascomycota) (Supplementary table S1), including the genome of Batrachochytrium dendrobatidis, a chytridiomycete with flagellated zoospores, and again we could not identify any CL-BBS gene in any of these genomes. Although lack of a corresponding gene prevents positive verification of the LGT hypothesis, it cannot be excluded that a BBS6 gene was present in the ancestral red alga endosymbiont or in an ancestral fungus, and that it was transferred to the water mold genome before being secondarily lost. The scenario of LGT from a red alga would still imply an ancient origin and very early triplication of the CL-BBS gene family, which would have occurred before separation of the lineages of Archaeplastida (leading to red algae) and Opisthokonts (leading to Metazoa). The hypothesis of LGT from an ancestral fungal species would set a somewhat later origin and triplication of the CL-BBS genes, but pre-dating the time of separation of the lineages of Fungi and Holozoa (including Metazoa) possibly in the Opisthokont lineage. Thus, any of the three hypotheses explaining the origin of the BBS6 gene in Oomycetes (vertical descent or LGT from red algae or from Fungi) imply that the CL-BBS gene family origin and triplication predated the origin of Metazoa, and depict scenarios of gene losses that are consistent with the more recent history of the gene, including loss of all three CL-BBS paralogs in Ecdysozoa, and of BBS12, independently in Echinodermata and in Urochordata (Figure 1).
Conservation of secondary structure elements ( Figure 4) indicated that, despite their sequence divergence, CL-BBS proteins from different phylogenetic groups conserve a typical chaperonin "Apical Domain". The isolated apical domain is sufficient in canonical chaperonin proteins for retaining substrate-binding properties [28], and in BBS proteins for conferring centrosomal localization [7]. The conservation of a chaperonin-like structural apical domain in CL-BBS proteins suggests that CL-BBS proteins bind to their substrates in a similar way than canonical chaperonin proteins. In contrast, the putative ATP-binding "Equatorial Domain" of non-vertebrate CL-BBS proteins is disrupted by proliferation of non-conserved deletions and insertions, and by divergence of the ADP/ATP binding site of BBS12 and of the ATP-hydrolysis sites of BBS6, BBS10 and BBS12, as also previously noted for their vertebrate orthologs [7,10,12,13]. Since most intra-ring and all inter-ring interactions in the canonical chaperonin complex involve the Equatorial Domain [29] and the ATP binding and hydrolysis sites are necessary for the folding activity of the chaperonin complex [30][31][32], divergence of the Equatorial domain and of the ATP binding/hydrolysis sites suggest that CL-BBS proteins do not assemble in a functional chaperonin-like complex. This conclusion is supported by early reports that CL-BBS proteins are not found associated in a complex [7]. However, more recently it has been reported that CL-BBS proteins associate with selected CCT monomers and with the BBSome component BBS7 in a "BBS/CCT complex" [8,9]. To reconcile the strong experimental evidence of formation of a protein complex with the apparent loss of sequence and structure integrity of the Equatorial domain of CL-BBS proteins, we suggest that CL-BBS and CCT proteins may aggregate in a non-chaperonin complex through their interaction with BBS7 by means of their relatively conserved substrate-binding apical domains, rather than in a hybrid BBS/CCT chaperonin-like conformation. This hypothesis would also be consistent with the observation that CL-BBS and CCT proteins aggregate only in the presence of BBS7 [8,9], suggesting that they are unable to assemble into a multimeric complex stabilized by monomer-monomer interactions as in chaperonins.
CL-BBS proteins are required for BBSome assembly [8,9] and localize to various tubulin-dense structures, including, besides the pericentriolar material of centrosomes and basal bodies [7,33], also the intercellular bridge at mitosis [7] and dendrites of mature neurons [34]. Intriguingly, it has been observed that CCT proteins, besides being essential for the folding of several proteins in their TRiC/CCT complex conformation [6], also bind as individual monomers to microtubule filaments [35] or to the growing ends of actin polymerizing filaments [36]. These observations suggest that CCT monomers and chaperoninlike BBS proteins are also capable of association with microtubules and other filamentous structures in a yet-to-be-characterized manner. developing cilia or flagella at some stage of their development. For example, Phytophthora and Pythium develop motile flagellated zoospores from sporangia. However, CL-BBS genes are not found in all organisms that develop cilia or flagella. For example, they are not found in species from Ciliates, Choanoflagellates, or in the flagellated green alga Chlamydomonas or in the flagellated fungus Batrachochytrium. In the case of Ciliates, it is known that specific chaperonin monomers are essential for cilium development [37], suggesting that in this group certain CCT monomers may be the functional equivalent to CL-BBS proteins. If chaperonin-like BBS genes emerged early in eukaryote evolution from a pre-adapted CCT gene, the poor correlation of their distribution with the distribution of ciliary structures in different lineages, might reflect some functional overlap with CCT monomers in affecting cilium development and functionality.

Material and Methods
We searched chaperonin-like BBS gene orthologs in 37 completely sequenced eukaryotic genomes. To these we added at a later stage of the analyses two Oomycete genomes that became available (for a total of five Oomycete genomes, Table 1), and 18 genomes from Fungi, based on results (for a total of 20 Fungus genomes, Supplementary table S1). Query targets were identified using TBLASTN [38] with the method of reciprocal best hit [16], according to the following procedure. Human chaperonin-like BBS (CL-BBS) proteins were used as queries and BLAST hits were collected with a liberal cut-off value (E-value<1.0). Whenever candidate CL-BBS gene homologs were not identified using human CL-BBS proteins as queries, we mined the genomes with CL-BBS proteins from other vertebrate species or, when available, CL-BBS proteins identified with previous searches in non-vertebrate genomes most closely related to the target genome. An extended region around each hit (up to ± 5000 bp) was excised from the genome and the corresponding query protein was used to guide the prediction of the complete structure of the newly-identified gene, based on homology and on intron-exon junction signals, using the gene-prediction software FGENESH+ [39] at the Softberry web-site (linux1.softberry. com). Reverse BLAST analyses were performed using the extended predicted protein sequence as queries against the NCBI non redundant (nr) database.
Multiple sequence alignments were obtained using MUSCLE [40]. Pairwise similarity of CL-BBS and CCT proteins was calculated from the alignment and the corresponding pairwise dissimilarity (1.0similarity) matrix was used to produce a phenogram using the UPGMA method [41].
Phylogenetic trees were obtained using Maximum-likelihood (ML) and Bayesian probabilistic methods, and by the neighbor-joining distance method [42]. Maximum-likelihood evolutionary trees were produced with PHYML 3.0 [43] with the LG substitution matrix [44], simultaneously estimating tree topology and branch lengths, amino acid equilibrium frequencies, fraction of invariable sites and discrete-gamma distributed substitution rates (8 states). Support for tree branches of the ML tree was obtained with the approximate Likelihood-Ratio Test (aLRT) [45].The Bayesian tree was generated using PHYLOBAYES 3.2 [46] based on the CAT-GTR model, inferring from sequence data amino acid substitutability matrix coefficients (GTR model) and position-specific equilibrium frequencies of amino acids (CAT model). Support values for the Bayesian tree topology were obtained as branch marginal posterior probabilities calculated from the distribution sampled from two converged MCMC chains of 20,000 cycles sampled every 10 steps after a burn in of 4,000 cycles. Thus, while for the ML method we used a model with generalized amino acid equilibrium frequencies, the Bayesian method was instead based on a highly-parameterized profile mixture-model of position-specific amino acid equilibrium frequencies, expected to be more resistant to long-branch attraction effects [17,18]. Neighbor-joining trees were obtained using MEGA5.1 [47] with a distance matrix based on the JTT substitution model and gamma distributed rates with parameter a=0.5, 1.0, 1.5, 2.0, 2.212 (the maximum likelihood estimate), 2.5, or 3.0, with bootstrap branch supports from 1000 sampling replicates.
Ratios of non-synonymous and synonymous substitution rates (ω=Ka/Ks) were estimated using the program CODEML from the PAML 4.0 package [20,48]. Significance of the estimates was tested with the Likelihood Ratio Test (LRT) [49] comparing the one ratio model M0 (Ka/Ks=x) with the null model Ka/Ks=1.0. Ka/Ks ratios were calculated testing the evolutionary tree of each group of interest independently, and using a branch-specific model where "foreground" branches in turn represented each group within the complete tree.
Consensus secondary structure predictions were independently obtained for each of the sequences identified in different taxonomic groups with the secondary structure prediction tool JPRED3 [50] excluding any supporting information from other homologous sequences, i.e., excluding aligned sequences not belonging to the group of interest, and excluding BLAST database searches. Predictions were compared with the secondary structures described for the crystal structure of the Thermoplasma acidophilum thermosome (PDB code 1a6d, chain A), a class 2 archaeal chaperonin.