Inference on Coat Protein Evolution of Lily Symptomless Carlavirus in India and Abroad Based on Motifs Study and Phylogenetic Analysis

1Department of Biotechnology, Dr Y. S. Parmar University of Horticulture and Forestry, Nauni, Solan (Himachal Pradesh) 173230, India 2Department of Entomology, Dr Y. S. Parmar University of Horticulture and Forestry, Nauni, Solan (Himachal Pradesh) 173230, India 3Department of Plant Pathology, Dr Y. S. Parmar University of Horticulture and Forestry, Nauni, Solan (Himachal Pradesh) 173230, India #Current address: Department of Molecular Biology, Central Potato Research Institute, Shimla (Himachal Pradesh) 171001, India ##Current address: Institute of Himalayan Bioresource Technology, HATS, Palampur, (Himachal Pradesh) 176061, India ###Current Address:Department of Biology, University of Antwerp, Universiteitsplein 1, B 2610 Wilrijk, Belgium


Introduction
Lilium crop has been reported to be susceptible to around twenty viruses under natural and glasshouse conditions (Lee, 1992). The three viruses, associated with most lily viral diseases are aphid transmissible Lily symptomless (LSV), Tulip breaking (TBV) and Cucumber mosaic (CMV) carlaviruses (Allen, 1975), but the most common virus diseases in Lilium are caused by LSV alone or in combination with mixed infections of Cucumber mosaic cucumovirus, Lily mottle potyvirus and Tulip breaking potyvirus (Allen 1972;Brunt et al., 2000;Derks and Asjes, 1975;Derks, 1995). LSV is a member of Carlavirus genus which includes more than fifty viruses. LSV infection results in unmarketable flowers and severe reduction in bulb size leading to a drastic reduction in economic returns (Asjes, 2000). LSV is aphid transmissible virus (Brierley and Smith, 1944a;Brierley and Smith, 1944b;Brierley and Smith, 1945) infecting lilies naturally. Various lilies, namely Lilium longiflorum, Lilium tigrinum, Asiatic hybrid lily, Oriental hybrid lily, etc. grown in Himachal Pradesh, India, have been found to exhibit various viral symptoms like yellowing, chlorotic striping, vein clearing and deformed flowers. So, it will be significant to know about the evolution of this virus in a way which could lead us to stop it from spreading.
Carlaviruses are the large genus of plant viruses. The genome is a single stranded  Kb in size (Cavileer et al., 1994;Fugi et al., 2002;Zavriev et al., 1991) and comprises six ORFs, encoding, in order, the replication related proteins, the putative movement proteins (MP) i.e. triple gene block (TGB), the coat protein (CP) and a putative nucleic acid binding regulatory protein (NABP). CP subunits are of one type, and 31-36 KDa in size (Adams et al., 2004). The carlaviral genomes have a poly (A) tract at their 3´-terminus and a cap structure or a monophosphate at their 5´-terminus (Zavriev et al., 1991). The genus comprises of more than 50 viruses such as LSV was first reported in Lilium species from Oregon, USA by Brierley and Smith in 1944. Today around the world, LSV has been reported from different countries of USA, Europe, Asia and Australia (Asjes, 1998). LSV is transmitted by Myzus persisae, Macrosiphum euphorbiae, Aulocorthium solani, Aphis gossypii, Aphis fabae or by whiteflies also. This aphid-borne carlavirus is unique to its genus as the plants infected with this virus show no symptoms at its initial stages of development, which leads to a problem in early detection of this virus. We describe here the comparative analysis of this particular virus because it is essential to prepare knowledge based design strategies for controlling these types of viruses.
This study conducted in light of CP gene sequences of various carlaviruses present throughout the world, which is a function of viral assembly and behavior, is the first to show a detailed analysis of coat protein of this important species of plant viruses. Comparisons of the CP gene of carlaviruses have led us to hypothesize about the probable importance of China or India as a source of diversity and evolutionary change with respect to LSV. The investigations also indicated high level of variations in the test LSV isolate (Accession no. AJ748277) occurring at nucleotide level as compared to amino acid level.

Methods
Sequence selection for comparative analysis of coat protein gene of lily symptomless carlavirus with other carlaviruses NCBI (National Centre for Biotechnology Information) database (http://www.ncbi.nlm.nih.gov/) was searched for all the carlaviral CP gene sequences (nucleotide sequences) present ( Table 1). One of the nucleotide sequences of coat protein gene from a regional LSV isolate was selected from NCBI database as the test sequence, as it shared 100% homology with one of the LSV isolates sequenced in our lab. This sequence of LSV isolate LSV-Oh (Accession no. AJ748277) selected, was 882 base pair long (Singh et al., 2005). All these sequences were available in GenBank format in NCBI so these were converted into 'Fasta' (Pearson, 2000) format for further experimentations.

Multiple sequence alignment and motifs search
Comparative studies of nucleotide sequence of coat protein gene of LSV (Accession no. AJ748277) with that of other carlaviruses were carried out separately for each country except for Brazil and USA.
As there was only one CP gene nucleotide sequence available from Brazil in NCBI database, so it had to be grouped with the two CP gene sequences from USA, so that a multiple sequence alignment could be generated. ClustalW program available at Network Protein Sequence Analysis (NPS@) web server, PBIL (Pole Bio-Informatique Lyonnais), Lyon; France (http://pbil.univ-lyon1.fr/) was used to see various conserved regions among the nucleotide and amino acid sequences of CP gene of carlaviruses from different countries and a data of Percentage similarity score for alignments of different countries was also obtained using EBI Tool: ClustalW (http://www.ebi.ac.uk/ Tools/clustalw/index.html). Thus, six multiple sequence alignments for CP nucleotide sequences were generated, each for Brazil and USA; Canada; China; India; Japan and South Korea and another six were generated for the amino acid sequences. One already known nucleotide motif, AATAAA (Polyadenylation signal motif), was searched manually in the multiple sequence alignments of carlaviral CP nucleotide sequences, as there was no software available to us which contain the database for motifs of carlaviruses.

Phylogenetic analysis of ORF5
EXOME-HORIZONTM software (Mascon Global Ltd., New Delhi) was used to construct Maximum Likelihood (ML) trees (using PHYLIP package and Dnaml program) for CP nucleotide sequences from different countries, separately. Further, Neighbor-joining (NJ) method of EBI Tool: ClustalW was used to construct a single combined phylogenetic tree for all the carlaviral CP nucleotide and amino acid sequences, seperately. Neighbor-joining method was employed because EXOME-HORIZON TM available to us did not accept large amount of sequence entries so, NJ of ClustalW was the freely available and reliable option.

Results and Discussion
A total of 79 complete coding carlaviral CP sequences were selected from NCBI database and one of the sequences (LSV isolate-Oh, Accession no. AJ748277) was selected as the test sequence and used in our studies. The multiple sequence alignment of CP gene nucleotide sequence of Lily symptomless virus isolate LSV-Oh (Accession no. AJ748277) with that of other carlaviruses from different countries using ClustalW program available at NPS@ web depicted that carlaviruses from Canada showed maximum identity of their residues (47.77%) with the test sequence, while least being shown by sequences from Brazil and USA (10.25%) ( Table 2).
The predicted amino acid sequence of protein products of the CP genes of all the 79 carlaviruses were deduced from nucleotide sequences using 'ExPASy' translate tool (http://us.expasy.org/tools/ dna.html). The multiple alignment of CP amino acid sequences performed separately for different countries using ClustalW program of NPS@ web server again depicted that sequences from Brazil and USA shared minimum identity of amino acid residues (1.06%) and those from Canada showed maximum identity of their residues (43.75%) with the test sequence (Table 3). EBI Tool: ClustalW showed widest range of percentage similarities i.e. 1% to 98% between translated amino acid sequences of test CP of LSV and rest of the carlaviral CP sequences from around the world.
The genome organization of HpLV was found to be similar to that of Potato virus M (PVM), rather than to sequences of other carlaviruses in earlier studies (Hataya et al., 2000), and similar results were depicted in our results where test LSV isolate exhibit less similarities (41-42% and 45%) at nucleotide and amino acid levels, respectively, with that of 7 Hop latent virus isolates from China (Xinjiang province), and only 37-38% and 46-47% of identities with PVM isolates at nucleotide and amino acid levels, respectively. It may be concluded that HpLV and PVM sequences are more identical. CP gene of Daphne virus S (DVS) shared 45.5% and 49.5% identities with LSV at amino acid and nucleotide levels, respectively in a report published earlier (Lee et al., 2003). In our investigations also we obtained almost similar findings, where 10 CP sequences of DVS isolates from South Korea, under study, were 39-44% and 46% identical to the test CP sequence of LSV isolate at nucleotide and amino acid levels, respectively. The results were in line with those obtained earlier (Singh et al., 2005), which revealed that LSV-T (LSV isolate obtained from Lilium tigrinum; AJ781318) had sequence homology value of 78-84% with the Indian isolates and it shows maximum relatedness of 85% with LSV-C (AJ564640) from China when compared to LSV isolates characterized from other regions of the world.
The present studies also demonstrate that almost all the CVB isolates shared similar homologies i.e. 34-46% and 37-45% with the test LSV sequence at nucleotide and amino acid levels, respectively, which indicated high level of identities within the Indian CVB isolates coat protein sequences. In earlier studies similar higher level of identities were obtained among the CP gene sequences of Indian CVB isolates, ranging from 74-98% and 74-99% at nucleotide and amino acid levels, respectively (Singh et al., 2007).
AATAAA (Polyadenylation signal motif) was searched for in the six multiple nucleotide sequence alignments of CP gene sequences of carlaviruses but it was not conserved in any of the LSV isolates under study. AATAAA motif was found completely conserved in two CPMMV isolates from USA (CPMMV-H; AF024628 and CPMMV-M; AF024629; Figure 1) along with 7 HpLV isolates (EF202598, EF202599, EF202600, EF394781, EF394782, EF394783 and EF394784) from China ( Figure 2) and 3 DVS isolates (AJ971469, AJ971470 and AJ971471) from South Korea (Figure 3). This motif is also found in other viruses such as Rice tungro bacilliform virus (RTBV) belonging to caulimoviridae family, where it forms an essential part of the poly (A) signal (Rothnie et al., 2001) and in 3´ untranslated region of ORF5 of Banana mild mosaic virus (genus not assigned) (Gambley and Thomas, 2001). The AATAAA nucleotide motif was absent from multiple sequence alignment of Canada and India.
Six phylogenetic trees of carlaviral CP nucleotide sequences prepared by ML method for different countries placed the test LSV closest to GarCLV isolates from Brazil ( Figure-4A and Figure-4B         This may be attributed to the changes in LSV sequences of either India or China at translational level due to microenvironment of the virus depending upon the climatic factors and virus-host interactions in the respective countries of their evolution.

Conclusions
On the basis of multiple sequence alignment, motif studies, and phylogenetic analysis it could be interpreted that in Lily symptomless virus variations are taking place at a faster pace at nucleotide level.
Although at present much of the functioning of the coat protein gene have not been halted but if these variations keep on accumulating then (at sudden point) it may lead to evolution of new strains of viruses capable of widespread dispersal and damage. Also, this virus shared its most recent common ancestry with its native LSV isolates from India and with LSV isolates from China, probably indicating its origin in either of the countries. However, extensive experimentation is required to study the real consequences of these changes or variations occurring in the coat protein of LSV at nucleotide level, especially in its conserved motif sequences.