G. H. Patel Post Graduate Department of Computer Science and Technology, Sardar Patel University, Vallabh Vidyanagar, Gujarat-388120, India
Received date: March 20, 2014; Accepted date: April 26, 2014; Published date: April 29, 2014
Citation: Patel SS, Vaidya MB, Shah DB (2014) Homology Modelling of Conserved rbcL Amino Acid Sequences in Leguminosae Family. J Data Mining Genomics Proteomics 5:154. doi:10.4172/2153-0602.1000154
Copyright: © 2014 Patel SS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Data Mining in Genomics & Proteomics
This study is focus on Homology modelling of few Leguminosae family species which are found in Gujarat state, INDIA. There are three subfamilies of Leguminosae family which are Fabaceae (Papilionaceae), Caesalpiniaceae and Mimosaceae. Multiple sequence alignment carried out of few species’ rbcL protein sequences in each subfamily and conserved amino acid considered for homology modelling. Evolutionarily related proteins have similar sequences and naturally occurring homologous proteins have similar protein structure. It has been shown that three-dimensional protein structure is evolutionarily more conserved than would be expected on the basis of sequence conservation alone; we found that there are few amino acids which are common with same base pairs in each sub-family even though they are from different genus. There is no Protein structure available of conserved amino acids in PDB database of our study so we did homology modelling of three rbcL protein sequences (one from each sub family) which are found conserved in Multiple sequence alignment and structure validation with Ramachandran Plot was carried out and CASTp server was used to find out active sites in predicted protein structure and finally function of each predicted protein reported after this homology modeling of few conserved rbcL amino acid sequences in Leguminosae family.
Homology modelling; Bioinformatics; Leguminosae family; rbcL
Leguminosae family contains species of Plants, Herbs, Shrubs, and Trees. Legumes are used as crops, forages and green manures; they also synthesize a wide range of natural products such as flavours, drugs, poisons and dyes. Legumes are able to convert atmospheric nitrogen into nitrogenous compounds useful to plants  This is achieved by the presence of root nodules containing bacteria of the genus Rhizobium. These bacteria have a symbiotic relationship with Legumes, fixing free nitrogen for the plants; in return legumes supply the bacteria with a source of fixed carbon produced by photosynthesis. This enables many legumes to survive and compete effectively in nitrogen poor conditions [2,3]. Leguminosae family is further classified into three subfamilies; 1. Fabaceae (Papilionaceae), Caesalpiniaceae and 3. Mimosaceae.
The most common gene used for plant phylogenetic analyses is the plastid-encoded rbcL gene. This single copy gene is approximately 1430 base pairs in length and is free from length mutations except at the far 3’ end. It has fairly conservative rate of evolution. The function of the rbcL gene is to code for the large subunit of ribulose 1, 5 bisphosphate carboxylase/oxygenase (RUBISCO or RuBP Case) .
Recent genome sequencing projects have provided massive amount of data, however, many of these genomes are still not fully annotated and consist of genes/proteins with unknown function and structure. This is due to several limitations, such as the cost and time required for experimental approaches . An alternative to laboratory based methods is a bioinformatics approach that utilizes algorithms and databases to estimate protein function. As these algorithms and databases are based on experimental results, they can be an effective means to perform functional and structural annotation of hypothetical proteins. Structures are more evolutionary conserved than sequence; therefore, analysis of three-dimensional (3D) structures holds great potential. Our present study describes the three 3D models of rbcL protein sequences which found conserved in multiple sequence alignment and further three protein structure predicted through homology modelling. In addition sequence and structural analysis and functional annotation were also done .
In current research, we have considered around 266 species which are found in Gujarat state of India [7,8]. Further we searched each species in NCBI database and finally found around 149 species’ information like DNA, Protein and other useful information of leguminosae family . We have only considered rbcL protein sequences for analysis. For calculating physio-chemical properties, Prot Param was used; Various parameters computed by ProtParam included the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY)  Secondary structure was also predicted (helix, sheets, and coils) by using PSI Pred .
Homology modelling approach was used to determine the 3D structure of three rbcL conserved protein sequences. BLASTP by Altschul et al.  search with default parameters were performed against the Brookhaven Protein Data Bank (PDB) to find suitable templates for homology modelling. For Fabaceae (Papilionaceae) 1RLD; for Caesalpiniaceae 1WDD and for Mimosaceae 1EJ7 were considered as the best templates for Homology modelling. Later SPDBV was used for homology model construction.
Protein structure validation
Active site prediction
The PDB file constructed was then used for finding the cavities in the protein and for this Computed Atlas of Surface Topography of proteins (CASTp) server was used. CASTp provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins.
The pipeline for the followed methodology is as represented in Chart 1.
The present study focused on sequence and structural analysis of rbcL protein sequences which are found conserved in Leguminosae Family’s subfamilies; for Fabaceae subfamily 38%, for Caesalpiniaceae 60% and for Mimosaceae 54% species were found which had conserved sequences as shown in Table 1. Prot Param was further used to analyze different physiochemical properties from the amino acid sequence which are listed in Table 2.
|Sub family||rbcL protein sequences|
Table 1: Information of conserved rbcL protein sequences considered for Homology modelling.
|Molecular weight||1776.0 Daltons||1298.5 Daltons||2326.5 Daltons|
|Estimated half-life||4.4 hours (mammalian reticulocytes, in vitro).||1.9 hours (mammalian reticulocytes, in vitro).||1 hours (mammalian reticulocytes, in vitro).|
|>20 hours (yeast, in vivo).||>20 hours (yeast, in vivo).||2 min (yeast, in vivo).|
|>10 hours (Escherichia coli, in vivo).||>10 hours (Escherichia coli, in vivo).||2 min (Escherichia coli, in vivo).|
|Grand average of hydropathicity (GRAVY)||-0.131||-0.425||-1.290|
|Classification of Protein||Unstable||Stable||Stable|
Table 2: Result of Physio-chemical Properties as calculated by Prot Param tool.
Results of Prot Param tools shows that protein of Fabaceae subfamily is unstable but stable protein was found in rest of subfamily. While estimated half-life result of Mimosaceae was found very less compare to other two sub-family as shown in Table 2.
Secondary structure analysis was performed using PSI Pred and the three rbcL protein were predicted to contain several helices, coil along with beta sheets as shown in Figures 1a-1c.
Homology modeling and protein structure validation
Homology or comparative modelling is one of the most common structure prediction methods in structural genomics and proteomics. Numerous online servers and tools have become available for homology or comparative modelling of proteins in past years . Despite minimal modifications, one initial step that is common in all modelling tools and servers is to find the best matching template by performing a sequence homology search with BLASTP . Templates are experimentally determined 3D structures of proteins that share sequence similarity with the query sequence. The template sequence and the protein sequence whose structure is to be determined are aligned using multiple sequence alignment algorithms . A welldefined alignment is very important for the prediction of a reliable 3D structure. BLASTP search was performed for each protein sequence against the PDB to identify templates for homology modelling. Then the query sequence and template ID were given as input for homology modelling using SPDBV. It generated three predicted protein Models which are shown in Figures 2a-2c. From the models retrieved, the selected model along with Ramachandran plot is shown in Figures 3a-3c respectively [16-24]. The final model was selected by checking various parameters and these are shown in Table 3. These parameters included percentage of amino acids in core, allowed and disallowed regions along with no of bad contacts.
|Subfamily||Core region (in %)||Allowed region (in %)||Disallowed region (in %)||Bad Contacts|
Table 3: Information of Ramachandran Plot.
Active site prediction
Active site signifies the functional region of the protein. During the active site prediction with the help of CASTp, it was observed that few pockets were predicted in Caesalpiniaceae and Mimosaceae subfamily protein structure but no pocket found in Fabaceace subfamily protein structure. Some of the predicted pockets are as shown in Figures 4a-4c.
We have used homology modelling approach to propose the 3D structure and possible functions for the conserved rbcL protein sequences which are found in Leguminosae Family. The function of protein can be understood better by its structure and structure of rbcL protein is already known so the function of these fragments of conserved sequences are confirmed by taking following templates; for Fabaceae (Papilionaceae) subfamily, 1RLD; for Caesalpiniaceae subfamily, 1WDD and for Mimosaceae subfamily, 1EJ7 for Homology modelling. Later SPDBV was used for homology model construction. With the help of above findings we found that each conserved protein involved in important function like in Caesalpiniaceae subfamily predicted protein structure has site which is heterodimer interface [polypeptide binding] and disulfide bond found within that particular structure. In Mimosaceae subfamily, predicted protein structure has few sites which are heterodimer interface [polypeptide binding], active catalytic residue site and metal binding site [ion binding] and no active site found in Fabaceae subfamily predicted protein structure. So, these particular predicted protein structures has many important feature as described above and found common in selected species of study in each subfamily of Leguminosae family and these protein sequences can be used for classification of Leguminosae Family species as protein sequences are found conserved in each subfamily. So, if your protein sequence has one of the conserved protein sequences as described in this study then it might be fall within that particular subfamily of Leguminosae Family.
We are heartily thankful to Prof. (Dr.) P.V. Virparia, Director, GDCST, Sardar Patel University, Vallabh Vidyanagar, for providing us facilities for the research work. We are also thankful to DST-PURSE program and Center for Interdisciplinary Studies in Science and Technology (CISST), Sardar Patel University, Vallabh Vidyanagar, Gujarat (India) for providing financial assistance in the form of fellowship.