alexa In-silico Characterization, Structural Modelling, Docking Studies and Phylogenetic Analysis of 5-Enolpyruvylshikimate-3-Phosphate Synthase Gene of Oryza sativa L. | Open Access Journals
ISSN: 2167-0412
Medicinal & Aromatic Plants
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

In-silico Characterization, Structural Modelling, Docking Studies and Phylogenetic Analysis of 5-Enolpyruvylshikimate-3-Phosphate Synthase Gene of Oryza sativa L.

Ubaid Yaqoob1*, Tanushri Kaul2, Saurabh Pandey2 and Irshad Ahmad Nawchoo1

1Plant Reproductive Biology, Genetic Diversity and Phytochemistry Research Laboratory, Department of Botany, University of Kashmir, Srinagar, Jammu and Kashmir, India

2Plant Molecular Biology Lab, International Centre for Genetic Engineering and Biotechnology, New Delhi, India




*Corresponding Author:


Ubaid Yaqoob
Plant Molecular Biology Lab, International Centre for Genetic Engineering and Biotechnology
Aruna Asaf Ali Marg, New Delhi-110 067, India
Tel: +919796186479
E-mail: [email protected]

Received Date: November 02, 2016; Accepted Date: November 09, 2016; Published Date: November 16, 2016

Citation: Yaqoob U, Kaul T, Pandey S, Nawchoo IA (2016) In-silico Characterization, Structural Modelling, Docking Studies and Phylogenetic Analysis of 5-Enolpyruvylshikimate-3-Phosphate Synthase Gene of Oryza sativa L. Med Aromat Plants (Los Angel) 5:274. doi: 10.4172/2167-0412.1000274

Copyright: © 2016 Yaqoob U, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Medicinal & Aromatic Plants


The 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) is one of the vital enzymes of the shikimate pathway which is involved in the biosynthesis of secondary metabolites and several amino acids. The multiple sequence alignment of these EPSPS protein sequences from different plants showed conserved regions at different stretches with maximum homology in amino acid residues. We revealed the homology model of Oryza sativa EPSPS (OsEPSPS) protein using the structure of E. coli EPSPS as template. The resulting model structure was refined by PROCHECK, RAMPAGE server, ProSA, Verify3D etc. that indicated the model structure is reliable. Ramachandran plot analysis showed that conformations for 94.3% of amino acid residues are within the most favoured regions. Through motif analysis, it was revealed that a conserved EPSPS domain is uniformly found in all EPSPS proteins irrespective of variable plant species suggesting its possible role in cellular and metabolic functions. The phylogenetic tree constructed revealed different clusters based on EPSPS in respect of bacteria, monocot and dicot plants. The interacting partners of the gene shows the importance of this gene family in regulating developmental and metabolic functions. The two conserved motifs LP(G/S)KSLSNRILLLAAL and LFLGNAGTAMRPL present in almost all EPSPS plant species may function as the catalytic domains of EPSPS enzymes and are supposed to contribute in the glyphosate binding site.


EPSP synthase; Glyphosate; Herbicide; Oryza sativa ; Shikimate pathway


The 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), one of the key enzymes of the shikimate pathway is involved in the biosynthesis of several aromatic amino acids (Phenylalanine (Phe), Tyrosine (Tyr) and Tryptophan (Trp)) and other secondary products (auxin, salicylate, folic acid, phytoalexins, flavonoids, alkaloids etc.) essential for plant survival [1]. It is also verified as a specific target of broad spectrum herbicide glyphosate (N-phosphonomethyl glycine) [2]. EPSPS (aroA) plays a central role in catalysing the transfer of enolpyruvyl moiety from phosphoenol pyruvate (PEP) to shikimate-3-phosphate (S3P) forming EPSP and inorganic phosphate [3]. The reaction is chemically infrequent because it proceeds via C–O bond cleavage of phosphoenol pyruvate rather than via P–O bond cleavage [4]. Glyphosate (GPJ) inhibits EPSPS in a slowly reversible reaction, which is competitive with respect to PEP and uncompetitive with respect to S3P [5,6]. In most of the crops and weeds, glyphosate can starve the plants of aromatic amino acids by competitively inhibiting the binding of EPSPS with PEP. Mutagenesis of EPSPS was done in various species so as to obtain glyphosate-tolerant EPSPS like proline-106 to serine in E. indica [7], proline-106 to leucine in N. tabacum [1], glycine-100 to alanine in agrobacterium sp. strain CP4 [8], proline-101 to serine in N. tabacum [9]. The occurrence of shikimate pathway in algae, bacteria, fungi and plants makes EPSPS a principal target for rising herbicide-resistant genetically modified crops [10]. Thus understanding its mechanism for regulating metabolic and developmental processes in diverse plant species would be a great revolution for engineering new herbicides, developing glyphosate resistant crops, new antibiotic and anti-parasitic drugs.


Comparative modelling and structural analysis

The reference sequence of EPSPS from Oryza sativa was retrieved by using NCBI database ( By searching the PDB of known protein structures, the comparative modelling was performed with target sequence as the query [11]. The target sequence was searched for similar sequence using the BLAST (Basic Local Alignment Search Tool) [12] against Protein Database (PDB) (http:// The best template for query sequence was recognized based on the e-value, % sequence identity and % sequence coverage. The BLAST results yielded X-ray structure of EPSPS from E. coli with 53% similarity to our target protein (OsEPSPS). Using ClustalW [13], all the sequences of EPSPS were aligned to find out the similarity present among the sequences. 2D and 3D structure alignment was carried out using ClustalW [14] and MATRAS 1.2 [15], respectively. The sequences of the EPSPS were further analysed for the presence of specific EPSPS domains and motifs through motifscan (myhits.isbsib. ch/cgi-bin/motif scan) and scan prosite (Prosite.expasy.nlm.nih. gov). Analysis of conserved motifs was done by MEME version 3.5.7 [16] using minimum and maximum motif width of 20 and 50 residues respectively and maximum number of 7 motifs, keeping rest of the considerations at default. Via Modeller 9.12 by comparative modelling of protein structure prediction, the theoretical structure of OsEPSPS from was generated.

The secondary structural features of the EPSPS sequences of template and target were calculated using SOPMA. The physico-chemical properties of EPSPS sequences like molecular weight, theoretical isoelectric point (pI), number of amino acids, total number of positive and negative residues, aliphatic index [17], grand average hydropathy (GRAVY) [18] extinction coefficient [19] and instability index [20] were evaluated by using Expasy’s ProtParam server (http://us.expasy. org/tools/protparam.html) [21]. The sub-cellular localizations were predicted by using CELLO v.2.5 [22]. Using NetNglyc 1.0 server (, the N-glycosylation sites of the EPSPS proteins were predicted. Using String software (http:// the interacting partners of EPSPS and its co-expressed genes were predicted [23].

Model validation of OsEPSPS

On the basis of geometrical and stereo-chemical constraints, the model was evaluated using RAMPAGE server (http://mordred.bioc., PROCHECK [24], Verify 3D [25] and ProSA-Web [26]. The model with the least number of residues in the disallowed region was selected for the further studies. The RMSD value between the template and target was calculated using MOE [27]. The best model structure was then compared with the template protein by superimposition using SuperPose Version 1.0 [28].

Active site prediction and molecular docking

Active sites of model and template proteins were identified using different binding site prediction servers like Q-site finder (http://, CASTp (http://sts-fw.bioengr.uic. edu/castp/) and PINUP server ( PINUP/) [29-31]. The refined protein model (OsEPSPS) was used to study its ligand binding mechanism. Docking analysis was performed by Sybyl 8.0 molecular modelling tool to identify active sites on protein structure where favourable protein-ligand interactions can occur [32]. The ligand molecules (S3P and GPJ) were docked inside the cavity of OsEPSPS protein.

Phylogenetic analysis

Using Molecular Evolutionary Genetic Analysis (MEGA) software Version 4.1 [33], phylogenetic analysis of the sequences was carried by using UPGMA method. Each node was tested using the bootstrap approach by taking 5,000 replicates.

Results and Discussion

Comparative modelling and structural analysis

The Oryza sativa EPSPS (OsEPSPS) protein sequence comprises of 515 amino acid residues. Sequences that showed maximum identity with high score and low e-value were aligned. According to the result of BLAST search against PDB [34], three reference proteins (PDB ID: 3NVS, 1G6S, 3FJX) represented a high level of sequence identity - 54%, 53% and 53% respectively. The E. coli template (PDB ID: 1G6S) with an e-value of 2e-149 and a query cover of 84% was selected for homology modelling. Structurally conserved regions (SCRs) between model OsEPSPS (target) and homologous proteins (PDB: 1G6S, 3NVS, 3FJX) were determined by multiple sequence alignment (Figure 1). Multiple sequence alignment of the EPSPS sequences highlighted the sequence conservation of amino acid residues among different species (Supplementary File 1). Structurally conserved regions (SCRs) between model OsEPSPS and template (PDB: 1G6S) were also determined (Figure 2). An extensive search of the motifs and their positions was done by MEME software which identified several conserved motifs in the protein sequences of EPSPS (Figure 3). Multilevel consensus sequences for the MEME defined motifs along with their functions are shown in Table 1. LP(G/S)KSLSNRILLLAAL and LFLGNAGTAMRPL motifs were present in almost all selected species.


Figure 1: Comparative sequences structure alignment of OsEPSPS with other homologues.


Figure 2: Comparative sequence alignment of OsEPSPS (target) and E. coli EPSPS (template) using superpose.


Figure 3: Block diagram of multilevel consensus sequences for the MEME defined motifs of EPSPS proteins: Seven motifs were obtained by MEME software. Different motifs are indicated by different filled boxes with numbers 1 to 7.

Motif Multilevel consensus sequences Function
1 ITPPEKLNVTEIDTYDDHRMAMCFSLAACADVPVTIKDPGCTRKTFPDYF Protein kinase C phosphorylation site,Casein kinase II phosphorylation site and N-glycosylation site
5 VLQPIKEISGTIKLPGSKSLSNRILLLAALSEGTTVVDNLLNSDDIHYML Casein kinase II phosphorylation site, Protein kinase C phosphorylation site, Pumilio RNA-binding repeat profile.

Table 1: Multilevel consensus sequences for the MEME defined motifs and their predicted functions.

The initial model of OsEPSPS was built by homology modelling methods using Modeller 9.12 software [35]. The Modeller 9.12 software constructed five model structures for OsEPSPS and the model with the lowest Discrete Optimized Protein Energy (DOPE) score was visualized by Accelrys Discovery studio version 4.1. This model was used for the identification of active sites and for docking of the substrate with the EPSPS. The rice and E. coli harbours both of the EPSPS domains which probably indicate toward similar mode of action as in microbes. In this study, predicted 3D structure of OsEPSPS was generated and the N-terminal and C-terminal domains were identified (Figure 4). In E. coli , EPSPS consists of six aligned parallel alpha-helices in each of two similar EPSPS I domains [36]. Similar domain structures were detected by Gong et al. [37], Garg et al. [38] and Filiz and Koc [39]. Bacterial EPSPSs are reported to fold in two globular domains and an insideout α-β barrel domain with PEPS3P binding in the interdomain cleft region [40]. The secondary structural features of the EPSPS sequences of 1G6S and OsEPSPS were calculated using SOPMA [41] with default parameters (Table 2). The EPSPS protein is composed of 42.52% α-helices, 17.86% extended strands and 10.10% beta turn in rice. In case of E. coli , the EPSPS protein is composed of 38.88% α-helices, 20.61% extended strands and 11.48% beta turn. Thus the α-helices and the beta sheets cover comparatively larger portions of the rice and E. coli EPSPS enzymes. Similar results have been observed by Gong et al. [37], Garg et al. [38] and Filiz and Koc [39] in several plant species. ScanProsite server identified the two signature sequences LFLGNAGTAMRPLTA (166-180) and RVKETERMVAIRTELTKLG (427-445) in both target and template. Several physico-chemical properties of EPSPS sequences were calculated by using Expasy’s ProtParam server [21]. The results are shown in Table 3. In developing buffer system for protein purification (isoelectric focusing method), the computed isoelctric point (pI) will be useful. The very high aliphatic index of the EPSPS enzyme sequences indicate that these enzymes may be stable for a wide temperature range. The high extinction coefficient of enzyme in rice indicates the presence of more Cys, Trp and Tyr. The instability index value for the EPSPS proteins were found to be ranging from 28.78 to 33.83 indicating the stable nature of the proteins. Using NetNglyc 1.0 server, the N-glycosylation sites (188 NATY and 464 NITA) of the OsEPSPS protein were predicted and may play role in posttranslational modifications for enzymatic function. N-glycosylation is an essential process for posttranslational modifications of proteins [42].


Figure 4: Cartoon structure of OsEPSPS showing its N- and C- termini in blue and red respectively.

Secondary structure element OsEPSPS 1G6S
Alpha helix 42.52% 38.88%
310helix 0.00% 0.00%
Pi helix 0.00% 0.00%
Beta bridge 0.00% 0.00%
Extended strand 17.86% 20.61%
Beta turn 10.10% 11.48%
Bend region 0.00% 0.00%
Random coil 29.51% 29.04%
Ambiguous states 0.00% 0.00%
Other states 0.00% 0.00%

Table 2: Details of the calculated secondary structure elements by SOPMA.

Properties OsEPSPS 1G6S
Molecular weight 54345.7 46095.7
Theoretical pI 8.04 5.37
Number of amino acids 515 427
-R 55 48
+R 57 38
Aliphatic index 93.42 94.66
Grand average of hydropathicity (GRAVY) 0.101 â0.005
Extinction coefficients (Mâ1 cmâ1) 34755 30745
Instability index 33.83 28.78
CELLO predicted location Combined Combined
Predicted N-glycosylation sites 188 NATY, 464 NITA -

Table 3: Physiochemical, structural and sequence properties, sub-cellular localizations and N-glycosylation sites of the EPSPS protein sequences.

Using String software, the EPSPS interacting partners as well as its co-expression genes were predicted in both rice and E. coli (Figure 5). Some proteins such as 3-dehydroquinate synthase, 3-dehydroquinate dehydratase, shikimate kinase, chorismate synthase and shikimate-5-dehydrogenase are found to be common interacting partners of EPSPS in both rice and E. coli . In the second step of shikimate pathway, 3-dehydroquinate synthase converts the 3-deoxy-arabinoheplutosonate-7-phosphate to 3-dehydroquinate and is essential for basic cellular metabolism machinery. In the fifth step of shikimate pathway, Shikimate kinase, an ATP dependent enzyme catalyzes the phosphorylation of shikimate to shikimate 3- phosphate. The seventh step of the shikimate pathway for the biosynthesis of aromatic amino acids is catalysed by chorismate synthase which is conserved in prokaryotes, fungi and plants [43].


Figure 5: EPSPS interacting partners as well as its coexpression genes predicted by STRING. (A) Rice (B) E. coli (C) The key to the putative interacting partners for OsEPSPS gene is listed. (D) The key to the putative interacting partners of E. coli EPSPS gene is listed.

Validation of OsEPSPS structure

RAMPAGE server and PROCHECK generated model revealed that 94.3% residues are falling in the most favoured region, 4.1% residues in allowed region, and 1.6% residues in outlier region of the Ramachandran plot (Figure 6). ProSA-Web analysis of the model revealed a Z-score value of target protein. The Z-score value of the target model OsEPSPS (-8.01) is located within the space of proteins determined by NMR and X-ray crystallography. This Z-score value is close to the value of template 1G6S (-11.83) which suggested that the obtained model was reliable and very close to experimentally determined structures (Figure 7a). Verify3D showed a score greater than 0.2 in 76% of the residues that corresponded to the quality of the OsEPSPS model that was acceptable and reliable. The value of RMSD indicates the degree to which the two three dimensional structures are similar. The lesser the value, the more similar the structures are. The Cα RMSD and backbone RSMD deviation for the OsEPSPS model and the E. coli template (1G6S) crystal structure were 1.58Å, and 1.56 Å, respectively and overall RMSD was 1.72 Å. Thus, the OsEPSPS model generated by Modeller 9.12 was confirmed to be reliable and accurate. The superimposition of the template and the model structure is shown in Figure 7b. It shows that the helix and the sheet regions of the template and model structure superimposed in a better way and a large deviation can be observed mainly in loop regions. It is reported that the loop region is the main region where the accuracy of a model protein structure deviates from the templates [44]. The ribbon diagram shown in Figure 4. 14C shows the docking of glyphosate (white balls) and S3P (brown balls) into the structure of OsEPSPS (target).


Figure 6: The plot for OsEPSPS designed by Rampage.


Figure 7: (A) Validation of OsEPSPS by ProSA tool. The Z-score value OsEPSPS (target) and E. coli EPSPS (template) protein were determined by NMR (represented in dark blue colour) and X-ray (represented in light blue colour). The two black dots represent Z-score value of target and the template. (B) Superposition of OsEPSPS (target) and E. coli EPSPS template (PDBID: 1G6S) shown in blue and green colour respectively. (C) Ribbon diagram showing docking of glyphosate (white balls) and S3P (brown balls).

Prediction of active sites and docking studies

After the final model was built, the possible binding sites of OsEPSPS were searched using various binding site prediction servers such as Q-site finder, CASTp and PINUP [29-31]. These studies showed that residues K, Q, D were highly conserved in active site of both model and the template protein and hence it could be predicted that their biological function would be identical. These conserved residues may function as the catalytic domains of EPSPS enzymes and could be in the glyphosate binding site as seen in bacterial EPSPS [7]. The mutation of a single amino acid (particularly lysine and arginine) can alter the binding site of glyphosate [37]. Molecular docking was performed by Sybyl 8.0 Surflex-Dock method (Tripos Inc., USA). We docked S3P and GPJ inside the cavity of OsEPSPS protein (Figure 7c). The Shikimate-3-phosphate (S3P) has ligand binding residues at 94, 95, 99, 173, 249, 250, 251, 277, 280, 402, 429 and the binding residues are K, S, R, T, S, S, Q, S, Y, D, and K respectively. The glyphosate (GPJ) ligand has ligand binding residues at 94, 170, 172, 202, 251, 402, 430, 433, 474, 475, 500 and the binding residues are K, N, G, R, Q, D, E, R, H, R and K respectively. Both GPJ and S3P have similar amino acids K, Q, D at positions 94, 251 and 402 respectively (Table 4). The glyphosate binding site is dominated by basic residues (Arg and Lys) [45] indicating their role in glyphosate-EPSPS binding.

Ligand Name Binding Residues
S3P 94K 95S 99R 173T 249S 250S 251Q 277S 280Y 402D 429K
GPJ 94K 170N 172G 202R 251Q 402D 430E 433R 474H 475R 500K
PO4 168L 169G 170N 171A 196V 199M

Table 4: Binding residues of different ligands of the OsEPSPS protein.

Phylogenetic analysis

The phylogenetic analysis of EPSPS across the selected organisms showed a clear delineation of EPSPS into four clusters. Phylogenetic tree results outline the development of EPSPS in Arabidopsis thaliana, Amborella trichopoda, Brassica rapa, Brachypodium distachyon, Cucumis melo, Fragaria vesca, Glycine max, Malus domestica, Oryza sativa , Populus trichocarpa, Phoenix dactylifera, Setaria italica, Sorghum bicolor, Solanum lycopersicum, Vitis vinifera, Zea mays, E. coli and V. chloerae. Many of these exhibited orthologous and paralogous relations with each other (Figure 8). However, B. distachyon showed highest sequence similarity to OSEPSPS. Amborella trichopoda is believed to be the most basal lineage in the clade of angiosperms. The results indicate that EPSPS protein gene family is strictly conserved and has evolved from bacteria.


Figure 8: Phylogenetic tree constructed by minimum evolution method of MEGA version 4.1 showing similarity of OsEPSPS with monocots, dicots and bacteria.


The first author is grateful to Council of Scientific and Industrial Research (CSIR) for providing financial assistance.

Conflict of Interest

We declare that we have no conflict of interest.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 745
  • [From(publication date):
    December-2016 - Oct 17, 2017]
  • Breakdown by view type
  • HTML page views : 669
  • PDF downloads :76

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals


[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version