Plasmodium vivax 1-deoxy-D-xylulose-5-phosphate synthase: Homology Modeling, Domain Swapping, and Virtual Screening

Structure-based computational approaches are needed to model proteins in the absence of any crystal structures and identify protein-ligand interactions. Biochemical pathways that exist in microorganisms but absent in humans serve as excellent targets for antimicrobial drug design. The Non-Mevalonate Pathway (NMP) is one such pathway that is present in all intra-erythrocytic stages of Plasmodium and could serve as a target for anti-malarial drug design and development. The first enzyme of the pathway, DXS (1-deoxy-D-xylulose-5-phosphate synthase) is the rate limiting enzyme and is also important for the biosynthesis of pyridoxal and thiamine. In the absence of available crystal structures, our aim was to develop homology models for Plasmodium DXS, which could provide insight into the structural features of this enzyme and its likely binding to ligands. Initial models were built using the PRIME module of Schrödinger Suite 2010 and then refined using MacroModel energy minimization. Analyses were also carried out using bioinformatics tools to predict domain swapping in Plasmodium DXS. This study should prove useful in the design and development of novel anti-malarial therapeutics. Journal of Data Mining in Genomics & Proteomics J o u r n a l of D ata Mi ning in Gmics & rot e o m i c s ISSN: 2153-0602 Ramamoorthy et al., J Data Mining Genomics Proteomics 2014, S1 DOI: 10.4172/2153-0602.S1-003 Citation: Ramamoorthy D, Handa S, Merkler DJ, Guida WC (2014) Plasmodium vivax 1-deoxy-D-xylulose-5-phosphate synthase: Homology Modeling, Domain Swapping, and Virtual Screening. J Data Mining Genomics Proteomics S1:003. doi:10.4172/2153-0602.S1-003


Introduction
Biochemical processes that exist in bacteria, but not in humans, provide a useful way to develop novel inhibitors to combat diseases. One such pathway is the Non-Mevalonate Pathway (NMP) exclusively present in lower animals and bacteria, which is utilized to synthesize isoprenoids for terpene biosynthesis [1]. Isopentenyl diphosphate (IPP) and Di methyl Allyl phosphate (DMAPP) are the end products of this pathway and serve as precursors for the biosynthesis of various terpenoids including cholesterol and vitamin E [2]. Since higher animals utilize the mevalonate pathway for isoprenoid biosynthesis, the NMP is an attract ive biochemical pathway, the enzymes of which provide several targets for the discovery of anti-infectious therapeutic agents.
Among the infectious diseases, malaria ranks highest, affecting about 2.4 billion people worldwide [3]. Although drugs are available to treat malaria, recurring antimicrobial resistance to the currently available drugs poses a significant threat to successful treatment. Discovering new molecular targets and new class of inhibitors are essential to overcome the issue of microbial resistance to drugs. The apicoplast, a non-photosynthetic plastid found in Plasmodium, is nuclear encoded [1] and is necessary to the survival of the malaria parasite [4]; it harbors many pathways including the NMP [5]. Since the non-mevalonate pathway is present in all intra-erythrocytic stages of Plasmodium, it represents an extremely attractive target for antimalarial drug discovery and development.
Major causative agents of malaria include Plasmodium vivax, Plasmodium falciparum, Plasmodium berghei, Plasmodium ovale, Plasmodium malariae and Plasmodium knowlesi, out of which Plasmodium falciparum and Plasmodium vivax are most important for pathogenesis of the disease [6]. Plasmodium falciparum is deadly while Plasmodium vivax causes less severe complications but is more widespread, especially in temperate zones. Moreover, Plasmodium vivax can hibernate in the liver for months or years and can resurface, causing disease [6].
The first step involved in the NMP is formation of 1-deoxy-Dxylulose-5-phosphate (DXP) by condensation of glyceraldehyde-3-phosphate (G3P) and pyruvate catalyzed by 1-deoxy-D-xylulose-5phosphate synthase (DXS) [7]. The DXS enzyme has a thiaminebinding motif and requires TPP (thiamine pyrophosphate) and either a Mn 2+ or Mg 2+ divalent cation to manifest its activity [7]. DXS catalyzes the first step of the NMP, which is also the rate limiting step. Formation of DXP is not a committed step of the non mevalonate pathway as DXS is also essential for thiamine and pyridoxal biosynthesis [8], thus making it an alluring target to develop drugs for anti-infectious diseases. By targeting enzymes present in the NMP for inhibition, the DXS enzyme in either P. falciparum or could potentially lead to the development of potent curative agents for malaria and other infectious diseases.
To better understand the structural components of DXS, a crystal structure would be valuable. The only crystal structures available for this enzyme to date are from E. coli and D. radiodurans. Here, we present homology models for P. vivax and other Plasmodium species (P. falciparum, P. berghei) of DXS in order to comprehend its structural features and for use in virtual screening of compound libraries readily available from the National Cancer Institute and other sources.

Methods and Results
Comparative modeling generates a 3-D structure of an enzyme based on previously determined X-ray crystal structures. The sequence of PvDXS and other Plasmodium DXS were obtained from PUBMED and were aligned with the sequences of E. coli and D. radiodurans, the available crystal structures for DXS using ClustalW [9]. Two crystal structures of DXS (from E. coli and D. radiodurans) have been reported to date [10]. The aligned structures were manually inspected for any anomalies and corrected to reflect the structurally conserved regions. Important regions from the alignment file are shown in ( Figure 1). The regions shown in ( Figure 1) represent the segment that connects the transit peptide to domain I, domain I -domain II linker region, and the domain II -domain III linker region. Alignment scores between the different species of DXS are shown in (Table 1).

Homology model of PvDXS
The sequence of DXS P. vivax was obtained from NCBI protein sequence database, (http://www.ncbi.nlm.nih.gov), and it consists of 1111 residues. The sequence of PvDXS was then aligned with the sequences of previously resolved crystal structures for DXS, E. coli and D. radiodurans, and the alignment scores shown in (Table 1).

Aligning the sequences and building the homology model:
Alignment of sequences is a crucial step in building a homology model to create a connection between the reference and the target proteins. Sequences should be properly aligned to ensure high quality of the model. Hence it is important to thoroughly inspect the sequences, especially in the conserved regions. Multiple sequence alignment of PvDXS with sequences of E. coli (PDB ID: 2O1S) [10], D. radiodurans (PDB ID: 2O1X) [10] and P. falciparum obtained from ClustalW 2.0.12 [9] are shown in (Figure 1). The aligned structures were manually inspected and edited to reflect the structurally conserved regions in the available crystal structures. The tertiary structure of the three proteins can be considered similar, since they share a significant degree of similarity, with a sequence identity of 30% and 29% with E. coli and D. radiodurans respectively. Relatively reliable homology models can be expected when the sequence identity between the template and the query sequence is 30% or greater [11]. In PvDXS, since the C-and N-termini are not likely to contribute significantly toward ligand binding [12], some residues in these regions were truncated while building the model ( Figure 2). The model generated for PvDXS has residues starting from 375 through 1011. Since templates were not available to model the starting sequence (residues ranging from 1-374), the model starts at residue 375. If the templates had gaps of more than 20 amino acid residues, the gaps were retained in the model. Thiamine pyrophosphate (TPP) and the divalent metal ion (Mg 2+ ) were retained from the X-ray crystal structure of D. radiodurans, while generating the model. The initial model was generated using the PRIME software, available from Schrödinger, Inc. [13] and then subjected to energy minimization to produce a refined model for docking studies. MacroModel, also available from Schrödinger, Inc. [14][15][16], was used for geometry optimization and it relied upon the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm [13] for energy minimization with the OPLS 2005 Force Field. MacroModel minimization was performed for 500 iterations or until energy gradient convergence threshold of 0.1 kJ/mol/Å was obtained.
Homology model of P. knowlesi, P. falciparum and P. berghei DXS: The sequences of P. knowlesi, P. falciparum and P. berghei DXS were obtained from the NCBI. The sequence of P. knolwesi consists of 1122 residues, P. falciparum, 1205 residues, and P. berghei, 843 residues respectively. They were aligned with the sequences of E. coli and D. radiodurans as described earlier. The sequence of P. knowlesi has high homology to P. vivax than P. falciparum or P. berghei and P. falciparum has high homology to P. berghei. The models of P. knowlesi, P. falciparum a n d P. berghei were built using the aligned templates of D. radiodurans and E. coli; coordinates of TPP and the divalent metal cation, Mg 2+ were retained from the D. radiodurans crystal structure.

Protein structure validation
The energy minimized structures were assessed for overall quality using PDBSum/ProCheck [17,18]. ProCheck is utilized to interrogate the stereochemical quality of a given protein structure by verifying the accuracy of parameters like bond angles, bond lengths, torsion angles and correctness of amino acid chirality. Using these criteria, the PvDXS model compares well with the available X-ray crystal structure of DrDXS and EcDXS.
An important indicator of the stereo chemical integrity of the model is the distribution of the main chain torsion angles: phi and psi, which can be examined using Ramachandran plots [17]. (Table 2) shows the Ramachandran plots and statistics of our homology models of Plasmodium DXS. The plots ( Figure 3a) clearly show that the vast majority of the residues are in a phi-psi distribution consistent known secondary structure. The remaining residues that fall into the random conformation are small segments and are primarily present in the loop regions of the protein. For comparative purposes, Ramachandran plot statistics of E. coli and D. radiodurans DXS from the X-ray crystal structures are included in (Figure 3b). It is noteworthy that the homology model of PvDXS had 80.5% of total residues present in the favorable regions, while the disallowed regions contain about 1.4% of the residues.

Active-site identity
Active-sites of the homology models of DXS generated were compared with the active-sites in the crystal structures of D. radiodurans DXS and E. coli DXS. It was found that the active-sites of the models were quite similar to the active sites relative to the templates used. A comparison of the residues in the active sites is tabulated in (Table 3). It is evident from ( Table 3) that the active sites share a high degree of sequence identity/similarity. We also analyzed the structure for PKCphosphorylation sites and found that residues 27  is located above domain II and domain III of the same monomer such that the active site is located within the same monomer, in the region between domain I and domain II. However, both of these crystal structures have missing residues in domain I. In E. coli DXS, the missing residues are found in not only in domain I (referred to as 'segment I' here) but also in the linker that connects domain I to domain II (referred to as 'segment II' here); the two missing segments were apparently the result of using a fungal protease during crystallization [10]. Residues in segment I of E. coli are present close to the active site. The crystal structure of D. radiodurans DXS was crystallized without the use of a fungal protease; it also has missing residues (199-243) in domain I in the region that connects strand 4 and 5 (segment I); however, residues present near the active site are ordered in this segment. The missing residues in segment I of D. radiodurans DXS (in spite of not using the fungal protease) might indicate that this region in domain I might be domain swapped such that residue 198 of one monomer is connected to residue 244 of the other monomer.

Structure of E. coli Transketolase
One of the closest structural homologues of DXS is Transketolase (TK), which catalyzes a similar biomolecular transformation. The crystal structure of TK from E. coli is also a homo dimer with three distinct domains. However, the arrangement of domains in TK differs substantially from DXS; domain I of one monomer is located above domain II and domain III of the other monomer such that the active site lies in the dimer interface region. The region that links domain I to domain II is larger in TK consisting of 95 residues. On the other hand, the linker region between domain I and domain II in DXS is comprised of only 20 residues [10]. The difference in domain arrangement between TK and DXS and the missing residues in DXS in the domain

Domain swapping
Domain swapping is a phenomenon by which a bond is created between two or more protein molecules as their identical domains are exchanged to form an intertwined dimer or oligomer (Figure 4) [19]. Factors such as salt bridges and pH reveal important information regarding domain swapping [19]. The ability of a protein to domain swap can also be determined form the sequence by analyzing the sequence derived structure entropies of the residues. Usually domain swapping occurs in a hinge region of the protein, and often proline residues are present in the hinge regions that might increase the rigidity of the loop associated with domain swapping [20]. Domain swapping favors the entangling of polypeptide chains, which might change the native-state properties of the protein, like proteinase-resistance and toxicity [21]. However, domain swapping is often observed at unusually low pH(~4) and not generally at physiological pH [22].
In order to derive information about possible domain swapping in PvDXS, we analyzed the sequence for hinge regions, where swapping could potentially occur. We studied the sequence derived structure  Table 3: Comparison of residues in the active-sites of DXS homology models and templates.

Closed Monomers
Hinge region C-interface values for the protein sequences were calculated using tri-peptides and tetra-peptides from different non-redundant databases as the standard using the Molecular Bioinformatics Center server [23]. The hinge regions were detected using DOMPRED [24] and DisEMBL [25], which indicated disorder in the regions: residues 1-250, residues 500-550, 605-650 and around residue 727. The disordered regions based on loops/coils in the secondary structure [26] were checked on the DisEMBL server [25]. Our results indicate that residues 1-250 are highly disordered, which suggest that this part of the protein could be the signal/transit peptide [12]. Likewise, the sequence was checked for the presence of a signal peptide using the signalP server [27], which indicates that residues 1-25 form the signal peptide. Based on the above studies, PvDXS is predicted to have three domains, Domain I consisting of residues 250-582, domain linker region between residues 583-609, domain II consisting of residues 609-724 and domain III consisting of residues 725-1111. We predict that domain swapping could occur in PvDXS between residues 499-528 of domain I, which form the linker between strand 4 and 5 of domain I ( Figure 5).
We used the SDSE tool [23] from the Molecular Bioinformatics Center to generate the entropy based on sequence. This was performed to identify the plausible domain swap region. To confirm the region of domain swapping, the proline, histidine and isoleucine residues that lie in the disordered region of domain I and in the domain I -domain II linker region were mutated to alanine and the sequence derived structure entropy change was calculated; the results of which are shown in (Table 3). There was a huge gain in entropy for the Histidine residue His499 upon mutation to alanine, in the predicted swap region. As expected, the proline residues present in the hinge region had a loss of entropy when mutated to alanine, which suggests that it provide the rigidity associated with domain swap. Based on the results, it is evident that if domain swapping occurs, it most likely occurs in the region 499-594. Similarly, for P. knowlesi, the domain swap is predicted to occur between residues 586-612. For P. falciparum, the domain swap is predicted to occur between residues 653-678 and for P. berghei, between residues 327-353 respectively ( Figure 6).
Residues His499 and Pro500 are specific to P. knowlesi and P. vivax but absent in other Plasmodium species. Residue His510 is specific to PvDXS and is absent in all other Plasmodium species, indicating that this residue might play an important role as a gatekeeper in domain swapping. Salt bridges reveal important information regarding domain swapping [19]. Salt bridges were analyzed for the dimeric PvDXS structure using VMD [28]. Analysis revealed three salt bridges occurring in segment I of the dimer, connecting chain A and chain B; Asn486-His527, Asn518-His572 and Asn547-Lys706 respectively. The first salt bridge formed between Asn486 of Chain A and His527 of Chain B is interesting since this segment lies in the predicted domain swapped region. This salt bridge could be retained in the domain swapped dimer such that Asn486 of Chain A is connected to His 527 of the same chain after domain I is swapped between the residues 499-528, with Pro528 providing the necessary rigidity associated with domain swap. Similar observations are noted for P. knowlesi, P. berghei and P. falciparum ( Table 4).

Structure of domain swapped PvDXS
To determine the structure of domain swapped PvDXS, comparative modeling was performed using E. coli Transketolase (PDB: 2R8O) as an additional template (along with E. coli DXS and D. radiodurans DXS); it had only 9% sequence identity with the sequence of PvDXS. The model was corrected for bond orders, bond angles, protonation states and then energy minimized using Schrodinger' s IMPACT module in the Protein Preparation Wizard. Later the prepared protein model was energy minimized employing MacroModel, using the LBFGS algorithm and the OPLS 2005 force field. The refined structure was validated using ProCheck. The refined model after energy minimization is shown in    ( Figure 7a). The refined structure had a score of 77.8% for the residues in the most favorable regions, 19.2% for residues in the additionally allowed regions, 1.7% for residues in generously allowed regions and 1.2% for residues in the disallowed regions (Figure 7c). In comparison to the unrefined structure, the residues in the most favorable regions have changed from 81.6% (Figure 7b) to 77.8% in the refined model (Figure 7c), but the residues in disallowed regions have changed from 2.2% (13 residues) to 1.2% (7 residues). Moreover, the residues in the disallowed regions were manually inspected for their vicinity to the active-site. It was found that these residues lie in the regions remote from the active-site. The above results suggest that the refined model may be a better one compared to the previous models to perform docking studies. As seen in (Figure 7b,7c) the phenyl ring of TPP is seen to occupy the region close to His510 and Pro511. Note that the side chains of the residues were slightly shifted using the transform tool of Maestro 8.5.2 prior to energy minimization, such that the residues no longer overlap with TPP.
Active site identity: For the domain swapped PvDXS structure, the active site residues were compared with E. coli TK, D. radiodurans DXS and E. coli DXS. It was found that most of the residues are identical or similar with the active site of DXS as opposed to Transketolase (TK), as shown in (Table 5).

Docking studies
Docking of TPP and the NCI Diversity Set compounds was performed using Schrodinger' s GLIDE software in order to validate the homology model(s) and the docked compounds were compared to the estimated absolute binding free energies of TPP and the NCI diversity set compounds docked to D. radiodurans DXS (reported here as GLIDE docking scores; only the lowest energy pose is reported).

Conclusion
In conclusion, we have generated homology models for P. vivax DXS and other Plasmodium species. We have predicted the probable regions of domain swapping in P. vivax DXS and have attempted to build the homology model based on the template of Transketolase for the putative domain swapped P. vivax structure. We have used energy minimization to refine our homology models. To validate our homology model, we have docked compounds from the NCI Diversity Set