Structural Investigation and In-silico Characterization of Plasmepsins from Plasmodium falciparum

Malaria is the one most important parasitic disease of humans, which affects approximately one hundred countries and threatens half of the world’s population. The Plasmodium aspartic protease called plasmepsins performs a vital role in providing nutrients to the malaria parasite, which make these proteins as an excellent drug target. In this study, we have carried out a comparative protein modeling, active site analysis and structural analysis of all ten plasmepsins from Plasmodium falciparum. In this report we have analyzed in-silico structure modeling and made efforts to characterize plasmepsins structure and further propose its functional information. The phylogenetic analysis and disulfide linkages indicate, plasmepsin I to IV and HAP have similar structure, function property. Whereas, plasmepsin IX to X and plasmepsin VI to VIII belong to a separate cluster. The integral membrane protein plasmepsin V has a functional characterization as compared to the others aspartic proteases from Plasmodium falciparum. The overall study summarizes the need of good model to understand the structure and function activity and to design potent small molecule inhibitors targeting all ten plasmepsins, specifically Plasmepsin V as important target. Citation: Nair DN, Singh V, Angira D, Thiruvenkatam V (2016) Structural Investigation and In-silico Characterization of Plasmepsins from Plasmodium falciparum. J Proteomics Bioinform 9: 181-195. doi:10.4172/jpb.1000405 Volume 9(7) 181-195 (2016) 182 J Proteomics Bioinform ISSN: 0974-276X JPB, an open access journal parasite and its inhibitors will provide basic foundation for further development of malarial drug based on plasmepsin V. Materials and Methodology Sequence retrieval and alignment The sequence of plasmepsin from P. falciparum was obtained from the protein sequence database of NCBI (GenBank Id: AAB41811). The genome ID and genome localization of each plasmepsin was retrieved from PlasmoDB [26]. PlasmoDB is a functional genomic database for malaria parasites. The multiple alignments were carried out using CLUSTAL X [27]. The identical and similar amino acids are shaded or colored. Fold-recognition and domain analysis The domain composition of plasmepsins was analyzed using the SMART-Simple Modular Architecture Research Tool [28] in combination with the P. falciparum (Pfam) database [29]. The classification of plasmepsins was done by using PRED-CLASS server [30]. In-silico physico chemical characterization For physico-chemical characterization, theoretical isoelectric point (pI), molecular weight, extinction coefficient and instability index [31] were computed using the Expasy’s ProtParam server [32]. Functional characterization The eukaryotic and viral aspartyl protease active site was predicted using PROSITE [33,34]. The SOSUI server was employed to identify the nature and function of the protein. The transmembrane region of the plasmepsin was predicted using HMM-TM: Prediction of Transmembrane Alpha-Helical Proteins. NetNGlyc 1.0 Server predicted N-glycosylation sites using artificial neural networks [35,36]. The intrinsic protein disorder regions was predicted by FoldIndex© [37] and GLOBPLOT 2 [38]. FoldIndex is a graphic web server that discriminates between folded and intrinsically unfolded proteins which defines the mean net charge, |<R>|, as the absolute value of the difference between the numbers of positively and negatively charged residues at pH 7.0, divided by the total residue number, and the mean hydrophobicity, <H>, as the sum of all residue hydrophobicity, divided by the total number of residues, using the Kyte/Doolittle scale, rescaled to a range of 0–1. Subcellular localization of protein using amino acid composition was achieved by Phobius predictor [39]. Secondary structure elements prediction was performed using the PSIPRED protein sequence analysis work bench [40]. The presence of disulfide bridges was analyzed using the DiANNA web server [41] and DISULFIND [42]. Phylogenetic analysis The sequences were aligned by Clustal X version 2.0 with default options and phylogenetic tree was constructed based on the bootstrap neighbor-joining method [43] using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.1 [44]. The stability of internal nodes was assessed by bootstrap analysis with 10,000 replicates. Homology modeling, refinement and validation The three dimensional (3D) structure of plasmepsin V was performed based on the homology modeling in PHYRE2 Protein Fold Recognition Server [45]. The Swiss PDB viewer and PyMOL were used to visualize and refine the models. The quality and validation of the obtained model was performed using PROCHECK [46], ERRAT [47] and PROVE software’s [48,49] from “SAVES: Meta Server Structure Analysis” under NIH MBI Laboratory Server. The model was also analyzed in SuperPose [50] and DALI server [51]. The obtained 3D-model was stereo-chemically evaluated on RAMPAGE server [52], which provides a score based on proline and glycine preferential positions according to a Ramachandran plot. The Prosa Web server was used to predict the Z-score of the modeled structure [53]. Docking studies using AutoDock/Vina Molecular docking protocols are widely used for predicting the binding affinities of ligands. The PDBQT files for protein and ligand preparation and grid box generation were done using Graphical User Interface program AutoDock Tools (ADT). ADT assigned polar hydrogens, united atom Kollman charges, solvation parameters and fragmental volumes to the protein. The prepared files were saved in PDBQT format. AutoGrid was used for the preparation of the grid map using a grid box and the grid size was set to 28 × 32 × 24 xyz points and grid center was designated at dimensions (x, y, and z): 4.055, 45.931.554 and 18.066. A scoring grid is calculated from the ligand structure to minimize the computation time. AutoDock/Vina was employed for docking using protein and ligand information along with grid box properties in the configuration file and in docking both the protein and ligands are considered as rigid. The results less than 1.0 Å in positional root-mean-square deviation (RMSD) was clustered together and represented by the result with the most favorable free energy of binding. The pose with lowest energy of binding or binding affinity was extracted and aligned with receptor structure for further analysis. Results and Discussion The ten plasmepsin sequences from P. falciparum were retrieved in FASTA format from the NCBI database and were analyzed using bioinformatics. The genome localization, gene ID, physiochemical features of each plasmepsin are represented in Table 1. In P. falciparum 3D7, the plasmepsin I-IV, VIII and IX genes are located in chromosome 14 whereas the plasmepsins V, VI, VII and X are localized on chromosome 13, 3, 10 and 8 respectively [54]. The protein sequence for plasmepsins vary in their length, which runs from 380 aa to 630 aa with variable molecular weight. The Isoelectric point or isoionic point (pI) is the pH at which the net charge on a protein is zero or neutral and hence, it does not show any mobility in an electric field. The computed pI of Plasmepsin I, II, IV and X (pI<7) showed acidic nature whereas the other plasmepsins (V, VI, VII, VIII, IX and HAP) having pI>7 had a basic nature. It is an important feature for any protein to know for experimental aspect of molecular biology, especially in 2D gel electrophoresis, isoelectric focusing etc. The high extinction coefficient of plasmepsin V indicates presence of high concentration of Cys, Trp and Tyr present in its sequence. The computed extinction coefficients help in the quantitative study of protein-protein and protein-ligand interactions in solution. The instability index provides an estimate of the stability of the protein and the protein whose instability index is smaller than 40 is predicted as stable [55,56]. The instability index of plasmepsin VI, IX and X was computed by the server to be above 40 which predicts that the protein may be unstable. Protein may have single or multiple functional regions called as domains, which perform specific biochemical functions [57]. In our study, we performed a conserved domain analysis using Pfam database and it was observed that all the plasmepsins contain Eukaryotic aspartic protease (ASP) domain. In addition to the ASP domain, plasmepsin V also contains Xylanase inhibitor N-terminal (Taxi_N) domain. The Citation: Nair DN, Singh V, Angira D, Thiruvenkatam V (2016) Structural Investigation and In-silico Characterization of Plasmepsins from Plasmodium falciparum. J Proteomics Bioinform 9: 181-195. doi:10.4172/jpb.1000405 Volume 9(7) 181-195 (2016) 183 J Proteomics Bioinform ISSN: 0974-276X JPB, an open access journal Name of protein Gene ID Chromosome number Genomic Localization Protein Accession number Aminoacid length Molecular weight Isoelectric Point Ext. Coefficient Instability index Plasmepsin I PF3D7_1407900 14 Pf3D7_14_v3: 288,297 289,655 (+) P39898.2 452 51260.9 6.72 55030 28.47 Plasmepsin II PF3D7_1408000 14 Pf3D7_14_v3: 293,471 294,832 (+) P46925.1 453 51489.7 5.42 55155 37.4 HAP PF3D7_1408100 14 Pf3D7_14_v3: 297,468 298,823 (+) CAB40630.1 451 51693.2 8.04 55030 37.62 Plasmepsin IV PF3D7_1407800 14 Pf3D7_14_v3: 283,086 284,435 (+) AAW71463.1 448 50933.2 5.3 54000 30.06 Plasmepsin V PF3D7_1323500 13 Pf3D7_13_v3: 975,403 977,175 (+) AAW71468.1 590 68480.3 7.7 88225 37.05 Plasmepsin VI PF3D7_0311700 3 Pf3D7_03_v3: 502,698 505,848 (+) XP_001351190.1 432 49432.9 7.56 38320 43.68 Plasmepsin VII PF3D7_1033800 10 Pf3D7_10_v3: 1,351,197 1,353,284 (+) AAN35526.1 450 52328.1 8.26 67980 33.75 Plasmepsin VIII PF3D7_1465700 14 Pf3D7_14_v3: 2,658,255 2,660,911 (-) AAN37238.2 385 44254.2 9.1 48165 37.65 Plasmepsin IX PF3D7_1430200 14 Pf3D7_14_v3: 1,188,349 1,191,466 (+) AAN36894.1 627 74183 9.25 69860 41.52 Plasmepsin X PF3D7_0808200 8 Pf3D7_08_v3: 416,344 418,065 (-) XP_001349441.1 573 65114 5.35 


Introduction
Malaria is a life-threatening disease caused by Plasmodium parasites transmitted to humans through the infected Anopheles mosquitoes, specifically in the region of tropical and subtropical climate and is active during the dusk and dawn [1]. About 20 different Anopheles species are globally important around the world [2]. Transmission is more intense in places where the mosquito lifespan is longer, because the parasite has time to complete its development inside the mosquito and it prefers to bite humans rather than other animals [3,4]. Resistance to antimalarial medicines is a recurring problem. In recent years, parasite resistance to artemisinins has been detected in 5 countries. If resistance to artemisinins develops and spreads to other large geographical areas, the public health consequences could be dire [5,6]. World Health Organization (WHO) recommends the routine monitoring of antimalarial drug resistance and supports countries to strengthen their efforts in this important area of work [7]. Effective drug is a critical component of malaria control. Selection of Plasmodium sp. parasites resistant to multiple drugs calls for accelerated efforts to develop new anti-malarial drugs targeting novel essential parasite pathways [8]. In humans, the disease is the result of infection by Plasmodium falciparum (Pf), Plasmodium malariae, Plasmodium ovalae or Plasmodium vivax. Plasmodium knowlesi, majorly known as the fifth human malaria parasite. Of these species, Plasmodium falciparum is the most lethal and considered as important target for drug intervention [9][10][11][12][13].
During the intra-erythrocytic stage of infection, the malaria parasite Plasmodium falciparum digests most of the host cell hemoglobin. Hemoglobin (Hb) degradation is essential for the growth of the malarial parasites. The degradation process that occurs inside an acidic digestive vacuole is thought to involve the action of aspartic proteases of Plasmodium, termed plasmepsins (PMs) [14][15][16]. The plasmepsins perform a crucial role in the provision of nutrients for the red cell stages of the malaria parasite and thus make excellent drug targets [17]. Inhibition of aspartic proteases aid to kill parasites in human red blood cells in culture. Proteases are known to play an important role in numerous pathways and represent potent drug targets for several chronic infectious diseases. Hence, aspartic proteases are considered one among the important drug-target.
The P. falciparum genome comprises a group of 10 aspartyl proteases called plasmepsin, initially discovered by identifying the hemoglobin digestion pathway in malarial victims and have been strongly considered as potential anti-malaria drug targets. The plasmepsins I, II, IV and HAP are expressed in the erythrocytic stages of the life cycle of P. falciparum and are localized in the food vacuole [18]. Plasmepsin V is an integral membrane protein present in the endoplasmic reticulum of the parasite suggesting a role in protein processing within the parasite [19][20][21]. Plasmepsin V, IX and X were expressed concurrently with plasmepsin I to IV but are not transported to the food digestive vacuole. The remaining plasmepsins VI-VIII are expressed during the exo-erythrocytic cycle and their functions are unknown [22,23]. For several years, the structure-based drug design of antimalarial compounds targeting P. falciparum and the plasmepsin inhibitors have received much attention due to their potential therapeutic use [24,25]. However, our current study shows a wide range of in-silico modeling and the structure analysis of all ten plasmepsins. This may provide a good foundation for designing potential anti-malarial drugs targeting plasmepsins.
Also, in order to make the strongest and most effective drug for therapy of malaria infection it will be necessary to optimize the binding of compounds to the most critical enzyme in the parasite. Based on our study we have shown that plasmepsin V is a key enzyme in the

Fold-recognition and domain analysis
The domain composition of plasmepsins was analyzed using the SMART-Simple Modular Architecture Research Tool [28] in combination with the P. falciparum (Pfam) database [29]. The classification of plasmepsins was done by using PRED-CLASS server [30].

Functional characterization
The eukaryotic and viral aspartyl protease active site was predicted using PROSITE [33,34]. The SOSUI server was employed to identify the nature and function of the protein. The transmembrane region of the plasmepsin was predicted using HMM-TM: Prediction of Transmembrane Alpha-Helical Proteins. NetNGlyc 1.0 Server predicted N-glycosylation sites using artificial neural networks [35,36]. The intrinsic protein disorder regions was predicted by FoldIndex © [37] and GLOBPLOT 2 [38]. FoldIndex is a graphic web server that discriminates between folded and intrinsically unfolded proteins which defines the mean net charge, |<R>|, as the absolute value of the difference between the numbers of positively and negatively charged residues at pH 7.0, divided by the total residue number, and the mean hydrophobicity, <H>, as the sum of all residue hydrophobicity, divided by the total number of residues, using the Kyte/Doolittle scale, rescaled to a range of 0-1. Subcellular localization of protein using amino acid composition was achieved by Phobius predictor [39]. Secondary structure elements prediction was performed using the PSIPRED protein sequence analysis work bench [40]. The presence of disulfide bridges was analyzed using the DiANNA web server [41] and DISULFIND [42].

Phylogenetic analysis
The sequences were aligned by Clustal X version 2.0 with default options and phylogenetic tree was constructed based on the bootstrap neighbor-joining method [43] using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.1 [44]. The stability of internal nodes was assessed by bootstrap analysis with 10,000 replicates.

Homology modeling, refinement and validation
The three dimensional (3D) structure of plasmepsin V was performed based on the homology modeling in PHYRE2 Protein Fold Recognition Server [45]. The Swiss PDB viewer and PyMOL were used to visualize and refine the models. The quality and validation of the obtained model was performed using PROCHECK [46], ERRAT [47] and PROVE software's [48,49] from "SAVES: Meta Server Structure Analysis" under NIH MBI Laboratory Server. The model was also analyzed in SuperPose [50] and DALI server [51]. The obtained 3D-model was stereo-chemically evaluated on RAMPAGE server [52], which provides a score based on proline and glycine preferential positions according to a Ramachandran plot. The Prosa Web server was used to predict the Z-score of the modeled structure [53].

Docking studies using AutoDock/Vina
Molecular docking protocols are widely used for predicting the binding affinities of ligands. The PDBQT files for protein and ligand preparation and grid box generation were done using Graphical User Interface program AutoDock Tools (ADT). ADT assigned polar hydrogens, united atom Kollman charges, solvation parameters and fragmental volumes to the protein. The prepared files were saved in PDBQT format. AutoGrid was used for the preparation of the grid map using a grid box and the grid size was set to 28 × 32 × 24 xyz points and grid center was designated at dimensions (x, y, and z): 4.055, 45.931.554 and 18.066. A scoring grid is calculated from the ligand structure to minimize the computation time. AutoDock/Vina was employed for docking using protein and ligand information along with grid box properties in the configuration file and in docking both the protein and ligands are considered as rigid. The results less than 1.0 Å in positional root-mean-square deviation (RMSD) was clustered together and represented by the result with the most favorable free energy of binding. The pose with lowest energy of binding or binding affinity was extracted and aligned with receptor structure for further analysis.

Results and Discussion
The ten plasmepsin sequences from P. falciparum were retrieved in FASTA format from the NCBI database and were analyzed using bioinformatics. The genome localization, gene ID, physiochemical features of each plasmepsin are represented in Table 1. In P. falciparum 3D7, the plasmepsin I-IV, VIII and IX genes are located in chromosome 14 whereas the plasmepsins V, VI, VII and X are localized on chromosome 13, 3, 10 and 8 respectively [54]. The protein sequence for plasmepsins vary in their length, which runs from 380 aa to 630 aa with variable molecular weight. The Isoelectric point or isoionic point (pI) is the pH at which the net charge on a protein is zero or neutral and hence, it does not show any mobility in an electric field. The computed pI of Plasmepsin I, II, IV and X (pI<7) showed acidic nature whereas the other plasmepsins (V, VI, VII, VIII, IX and HAP) having pI>7 had a basic nature. It is an important feature for any protein to know for experimental aspect of molecular biology, especially in 2D gel electrophoresis, isoelectric focusing etc. The high extinction coefficient of plasmepsin V indicates presence of high concentration of Cys, Trp and Tyr present in its sequence. The computed extinction coefficients help in the quantitative study of protein-protein and protein-ligand interactions in solution. The instability index provides an estimate of the stability of the protein and the protein whose instability index is smaller than 40 is predicted as stable [55,56]. The instability index of plasmepsin VI, IX and X was computed by the server to be above 40 which predicts that the protein may be unstable.
Protein may have single or multiple functional regions called as domains, which perform specific biochemical functions [57]. In our study, we performed a conserved domain analysis using Pfam database and it was observed that all the plasmepsins contain Eukaryotic aspartic protease (ASP) domain. In addition to the ASP domain, plasmepsin V also contains Xylanase inhibitor N-terminal (Taxi_N) domain. The  Xylanase inhibitor N-terminal domain are mostly present in plants and have a major function to create the catalytic pocket necessary for cleaving xylanase [58,59]. The plant xylanase inhibitor proteins (XIPs) that inhibit fungal xylanase activity during rice blast fungal attack and are believed to act as a defensive barrier against fungal pathogens [60,61]. The presence of Taxi-N domain along with ASP domain in plasmepsin V indicates that it has an interesting characteristic which other plasmepsins do not have and thus it acquaints plasmepsin V as an important macro-molecule for further structure and function analysis. The domain organization, Bit score and the E-value of each domain search are shown in Table 2.
The multiple sequence alignment (MSA) of plasmepsins I, II, IV and HAP has more than 60% sequence similarity whereas other plasmepsins have diversity in their sequence. In MSA analysis, the active site region residues, Asp-Thr/Ser-Gly-Ser, in all the plasmepsins were found to be conserved except in HAP (Figure 1). In case of HAP the histidine residue is present instead of aspartic acid in its active site. The SCOPE database shows that plasmepsins are pepsin like proteins which falls in pepsin_retropepsin superfamily. It is evident from the results of multiple sequence alignments that there are two-conserved motifs and some short-length conserved regions were also found. of aspartic protease was subjected to search against plasmepsin for motif annotation using PROSITE search. These motifs or important conserved regions might help in inference of protein structure and sequence evolution history. Based on our analysis, all the plasmepsin sequence contains two distinct active site motif detected with 10-12 amino acid length were showed in Table 3 and Figure 1. The HAP protein contains only one motif which is present in the C-terminal region of the amino acid sequence. The malaria parasite Plasmodium uses plasmepsin to degrade hemoglobin in the red blood cells. It is experimentally proven that the plasmepsins I to V and HAP are capable of cleaving native hemoglobin as well as denatured globins with an optimized pH. But the activity of other plasmepsins has not yet been demonstrated. These results indicate that all the plasmepsins have a similar active site for aspartic protease activity.
The prediction of subcellular localization, disulfide linkage and glycosylation are shown in Figure 2. The online server Phobius predicted subcellular localizations which revealed that all plasmepsin I to V, IX and X are transmembrane proteins. Plasmepsin V is an integral membrane protein located in the endoplasmic reticulum of the parasite. Plasmepsin VI, VII and VIII contain no transmembrane region but they have a signal peptide region in their N-terminal. This signifies the fact that the plasmepsins which express in the erythrocytic cycle have transmembrane domain while others that express in exoerythrocytic cycle have a signal peptide instead of transmembrane domain. Only plasmepsin V is predicted to have two transmembrane spanning regions which keep this protein separate from the other plasmepsins.
The disulfide bridge formation in a protein may play major role in the thermos-stability, functionality and structural stability of proteins. We calculated the cysteine residues and the disulfide bridges using various online tools like DIANNA server, DISULPHIDE and CYS_REC. The possible pairing and pattern with probability resulted in plasmepsin I-IV and VI-VIII having two disulfide bridges. Plasmepsin IX and X have four disulfide bridges were as plasmepsin V has five disulphide bridges. The disulphide architecture of plasmepsin IX is closely similar to plasmepsin V. The analysis also demonstrated that these proteins are glycosylated and most of the glycosylated proteins are known to be   involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions.
The disorder predictors and unfoldability of protein have been proven to be useful in advancing our understanding of disordered regions with potential impact to improve the success rate of structural genomic efforts. The formation of protein crystals can be obstructed by the presence of highly flexible and disordered regions. The understanding of disordered regions as a result of structural bioinformatics efforts allowed us to extract and analyze patterns associated with these regions which can further help to overcome several potential bottlenecks for a successful structural genomics and X-ray crystallography. The unfoldability and disorder prediction of plasmepsins shows that plasmepsin IX has highly unfolded structure with the presence of six disordered regions [62]. The predicted secondary structure composition of plasmepsins was determined using the NPS@ server and Gor IV method which generates percentage of alpha helix, beta sheets and TM helix present. The results revealed that 40% beta sheets dominated among the secondary structure followed by 15-20% alpha helix. The disordered regions and the percentage of secondary structure of each plasmepsins are shown in Table 4.
To the better understanding of the evolutionary relationships plasmepsins from P. falciparum, a phylogenetic analysis was performed by using MEGA 5.1 (Figure 3). The evolutionary history was evaluated based on the Neighbor-Joining method by bootstrap technique. This technique allows evaluating and judging the strength of branching pattern of tree. Also, the bootstrap value given at the node of each branch determines the probability of correct or incorrect relationship such that the value above 50% represents higher confidentiality of relationship and vice versa. The phylogenetic tree is separated into four distinct cluster groups with varied values where plasmepsin I, II, IV and HAP are present in single cluster. Cluster I contains plasmepsins, which are located in the acidic food vacuole and are active during the intra-erythrocytic phase of the life cycle. The sequence, structure and function of these proteins have more than 60% identity and fall together in a single cluster. The second cluster contains the plasmepsin IX and X, where these two proteins are known to express concurrently with plasmepsin I-IV, but their sequence identity is less than 30%. The third cluster contain plasmepsin VI, VII and VIII express within the sporogonic cycle in the mosquito and are functionally different from the other cluster of plasmepsins, Plasmepsin V forms a separate cluster indicating that this protein is functionally different from the   other plasmepsins. It is an integral membrane protein present in the endoplasmic reticulum of the parasite and helps to export hundreds of proteins from the parasite to the host cell. These studies thus indicate that each cluster family is separated based upon its occurrence and localization.
For several years, the structure-based drug design of anti-malarial compounds targeting plasmepsin and its inhibitors has received much attention due to their potential biomedical use. The X-ray crystallographic structure of plasmepsin I, II, IV and HAP are well known but other plasmepsin structures are not known. The PDB structures of plasmepsin I-IV and HAP structure were downloaded from the PDB database. These four structures were superposed and highlights the structures having similar secondary structure fold when compared to that of eukaryotic aspartic proteases. The single aminoacid protein chain of these plasmepsins folded into a topologically similar way consisting of two major beta-hairpin in their domain region along with two catalytic aspartic acid residues present in the N-and C-terminal of the beta-hairpin structure. The PDB structure of plasmepsin II (1SME) with its disulfide linkage and active site residues is shown in Figure 4a.
The unknown structures of other plasmepsins (Plasmepsin V, VI, VII, VIII, IX and X) were predicted using SWISS-MODEL and PHYRE structure prediction server. Plasmepsin V was modeled based on the X-ray crystallographic structure of plasmepsin V from Plasmodium vivax that served as the template 4ZL4 [63]. The missing regions in the crystal structure of plasmepsin V of Plasmodium vivax were also modeled in the built structure of plasmepsin V from Plasmodium falciparum (Figure 4d). The other plasmepsins VI-X were modeled against the PDB structure of cathepsin D (4OBZ), the most closely related human aspartic protease. All the single chain modeled structures except plasmepsin V were further superimposed over the crystal structure of cathepsin D from Homo sapiens [64] (Figure 4c). Plasmepsin V was superposed against its homolog from Plasmodium vivax (Figure 4d) and we also superposed the structure of all known plasmepsins, plasmepsin I-IV, together (Figure 4b). The structure of all the superposed model showed a similar topological fold which represent that the ten plasmepsins comes into the same fold of pepsinlike superfamily.
The quality and reliability of modeled structure was checked by several structural assessment methods like Z-score, RMSD and The superposed structure of all the known plasmepsins where green, yellow, magenta and cyan represents plasmepsin I, II, IV and HAP respectively. The red color circle represents the cavity of the proteins. c) The superposed structure of HAP (green color) against the modeled structure were plasmepsin VI (cyan color), VII (yellow color), VIII (magenta color), IX (brown color), X (gray color). d) The superposed structure of plasmepsin V (green color) over the structure of Plasmodium vivax (red color) and the blue circle indicate the extra regions which not present in the structure of plasmepsin V from Plasmodium vivax.  Table 5: The checking and validation of protein structures during and after the model refinement of plasmepsin V-X using various software's available in SAVES. The RMSD value of each model is predicted from DALI server, Z-score predicted using PROSA web server. The total number of outlier protein atoms are predicted from PROVE and averaged 3D-ID score calculated from VERIFY 3D using the modeled structures. The number of residues in the favored region, allowed region and outlier region is checked using Ramachandran plot.  Ramachandran plot. The Z-Score and the RMSD value of the modeled 3D structure were predicted using PROSA web server and Dali server and the results are tabulated in Table 6. The Dali server gives result as a pair of pre-computed structural neighbor, which has been sorted by Z-score and the Z-score lower than 2 are considered spurious. The RMSD has often been used to measure the quality of reproduction of a known (i.e., crystallographic) binding pose by a computational method and the RMSD value of each model were less than 1 indicate the best models. The Ramachandran plot provides an easy way to view the distribution of torsion angles of a protein structure [65].

Sl. No
It also provides an overview of allowed and disallowed regions of torsion angle values, serving as an important indicator of the quality of protein 3D structures. Based on the Ramachandran plot, almost all the plasmepsin models are having 90% residues in the allowed region, whereas plasmepsin V has 93% residues in their allowed region. This indicates the degree of correctness of modeling of the plasmepsin proteins. The models were further checked by VERIFY3D, the analysis revealed the number of amino acids have scored ≥ 0.2 in the 3D/1D profile and the result showed in Table 5. All the predicted structure validation depicted that plasmepsin V has a better modeled structure as compared to the other plasmepsin structures. This summary may be useful for biologists seeking a good crystallographic structure and aiming towards exploiting the pre modeled structures for docking analysis.
All the plasmepsins of P. falciparum are aspartic proteases, which constitute one of the major protease subclasses. They further distinguish each subclass based on its structural homology. Most of the aspartic proteases, including the plasmepsins and cathepsin D, are members of the pepsin family. Most of the known aspartic proteases have a well-defined subsite pocket to inhibit its substrate pepstatin. These subsites are located on both side of the catalytic site, found only in eukaryotes. On the basis of X-ray crystallography, it is well known that pepstatin binds to plasmepsin I-IV in its enzyme binding subsites, which is localized in the extended beta-strand conformation. The structures also contain a beta-hairpin turn, which helps to interact with substrate and inhibitors by covering the binding cavity. The substrate binding cavity and its interacting residues are similar in plasmepsin I-IV and HAP protein. The co-crystal structure of plasmepsin II (PDB ID: 1W6I) is of 329 amino acid residue and its catalytic binding sites contain 1 non-polar residue (Val 78), 5 polar residues (Gly 36, Asn 76, Ser 79, Tyr 192 and Ser 218) and 2 negatively charged residues (Asp 34 and Asp 214) (Figure 5a). The ligand pepstatin formed five H-bond with the catalytic residues ( Figure 5b). The other amino acid residues Ser 37, Tyr77, Gly 216, Thr 217, Phe 241, Leu 242 and Ile 290 were also involved in the hydrophobic interactions with pepstatin.
The modeled structures of all the plasmepsins were superposed over plasmepsin II, which was further, used for docking studies. The binding sites of each model were predicted based on the alignment of the binding residues of plasmepsin II. An automated docking tool, Autodock Vina that works by Lamarckian Genetic Algorithm, performed molecular docking studies [66]. Their docking score between ligand and model predicts the strength and the binding activity of a binding complex. The docking score of pepstatin with each plasmepsin model were calculated ( Table 6). The docking score of pepstatin with plasmepsin I-VI and plasmepsin VII-X were below a score of -7 affinity, whereas HAP and plasmepsin VI have a binding score of -6.5 and -6.6 respectively. The least binding score was observed -5.1 with plasmepsin V indicating that plasmepsin V have a low binding affinity towards pepstatin. The least binding of plasmepsin V with pepstatin as compared to other plasmepsins might be because of low volume of the cavity due to the loop located at the cleft of the binding pocket. We further designed some small molecules similar to the pepstatin and used them as a ligand for docking studies. The shape of the binding pocket of plasmepsin V (Figure 6a) was observed to complement the shape or pose of the ligand and the grid was made accordingly for the docking studies ( Figure  6b). Compared to the peptide molecules the designed small molecules showed more binding affinity towards plasmepsin V (Table 7). Almost all the molecules except M2 produced a docking score less than -8 kcal/ mol and M2 showed an affinity of -7.7 kcal/mol. The molecule M9 showed lowest binding score with two hydrogen bonding between GLY 367 and ASP 11. The other molecules M5 and M7 with model have hydrogen bond between GLY 367. The possible interacting residue of plasmepsin V model with each ligands were shown in Figure 7 and Table 7. These results might be helpful for structure based inhibitor designing for plasmepsin V. This study opens the wide area of research focusing on synthesis of these molecules for in vivo studies which may also lead to design new inhibitors for plasmepsin V.
The overall study of plasmepsins and the experimentally known results indicate that these proteins play an extensive role in the survival of parasite inside the host cell. Plasmepsin I-IV are situated in the acidic food vacuole and are active only during the intra-erythrocytic phase of the life cycle. Since, they are involved in the degradation of host hemoglobin; their inhibition is considered to be a good anti-malarial strategy. Plasmepsin VI-VIII is expressed within the sporogonic cycle in the mosquito, but not much research has been done on their functional part. Hence, these plasmepsins expressed in the intra-erythrocytic stage may not be suitable for designing drugs against malaria at this point of time. Similarly, plasmepsin IX and X are expressed concurrently with plasmepsin I-IV, but are not transported to the food vacuole. Their large molecular weight, disordered regions and their inefficiency to build a properly folded structure are major obstructions for designing good inhibitors against them. Plasmepsin V present in the endoplasmic reticulum of the parasite helps to export hundreds of proteins into the host cell to remodel its erythrocyte surface. The recent preliminary docking result in this article helps in better understanding of plasmepsins and also unfastens wide area for research focusing on structural characterization and design & synthesis of small molecule that are potent candidate for anti-malarial therapy with respect to Plasmepsin inhibition.

Conclusion
The present study is focused on the in-silico characterization and structural investigation of plasmepsins from Plasmodium falciparum. The domain prediction revealed that all the plasmepsins contain a eukaryotic aspartic protease domain and have conserved active site pattern in their N-and C-terminal. A physicochemical characterization was performed by computing theoretical isoelectric point (pI), molecular weight, extinction coefficient and instability index together with the prediction of disulfide linkages, motif profiles, sub-cellular localization and disordered regions using various servers. The fold of modeled structures was compared with the known three dimensional structure plasmepsins and the models were validated using Ramachandran's map, PROCHECK and WHAT IF. The docking studies of plasmepsin were done with its known inhibitor pepstatin. We also designed some small molecule inhibitors for plasmepsin V which may serve as a good foundation for designing new inhibitors for plasmepsins table 8.