O-Vanillin and Some of its Novel Schiff Bases: A Cheminformatic Approach to Identify their Biological Functions

Joseph VA1,2#, Georrge JJ3#, Pandya JH4 and Jadeja RN1* 1Department of Chemistry, Faculty of Science, The MS University of Baroda, Vadodara, Gujarat, India 2Department of Chemistry, Christ College, Rajkot, Gujarat, India 3Department of Bioinformatics, Christ College, Rajkot, Gujarat, India 4Department of Chemistry, Shree DKV Arts and Science College, Jamnagar, Gujarat, India #These authors are considered as Joint first authors


Introduction
A huge amount of protein sequences and chemical structure data has been generated from sequencing centres and synthetic chemistry laboratories as a result of the fast development in Bioinformatics and Cheminformatics. This interdisciplinary approach saved both time and money in identifying novel drug molecules for the known protein and nucleic acid targets through high throughput and virtual screening of chemical compounds. The problem of having huge number of chemical molecules with unknown biological functions was solved after 1990 when many chemical-diversity-related approaches like structural descriptor computations, structural similarity algorithms, diversified compound selections, classification algorithms and library enumerations were developed. To compliment this many filtering techniques to discriminate toxic and non-toxic compounds were also discovered alongside the Cheminformatics timeline [1][2][3][4].
The current medicinal chemistry research draws heavily from polypharmacology which is a multi-target approach wherein single chemical molecule interact with multiple targets instead of a single one. For example, many of the most effective drugs currently in use such as Gleevec (anti-cancer drug), psychiatry drugs -serotonin reuptake inhibitors -and aspirin (anti-inflammatory drug) act upon multiple targets rather than a one to one mechanism [5][6][7][8].
Computational target fishing is a novel approach which predicts the biological function of small molecules by identifying their interacting proteins. It involves various chemoinformatics tools, databases, and machine learning algorithms [9][10][11]. Many previous studies evidenced the prediction of multi-target small molecules [12] through target fishing approach [13][14][15]. The current study is focused on the prediction of biological target through target fishing approach for the novel o-vanillin and some of its novel Schiff bases which are synthesized by our group [16][17][18]. The obtained results were confirmed via docking studies.

Materials and Methods
The synthesised seven O-vanillin chemical molecules were used to carry out this Cheminformatics study. The list of molecules with two dimensional structures are shown in Figure 1.

Ligand preparation
It involves the addition of hydrogens, 2D to 3D conversion, bond lengths and bond angles fixation, energy minimisation with correct chiralities, ionization states, tautomers, stereochemistries and ring conformations. All Molecules were prepared by using LigPrep module of Schrodinger [19].

Drug likeliness property prediction
QikProp module of Schrodinger is used to predict drug likeliness of prepared molecules. It predicts widest variety of relevant properties such as LogP, MDCK, HERG, Lipinski's rule of 5 and many others [20]. The QikProp screened molecules were exported and taken for further analysis.

Toxicity prediction
ToxPredict tool was used to predict the toxicity level of molecules obtained from the previous stage. It predicts and reports on toxicities including several parameters such as carcinogenicity, mutagenicity, human toxicological hazards and other 20 parameters [21]. Toxicity is the adverse effect of a chemical compound on the biological functions.

Biological function prediction
A diverse range of publicly available Bioinformatics and Cheminformatics tools and databases were used to identify protein targets and biological activity of screened molecules from previous steps. Due to limitation in the accuracy of each tool, various tools were employed to get more accurate results.

Abstract
Ortho-Vanillin (2-Hydroxy-3-methoxybenzaldehyde) is an organic solid present in the extracts and essential oils of many plants. Its functional groups include aldehyde, ether and phenol. Recent years, most ortho-vanillin is used in the study of mutagenesis and as a synthetic precursor for pharmaceuticals. The current study is focused on the prediction of biological target through target fishing approach for the novel o-vanillin and some of their Schiff bases which are synthesized by our group. Various tools and databases were employed to identify the biological function of the synthesized O-vanillin derivatives, the obtained results were confirmed through docking and molecular dynamics simulation studies.
Therapeutic Target Database (TTD) was used with the cut off "tanimoto coefficient" 0.85 to obtain the drugs similar to the query compounds. TTD is a constantly updated database incorporating information on the subject of potential targets and the corresponding approved, clinical trial and analytical drugs. It also contains more than 2300 targets, which includes 388 successful and 461 clinical trial targets, 20600 drugs, which includes 2003 approved and 3147 clinical trial drugs, 20,000 multitarget agents against almost 400 target-pairs and the activity data of 1400 agents against 300 cell lines [22].
Super Target, was also used to explore the drug-target relationship with the similarity of 0.9 as the cut off value [23]. Another resource used was STITCH, which has been developed to provide comprehensive protein chemical interaction information from various metabolic pathways, crystal structures and binding experiments. The predicted functional interaction partners were obtained by STITCH [24].
DRAR-CPI server is used to identify the interacting proteins. It is a server for identifying drug repositioning potential and adverse drug reactions via the chemical-protein interactome. On submissionof a scaffold, the interactions of molecule across the targetable proteins, their PDB ID, function, and dockingscore are displayed [25]. The next resource is Chem Mapper, a free web server for computational drug discovery based on the concept that compounds sharing high 3D similarities may have relatively similar target association profile. It integrates more than 305000 chemical structures with pharmacology annotations, from commercial and public chemical catalogues which includes BindingDB, chEMBL, DrugBank, Kyoto Encyclopedia of Genes and Genomes (KEGG), Protein Data Bank (PDB). ChemMapper performs the 3D similarity searching, ranking, and superposition. The query molecules align with each target compound in the database and calculate the 3D similarity scores and the top most similar structures are returned. Based on this result a chemical-protein network is constructed and a random walk algorithm is taken to compute the probabilities of the interaction between the query structure and proteins which are associated with hit compounds. These potential protein targets are ranked by the standard score of the probabilities. ChemMapper can be useful in a variety of polypharmacology, drug repurposing, chemical-target association, virtual screening, and scaffold hopping studies [26]. The protein target which hit with score value one is considered from ChemMapper.
All the above performed resources execute similarity search based on the input structure of ligand molecules. Pharmacophore based similarity search is a new method to find similar molecules and its targets with higher accuracy. Pharmacophore is defined as a 3D structural feature that illustrates the interaction of a ligand molecule with a target receptor in a specific binding site [27]. To achieve this, the PharmMapper is used. It is an online target identification tool based on pharmacophore mapping. It has large, in-house repertoire of pharmacophore database extracted from all the targets in TargetBank, DrugBank, BindingDB and Potential Drug Target Database (PDTD). Over 7,000 receptor-based pharmacophore models are stored and accessed by PharmMapper. PharmMapper finds the best mapping poses of the query molecules against all the targets in PharmTargetDB and top potential drug targets as well as respective molecule's aligned poses obtained with z-score. The z-score greater than three was considered as a final molecule as the greater z-score are highly significant [28].
Enormous manual analysis was carried to obtain highly accurate and consensus final protein targets for each ligand molecules from all tools.

Protein structure retrieval
The three dimensional structure of finalized proteins which probably bind with the selected ligand molecules were obtained from Protein data Bank (PDB) along with its co crystalized inhibitor (Reference Ligand) [29].

Approved drugs and inhibitor retrieval
The available approved drugs and inhibitors of each target were used for the comparative analysis in docking procedure. The approved/ experimental drugs for the proteins, which finalized from the previous steps were retrieved from DrugBank. DrugBank is a comprehensive, high-quality, freely accessible, online database containing information on drugs and drug targets [30]. The inhibitors for each target were obtained from chEMBL. ChEMBL is a database of bioactive druglike small molecules, it contains small molecule structures, calculated properties (e.g., logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g., binding constants, pharmacology and ADMET data) [31].

Protein-Ligand docking
The protein-ligand interactions were further confirmed by proteinligand docking with the obtained proteins from the above steps and the respective chemical molecules. The molecular docking studies were carried out for all proteins separately with their respective synthesised molecule, approved/experimental drugs, inhibitors obtained from chEMBL, and reference ligand which is naturally present in PDB structure of protein using Molegro Virtual Docker (MVD). It has two docking search algorithms: MolDock Optimizer and MolDock Simplex Evolution (SE). MolDock Optimizer is the default search algorithm in MVD. In order to dock the receptor and ligand, the receptor was prepared from the "prepare molecule" option provided. Then, for grid searching, cavities were generated using the "detect cavity" option. Finally, the ligands and the targets obtained from the previous steps were provided in an sdf file format for docking using the docking wizard. During docking, the following parameters were fixed: number of runs 10, population size 50, crossover rate 0.9, scaling factor 0.5, maximum iteration 2,000, and grid resolution 0.30 [32]. The obtained results were analysed in comparison with an already reported inhibitors.

Visualization of results
The software Pymol and Molegro Virtual Docker (MVD) were used to visualize the docked result. PyMOL is a powerful and comprehensive molecular visualization product for rendering and animating 3D molecular structures. Molegro Virtual Docker is an integrated platform for predicting protein-ligand interactions. Molegro Virtual Docker handles all aspects of the docking process from preparation of the molecules to determination of the potential binding sites of the target protein and prediction of the binding modes of the ligands [32].

Molecular dynamics simulation studies
Molecular dynamics (MD) simulations are important tools for understanding the physical basis of the structure and function of biological macromolecules. The early view of proteins as relatively rigid structures has been replaced by a dynamic model in which the internal motions and resulting conformational changes play an essential role in their function. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamical evolution of the system. In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, where forces between the particles and their potential energies are calculated using interatomic potentials or molecular mechanics force fields [33]. The MD simulations of best docked complexes were performed using a Desmond Molecular Dynamics module of Schrodinger, with Optimized Potentials for Liquid Simulations (OPLS) all-atom force field 2005 [34,35]. The complexes were prepared before simulation by the protein preparation wizard. Prepared protein-ligand complexes were then solvated with an SPC water model in a triclinic periodic boundary box. To prevent interaction of the protein complex with its own periodic image, the distance between the complex and the box wall was kept 10 Å. The energy of the prepared systems was minimized for 5000 steps using the steepest descent method or until a gradient threshold of 25 kcal/mol/Å was achieved. It was followed by L-BFGS (Low-memory Broyden-FletcherGoldfarb Shanno quasi-Newtonian minimiser) until a convergence threshold of 1 kcal/mol/Å was met. For system equilibration, the default parameters in Desmond were applied. The equilibrated systems were then used for simulations at a temperature of 300 K and a constant pressure of 1 atm, with a time step of 2 fs. For handling long range electrostatic interactions Smooth Particle Mesh Ewald method was used, whereas Cutoff method was selected to define the short range electrostatic interactions. A cut-off of 9 Å radius was used. The system is allowed to run up to 20 nanoseconds (ns) to observe the better performance of the protein-ligand complex.

Ligand preparation, drug likeliness and toxicity prediction
The drug likeliness of prepared molecules were performed by QikProp module of Schrodinger. ToxPredict tool was used to predict the toxicity level of all the molecules. The results revealed that all seven molecules of O-Vanillin pass the drug likeliness and toxicity prediction.

Biological function prediction
Biological function prediction was carried out for drug, drug-like, small molecules similarity search and target similarity search. Results of all the databases and tools were compared with each other to find accurate and consensus targets. Finally, the obtained targets were tabulated in Table 1 with their detailed information, which includes PDB identification number and the disease involved. Four biological targets were predicted for o-vanillin.
Vanillin V-4BR predicted Orotidine 5-phosphate decarboxylase (ODCase) of Methanobacterium thermoautotrophicum (Archaea) as its target. It catalyzes the last step in the de novo pyrimidine biosynthesis pathway, converting OMP to UMP, which in turn serves as the source of all cellular pyrimidine nucleotides. ODCase continues to elicit keen interest not only because of its obvious importance in DNA and RNA synthesis, but also in cell growth and proliferation. In Methanobacterium thermoautotrophicum, controlling this enzyme is an rate limiting step for Methane production [36]. It does not have any inhibitors in the available chemical databases.
Prothrombin (coagulation factor II) is predicted as the target for vanillin MMM and V-2,4ME. Prothrombin is proteolytically cleaved to form thrombin in the coagulation cascade, which ultimately results in the reduction of blood loss [37,38]. Inhibition of prothrombin prevent blood coagulation, which is essential for many biochemical experiments and surgical procedures. Six approved drugs and many hundreds of inhibitors are available to deactivate prothrombin.
Genome polyprotein (RNA polymerase) of Hepatitis C Virus (HCV) were predicted as a target for vanillin V-4BR. It is an the RNA dependent RNA polymerase, which replicating the HCV's viral RNA by using the viral positive RNA strand as its template and catalyzes the polymerization of ribonucleoside triphosphates (rNTP) during RNA replication [39,40]. Hindering the activity of RNA polymerase terminate the proliferation of Hepatitis C virus. Many inhibitors are available for this protein, but suitable drug molecules are under clinical trial [41,42].
Vanillin MMM predicted Androgen receptor (AR) also known as NR3C4 of humans as its target. It is activated by binding of either of the androgenic hormones testosterone or dihydrotestosterone [43]. Although AR is involved in many physiological functions, it is a critical mediator of prostate cancer promotion [44,45]. More than 21 approved and 23 experimental drugs are targeting Androgen receptor [30].

Protein structure retrieval, Protein-Ligand docking and visualization
The three dimensional structure of all proteins listed in Table 1 were retrieved from Protein Data Bank (PDB). The protein-ligand interactions were further confirmed by protein-ligand docking using MVD to obtain better results. Docking results with energy value of O-Vanillin are shown in the Table 2 and the active site amino acid residues along with the number of hydrogen bond and its energy (Kcal/ mol) are shown in Table 3. The screenshot of docking results of all three o-vanillin shown in Figures 2, 3 and 4.
The O-vanillin V-4BR binds with Orotidine 5-phosphate decarboxylase (PDB ID: 1LP6) with docking score -90.52 kcal/mol, which is higher energy than the reference (Co-crystal) ligand (-96.77 kcal/mol) Cytidine-5'-monophosphate [36]. Total 12 hydrogen bonds with binding energy -13.6 kcal/mol is involved in the binding of reference ligand with protein but, only 4 hydrogen bonds with -7.  [37,38] and the collected inhibitors from chEMBL provide better docking score than the MMM and V-2,4ME, they failed in the drug likeliness property.

Molecular dynamics simulation studies
To analyse the stability and overall conformational changes of o-vanillin V-2,4ME with Prothrombin (PDB ID: 3C1K), the Root mean square deviation (RMSD), Root mean square fluctuations (RMSF) and the Protein-ligand contacts were studied through molecular dynamics simulation.

S No
Name of predicted target PDB ID Mechanism/ Disease involved  The RMSD of Prothrombin was steady throughout the simulation process except 10 to 14 ns, in which little higher flexibility was observed. But, the overall RMSD value of protein is ranging between 0.8 A° and 1.7 A° ( Figure 5). As the values are within the acceptable range of 1-3 Å [34], the protein doesn't undergo much conformational changes. But the ligand is highly flexible RMSD till 4 ns and later stage it shows the stability ( Figure 5). As the values observed are significantly similar with the RMSD of the protein at later duration, it shows the ligand is not likely to diffuse away from its binding site.
The Root Mean Square Fluctuation (RMSF) is useful for characterizing local changes along the protein chain and the ligand atom positions. Only six amino acids show high flexibility (Figure 6a) with the maximum value of 1.6 A°. Throughout the process, the ligand V-2,4ME shows the flexibility in the range of 0.9-2.5 A° (Figure 6b). The -O-CH 3 group has higher flexibility with 2.5 A°.
Protein interactions with the ligand were monitored throughout the simulation. Most of the active site amino acids had hydrophobic interactions and Water Bridges than the Hydrogen Bonds and Ionic interactions (Figure 7).
A timeline representation of the protein-ligand interactions and contacts summarized in Figure 8. The top panel shows the total number of specific contacts the protein makes with the ligand over the course of the trajectory. The bottom panel shows which residues interact with the ligand in each trajectory frame. Some residues make more than one specific contact with the ligand, which is represented by a darker shade of orange, according to the scale to the right of the plot. The active site amino acids have interaction with ligand average of 3.4 ns (17% of 20 ns). Most of them, discontinuously contact the ligand. A schematic of detailed ligand atom interactions with the protein residues is shown in Figure 9. The interactions that occur more than 10.0% of the simulation time in the selected trajectory (0.00 through 20.00 ns), are shown. The -OH group of ligand molecules contact the amino acid Glu 217 with one water molecule and Gly 216 with two water molecules.

Conclusion
The current study adopted novel ligand based approach in which the ligand information is available, but the biological activity is unknown. Various tools and databases were employed to identify the biological function of the synthesized O-vanillin derivatives. Finally, the binding modes of synthesized molecule with their predicted targets were studied by docking. The O-vanillin derivatives V-4BR, MMM and V-2,4ME were predicted and confirmed with its biological functions, which can help in the controlling of methane production in the archaea and prevent blood coagulation by inhibiting prothrombin. The docking studies show the ligand molecules have better binding capacity with the respective proteins through hydrogen bond interaction. But, in the molecular dynamic studies reveals that, to understand the hydrogen