Identification of Novel Hepatitis C Virus NS3-4A Protease Inhibitors by Virtual Screening Approach

Hepatitis C Virus (HCV) is a leading cause of cirrhosis and liver cancer worldwide. The replication and viral polyprotein maturation of HCV crucially depends on the cleavage of the polyprotein precursor into 10 viral proteins. The NS3-4A serine protease cleaves the nonstructural region of the polyprotein at four out of five junctions, thus is a promising target for the development of antiviral inhibitors. There are many inhibitors of HCV NS3/4A protease in the clinical trial and improvement indicating significant reduction in the viral infection rate. However, most PIs develop resistance associated variants while treatment and are restricted to one or two HCV genotypes. The second generation PI, MK-5172, is the only exception, which potently inhibits most variants associated with resistance to first generation PIs and is pan-genotypic. In this study, we investigated the potent lead compound(s) based on similarity search using the most potent proved protease inhibitor, MK-5172. We have performed virtual screening techniques using PubChem database available in NCBI to identify lead like molecules. The database has yielded 32 hits for 95% similarity search and the pharmacokinetic analysis (ADME) was performed for screened compounds. This structure based drug design identified three lead compounds that can work better against NS3/4A protease. Identification of Novel Hepatitis C Virus NS3-4A Protease Inhibitors by Virtual Screening Approach


Introduction
Hepatitis C virus (HCV) infects an estimated 200 million people worldwide. About 3% of the world's population is chronically infected with HCV and 3-4 million people are newly infected each year [1], often leading to cirrhosis, hepatic failure, and hepatocellular carcinoma. Hepatitis C virus (HCV) is an enveloped RNA virus which belongs to the Flaviviridae family and is the unique member of the Hepaci virus genus with at least 6 genotypes and numerous subtypes [2,3]. Efficient vaccine against HCV does not exist and the only available standard therapy is a combination of pegylated interferon-α (IFN-Peg) and ribavirin, efficient in only 50% of patients chronically infected [4]. Though, IFN α/Ribavirin helps in controlling the viral outbreak inside hosts, these are indirect antivirals and do not target a specific HCV protein or RNA element. Moreover, patients undergoing interferonbased therapies experience significant adverse effects, including flu-like symptoms, anemia, and depression. It is therefore of major importance to develop anti-virals acting directly on the viral protein. Most importantly, direct-acting antiviral agents (DAA) have the potential to improve Sustained virologic response (SVR) rates and minimize treatment duration.
HCV RNA genome encodes a polyprotein precursor which contains structural proteins (C, E1, E2 and p7) and non-structural proteins (NS1, NS2, NS3, NS4A, NS4B, NS5A and NS5B). Nonstructural proteins are involved in the replication of HCV genome and the building up of virions. Among them, NS2 and NS3 protease are essential enzymes for polyprotein processing. Therefore, they are the potential targets for screening anti HCV compounds [5]. NS3 protease activated by cofactor NS4A causes the cleavage of polyprotein at four junctions, producing the nonstructural proteins 4A, 4B, 5A and 5B and is therefore complementary for the replication of virus [6,7]. Hence, the direct role of NS3/4A protease in viral replicate machinery is proved an important therapeutic target for the cure of hepatitis C [8]. The active site configuration of NS3 protease comprises the residues His-57, Asp-81, and Ser-139 [9]. Many inhibitors of HCV NS3/4A protease are in the clinical trials and improvement; indicating significant reduction in the viral load of patients [10]. However, the active site for protease inhibitors is a long shallow groove and even a single-point mutation is sufficient to hinder the binding of inhibitors. Therefore, pipelines of compounds certainly essential to meet the drug resistance situations. In silico screening has proven to be useful to meet the challenges of antiviral drug discovery. Large virtual compound libraries are filtered by different computational screening methods such as docking, ligandbased similarity searches or pharmacophore based screening. This approach is helpful in reducing the number of candidate molecules to a smaller set of promising candidates that are then tested biologically. Since years, computational techniques like virtual screening have proven to be of great use to make the drug development process faster and less expensive [11].
In the past decades, a lot of work in the field of drug discovery and optimization for antiviral has been done using computational techniques [12][13][14]. Further development in computational techniques (protease inhibitors) which should have unique mechanisms of action, higher potency, pharmacokinetics, novel binding properties, improved mutant potency profile and pan-genotype activity (effective against all HCV genotypes (1 to 7)); with the help of virtual screening techniques considering the existing most efficient drug in the pipeline as the reference compound.

Methodology
Preparation of data set HCV NS3/4A protease structures were obtained from Protein Data Bank [15]. Currently, RCSB Protein Data Bank (PDB) is presented with about 108 structures of NS3-4A protease of HCV both in complex and free forms which provide a valuable source for the development of novel and potential drug against HCV. The native and mutant (A156T) structures of NS3-4A protease were taken from RCSB protein data bank (PDB) [16]. The corresponding PDB codes are 2P59 and 3SUG respectively. MK-5172 was used as the protease inhibitor for the ligand-based virtual screening of lead compounds in our investigation. The SMILES (Simplified Molecular Input Line Entry System) strings of the drug and lead compounds such as MK-5172, CID 58428446, CID 71276250 and CID 71276290 were collected from PubChem database and submitted to CORINA in order to generate 3D structures [17,18]. The three-dimensional structure of target proteins (2P59 and 3SUG) and lead compounds were energy-minimized using GROMACS package 4.5.3 [19] adopting the GROMOS43a1 force field parameters before performing the computational analysis.

Virtual Screening
Virtual Screening is an important tool in computer-assisted drug discovery and requires prior biological information to predict active compounds. It is the computational cognate of biological screening and is popularly used for lead identification in pharmaceutical research [20]. VP reduces the massive virtual chemical space of small organic molecules, to screen against a specific target protein, to a manageable set of promising compounds that exhibit the highest chance to be a lead compound [21]. The PubChem database (http://www.ncbi.nlm. nih.gov/pccompound) was used for searching similarity based lead compounds by employing the MK-5172 as query [22]. MK-5172 is a P2-P4 macrocyclic competitive inhibitor of NS3/4A protease with a broad HCV genotypic activity high mutant potency profile. It exhibits excellent selectivity over other serine proteases and shows improved inhibitory potency. Its molecular formula is C29H38N4O7 with a molecular weight of 554.63 and a density of 1.33. Structural likeness increases the chance to share a common bioactive profile so ligandbased similarity methods are preferred. Selecting compounds similar to known drugs increases the possibility of choosing a potential lead [21]. Based on a structural similarity search among small molecules, it is possible to retrieve compounds containing identical substructures that share affinity to the same receptors [23].The number of molecules found in the database after 95% similarity search is around 32 compounds. The candidate compounds were further screened using molecular docking studies, bioavailability and ADME analysis.

ADME and Toxicity
Molecular properties viz, membrane permeability, polar surface area and bioavailability are always identified with some basic molecular descriptors such as logP (partition coefficient), molecular weight (MW), TPSA or counts of hydrogen bond acceptors and donors in a molecule [24]. These molecular properties were used in devising ''rule of five'' [25]. The rule states that most molecules with drug likeliness and good membrane permeability have MW ≤ 500, calculated Octanol-water partition coefficient, log P ≤ 5 and hydrogen bond donors ≤ 5, acceptors ≤ 10 [26]. Therefore, Lipinski's Rule of Five was used to test the bioavailability characteristics such as adsorption, distribution, metabolism, elimination (ADME) of the lead compounds. In this study, these molecular properties and bioactivity for all the lead compounds were estimated using MOLINSPIRATION program (http://www.molinspiration.com/cgi-bin/properties) [27]. Successful drug discovery requires high-quality lead structures which may need to be more drug-like than commonly accepted [28]. Toxicity and poor pharmacokinetics should be eliminated in the early stages of drug discovery. Hence, the hits were further screened using drug-likeliness, toxicity characteristics, and drug score. These physico-chemical properties were thus calculated for the filtered set of hits using the program OSIRIS Property Explorer (http://www.organic-chemistry. org/prog/peo/) [29].
The OSIRIS program calculates the drug-likeliness based on a list of about 5,300 distinct substructure fragments created by 3,300 traded drugs as well as 15,000 commercially available chemicals yielding a complete list of all available fragments with associated drug-likeliness. The drug score combines drug-likeliness, cLogP, logS, MW, and toxicity risks as a total value which may be used to compute the compound's drug-score and its overall potential to qualify for a drug [29].

Molecular docking
The process of docking is involved with specification of ligand binding site in a receptor molecule and then docking the candidate ligands into the specified site. The lead compounds obtained from the ligand-based VS analysis were used in docking calculation and binding energy estimation. The SMILES strings were used for constructing the 3D-structures of all lead compounds using CORINA software. After docking, the compounds were ranked based on the geometric matching score with target proteins. The geometric matching score of MK-5172 with the target proteins (native and mutant structures) were used as reference for filtering the lead compounds. The energy minimized structures of NS3-4A protein were used as a template molecule to dock the screened inhibitors. In this study, rigid docking analysis was performed by means of Patchdock program (http://bioinfo3d.cs.tau. ac.il/PatchDock/) [30].
It is geometry-based molecular docking algorithm defined on shape-complementary principles. The PatchDock algorithm divides the Connolly dot surface representation [31] of the molecules into concave, convex, and flat patches. Then, complementary patches are matched to generate candidate transformations. Each candidate transformation is further evaluated by a scoring function that considers both geometric fit and atomic desolvation energy [32].
Finally, root mean square deviation (RMSD) clustering is applied to the candidate solutions to discard redundant solutions. The input parameters for the docking are the PDB coordinate file of the protein and ligand molecule. This algorithm has three major stages: (i) molecular shape representation, (ii) surface patch matching (iii) filtering and scoring. The refinement and rescoring of the docking solutions from Patch Dock was performed using Fire Dock. It consists of two main steps: (i) rearrangement of the interface side-chains and (ii) adjustment of the relative orientation of the molecules. The method accounts for the observation that most interface residues that are important in recognition and binding do not change their conformation significantly after complex process. It restricts sidechain movements and thus manages to reduce the false-positive rate noticeably. It is worth to mention that Fire Dock prediction results are comparable to current state-of-the-art refinement methods [33]. The service is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.

Molecular dynamics simulation
GROMACS package 4.5.3 was used to perform simulations for the native and mutant NS3-4A protease-MK-5172 as well for CID 71276250 molecule [34]. Gromos 43a1 force field was implemented with GROMACS package 4.5.3. With the help of periodic boundary conditions and the SPC water model, the protein was solvated in cubic 0.9 nm [35]. PRODRG server was used to generate topology of the ligand. This server uses GROMOS force field for generating topology file and assigning atom types [36]. Seven Chlorine (7 CLions) counter ions were added to neutralize the total charge of the system. 1000 steps of steepest descent energy minimization were carried out for the proteins. After energy minimization step, the system was equilibrated at constant temperature and pressure. The system was equilibrated at constant temperature of 300 K and at the constant pressure of 1 atm. Using an atom-based cutoff of 8 Å, the non-bonded list was generated. The long range electrostatic interactions were handled by particlemesh Ewald algorithm [37] whereas constrains bond lengths at their equilibrium values were handled by SHAKE algorithm handled long range electrostatic interactions and constrain bond lengths at their equilibrium values respectively [38]. The total simulation time was set to 1000 ps with integration time step of 2 fs. Trajectories were stored in traj.trr file and structural analysis was done at every picoseconds. Root mean square deviation (RMSD) was analyzed through Gromacs utilities g_rms.

Results and Discussion
We obtained the crystal structure of NS3-4A protease (2P59) and its mutant (3SUG) from PDB database. It is believed that similar structure compounds may have similar function.
Most importantly, the importance of virtual screening is highlighted number of times in the recent literatures [39][40][41][42]. Therefore, 95% similarity was applied to filter the candidate compounds. In silico ligand-based Virtual Screening result indicates that 32 compounds were identified for NS3-4A protease inhibition, similar to the currently active drug molecule, MK-5172. Many drug candidates fail in the clinical trials; reasons are unrelated in the potency against the intended drug target. Bioavailability, pharmacokinetics and toxicity issues are blamed for more than half of all failure in the clinical trials of HCV drugs. Therefore, it is essential to evaluate the oral bioavailability, pharmacokinetics and toxicity of small molecules.

Bioavailability analysis
The molecular properties and bioactivity for the candidate compounds was anticipated using Molinspiration program. The LogKow program [43] estimates the log octanol/water partition coefficient (logP) of organic chemicals and drugs by an atom/ fragment contribution method [44]. Molecular polar surface area (TPSA) is calculated based on the methodology as a sum of fragment contributions. PSA has been shown to be a very good descriptor characterizing drug absorption, including intestinal absorption, blood-brain barrier penetration and bioavailability. The two important predictors of oral bioavailability of drug molecules are Log P value and PSA values [45]. Thus, we calculated log P and PSA values along with other physiochemical properties such as molecular mass, the number of hydrogen bond acceptors, and the number of hydrogen bond donors for the all the 32 candidate compounds obtained from ligand-based VS. It is believed that lesser the nviolations (number of violations of Lipinski's Rule of Five) better is the drug molecule. The results showed that only 3 molecules (CID 58428446, CID 71276250 and CID71276290) had minimum 1violations, though MK-5172 shows 2 violations; which confirms that these molecules act as better drug molecules against NS3-4A protease (Table1).

Toxicity and physicochemical properties
The toxicity and drug properties of molecules were predicted using OSIRIS Property Explorer. It analyzes the drug likeness and thus the pharmacokinetic property by considering parameters like toxicity risks, PSA, clogP and solubility of the candidate compounds. clogP is a well-established measure of the compound's hydrophilic and therefore low logP/high hydrophilic values may indicate good absorption or permeation properties. It has been shown for compounds exhibiting reasonable probability of being well absorbed; their logP value must not be greater than 5.0. Drug solubility is an important factor that affects the movement of a drug from the site of administration into the blood. It is known that insufficient solubility of drug can lead to poor absorption [25]. The estimated log S value is a unit stripped logarithm (base ten) of a compound's solubility measured in mol/liter. Table 2 shows properties of all the candidate compounds. It is clear from the table that three compounds such as (CID 58428446, CID 71276250 and CID71276290) may fulfill the pharmacokinetics and could be considered as lead molecules for HCV NS3 protease inhibition. The toxicity risk predictor locates fragments within a molecule, which indicate a potential toxicity risk. Toxicity risk alerts are an indication that the drawn structure may be harmful concerning the risk category specified. From the data evaluated in Table 2, indicates that all the compounds including the standard MK-5172 possess some irritant property excluding only the compound 71276250.

Drug-Likeliness and Drug-Score
Drug-likeness is an important parameter because drug-like molecules exhibit favorable absorption, distribution, metabolism, excretion, toxicological (ADMET) parameters. Currently, there are many approaches to assess a compound drug-likeness based on topological descriptors, fingerprints of molecular drug-likeness structure keys or other properties such as clog P and MW. In this study, Osiris program [29] was used for calculating the fragment based drug-likeness of the lead compounds and comparing them with MK-5172. The drug score combines drug-likeliness, miLogP, log S, MW, and toxicity risks in one convenient value than may be used to judge the compound's overall potential to qualify for a drug. It is interesting to note that CID 71276250 have significantly higher drug-score value (0.27), compared to MK-5172 (0.13) and other lead compounds. Furthermore, almost all the compounds including MK-5172 shows irritant property with the only exception of this compound showing no toxicity risks at all. This result indicates that CID 71276250 has distinct property in comparison with other lead molecules considered in our study. The toxicity, drug-likeness, and drug-score results for the compound CID 71276250 are illustrated in Figure 1.

Molecular docking analysis
Protein-ligand docking was performed to gain an insight into the binding affinity of lead compounds to the target protein. In the docking study, we employed Patchdock algorithm along with Fire Dock refinement. It is believed that this combined approach of docking analysis is certainly helpful for the elimination of false-positive in our prediction. The docking result is shown in Figure 2. It is clear from the

Molecular dynamics simulation
RMSD details were analyzed using GROMACS package 4.5.3. RMSD is the measure of the deviation of the mutant structure from their native structure. RMSD analysis can give an idea of how much the three-dimensional structure has deviated over time. Figure 7 shows the stable binding of CID 71276250 with respect to MK-5172. MK-5172 has deviated more than of CID 71276250 molecule. Native MK-5172 acquired ~0. 20

Conclusion
Although there is no specific vaccine for HCV till now and the available standard therapy of Peg-IFN/Ribavirin is associated with many side effects (including depression, headache, fever, flu like symptoms, hemolytic anemia), there has been substantial progress in the research work and clinical improvement of novel antiviral drugs. NS3-4A protease is recognized as an important target because of its central role in the viral replication and survival inside patients and in confounding the innate immune response to viral infection. Developing inhibitors of the protease that can be orally available, have reduced toxicity and side effects, improved SVR, high mutant potency, pangenotypic and available at low cost is the most challenging aspect. Study shows that currently, MK-5172 is the drug that satisfies the requisites to most extent and also has reduced cross-resistance evidences. Our research based on structure based drug design and similarity search, identified three lead compounds. Thus, we conclude that the inhibitors computationally studied here may be potent drug candidates and their potency may be increased against HCV NS3/4A protease with relatively simple structural changes. Further investigation of this molecule using experimental approaches would be an interesting future direction.