QSAR Models for Prediction of Binding and Inhibitory Properties of [(E)-2-R-vinyl]benzene Derivatives with Therapeutic Effects against Helicobacter pylori

Helicobacter pylori is a gram-negative bacterium that infects the luminal surface of the human gastric epithelium. Around one half of the world’s population is thought to be infected by this bacterium, which is able to develop diseases such as peptic ulcer or gastric cancer. Eradication of Helicobacter pylori is becoming increasingly difficult due to resistance to common antibiotics. In previous work we have shown that an essential protein, flavodoxin, constitutes a target for the development of novel, specific antibiotics against infection caused by this microorganism, and we have described compounds sharing the [(E)-2-R-vinyl]benzene scaffold that exhibit bactericidal properties against Helicobacter pylori cultures. Based on the affinity and activity of 24 such compounds we have now developed QSAR models for affinity and minimal inhibitory concentration that will guide the improvement of antibacterial compounds based in the [(E)-2-R-vinyl]benzene scaffold. The two models show high statistical correlation and predictive capacity. Discovering novel chemicals with specific antimicrobial properties against Helicobacter pylori, and presumably not affected by existing resistances, will be of great help for the treatment of the diseases associated with this bacterium. *Corresponding author: Javier Sancho, Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009, Zaragoza, Spain, Tel: (+34) 976 761286; Fax: (+34) 976 762123; E-mail: jsancho@unizar.es Received November 08, 2013; Accepted December 11, 2013; Published December 14, 2013 Citation: Galano JJ, Sancho J (2013) QSAR Models for Prediction of Binding and Inhibitory Properties of [(E)-2-R-vinyl]benzene Derivatives with Therapeutic Effects against Helicobacter pylori. Med chem 4: 306-312. doi:10.4172/21610444.1000157 Copyright: © 2013 Galano JJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
The classical Quantitative Structure-Activity Relationship (QSAR as we know them today) emerged in the first half of the 60s with the early works of Hansch and Fujita [1], the Free-Wilson method [2], the subsequent modification made by Ban [3] and the later approaches proposed by Kubinyi [4][5][6][7][8][9][10] and Topliss [11]. QSAR, assisted by a wealth of organic synthetic techniques, computer technologies and by the discovery of many potential therapeutic targets, has since become a central tool for chemists, biologists and biochemists in search of improving active compounds to use them as drugs for the treatment of a variety of diseases.
Helicobacter pylori (Hp) is a gram-negative bacteria that establishes life-long infections in the gastric mucosa of infected people [12,13]. Nowadays, about 50% of the global population is believed to be infected by Hp, with a lower prevalence in Europe and the United States but much higher in developing countries [14]. The presence of Hp in the human gastric mucosa is related to the development of diseases such as type B gastritis, chronic peptic ulcers and gastric neoplasias on individuals infected by that microorganism.
The traditional treatment used worldwide to combat Hp, known as the triple therapy (one proton pump inhibitor together with two wide spectrum antibiotics: clarithromycin and a choice of amoxicillin or metronidazole) [15], has unfortunately lost efficacy due to increased resistances. No new drugs have been developed in recent years to treat Hp infection, but some selective targets (one of them being the protein flavodoxin: Hp-Fld) have been identified [16]. Hp-Fld is an electron carrier essential for Hp viability [17,18] and it has been used in previous works [19,20] as target for the discovery of novel compounds that could be potential drugs against Hp. Four different biological responses (BR) or biological activities were assayed for those novel compounds, which are mainly derived from a common substructure represented in Figure 1. From those data we have gathered a database of 24 congeneric compounds and their respective biological activity values. We describe here two QSAR models for the binding affinity and the therapeutic potency of bactericidal compounds based on the [(E)-2-R-vinyl] benzene scaffold (Figure 1), which will guide the future improvement of Hp-specific bactericides.

Data set for QSAR analysis
Four Hp flavodoxin inhibitors were identified in previous work [19] and subsequently optimized [20]. Three of them, originally named C1, C2 and C4, were shown to display bactericidal activity against Hp. Compounds C1 and C2 are structurally related as they contain the common [(E)-2-R-vinyl]benzene substructure ( Figure 1). In this work, those two compounds are referred to as I and II to retain their initial numbering. Twenty two additional compounds: 7 and 15 analogues of I and II respectively, were selected from [20]. The numbering assigned to them in that work [20] is retained here for clarity (Table 1). Compounds I, II, plus the additional 22 compounds constitute the database that we have used for the development of the QSAR models.
Cluster analysis (CA) was used to assess the structural diversity of such compounds by checking that relatively balanced distributions of compounds for a given cluster number were obtained, and to design the training (80%) and test (20%) series. Agglomerative hierarchical clustering was performed starting with each point as a singleton cluster and then repeatedly merging the two closest clusters until a single all-encompassing cluster remained. The "centroid method" was used to merge objects into clusters, and distance between two clusters was defined as the squared Euclidean distance between their respective centroids [21]. The clustering was performed using the STATGRAPHICS 5.1 program [22].
As described elsewhere [20], the biological response variables (BR) tested for each compound were: affinity (K d ) of the compound/Hp-Fld complex, minimal inhibitory concentration (MIC), minimal cytotoxic concentration (MCC), and therapeutic index (TI=MCC/MIC). Normality for each BR (i.e., the fit to a Normal statistical distribution) was checked and, in cases where it was not met, some compounds were removed to obtain a dataset with Normal distribution. For QSAR analyses each BR was expressed as LogBR.

Molecular descriptors
A large number of molecular descriptors (MD) can be used in QSAR studies [23] (e.g., the latest version of the DRAGON program [24] calculates over 4800 MDs). The specific biological action of drugs may be described by a combination of hydrophobic, electronic and steric properties. A total of 35 MDs, mostly related to those properties and easily interpretable, have been selected for the current work. We have considered as hydrophobic descriptor the logarithm of the octanolwater partition coefficient (Log P) which has been calculated through Ghose-Crippen methodology [25,26] implemented in DRAGON 6. The four electronic descriptors used: total dipole moment (µ), total energy (E T ), HOMO eigenvalue and LUMO eigenvalue, were calculated  by quantum mechanical procedures using the MOPAC 6 software [27] with the Austin Model 1 (AM1) semi-empirical Hamiltonian [28] after full geometrical optimization of each molecule. Three additional descriptors were calculated based on the previous ones: electrophilicity index (ω) [29], chemical hardness (η) and softness (s) [30]. The polarizability property (Pol) [31] was calculated from the partition method implemented in HYPERCHEM 7.51 software [32]. Steric and structural parameters were obtained from DRAGON 6 software.
MDs exhibiting values above 25 were scaled (divided by their standard deviations). The QSAR models here developed are based on orthogonal MDs exhibiting very low inter-correlation coefficients. This was checked by subjecting MDs to an inter-correlation study. Description, classification and other statistic parameters of the MDs used are summarized in Table 2. The final MDs included in the best QSAR models derived were selected using genetic algorithms. The models selected were those displaying the best values of statistical parameters such as correlation coefficient, standard deviation and Fisher statistics.

QSAR validations
The capability of the QSAR models to predict correctly the BR values within a training set was measured by the coefficient of determination (R 2 ) and the cross-validation coefficient (q 2 ), here determined by leaveone-out (LOO) cross-validation. The possibility that the correlation found between the selected MDs and each dependent variable were just due to statistical chance was ruled out using Y-scrambling [33] (thirty iterations were performed). Commonly, models with R 2 > 0.75 and q 2 > 0.5 are required [34]. The ability of one of the QSAR models to predict correctly the BR values of an external series (not used as training set) was measured through the external coefficient of determination (0<R 2 Ext <1).

Inhibitor candidates
The sources of the 24 flavodoxin inhibitors selected here for the QSAR analysis, their purities, and the chemical characterizations of those made to order by Maybridge or synthesized at home, are reported in detail in Galano et al. [20].
Biological activity assays Table 1 shows the results of the biological responses determined per compound in Galano et al. [20]. The biological assays performed for determining such biological activities are also detailed there.

QSAR study for variants of inhibitors I and II
The observed improvement in the therapeutic index (TI) exhibited by some of the derivatives purposely designed in [20] prompted us to apply a classical QSAR approach to facilitate further optimizations. Twenty four structurally congeneric compounds, including I, II (Table  1) constituted the starting point. The values of the molecular descriptors finally selected for the QSAR models are summarized in Table S1. Hierarchical CA revealed several clusters ( Figure S1) evidencing a molecular diversity, which is also reflected in the value ranges of both MDs and BRs shown in Table 2.

QSAR model for LogK d :
This BR passed the normality test: the histogram in Figure 2 displays the typical Gauss bell curve of a Normal probability distribution, and the BR values fitted a straight line in the Normal probability plot (Figure 2). Typified kurtosis and asymmetry parameters also demonstrated compliance with normality (not shown). Six outliers (compounds 6, 35, 38, 40 and 45) were detected using standard statistical tests (residuals, standardized residuals, studentized residuals and Cook distances) and were removed. The training set, defined by CA comprised 15 Figure S2). The observed K d values determined by ITC and the predicted values are compared in Table 3. Compound 1 is predicted as the most affine one in this series whereas compound 37 is predicted as the least affine one, which fully coincides with the experimental results.

QSAR model for Log(1/MIC):
This BR did not initially met the normality requirement (Figure 2c and 2d), which was confirmed by both typified kurtosis and asymmetry values (not shown). Normality was obtained by removing compounds 7, 36 and 39. After this, compounds 37 and 40-42 and I were eliminated as outliers. Because as many as 8 compounds had to be discarded, we preferred to use the remaining 16 compounds as training set. Thus, no test set was extracted for external validation. The model combines three MDs: nCb-, Hy and ω, exhibiting low pair correlations (0.074 for nCb-and Hy; 0.004 for nCb-and ω; and 0.22 for Hy and ω). A high correlation was also obtained (R 2 =0.801), the q 2 (=0.682) value warrants a good internal predictability and the Y-scrambling analysis discards both chance correlation and chance prediction ( Figure S2). The observed Log(1/MIC) values and the predicted ones are compared in Table 4. The most potent compounds against H. pylori: 43 and 44 were correctly predicted by the model, whereas the least potent one: 33 was predicted as the second less potent one.

QSAR models for affinity and inhibitory properties of [(E)-2-R-vinyl]benzene derivatives
QSAR studies are based on the assumption that similar molecules have similar activities, and summarize existing relationships between chemical structures and biological activity within a dataset of structurally similar chemicals. Therefore, QSARs allow to predict the activities of new compounds using regression models relating a set of predictor variables to a biological response variable. Predictors may be physico-chemical properties or theoretical or structural parameters of compounds, while the response variables generally consist of biological activities such as binding affinity, potency, etc. We have performed classic QSAR analysis on compounds I and II, and analogues thereof, all sharing a [(E)-2-R-vinyl]benzene substructure. Several outliers were identified and excluded from the database. Reported sources for outliers in QSAR studies include flexible binding sites, inappropriate   Four different BRs were modeled. For two of them, we found either no model with good statistical parameters (QSAR model for Log(TI)) or a model that showed a poor correlation and poor external predictability (QSAR model for Log(1/MCC)). In contrast, the QSAR models for LogK d and for Log(1/MIC)) appear promising. As shown in eq. 1, the developed model for LogK d describes the affinity of ligands for Hp-Fld by means of electronic and constitutional terms. Equation 1 suggests that decreasing %X (percentage of halogen atoms) decreases LogK d , thus increasing the affinity of the compounds for Hp-Fld. It should be noticed that although this term is statistically significant, its contribution to the value of the BR is small. Similarly, decreasing TE (total energy) or η (chemical hardness) lowers the value of LogK d, thus increasing the affinity of the complex. The TE is related to the amount and type of bonds present in a given compound. In our context, its contribution to affinity suggests that reducing the number of heteroatoms and/or shifting from fluorine to iodine along the halogen series would increase the affinities of the compounds. On the other hand, η, whose contribution to LogK d is greater than that of TE, is inversely related to the reactivity of compounds. According to eq. 1, the lower the η value (e.g. higher the reactivity) the lower the LogK d value and hence higher the compound/Hp-Fld affinity. The model correctly identifies the more and less affine compounds and its external predictability is high.
The QSAR model developed for Log(1/MIC) is given in eq. 2. The model suggests that decreasing nCb-(number of substituted benzene carbon) will increase the inhibitory activity. This is in perfect agreement with the above conclusion drawn from LogK d model which states that decreasing the number of bonds (benzene Carbon substitutions in this case) could increase the affinity of the compounds and consequently its inhibitory potency. Besides, it also could reflect on steric issues related to binding to flavodoxin. Meanwhile, the term Hy (hydrophilic factor) has also a negative contribution to this BR, which indicates that more hydrophobic substituents contribute to a greater inhibitory effect. Finally, more electrophilic compounds (greater electrophilicity index: ω) also contribute to a greater inhibitory effect, which might indicate that the Hp-Fld binding site is an electron-rich site.

Conclusions
Two QSAR models for target affinity and inhibitory activities of [(E)-2-R-vinyl]benzene derivatives have been developed that show high statistical correlation and predictive capacity. They provide insights about the type of physical-chemical interaction of these inhibitors with their possible biological receptor (the flavodoxin protein) as well as simple structural information about how to optimize such compounds in order to guide the design of variants with improved affinity and improved inhibitory effect towards Hp cells. The discovery of novel chemicals with such specific antimicrobial properties against Hp is urgently needed at present for the treatment of the diseases associated with these bacteria.