Received date: November 08, 2013; Accepted date: December 11, 2013; Published date: December 14, 2013
Citation: Galano JJ, Sancho J (2013) QSAR Models for Prediction of Binding and Inhibitory Properties of [(E)-2-R-vinyl]benzene Derivatives with Therapeutic Effects against Helicobacter pylori. Med chem 4:306-312. doi: 10.4172/2161-0444.1000157
Copyright: © 2013 Galano JJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Medicinal Chemistry
Helicobacter pylori is a gram-negative bacterium that infects the luminal surface of the human gastric epithelium. Around one half of the world’s population is thought to be infected by this bacterium, which is able to develop diseases such as peptic ulcer or gastric cancer. Eradication of Helicobacter pylori is becoming increasingly difficult due to resistance to common antibiotics. In previous work we have shown that an essential protein, flavodoxin, constitutes a target for the development of novel, specific antibiotics against infection caused by this microorganism, and we have described compounds sharing the [(E)-2-R-vinyl]benzene scaffold that exhibit bactericidal properties against Helicobacter pylori cultures. Based on the affinity and activity of 24 such compounds we have now developed QSAR models for affinity and minimal inhibitory concentration that will guide the improvement of antibacterial compounds based in the [(E)-2-R-vinyl]benzene scaffold. The two models show high statistical correlation and predictive capacity. Discovering novel chemicals with specific antimicrobial properties against Helicobacter pylori , and presumably not affected by existing resistances, will be of great help for the treatment of the diseases associated with this bacteriu
Helicobacter pylori; Flavodoxin; Antibiotic; Antimicrobial; Inhibitor; QSAR
The classical Quantitative Structure-Activity Relationship (QSAR as we know them today) emerged in the first half of the 60s with the early works of Hansch and Fujita , the Free-Wilson method , the subsequent modification made by Ban  and the later approaches proposed by Kubinyi [4-10] and Topliss . QSAR, assisted by a wealth of organic synthetic techniques, computer technologies and by the discovery of many potential therapeutic targets, has since become a central tool for chemists, biologists and biochemists in search of improving active compounds to use them as drugs for the treatment of a variety of diseases.
Helicobacter pylori (Hp) is a gram-negative bacteria that establishes life-long infections in the gastric mucosa of infected people [12,13]. Nowadays, about 50% of the global population is believed to be infected by Hp, with a lower prevalence in Europe and the United States but much higher in developing countries . The presence of Hp in the human gastric mucosa is related to the development of diseases such as type B gastritis, chronic peptic ulcers and gastric neoplasias on individuals infected by that microorganism.
traditional treatment used worldwide to combat Hp, known as the triple therapy (one proton pump inhibitor together with two wide spectrum antibiotics: clarithromycin and a choice of amoxicillin or metronidazole) , has unfortunately lost efficacy due to increased resistances. No new drugs have been developed in recent years to treat Hp infection, but some selective targets (one of them being the protein flavodoxin: Hp-Fld) have been identified . Hp-Fld is an electron carrier essential for Hp viability [17,18] and it has been used in previous works [19,20] as target for the discovery of novel compounds that could be potential drugs against Hp. Four different biological responses (BR) or biological activities were assayed for those novel compounds, which are mainly derived from a common substructure represented in Figure 1. From those data we have gathered a database of 24 congeneric compounds and their respective biological activity values. We describe here two QSAR models for the binding affinity and the therapeutic potency of bactericidal compounds based on the [(E)-2-R-vinyl]benzene scaffold (Figure 1), which will guide the future improvement of Hp-specific bactericides.www.omicsonline.org/open-access/prevalence-and-determinants-of-chestrelated-symptoms-of-acute-respiratory-tract-infections-among-children-below-5-years-in-upper-e-2161-105X-1000366.php
Data set for QSAR analysis
Four Hp flavodoxin inhibitors were identified in previous work  and subsequently optimized . Three of them, originally named C1, C2 and C4, were shown to display bactericidal activity against Hp. Compounds C1 and C2 are structurally related as they contain the common [(E)-2-R-vinyl]benzene substructure (Figure 1). In this work, those two compounds are referred to as I and II to retain their initial numbering. Twenty two additional compounds: 7 and 15 analogues of I and II respectively, were selected from . The numbering assigned to them in that work  is retained here for clarity (Table 1). Compounds I, II, plus the additional 22 compounds constitute the database that we have used for the development of the QSAR models.
aThe same number of compound given in 
bValues calculated as MCC/MIC
c,dDerivatives of inhibitor I and II respectively
Table 1: Observed values for tested BRs in selected compounds.
Cluster analysis (CA) was used to assess the structural diversity of such compounds by checking that relatively balanced distributions of compounds for a given cluster number were obtained, and to design the training (80%) and test (20%) series. Agglomerative hierarchical clustering was performed starting with each point as a singleton cluster and then repeatedly merging the two closest clusters until a single all-encompassing cluster remained. The “centroid method” was used to merge objects into clusters, and distance between two clusters was defined as the squared Euclidean distance between their respective centroids . The clustering was performed using the STATGRAPHICS 5.1 program .
As described elsewhere , the biological response variables (BR) tested for each compound were: affinity (Kd) of the compound/Hp-Fld complex, minimal inhibitory concentration (MIC), minimal cytotoxic concentration (MCC), and therapeutic index (TI=MCC/MIC). Normality for each BR (i.e., the fit to a Normal statistical distribution) was checked and, in cases where it was not met, some compounds were removed to obtain a dataset with Normal distribution. For QSAR analyses each BR was expressed as LogBR.
A large number of molecular descriptors (MD) can be used in QSAR studies  (e.g., the latest version of the DRAGON program  calculates over 4800 MDs). The specific biological action of drugs may be described by a combination of hydrophobic, electronic and steric properties. A total of 35 MDs, mostly related to those properties and easily interpretable, have been selected for the current work. We have considered as hydrophobic descriptor the logarithm of the octanolwater partition coefficient (Log P) which has been calculated through Ghose-Crippen methodology [25,26] implemented in DRAGON 6. The four electronic descriptors used: total dipole moment (μ), total energy (ET), HOMO eigenvalue and LUMO eigenvalue, were calculated by quantum mechanical procedures using the MOPAC 6 software  with the Austin Model 1 (AM1) semi-empirical Hamiltonian  after full geometrical optimization of each molecule. Three additional descriptors were calculated based on the previous ones: electrophilicity index (ω) , chemical hardness (η) and softness (s) . The polarizability property (Pol)  was calculated from the partition method implemented in HYPERCHEM 7.51 software . Steric and structural parameters were obtained from DRAGON 6 software. MDs exhibiting values above 25 were scaled (divided by their standard deviations). The QSAR models here developed are based on orthogonal MDs exhibiting very low inter-correlation coefficients. This was checked by subjecting MDs to an inter-correlation study. Description, classification and other statistic parameters of the MDs used are summarized in Table 2. The final MDs included in the best QSAR models derived were selected using genetic algorithms. The models selected were those displaying the best values of statistical parameters such as correlation coefficient, standard deviation and Fisher statistics.
|DM or BR||Mean||SD||Min||Max||Description|
|RBN||2.3||0.6||2.0||4.0||Number of Rotatable Bonds|
|RBF||0.1||0.02||0.05||0.1||Rotatable Bond Fraction|
|nAB||8.5||3.4||6.0||17.0||Number of Aromatic Bonds|
|nHM||0.9||0.9||0.0||3.0||Number of Heavy Atoms|
|nHet||6.0||1.5||4.0||9.0||Number of Heteroatoms|
|nX||2.6||1.8||0.0||6.0||Number of Halogen Atoms|
|X%||11.2||8.0||0.0||27.8||Percentage of Halogen Atoms|
|nCIC||1.8||1.1||1.0||4.0||Number of Rings (cyclomatic number)|
|nCIR||2.3||1.7||1.0||7.0||Number of Circuits|
|MCD||0.5||0.2||0.3||0.8||Molecular Cyclized Degree|
|RCI||1.1||0.1||1.0||1.2||Ring Complexity Index|
|nCar||8.4||3.3||6.0||16.0||Number of Aromatic Carbon (sp2)|
|nCbH||4.6||2.2||0.0||10.0||Number of Unsubstituted Benzene Carbon (sp2)|
|nCb||3.8||1.6||2.0||6.0||Number of Substituted Benzene Carbon (sp2)|
|nHDon||0.04||0.2||0.0||1.0||Number of Donor Atoms for H-bonds (N, O)|
|nHAcc||4.3||1.9||2.0||8.0||Number of Acceptor Atoms for H-bonds (N,O,F)|
|Qmean||0.12||0.05||0.0||0.2||Mean absolute charge (charge polarization)|
|AMR||63.9||17.7||42.3||93.9||Ghose-Crippen molar refractivity|
|ALOGP||3.6||1.0||2.0||5.4||Ghose-Crippen oct.-wat. partition coeff. (LogP)|
|ALOGP2||13.8||7.6||4.1||28.6||Squared Ghose-Crippen oct.-wat. partition coeff.|
|SAtot||290.6||61.8||206.1||409||Total surface area from P_VSA-like descriptors|
|TE||3711||941||5199||2298||Total Energy (EV)|
|HOMO||-9.9||0.55||-10.83||-8.3||Orbital HOMO Energy|
|LUMO||-1.7||0.24||-2.3||-1.2||Orbital LUMO Energy|
|μ||4.4||1.4||1.2||7.1||Total Dipole Moment|
|Kd (µM)||7.7||9.3||0.4||40.0||Dissociation Constant|
|MIC (µM)||19.4||43.4||0.5||150.0||Minimal Inhibitory Concentration|
|MCC (μM)||18.7||29.3||0.01||100.0||Minimal Cytotoxic Concentration|
|TI||5.9||8.7||0.0||37.7||Therapeutic Index (MCC/MIC)|
Table 2: Statistical parameters, value ranges and description of the MDs and BRs analyzed.
The capability of the QSAR models to predict correctly the BR values within a training set was measured by the coefficient of determination (R2) and the cross-validation coefficient (q2), here determined by leaveone- out (LOO) cross-validation. The possibility that the correlation found between the selected MDs and each dependent variable were just due to statistical chance was ruled out using Y-scrambling  (thirty iterations were performed). Commonly, models with R2 > 0.75 and q2 > 0.5 are required . The ability of one of the QSAR models to predict correctly the BR values of an external series (not used as training set) was measured through the external coefficient of determination (0<R2Ext<1).
The sources of the 24 flavodoxin inhibitors selected here for the QSAR analysis, their purities, and the chemical characterizations of those made to order by Maybridge or synthesized at home, are reported in detail in Galano et al. .
Biological activity assays
Table 1 shows the results of the biological responses determined per compound in Galano et al. . The biological assays performed for determining such biological activities are also detailed there.
QSAR study for variants of inhibitors I and II
The observed improvement in the therapeutic index (TI) exhibited by some of the derivatives purposely designed in  prompted us to apply a classical QSAR approach to facilitate further optimizations. Twenty four structurally congeneric compounds, including I, II (Table 1) constituted the starting point. The values of the molecular descriptors finally selected for the QSAR models are summarized in Table S1. Hierarchical CA revealed several clusters (Figure S1) evidencing a molecular diversity, which is also reflected in the value ranges of both MDs and BRs shown in Table 2.
QSAR model for LogKd: This BR passed the normality test: the histogram in Figure 2 displays the typical Gauss bell curve of a Normal probability distribution, and the BR values fitted a straight line in the Normal probability plot (Figure 2). Typified kurtosis and asymmetry parameters also demonstrated compliance with normality (not shown). Six outliers (compounds 6, 35, 38, 40 and 45) were detected using standard statistical tests (residuals, standardized residuals, studentized residuals and Cook distances) and were removed. The training set, defined by CA comprised 15 compounds II, 1-3, 5, 7, 31-33, 36, 37, 39, 41, 42 and 44. The remainder compounds: I, 4, 34 and 43 constituted the test set.
The best model found for LogKd was:
Log Kd=0.02(±0.01)%X + 0.29(±0.12)TE + 1.04(±0.5)η - 2.77(±2.08) (1)
(s=0.167; F=25.41; SPRESS=0.237)
The model combines three MDs: %X, TE and η, which exhibit low pair correlations (0.265 for %X and TE; 0.359 for %X and η; and 0.002 for TE and η). The R2 (=0.874) indicates that there is a good correlation between the observed LogKd values and the MDs selected. On the other hand, q2 (=0.747) and R2ext (=0.924) values are indicative of good internal and external predictability of this model. Besides, Y-scrambling suggests there is neither chance correlation nor chance prediction in the model (Figure S2). The observed Kd values determined by ITC and the predicted values are compared in Table 3. Compound 1 is predicted as the most affine one in this series whereas compound 37 is predicted as the least affine one, which fully coincides with the experimental results.
|Compound||Y Obs.||Y Pred.||Residual|
Table 3: Observed and predicted LogKd values.
QSAR model for Log(1/MIC): This BR did not initially met the normality requirement (Figure 2c and 2d), which was confirmed by both typified and asymmetry values (not shown). Normality was obtained by removing compounds 7, 36 and 39. After this, compounds 37 and 40-42 and I were eliminated as outliers. Because as many as 8 compounds had to be discarded, we preferred to use the remaining 16 compounds as training set. Thus, no test set was extracted for external validation. The training set, constituted by compounds: II, 1-6, 31-35, 38 and 43-45 lead to the following model:
Log(1/MIC)=-0.25(±0.08)nCb- -8.48(±3.19)Hy + 1.38(±0.61)ω - 10.5(±4.21) (2)
(s=0.156; F=16.175; SPRESS=0.198)
The model combines three MDs: nCb-, Hy and ω, exhibiting low pair correlations (0.074 for nCb- and Hy; 0.004 for nCb- and ω; and 0.22 for Hy and ω). A high correlation was also obtained (R2=0.801), the q2 (=0.682) value warrants a good internal predictability and the Y-scrambling analysis discards both chance correlation and chance prediction (Figure S2). The observed Log(1/MIC) values and the predicted ones are compared in Table 4. The most potent compounds against H. pylori: 43 and 44 were correctly predicted by the model, whereas the least potent one: 33 was predicted as the second less potent one.
|Compound||Y Obs.||Y Pred.||Residual|
a Predicted as the highest inhibitory activity compound.
b Predicted as the lowest inhibitory activity compound.
Table 4: Observed and predicted Log(1/MIC) values.
QSAR model for Log(1/MCC): Compound 33 was removed to ensure the normality condition (Figure 2e and 2f). Then, compounds 5, 34 and 45 were removed as outliers. Compounds: I, II, 2-4, 6, 31, 32, 36-41, 43 and 44 were the training set, and compounds 1, 7, 35 and 42 were chosen as test set. The orthogonality between the selected MDs was checked and all pair wise correlations were <=0.3 (not shown). The developed QSAR model is shown in equation 3:
Log(1/MCC)=-0.33(±0.07)nCb- -3.71(±0.70)Hy - 0.31(±0.11)TE - 3.07(±0.62) (3)
(s=0.302; F=11.583; SPRESS=0.409)
R2 (=0.743) and q2 (=0.53) values are lower than in the previous QSAR models and R2ext (=0.512) is quite low, indicating poorer correlation and external predictability for this model.
QSAR model for Log(TI): We also tried to develop a model for Log(TI) but no model with good statistical parameters was found (R2<0.75, q2<0.5 and R2ext<0.75).
QSAR models for affinity and inhibitory properties of [(E)-2-R-vinyl]benzene derivatives
QSAR studies are based on the assumption that similar molecules have similar activities, and summarize existing relationships between chemical structures and biological activity within a dataset of structurally similar chemicals. Therefore, QSARs allow to predict the activities of new compounds using regression models relating a set of predictor variables to a biological response variable. Predictors may be physico-chemical properties or theoretical or structural parameters of compounds, while the response variables generally consist of biological activities such as binding affinity, potency, etc. We have performed classic QSAR analysis on compounds I and II, and analogues thereof, all sharing a [E)-2-R-vinyl]benzene substructure. Several outliers were identified and excluded from the database. Reported sources for outliers in QSAR studies include flexible binding sites, inappropriate calculation of selected MDs, different binding modes, etc. [35,36]. The specific reasons why those compounds are outliers are at present not known.
Four different BRs were modeled. For two of them, we found either no model with good statistical parameters (QSAR model for Log(TI)) or a model that showed a poor correlation and poor external predictability (QSAR model for Log(1/MCC)). In contrast, the QSAR models for LogKd and for Log(1/MIC)) appear promising. As shown in eq. 1, the developed model for LogKd describes the affinity of ligands for Hp-Fld by means of electronic and constitutional terms. Equation 1 suggests that decreasing %X (percentage of halogen atoms) decreases LogKd, thus increasing the affinity of the compounds for Hp-Fld. It should be noticed that although this term is statistically significant, its contribution to the value of the BR is small. Similarly, decreasing TE (total energy) or η (chemical hardness) lowers the value of LogKd, thus increasing the affinity of the complex. The TE is related to the amount and type of bonds present in a given compound. In our context, its contribution to affinity suggests that reducing the number of heteroatoms and/or shifting from fluorine to iodine along the halogen series would increase the affinities of the compounds. On the other hand, η, whose contribution to LogKd is greater than that of TE, is inversely related to the reactivity of compounds. According to eq. 1, the lower the η value (e.g. higher the reactivity) the lower the LogKd value and hence higher the compound/Hp-Fld affinity. The model correctly identifies the more and less affine compounds and its external predictability is high.
The QSAR model developed for Log(1/MIC) is given in eq. 2. The model suggests that decreasing nCb- (number of substituted benzene carbon) will increase the inhibitory activity. This is in perfect agreement with the above conclusion drawn from LogKd model which states that decreasing the number of bonds (benzene Carbon substitutions in this case) could increase the affinity of the compounds and consequently its inhibitory potency. Besides, it also could reflect on steric issues related to binding to flavodoxin. Meanwhile, the term Hy (hydrophilic factor) has also a negative contribution to this BR, which indicates that more hydrophobic substituents contribute to a greater inhibitory effect. Finally, more electrophilic compounds (greater electrophilicity index: ω) also contribute to a greater inhibitory effect, which might indicate that the Hp-Fld binding site is an electron-rich site.
Two QSAR models for target affinity and inhibitory activities of [(E)-2-R-vinyl]benzene derivatives have been developed that show high statistical correlation and predictive capacity. They provide insights about the type of physical-chemical interaction of these inhibitors with their possible biological receptor (the flavodoxin protein) as well as simple structural information about how to optimize such compounds in order to guide the design of variants with improved affinity and improved inhibitory effect towards Hp cells. The discovery of novel chemicals with such specific antimicrobial properties against Hp is urgently needed at present for the treatment of the diseases associated with these bacteria.
Galano, J.J. is a recipient of a Ph. D. studies fellowship awarded by Banco Santander and University of Zaragoza (Spain). This work was supported by grants BFU2010-16297 and BFU2010-19451 (Ministerio de Ciencia e Innovación: MICINN, Spain) and Grupo Protein Targets B89 (Diputación General de Aragón, Spain).