Medical, Pharma, Engineering, Science, Technology and Business

**Fei Li ^{*}, Lulu Cao, Huifeng Wu and Jianmin Zhao**

Key Laboratory of Coastal Zone Environmental Processes, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS); Shandong Provincial Key Laboratory of Coastal Zone Environmental Processes, YICCAS, Yantai Shandong 264003, PR China

- *Corresponding Author:
- Fei Li

Key Laboratory of Coastal Zone Environmental Processes

Yantai Institute of Coastal Zone Research, China

**Tel:**+86-535-210902

**E-mail:**[email protected]

**Received Date:** May 14, 2014; **Accepted Date:** June 28, 2014; **Published Date:** July 05, 2014

**Citation:** Fei Li, Lulu Cao, Huifeng Wu, Jianmin Zhao (2014) Combined SVM-PLS Method for Predicting Estrogenic Activities of Organic Chemicals in the Coastal Water. J Coast Dev 17:388. doi:10.4172/1410-5217.1000388

**Copyright:** © 2014 Li F, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Coastal Zone Management

A data set of 517 natural, synthetic and environmental chemicals belonging to a broad range of structural classes have been tested for estrogenic activities (expressed as logREC10) to the estrogen receptor (ER) using a yeast twohybrid assay. In this study, quantitative structure- activity relationships (QSARs) were determined using two methods, partial least square (PLS) and support vector machine (SVM). The Q^{2}_{cum} of the PLS model is 0.678, indicating high robustness and good predictive ability. The correlation coefficient (R) between the observed and the predicted values is 0.870, indicating the predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Eight DRAGON descriptors were included in the PLS model, including Mor_{03p}, L_{3e}, R_{8p}, R_{Tv}^{+}, R_{8e}, R_{1p}^{+}, R_{7p}^{+} and HATSv, which implies that chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, polarizabilities and van der Waals volumes). Comparison of the results obtained from two models showed that the SVM method exhibited the best overall performances, with a RMS error of 0.145 logREC10 units for the whole set. Moreover, three linear QSAR models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.

Endocrine disrupting chemicals (EDCs); Estrogen receptor (ER); Quantitative structure activity relationship (QSAR); PLS; SVM

Scientific and public concern heightens over the potential health effects of exposure to environmental pollutants with endocrine disruption potential [1]. Diminished or excessive production of estrogens is a major problem related to prostate cancer, spinal bular musular atrophy, and female pattern baldness. Consequently, there is a need to develop screening and testing procedures for endocrine disrupting chemicals (EDCs).

Considering the high number of potential EDCs, this remains a labor intensive and time-costing operation. It is crucial to develop efficient and economical alternative modeling approaches for the purpose of predicting the estrogenic activities of potential EDCs. Quantitative structure activity relationship (QSAR) methods are the most promising and successful tools to provide rapid and useful meanings for predicting the biological activity and chemical toxicity. They are considered as an important part of the priority setting process by the endocrine disruptor screening and testing advisory committee (EDSTAC) [2]. QSAR are widely applied for the understanding of the mechanism of chemicals’ binding for the estrogen receptors (ER) [3-6], for androgen receptor (AR) and for several other members of the nuclear receptor family [7]. These include electrostatic models, comparative molecular field analysis (CoMFA) which considers the overall steric and electrostatic properties of the compound of interest, computer graphic and energy (electrostatic and van der Waals) based models for fit into DNA and common reactivity patterns (COREPA) which reflect the stereoelectronic features.

In this paper, a data set consisted of experimental values which were determined by Nishihara et al. [8], including 517 natural, synthetic, and environmental chemicals from a broad range of structural classes. The data set was used to construct global QSAR models for the whole data set and local models for specific well-known families. Some information descriptors were selected using Forward stepwise (FS) regression from the original 709 DRAGON calculated descriptors and were applied to construct an optimal model based on SVM. Another classical method, partial least square (PLS) [9] was utilized to establish QSAR model to compare the results with those obtained by SVM. In addition, some careful models for specific well-known families were examined in conjunction with knowledge of the recently reported ligand-ER crystal structures.

**Experiment and data set**

Because it is expected that the major key target of EDCs is the nuclear hormone receptor, which binds specifically to the steroid hormone and regulates its gene expression, the yeast two-hybrid assay has been developed. Unlike yeast-based assay (YES) [10], another reporter gene assay using a yeast two-hybrid cells, the method contains the coactivator, so that the system more closely resembles the mammalian hormonal system.A detailed description of the experimental methods is provided in Nishihara et al. [8].

The overall data set consisted of more than 500 organic chemicals, including natural substances, medicine, pesticides, and industrial
chemicals. **Table 1** shows a summary of the test compounds with the names of 55 positive compounds. Tested chemicals consisted of natural
substances (metabolites, oxidation products, etc.), medicines, food additives, pesticides, and industrial chemicals (PCBs, PCDFs, PAHs, phenols,
benzenes, phthalates and adipates, and others). The estrogenic activities to the ER, expressed as log unit of 10% relative effective concentration
(logREC_{10}), are listed in **Table 1**.

**Descriptor generation and selection**

Structures of chemicals were drawn with the Chem Draw computer program. These were then geometry-optimized with the PM3 Hamiltonian using the software package Chemoffice 6.0 program, and exported into a file format suitable for MOPAC analysis. The resulting geometry was transferred into the DRAGON software that was used to calculate molecular structural descriptors. Molecular descriptor meanings and their calculation procedure are summarized in the DRAGON software, and explained in detail, with related literature references, in the Handbook of Molecular Descriptors by Todeschini and Consonni [11].

Generally, more descriptors should be considered in QSAR study so as to better characterize molecular structures. However, if no significant relevant or irrelevant descriptors are included, the quality of prediction and robustness of the developed QSAR model may decrease, and its interpretation becomes more difficult. Hence, descriptors selection is necessary for QSAR study.

In this work, the FS regression was employed to select the optimal subset from an original set of 709 calculated descriptors, as also did and
described in other studies [12]. As a result, 13 descriptors were obtained, which are listed in **Table 2** with their physical-chemical meanings.

**PLS method**

PLS regression was adopted here to develop QSAR model, for this method can analyze data with strongly collinear, noisy and numerous predictor variables [9,13]. PLS regression was carried out using the Simca-S package (Umetrics AB, Sweden). The conditions for computation were as follows: cross validation rounds=7, maximum iteration=200, missing data tolerance=50% and significance level (p) limit=0.05. Within Simca, the number of significant PLS components (model dimensionality) is determined by cross-validation. Cross-validation simulates how well a model predicts new data, and gives a statistic Q2cum (cumulative Q2, Q2 means the fraction of the total variation of the dependent variables that can be predicted by a component) for the final PLS model [14]. Q2cum is a good measure of the predictive power and robustness of the model. When Q2cum of a model is larger than 0.5, the model is believed to have a good predictive ability [15]. Besides Q2cum, model adequacy mainly was measured as the number of PLS components (A), the correlation coefficient between observed and predicted values (R), and the significance level (p). In addition, a general standard error (SE) was adopted to compare the prediction precision of different models. SE was defined like that in multiple regression analysis, i.e,

(1)

where n stands for the number of observations in the training set.

In PLS-regression modeling, a predictor variable may be important for the modeling of Y. Such variables are identified by large PLSregression coefficients. However, a variable also may be important for modeling of X, which is identified by large loadings. A summary of the importance of an X-variable for both Y and X is given by a parameter, variable importance for the projection (VIP), which is a weighted sum of squares of the PLS-weights, with the weights calculated from the amount of Y-variance of each PLS component [9]. Therefore, terms with large values of VIP are the most relevant for explaining the dependent variable.

All the predictor variables are not necessarily to be included in a PLS model. Inclusion of redundant variables may lead to PLS models with low statistical significance [5]. Accordingly, the following PLS analysis process was followed to obtain an optimal model. First, a PLS model with all the predictor variables was calculated. Then each variable was eliminated and new PLS analysis was performed, leading to a series of new PLS models. The one with the largest Q2cum was selected. If there were several models with the same Q2cum, the model was selected that eliminated the variable for which the VIP was the lowest in the previous model. This procedure was repeated until two predictor variables were left. Finally, the model with the largest Q2cum was selected as the optimal PLS model.

**SVM method**

SVM is a new and very promising classification and regression method developed by Vapnik et al. [16]. A detailed description of the theory of SVM can be referred in several excellent books and tutorials [17,18]. SVMs are originally developed for classification problems; they can also be extended to solve nonlinear regression problems by the introduction of Vapnik’s ε-insensitive loss function. The SVM method has a number of interesting properties, including an effective avoidance of overfitting, which improves its ability to build models using large numbers of molecular proterty descriptors with relatively few experimental results in the training set. The application of SVM in regression can be expressed in the following way [14,19,20]:

Suppose the training data,

(2)

where xm is the independent variables assembly of No. m sample (n-dimensional); ym is the independent variables assembly of No. m sample which is a measured value; k is the total number of training set. The kernel idea of SVM algorithms is to make a regression hyper plane, which can do the best to fit samples in space. The linear function is formulated in the high dimensional feature space, with the form of function:

(3)

where Φ(x) is the high dimensional feature space, which is nonlinearly mapped from the input space x, w is the weight vector to be identified in the function, and b is the threshold.

The optimal regression function is given by the minimum of the functional,

(4)

where C is a pre-specified value, and are slack variables representing upper and lower constraints on the outputs of the system.

(5)

In this work, anε -insensitive loss function was used which can be presented in **Figure 1**. Based on the ε-insensitive loss function and Lagrange function, the original fitting problems can be transformed as the corresponding dual
Lagrangian form, which can be given by,

(6)

Or alternatively,

(7)

With constrains,

(8)

Solving Equation.(6) with constraints Equation.(8) determines the Lagrange multipliers: , and the regression function is given by Equation.(3), where

(9)

Finally, considering kernel function* K*(*x*, *x _{i}*), the space transformation of inner product operation can be realized. By introducing Lagrange multipliers and exploitingthe optimality constraints, decision function can take thefollowing form:

(10)

Where and α_{i}are the introduced Lagrange multipliers.

According to Karush–Kuhn–Tucker (KKT)conditions, only a number of coefficients among and willbe nonzero, and the data points corresponding tothem could bedefinedas support vectors, which can determine the hyper plane[20]. In this equation, *K*(*x*, *x _{i}*)refers to kernelfunction, including linear, polynomial, radial basisfunction (RBF), and sigmoid function.

The regression performance of SVM depends on the combinationof several parameters[19]. They are penalty factor *C*, *ε* of the*ε*-insensitive loss function, the kernel type,and its corresponding parameters. The penalty factor *C* is a regularizationparameter that controls the tradeoff between maximizing themargin and minimizing the training error. If *C* is too small,hen insufficient stress will be placed on fitting the trainingdata. If *C* is too large, then the algorithm will over fit thetraining data.The optimal value for*ε*depends on the type of noisepresent in the data, which is usually unknown. Even ifenough knowledge of the noise is available to select anoptimal value for*ε*, there is thepractical consideration ofthe number of resulting support vectors. “*ε*-insensitivity”prevents the total training set meeting boundary conditionsand so allows for the possibility of sparsely in the dualformulation’s solution. So, choosing the appropriate valueofεis critical from theory.The kernel type is another important one. In our study, the Gaussian radial basisfunction is selected, because it has only one kernel parameter and has been commonly used in regression, shown as below:

(11)

The kernel parameter *σ*controls the amplitudeof the Gaussian function andcontrols the generalizationability of SVM. We have to optimize *σ* and findthe optimal one. So we should take effective and reliable measures to set the three parameters in RBF-SVM. In this study, Random Search Technique is proposed.

The overall performance of SVM is evaluated in terms of *R* and a root-mean-squared error (RMSE) according to the equation below

(12)

Where y_{k} is the desired output, is the actual output of the SVM model, and *n* is the number of compounds in analyzed set.

The SVM model in our present study was implemented using the software LibSVM that is efficient software for classification and regression developed by Chih-Chang and Chih-Jen Lin [21]. All the algorithms used in this study were written inMatlab 7.0 and run on a personal computer (Intel Celeron-420 processor /1.66GHz 512MB RAM).

**PLS model**

After the FS regression, thirteen molecular structural descriptors are obtained and are listed in **Table 2** with their physical-chemical meanings.

The PLS regression was used to perform regression analysis, with log*REC*10 _{as} a dependent variable and thirteen selected 3D descriptors as independent variables. According to the variable selection procedure mentioned above, 8 descriptors (*Mor*_{03p}, *L*_{3e}, *R*_{8p}, *R*_{Tv}^{+}, *R*_{8e}, *R*_{1p}^{+}, *R*_{7p}^{+} and *HATS*_{v}) were obtained. The predicted log*REC*_{10} values by the PLS method are given in **Table 1**, and the statistical values of *Q*^{2}_{cum}, SE and correlation coefficient *R* are shown in **Table 3**. The linear function [3] was built as follows, with parameters defined in **Table 2**:

log*REC*_{10} = 8.253 + 0.282*Mor*_{03p} - 1.820*L*_{3e} - 0.988*R*_{8p} +

3.284*R*_{Tv}^{+} - 2.774*R*_{8e} + 0.567*R*_{1p}^{+} - 11.915*R*_{7p}^{+} - 0.149*HATS*_{v} (3)

*n*=55, *A=*2, *R*^{2}_{X(adj)(cum)}=0.587, *R*^{2}_{Y(adj)(cum)}=0.757

Where *A* is the number of PLS components,* R*^{2}_{X(adj)(cum)} and* R*^{2}_{Y(adj)(cum)} stand for cumulative variance of all the predictor variables and dependent variable, respectively, explained by all extracted components.So it can be concluded that two PLS components were selected in the QSAR model (Equation. 3), and the two PLS components explained 58.7% of the variance of the independent variables, and 75.7% of the variance of the dependent variable.* R*^{2}_{Y(adj)(cum)} should act as a criterion for optimal variable selection since they describe the performance of models.

*Q*^{2}_{cum} value of our universal QSAR model is as high as 0.678, indicating that the model has good predictive ability and robustness. **Figure 1** shows these predicted values of log*REC*_{10} versus experimental values. Concerning the goodness of fit of the model, the correlation coefficient(*R*) between the observed and the predicted log*REC*_{10} with multiple correlation coefficientsis 0.870. All the absolute residuals are less than 3×SE and it indicated that there were not outliers (**Figure2**). Therefore it can be concluded that the fitting results are satisfactory.

All the predictor variables are listed in **Table 4**. The *VIP* values indicate the significance of the variable in explaining the variance of the dependent variable.

**SVM model**

c=64.0, g=0.0078125, p=0.0625

R^{2}=0.888, MSE=0.021, SE=0.527

**Model validation**

Model validation is one of the most important processes of QSAR development [22]. Any model needs to be validated before it is used for “understanding” or for predicting new events such as the biological activities of new compounds or the yield and impurities at other process conditions. Many researchers apply the leave-one-out (LOO) or leave-some-out (LSO) cross-validation procedures. Another widely used approach to estimate model robustness is the so called y-randomization [23]. However, some researches suggested that the only way to estimate the true predictive power of a QSAR model is to compare the predicted and observed activities of an external test set of compounds that were not used in the model development [23-25].

To validate the developed QSAR model, approximately 60% of the compounds under study were selected randomly and used to develop a new
PLS model using the same descriptors, then the new model was used to predicate the log RP values of the remaining 40% compounds. The
procedure was repeated 10 times, and the final results are shown in **Table 3**. From the results, it indicates that the developed QSAR models have
good robustness and predictive ability.

**Comparison of the results**

**Figure 4** gives the comparison between the results obtained by PLS and SVM based on the SE. As shown in **Table 3**, the SVM model gives the
highest correlation coefficient R. It indicates that the SVM performed better than the PLS method. It also showed the better generalization ability.
The reason may be that the SVM method embodies the structural risk minimization principle which minimizes an upper bound of the
generalization error rather than minimizes the training error. This eventually leads to better generalization than neural networks, which
implement the empirical risk minimization principle and may not converge to global solutions.

**Mechanism interpretation**

From a practical point of view, interpreting the descriptors used in the models could provide some insight into factors that are likely to govern the ER binding of EDCs and help us to understand which interactions may play an important role in the binding process.

Model (3) extracts two PLS components that are relevant to 8 predictor variables. The factors governing logREC_{10} can be interpreted by PLS
weights of the variables included in model (3). The respective weights of the 8 calculated descriptors retained in the PLS model are shown in **Table 4**. From the PLS weights, one could see how much one descriptor contributes to the interpretation of the variance of estrogenic activity and
how they relate to each other.

The first PLS component is loaded primarily on the five descriptors, L_{3e}, Mor_{03p}, R_{7p}^{+}, R_{8e} and HATS_{v} (Mor_{03p} on the positive side, L_{3e},
R_{7p}^{+}, R_{8e} and HATS_{v} on the negative). 3D-MoRSE descriptors appearing in the model are important because they take into account the 3D
arrangement of the atoms without ambiguities (in contrast with those coming from chemical graphs), and also because they do not depend on
the molecules with great structural variance and being a characteristic common to all of them. This type of indices are based on the idea of
obtaining information from the 3D atomic coordinates by the transform used in electron diffraction studies for preparing theoretical scattering
curves [26]. In order to take into account the specific contributions of the atoms to the property being studied, different atomic properties can be
employed as weighting schemes. Mor_{03p} corresponds to signal 03 and is weighted by atomic polarizabilities. On the other hand, for L_{3e}, R_{7p}^{+},
R_{8e} and HATS_{v}, W*[1] and the corresponding coefficient in model (3) are both negative. L_{3e} is a WHIM descriptor weighted by atomic
Sanderson electronegativities, and it remarkably governs logREC_{10}, as indicated by its VIP, the largest among the predictor variables. R_{7p}^{+} and
R8e are R-GETAWAY descriptors, which are derived by combines the information provided by the molecular influence matrix with geometric
interatomic distances in the molecule [27]. The negative PLS weights W*[1] and coefficient of R_{7p}^{+} and R_{8e} in the model (3) indicate the
negative correlation relationship between them and logREC_{10}. This type of elaborated 3D descriptors is able to determine the shape and size of
the inhibitor. The descriptor R_{7p}^{+} is R maximal autocorrelation of lag 7 weighted by atomic polarizabilities and R_{8e} is autocorrelation of lag 8
weighted by atomic Sanderson electronegativities. HATS_{v} is an H-GETAWAY descriptor, which encodes both the geometrical information
given by the molecular influence matrix H and the topological information given by the molecular graph, weighted by selected atomic weights.
The selected descriptor HATS_{v} is a leverage-weighted total index weighted by atomic van der Waals volumes.

The second PLS component that also extract five descriptors, L_{3e}, R_{8e}, HATS_{v}, R_{8p} and R_{1p}^{+}. The negative PLS weights W* [2] and
coefficient of L_{3e}, R_{8e} and R_{8p} in the model (3) also indicate the negative correlation relationship between them and logREC_{10}. R_{1p}^{+} is also an
R-GETAWAY descriptor weighted by atomic polarizabilities.

In conclusion, the molecular descriptors most frequently selected by FS regression can be used to predict the logREC_{10} value of organic
chemicals. The estrogenic activity is related to distributed atomic Sanderson electronegativities, atomic polarizabilities and atomic van der Waals
volumes.

**Local QSAR models**

As shown in **Table 1**, the chemicals were divided into six “families” based on their structural characters. They were: (1) natural products and
related compounds; (2) medicines, food additives, and related compounds; (3) PCBs, PCDFs, PAHs, and related compounds; (4) Phenols; (5)
Benzenes and heterocyclics; (6) Phthalates and adipates. To increase our understanding of the structural requirements for a chemical’s binding to
ER, three linear QSAR models for three of the families were developed based on the PLS regression method, which are listed in the **Table 5**.

(1) natural products and related compounds

logREC_{10}=29.602 + 24.545E_{1e} - 29.320H_{1v} - 30.144 R_{4e} (5)

n=12, Q_{2}^{cum}=0.877, R=0.957, SE=0.549, p < 0.0001

(2) medicines, food additives, and related compounds:

logREC_{10}=7.913 + 1.910 RDF_{020m} - 6.706L_{3p} (6)

n=7, Q_{2}^{cum}=0.934, R=0.989, SE=0.420, p < 0.0001

(3) phenols:

logREC_{10}=16.474 + 0.040MWC_{01} - 1.846GATS_{4v} - 1.710GATS_{1e} -1.423Mor_{02m} + 0.807Mor_{21e} + 0.695E_{2e} - 8.037A_{s}+ 0.009G_{m} - 0.668V_{m} - 0.878HATS6_{p} (7)

n=25, Q_{2}^{cum}=0.915, R=0.983, SE=0.157, p < 0.0001

Inspecting the knowledge obtained above, it is possible to gain some information about what factors are likely to govern the ER binding ability for a specific family. This is beneficial for developing a credible model for prediction.

The two methods, PLS and SVM, were used to develop linear and nonlinear QSARs to predict estrogenic activities of 55 structurally diverse organic chemicals. Eight descriptors, which represent the features of the compounds responsible for the binding ability to estrogen receptor, were selected to develop global QSAR models. Inspection of the PLS model indicates that atomic Sanderson electronegativities, polarizabilities and van der Waals volumes may be most relevant factor controlling the binding behavior, affecting the space-matching between the ER protein and the ligand. The two resultant QSAR models were further compared with respect to statistical measures from a leave-some-out process, with SVM yielding the best model performance in terms of self-consistency and ability to predict the activity of the test chemicals. Additionally, 55 chemicals were divided into six well-known “families” according to their chemical structures. Three local QSAR models were developed, which gave us insight into the factors that govern the binding behavior for these specific chemicals.

- Zacharewski T (1997) In vitro bioassays for assessing estrogenic substances. Environmental Science & Technology 31: 613-623.
- Fang H, Tong WD, Branham WS, Moland CL, Dial SL, et al. (2003) Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor. Chem. Res. Toxicol 16: 1338-1358.
- Waller CL, Oprea TI, Chae K, Park HK, Korach KS,et al. (1996) Ligand-based identification of environmental estrogens. Chem Res Toxicol 9: 1240-1248.
- Tong WD, Perkins R, Xing L, Welsh WJ, Sheehan DM (1997) QSAR models for binding of estrogenic compounds to estrogen receptor alpha and beta subtypes. Endocrinology 138: 4022-4025.
- Tong W, Lowis DR, Perkins R, ChenY, Welsh WJ, et al. (1998) Evaluation of quantitative structure-activity relationship methods for large-scale prediction of chemicals binding to the estrogen receptor. J Chem Inf Comput Sci 38: 669-677.
- Wiese TE, Polin LA, Palomino E, Brooks SC (1997) Induction of the estrogen specific mitogenic response of MCF-7 cells by selected analogues of estradiol-17 beta: A 3D QSAR study. J Med Chem 40: 3659-3669.
- Costantino G, Macchiarulo A, Camaioni E, Pellicciari R (2001) Modeling of poly(ADP-ribose)polymerase (PARP) inhibitors. Docking of ligands and quantitative structure-activity relationship analysis. J Med Chem 44: 3786-3794.
- Nishihara T, Nishikawa J, Kanayama T, Dakeyama F, Saito K,et al. (2000) Estrogenic activities of 517 chemicals by yeast two-hybrid assay. Journal of Health Science 46: 282-298.
- Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab 58: 109-130.
- Coldham NG, Dave M, Sivapathasundaram S, McDonnell DP, Connor C, et al. (1997) Evaluation of a recombinant yeast cell estrogen screening assay. Environ Health Perspect 105: 734-742.
- Todeschini R (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany, 2000.
- Morales AH, Duchowicz PR, Perez MAC, Castro EA, Cordeiro MNDS, et al. (2006) Application of the replacement method as a novel variable selection strategy in QSAR. 1. Carcinogenic potential. Chemometr Intell Lab 81: 180-187.
- Ding GH, Chen JW,Qiao XL, Huang LP, Lin J et al. (2006) Quantitative relationships between molecular structures, environmental temperatures and solid vapor pressures of PCDD/Fs. Chemosphere 62, 1057-1063.
- Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, et al. (2004) Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J Chem Inf Comput Sci 44: 1257-1266.
- Golbraikh A, Tropsha A (2002) Beware of q(2)! J Mol Graph Model 20: 269-276.
- Vapnik (1998) Statistical Learning Theory. John Wiley & Sons: New York.
- Cristianini NS (2000) An Introduction to Support Vector Machines. Cambridge University Press: Cambridge, UK.
- Schölkopf BS, Alexander JS (2002) Learning with Kernels. MIT Press: Cambridge, MA.
- Liu HX, Xue CX, Zhang RS, Yao XJ, Liu MC,et al. (2004) Quantitative prediction of logk of peptides in high-performance liquid chromatography based on molecular descriptors by using the heuristic method and support vector machine. J Chem Inf Comput Sci 44: 1979-1986.
- Xue CX, Zhang RS, Liu HX, Yao XJ, Liu MC,et al. (2004) QSAR models for the prediction of binding affinities to human serum albumin using the heuristic method and a support vector machine. J Chem Inf Comput Sci 44: 1693-1700.
- Chang CCL, CJ LIBSV- A Library for Support Vector Machines.
- Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, (2003) Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des 17: 241-253.
- Golbraikh A, Tropsha A (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J Comput Aided Mol Des 16: 357-369.
- Norinder U (1996) Single and domain mode variable selection in 3D QSAR applications. Journal of Chemometrics 10: 95-105.
- 25) Zefirov NS, Palyulin VA (2001) QSAR for boiling points of "small" sulfides. Are the "high-quality structure-property-activity regressions" the real high quality QSAR models? J Chem Inf Comput Sci 41: 1022-1027.
- Duchowicz PR, Fernández M, Caballero J, Castro EA, Fernández FM (2006) QSAR for non-nucleoside inhibitors of HIV-1 reverse transcriptase. Bioorganic & Medicinal Chemistry 14: 5876-5889.
- Consonni V, Todeschini R, Pavan M (2002) Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. J Chem Inf Comput Sci 42, 682-692.

Select your language of interest to view the total content in your interested language

- Atmosphere
- Atmospheric Chemistry
- Atmospheric inversions
- Beach Evolution
- Beach Nourishment
- Beach Placers
- Biosphere
- Chemical Oceanography
- Coastal Anthropogenic Pressure
- Coastal Aquifer
- Coastal Bar/ Coastal Spit
- Coastal Deposition
- Coastal Development
- Coastal Dunes
- Coastal Engineering
- Coastal Erosion
- Coastal Geomorphology
- Coastal Irrigation
- Coastal Lagoon
- Coastal Mining
- Coastal Plain
- Coastal Ports
- Coastal Region/ Coastal Area
- Coastal Upwelling
- Coastal Waters
- Coastal Wetlands
- Crystallography
- Disaster Science
- Dynamical Oceanography
- Earth Science
- Ecology
- Environmental Oceanography
- Experimental Oceanography
- Fisheries Oceanography
- Fishery
- Gemology
- Geochemistry
- Geochronology
- Geomicrobiology
- Geomorphology
- Geosciences
- Geostatistics
- Glaciology
- Hydrologic
- Integrated Coastal Zone Management
- Limnology and Oceanography
- Marine Biotechnology
- Marine Chemistry
- Marine Ecosystem
- Marine Isotope Stage
- Marine Sediment
- Marine Spatial Planning
- Marine and Coastal Biodiversity
- Marine-Engineering
- Meridional Overturning Circulation
- Microbial Oceanography
- Mineralogy
- Oceanography and Marine Biology
- Operational Oceanography
- Physical Oceanography
- Sea Surface Salinity
- Shallow Water Equation
- Stratified Flow

- Total views:
**12020** - [From(publication date):

June-2014 - Sep 19, 2018] - Breakdown by view type
- HTML page views :
**8248** - PDF downloads :
**3772**

Peer Reviewed Journals

International Conferences 2018-19