A Metabonomic Study on Samples of Cutaneous Leishmaniasis and its Correlation with the Selenium Level in the Blood Serum

H NMR spectroscopy of biofluids (urine, serum/plasma) and tissue generates comprehensive biochemical profiles of low-molecularweight endogenous metabolites [1]. NMR spectroscopy has emerged as a key tool for understanding metabolic processes in living systems [2]. Metabonomics is formally defined as “the quantitative measurement of the multi-parametric metabolic response of living systems to pathophysiological stimuli or genetic modification” [3,4]. This approach is complementary to proteomics and genomics and is applicable to a wide range of problems in diverse biomedical research areas. The applied ranges of spectroscopic techniques for metabonomics are often used in a so-called “hyphenated” mode (e.g., LC–NMR–MS); however, NMR-based metabonomics has proven to be particularly appropriate for the rapid analysis of complex biological samples [5-8]. Detailed inspection of NMR spectra and integration of individual peaks can give valuable information on dominant biochemical changes. Moreover, pattern-recognition methods can be used to map the NMR spectra into a lower dimensional space (than that implied by the number of points in the digital representation of the NMR spectrum), making it easier to determine any similarities of biochemical profiles among samples and to determine the biochemical basis for these similarities [5].


Introduction
H NMR spectroscopy of biofluids (urine, serum/plasma) and tissue generates comprehensive biochemical profiles of low-molecularweight endogenous metabolites [1]. NMR spectroscopy has emerged as a key tool for understanding metabolic processes in living systems [2]. Metabonomics is formally defined as "the quantitative measurement of the multi-parametric metabolic response of living systems to pathophysiological stimuli or genetic modification" [3,4]. This approach is complementary to proteomics and genomics and is applicable to a wide range of problems in diverse biomedical research areas. The applied ranges of spectroscopic techniques for metabonomics are often used in a so-called "hyphenated" mode (e.g., LC-NMR-MS); however, NMR-based metabonomics has proven to be particularly appropriate for the rapid analysis of complex biological samples [5][6][7][8]. Detailed inspection of NMR spectra and integration of individual peaks can give valuable information on dominant biochemical changes. Moreover, pattern-recognition methods can be used to map the NMR spectra into a lower dimensional space (than that implied by the number of points in the digital representation of the NMR spectrum), making it easier to determine any similarities of biochemical profiles among samples and to determine the biochemical basis for these similarities [5].
Leishmaniasis is a parasitic disease that is found in 88 countries throughout Africa, Asia, Europe and North and South America [9]. There are an estimated 12 million cases worldwide, with 1.5 to2 million new cases each year [9].The disease is caused by leishmania parasites, which are spread by the bite of infected sand flies. There are several different forms of leishmaniasis in the human; the most common is CL, which causes skin sores.
Endemic CL is still considered an important health problem in many parts of the world, especially the Mediterranean region and almost all countries of the Middle East, including Iran [10]. For most infectious diseases, increased formation of reactive oxygen species is secondary to the primary disease process. Some microorganisms are highly susceptible to exogenous reactive oxygen species such as hydrogen peroxide [11].Glutathione peroxidase (GSH-Px) removes H 2 O 2 and selenium is required for the activity of GSH-Px. Therefore, to assess the status of GSH-Px and selenium in CL, the concentration of selenium in the serum of 20 patients with CL and 44 healthy controls was determined (Table 1). Selenium concentrations were found to be lower in the patient group than in the controls (p < 0.0001) [11]. Selenium has an important role in the pathophysiological processes of CL and decreased selenium levels may be a host defense strategy against CL infection. The NMR spectral profiles resulting from this analysis, however, are complex and more easily interpreted with automated data reduction and chemometric analysis [4] such as PLS, ANN and MLR. PLS is a supervised classification method that groups the dataset into two clusters: healthy and patient samples. MLR is a common method for variable selection; it uses forward selection and backward deletion. ANN is a non-linear method for developing a robust model between the response and selected variants.

Sample collection
Blood was drawn from 20 leishmaniasis-positive patients attending the Pasteur Institute of Iran in Tehran. Samples were allowed to clot in non-anti-coagulant Vacutainer test tubes for two hours at room temperature (RT) and the serum was separated after centrifugation for 10 minutes at 2500 rpm and RT. Aliquots of the serum were stored at -80°C until assayed.

HNMR spectroscopy
Prior to NMR analysis, serum samples (600 L) were diluted with 60 L of 52% D 2 O (deuterium oxide, 99.9 % D, Aldrich Chemicals Company, Wisconsin, USA) and placed in 5 mm high-quality NMR tubes (Sigma Aldrich, RSA). Conventional 1 H NMR spectra of each serum sample were measured at 500 MHz on a Bruker DRX NMR instrument laboratory with Carr-Purcell-Meiboom-Gill (CPMG) method [10]. 1 H NMR spectra were recorded at 300K. For each spectrum, 200 scans were collected into 32K computer data points with 8389 Hz width for spectra and relaxation delay time 1.5 s. Metabolites were assigned on the basis of their chemical shifts and signal multiplicity.

Atomic absorption spectrophotometer
The level of selenium in each serum sample was determined by a Spectra AA 220 Plus Zeeman atomic absorption spectrophotometer with a graphite furnace GTA-110 (Varian, Australia), with deuterium background correction. Varian hollow cathode lamps were employed at 196 nm wavelength and 1.0 nm band pass. Pyrolytically coated graphite tubes with pyrolytic graphite platforms (Varian) were used. Selenium concentration was determined by an internal standard addition method as previously described [12]. The serum was diluted (1:4) with 0.05% Triton-X100 in 0.125% nitric acid. All determinations were run in duplicate and individual values were averaged. By means of an auto sampler, 10 L of the solution was dispensed on the atomizer platform, together with 10 L of 1 mg/mL palladiumchloride and 2 mL 2% (w/v) ascorbic-acid solution.

Data reduction of NMR data
All plasma 1 HNMR spectra were manually phased and baseline corrected using the Chenomx NMR suite (version 6.0).The 0.0-10.0 parts per million (ppm) spectral regions were reduced to 250 integral segments of an equal width of 0.04 ppm. The integrals were measured for each of the 250 regions ( Figure 1) and the spectral region containing the water resonance (4.6-5.0 ppm) was removed from each subdivided spectrum, thereby eliminating spectral variations due to differences in water suppression from one spectrum to the next [5,13,14]. This optimal width of segmented regions is based on previous studies, which found that regions of 0.04 ppm accommodated any small pH-related shifts in signals and variation in shimming quality [5]. The integral values of each spectral region were normalized to a constant sum of all integrals in a spectrum to reduce any significant concentration differences between samples [13,14].

Statistical analysis
Orthogonal signal correction (OSC): OSC, a preprocessing step to NMR spectroscopy analysis, removes orthogonal variations in the class of interest. Because of its powerful attributes, it has received increased emphasis in metabonomics studies. OSC can essentially filter any confounding factors that obscure interesting biological variations [15,16]. In the OSC procedure, the identity of the sample classes is included in the calculation and designated by a vector, Y. The first step in the OSC procedure calculates the first principal component or score vector, t, an optimal linear description of the spectral data; it describes maximum separation based on class.
The longest vector orthogonal to Y is calculated; and this vector, t*, describes the greatest source of variation that is not related to class yet still provides a good summary of the spectral data [17]. This is done as: Next, the loading vector, p*, relating to this corrected vector is calculated and the product of the orthogonal score and loading vectors is subtracted from the spectral data. Examination of the OSC scores, t* and loadings p* can be useful in determining the source of the removed variation. The residual matrix represents the filtered spectral data and is used for the calculation of PLS.

PLS:
The PLS model was calculated and graphed using MATLAB. PLS uses both an X-matrix and a Y-matrix; Y in this case is a vector describing the concentration of the denaturant. The X matrix is decomposed into a set of orthogonal components, but instead of describing the maximum variance in X (as in PCA), these components describe the maximum covariance between X and Y (PLS). The scores, T, relate X and Y to each other. For each component in PLS, Y is described as a linear combination of all the X variables. The weight for each component, w, describes how important a certain variable is for describing the response.

Multiple linear regression:
The general purpose of multiple linear regression is to quantify the relationship between several independent or predictor variables and a dependent variable. In this study, a set of coefficients defined the single linear combination of independent variables (molecular descriptors) that best described selenium levels. The selenium concentration for each serum sample was calculated as a composite of each molecular descriptor weighted by the respective coefficients. Initially stepwise MLR was applied to select the best descriptors among 250 variables. The selected variables were used to make a multi linear model. The resulting multi linear model can be represented as: where k is the number of independent variables and y is the dependent variable. Regression coefficients represent the independent contributions of each calculated molecular descriptor. The matrix notation of algebraic MLR model is defined as below: y= Xb^+ e b^= (X T X) -1 X T y where b^ is the estimator for the regression coefficients. The MLR model was built using a training set and validated by an external prediction set. Multiple linear regression (MLR) techniques based on least-squares procedures are often used for estimating the coefficients involved in the model equation [18]. Artificial neural networks model: The theory behind artificial neural networks has been described in detail elsewhere [19]. The current study designed a three-layer back-propagation network with a sigmoidal transfer function, written in MATLAB in our laboratory. The three descriptors appearing in our previous QSMR model were used as input parameters for generating the network. The signals from the output layer represented the selenium concentration. Such an ANN may be designed as a 5-3-1 net to indicate the number of nodes in input, hidden and output layers, respectively. Generally, the neural network methodology has several empirically determined parameters. These include: when to stop training (i.e. the number of iterations or the convergence criterion), the number of hidden nodes and the learning rate and momentum terms. The values of constructed ANN parameters were optimized using the procedure that was reported in our previous works [20][21][22]. The initial weights were chosen randomly and the program was written so that the randomized weights depended on the number of input, hidden and output nodes. To evaluate the performance of the ANN, standard error of calibration (SEC) and standard error of prediction (SEP) were used [23]. The number of neurons in the hidden layer, with a minimum value of SEC, was selected as the optimum number. Learning rate and momentum were similarly optimized. The validation set was used to examine the validity of the ANN model.

Result and Discussion
PCA was primarily used for distinguishing outliers [24]. Consequently, the number of samples decreased to 58 (16 patients and 42 healthy individuals). The PLS was employed as a supervised method to classify the patient and healthy groups based on the selenium concentration. Figure 1 shows the results of classification of the two-dimensional score plot with two latent variables. In addition, the PLS plots (Figure 2 and Figure 3) depict the importance of variables selected by the stepwise MLR method. As shown in Table 3, the loading plot confirmed that threonine, lipids, lactate and alanine were the most important variables. The importance of metabolites determined in this study, supported by the literature [25][26][27][28][29][30], confirmed our results. For the next step, modeling selenium levels based on the five best descriptors was the main goal. Table  2 shows the MLR-calculated values of selenium concentration for all serum samples. Table 4 presents the regression results for the selected MLR model. The R 2 for training, test and validation sets were found to be 0.98, 0.97 and 0.94 respectively. Figure 4 shows the correlation between the calculated MLR and the experimental values for selenium levels included in the test and validation sets. The correlation of R 2 = 0.95 (test and validation sets) indicated a reasonable agreement between these values.
Developing an ANN and comparing it with the MLR models    [1] lipids, [2] valine/ isoleucine, [3] lipids, [4] threonine, [5] lactate, [6] alanine, [7] arginine/lysine, [8] proline, [9] glutamine, [10] glucose/amino acids, [11] lactate.  provided the opportunity to investigate the nonlinear characteristics of the dependence of selenium concentration on metabolites. In order to have a meaningful comparison, the variables for the linear and nonlinear treatments should be the same. Therefore, the five descriptors appearing in the MLR model were considered as the inputs for generating the networks. After optimizing the parameters for constructing the artificial neural network, the structure of 5-3-1 was obtained. A test set of twenty serum samples was used to optimize the learning iteration size and avoid overtraining. To evaluate the network, the concentrations of selenium included in the validation set were predicted (Table 2). Figure 5 shows the plot of the ANNpredicted versus the experimental values for selenium concentration, illustrating that the validation sets showed better values than those obtained by the MLR method. Figure 6 shows the predicted values for the residuals of ANN for different selenium concentrations as well as the experimental values.   The propagation of the residuals on both the positive and negative sides of the x axis indicates that there was no systematic error in the development of the neural network. Table 4 lists in more detail data about the predictive power of the MLR and ANN models. The data suggests that linear and non-linear models were in good agreement with the experimental results. The ANN-predicted values also showed much lower standard errors when compared with those of the MLR model. The ANN model also revealed a higher value for the F-statistic. In order to prove the validity and stability of the models, cross-validation tests using the leave-one-out method were performed; Table 4 summarizes the results.

Conclusion
Results using both ANN and MLR clearly show that the two regions of selenium concentration below and above 60 g/L correspond to patient and healthy groups respectively. In finding the five most important descriptors, this study has demonstrated a significant relationship between selenium concentration and descriptors such as threonine, lipids, lactateand alanine. Moreover, PLS clearly classifies the data into two groups: patients and healthy. These five descriptors have a considerable role in modeling the selenium concentration of serum and dividing samples into two distinct groups. Mean effect of a descriptor is the product of its mean and regression coeffi cient in the MLR model Table 3: Specifi cations of the selected MLR model.