The Use of Partial Least Square (PLS) to Explore the Importance of Sperm Characteristics in the Prediction of Bull Fertility

Even though a practical in vitro semen assay for determining bull fertility would be of great benefit to reproductive programs [6], it is unlikely that the evaluation of a single sperm characteristic may reflect the real sperm fertilization capacity of a semen sample. Hence, finding a set of sperm characteristics considered important for predicting conception rate would be a useful method of monitoring and/or predicting the sire fertility [16].


Introduction
It has been established that individual bulls differ in their ability to fertilize oocytes and/or to develop to embryo stages following in vitro fertilization [1][2][3][4]. In addition, a marked variability in field fertility among bulls has been reported [5][6][7][8], the so called "bull effect" [9].
Attempting to predict bull fertility, many classical and modern laboratory tests have been developed for the assessment of semen quality [10][11][12][13][14]. However, the results of such in vitro assays do not always correlate with the real fertility of a semen sample [4,15].
Even though a practical in vitro semen assay for determining bull fertility would be of great benefit to reproductive programs [6], it is unlikely that the evaluation of a single sperm characteristic may reflect the real sperm fertilization capacity of a semen sample. Hence, finding a set of sperm characteristics considered important for predicting conception rate would be a useful method of monitoring and/or predicting the sire fertility [16].
According to Sudano et al. [4], the competence of a specific laboratory test in predicting the bull fertility can be directly related to the statistical analysis performed. In a previous work of Oliveira et al. [16], we used the Partial Least Square (PLS) analysis for the assessment of in vitro sperm characteristics and their importance in the prediction of conception rate in a bovine timed-AI program. Such statistical analysis was chosen because it allows to simultaneously explain not only the relationship between response (Y) and predictor (X) variables but also to explore the relationships between the predictor variables Esposito Vinzi and Russolillo [17]. Furthermore, PLS can be used with a relatively small set of data; a situation that is frequently observed in biological experiments. For instance, PLS methods can be used to analyze data from multiple modalities collected on the same observations Krishnan et al. [18]. Hence, with PLS regression analysis, it was possible investigating the relationships between laboratory sperm characteristics and field fertility even with few repetitions for each bull and reduced number of bulls Oliveira et al. [16]. PLS methods have been applied in a diverse range of fields such as chemistry Wold et al. [19], genomics [20], neurobiology [21] and marketing.
In this article, we aim to provide further information that may help researchers to use PLS regression analysis for selecting a set of laboratory assays that is better correlated to field fertility, explaining how the PLS was used to explore the importance of in vitro sperm characteristics in the prediction of bull fertility.

Assessment of field fertility
For this experiment, it was utilized data from a timed-AI program of suckled multiparous Nelore cows (n=97) located in a commercial beef farm in the state of Mato Grosso, Brazil. Cows presented body condition score (BCS) between 1.75 and 3.25 in a 1-5 scale (1=emaciated, 5=obese).
All cows receive the same timed-AI protocol beginning 30 to 40 d postpartum for first service. Cows were inseminated by two Frozen semen doses from three batches of three Angus bulls were used. The semen thawing and handling protocols were performed according to the routine procedures of the farm where the experiment was conducted.
Semen from each of the 3 bulls was equally distributed in the breeding groups and AI technician in order to guarantee a randomized experimental design and a balanced number of animals per field variable. All cows were examined for pregnancy by transrectal ultrasonography 40 d after AI.

Laboratory assessment of semen quality-laboratorial experiment
Frozen semen samples from each bull and batch (n=9) utilized in the field trial were brought to the laboratory. Each semen batch was thawed in the same thermostatically controlled thawing bath of the field experiment, in a temperature of 36°C, for 30 sec. After thawing, the following in vitro semen analyses were performed: For assessment of computer assisted sperm motility, 6 µL of the semen sample was placed in a standard count analysis chamber (Makler counting chamber, SEFI Medical Instruments LTD, Haifa, Israel). Six fields were randomly selected for each analysis. The following variables were analyzed by CASA: total motility (TM), progressive motility (PM), average path velocity (VAP), straight-line velocity (VSL), curvilinear velocity (VCL), amplitude of lateral head displacement (ALH), beat cross frequency (BCF), straightness (STR), linearity (LIN), and the percentage of rapidly moving cells (RAP).

Sperm thermal resistance test (TRT):
The thermal-resistance test (TRT) was carried out in order to verify post-thaw sperm longevity of semen samples. Immediately after the thawing bath removal, an aliquot of 250 µL of frozen-thawed semen was put into a warmed micro centrifuge tube which remained incubated at 37°C. After 120 min of incubation (TRT 2 h), the same procedure described on section 2.2.2 for CASA was applied and the following motility parameters were assessed for TRT_2h: TM_2h, PM_2h, VAP_2h, VSL_2h, VCL_2h, ALH_2h, BCF_2h, STR_2h, LIN_2h, RAP_2h.

Hiposmotic swelling test (HOST):
The Hiposmotic Swelling Test was performed by incubating 20 µl of semen with 1 mL of a 100 mOsm hypoosmotic solution at 37°C for 60 min. After incubation, 20 µl of the solution was coversliped and evaluated by contrast phase microscopy. Two hundred sperm were evaluated under magnification 1000 ×. Sperm with swollen or coiled tails were considered viable. The percentage of viable sperm in the Hiposmotic Swelling Test (HOST+cells) was calculated according to Revell and Mrode [10]. The cells were simultaneously excited by an argon laser at 488 nm and by a Near UV laser at 405 nm as described by de Andrade et al. [22].
Samples for staining and flow cytometry analysis were diluted in a modified Tyrode's medium (TALPm; pH 7.4) with 114 mM NaCl, 3.2 mM KCl, 0.5 mM MgCl 2 .6H 2 O, 0.4 mM NaH 2 PO 4 .H 2 O, 5 mM glucose, 10 mM sodium lactate, 0.1 mM sodium pyruvate and 10 000 UI⁄100 mL sodium penicillin. After the addition of the dyes for each analysis, the semen samples incubated in the TALPm were analyzed in the flow cytometer (BD FACSDiva 6.0 software; Becton-Dickinson). The samples were processed through the instrument at an acquisition rate of approximately 600 to 1 000 events⁄s, acquiring 10,000 cells per analysis [23].

Simultaneous assessment of plasma and acrosomal membranes:
An aliquot was taken from the semen samples and added to the TALPm. The resulting samples had a concentration of 5×10 6 spermatozoa/mL in a volume of 148 µL. Then, 2 µL of Hoechst 33342 (H342; 40 µg/mL) was added. After 10 min of incubation at 37°C, 3 µL of propidium iodide (PI, 0.5 mg/mL; 28.707-5; Sigma-Aldrich, St Louis, MO, USA) and 10 µL of Pisum sativum agglutinin conjugated to fluorescein isothiocyanate (FITC-PSA, 100 µg/mL; L-0770; Sigma-Aldrich) were added to the samples [14]. After 10 min of incubation at 37°C, the samples were diluted with the addition of 150 µL of the TALPm to a concentration of 2.5×10 6 spermatozoa/mL to be analyzed by a flow cytometry [22]. Two-dimensional dot-plots of FITC-PSA (Filter E) vs. PI fluorescence (Filter C) from a total of 10,000 events were generated. Each quadrant represented one of the following sperm subpopulations: 1) IPIA: sperm with intact plasma and acrosomal membranes; 2) IPDA: sperm with an intact plasma membrane and a damaged acrosomal membrane; 3) DPIA: sperm with a damaged plasma membrane and an intact acrosomal membrane; and 4) DPDA: sperm with damaged plasma and acrosomal membranes [23].The percentage of sperm cells presenting intact plasma menbranes (IPM: IPIA+IPDA) and the percentage of sperm cells presenting intact acrosome (IA: IPIA+DPIA), as well as the IPIA subpopulation were considered for the regression analysis.
Assessment of sperm plasma membrane stability: Samples were incubated in the TALPm solution at a concentration of 5x10 6 spermatozoa/mL in 147 µL with 2 µL of H342 (40 µg/mL). Subsequently, 0.5 µL of fluorescent probe Yo-Pro-1 (Y3603, Molecular Probes Inc., Eugene, OR, USA) was added to the sample (7.5 µM), resulting in a final concentration of 25 nM. Subsequently, the sample was incubated for 20 min and then merocyanine 540 (M540) fluorescent probe (M 24571, Molecular Probes Inc., Eugene, OR, USA) was added (810 µM) to obtain a concentration of 2.7 µM in 150 µL; and was incubated for 70 seconds. Samples were then diluted in 150 µL of TALPm and analysed by flow cytometry [24].
The M540 staining showed two distinct populations of viable cells: one that was characterised by cells with low fluorescence emission (LBD) and the other that was characterised by high fluorescence (HBD) captured in the long pass-595 and band-pass-610/20 nm. The Positive Yo-Pro cells (cells with a damaged plasma membrane; YoPro+cells), LBD and HBD sperm populations were considered for relationship analysis. samples and TALPm was added to obtain samples with a concentration of 5×10 6 spermatozoa/mL and a final volume of 499.5 µL. Then, the C11-BODIPY 581/591 (1mg/mL, D-3861, Molecular Probes Inc., Eugene, OR, USA) was added. The sample was incubated for 30 min at 37°C. After this incubation period, 145 µL of this solution was transferred to another micro tube and 2 µL of H342 (40µg/mL) was added. This sample was incubated for 10 min at 37°C. After the incubation period of H342 probe, 3 µL of PI (0.5 mg/mL) was added in the sample. Then, the sample was incubated with PI for 5 min at 37°C. Subsequently, the sample was diluted with the addition of 150 µL of TALPm to a concentration of 2.5×10 6 spermatozoa/mL and was analyzed by flow cytometry.

Assessment of lipid peroxidation: An aliquot was taken from the
Two-dimensional dot-plots of C11-BODIPY 581/591 (Filter E) vs. PI fluorescence (Filter C) from a total of 10,000 events were generated. Assessment of sperm morphology: An aliquot of frozen-thawed semen was diluted and fixed in pre-warmed (37°C) formaldehyde-PBS. Sperm cells (n=200) were counted under differential interferencecontrast microscopy (model 80i; Nikon, Tokyo, Japan) at a magnification of 1000×. Morphological characteristics of sperm were classified as major defects (Maj_Def), minor defects (Min_Def) and total defects (Tot_Def) according to Blom [25].

Assessment of sperm morphometry and chromatin structure:
For each sample, two smears were prepared for subsequent assessment of sperm chromatin structure and morphometry. The sperm smears were fixed with ethanol acetic acid (3:1, V/V) for 1 min and 70% ethanol for 3 min. Then, the smears were hydrolyzed for 25 min in 4 M HCl, washed in distilled water and air-dried. One droplet of 0.025% toluidine blue in McIlvaine buffer (sodium citrate-phosphate), pH 4.0, was placed over each smear and then coverslipped.
Fifty gray-level digital images of each slide were obtained randomly using a Leica DM500 microscope (Leica Microsystems Inc., Buffalo Grove, IL, USA) with a 100X objective lens (immersion) coupled to a Leica ICC50 camera (Leica Microsystems Inc.) that was connected to a PC microcomputer. Using threshold-based image segmentation [26], at least 100 sperm heads were isolated for each smear. The sperm heads were analyzed to obtain the average pixel value that made up each head. Six heads with the smallest pixel values were selected automatically and defined as standard heads. Subsequently, for each image, the difference between the standard value of the smear and the average value of each head analyzed was determined. This difference was transformed into a percentage (Dif %) of the average pixel value for the standard heads, which indicates the sperm chromatin decondensation (DIF). The coefficient of variation (CV) of the gray level intensity for each head, which indicates the sperm chromatin heterogeneity, was also calculated [13,14].
Area, Perimeter, Width, Length, Width:length ratio (WLR), Ellipticity, and Shape Factor (SF) of all sperm heads were determined using other algorithm developed in Scilab environment [13,26]. Fourier descriptors containing harmonic amplitudes from 0 to 2 (Fourier zero: Fourier_Z, Fourier one: Fourier_1 and Fourier two: Fourier_2) were also considered. Another estimated feature was the sperm head symmetry. The Side Symmetry (SS) is a measurement that identifies asymmetries along the principal (major) sperm axis. The Anterior-posterior Symmetry (SAP) is a measurement that identifies asymmetries along the horizontal (minor) sperm axis. All of these symmetries were calculated using the procedure described by Beletti and Costa [26], Kanayama and Beletti [27] which involves flipping the object along its major (or minor) axis, and then identifying the area of overlap between the original and flipped areas [13].

Assessment of sperm concentration:
An aliquot of 20 µL of frozenthawed semen was diluted in 980 µL of formaldehyde-PBS. The sperm cells were counted using Neubauer chamber, under optic microscopy, at a magnification of 400× and sperm concentration was calculated for each semen sample.

Statistical analysis
In this study, the method of Partial Least Squares (PLS) regression was the statistical procedure used to explore the importance of the in vitro sperm variables in the prediction of bull fertility. The PLS regression, which was developed from an initial propose of Wold [28], is a technique of multivariate data analysis used to relate one (or more) response variable (Y) (conception rate, in this case) with several predictor variables (X) (in vitro sperm characteristics) based on the extraction of factors ( Figure 1). These factors, also called components or latent variables, simultaneously decompose the two set of variables maximizing the explanation of the variability of either Y or X, or both.
The PLS analysis was performed using proc PLS implemented in SAS v.9.3 (SAS Inst. Inc., Cary, NC, USA). The results obtained in the laboratory tests (laboratory experiment; 47 variables) were confronted with the result of field experiment (conception rate; one response variable). A matrix of scatter-plots was used to explore the patterns among variable and detect non-linear relationships.
In each trial, a regression coefficient and a variable importance plot was obtained for each predictor (sperm variables). The regression coefficients represent the importance that each predictor (variable) has in the prediction of the response). The variable importance plot, on the other hand, represents the contribution of each predictor in fitting the PLS model for both predictors and response. It is based on the Variable Importance for Projection (VIP) statistic of Wold [28] which summarizes the contribution of a variable to the model. If a predictor has a relatively small coefficient (in terms of absolute value) and a small VIP value, then it is a prime candidate for deletion. Wold's reports in Umetrics SAS [29] consider a value less than 0.8 to be "small" for the VIP.
In the present study, in order to select the laboratory sperm variables that would be important predictors of field fertility, we propose that the PLS regression was performed consecutively, until no remaining variables presented VIP lower than 0.8. Hence, the following procedure was performed: after running the PLS analysis for the first time, the predictors with VIP values lower than 0.8 were excluded. Then, the PLS was performed for the second time without the deleted variables. Again, the predictors with VIP values lower than 0.8 were excluded. Then, the PLS was performed for the third time without the deleted variables. When no other sperm variables presented VIP<0.8, the remaining variables were selected as the group of good predictors for conception rate. In each step, residual analysis, correlation loading plots, as well as X and Y scores plot for the first four factors were used to detect outliers, non-linear patterns or grouped observation.

Results
The PLS factors were assessed for each in vitro sperm characteristic and the variables were selected according to Wold's criterion (SAS) [29].
After the first PLS procedure, the following variables presented VIP<0.8 and were excluded: VAP, VSL, ALH, STR, LIN, ALH_2h, STR_2h, LIN_2h, HBD, minor defects, Area, Perimeter, Width, Fourier_1, Side Simmetry and DIF. Then, a second run of PLS procedure was performed with the remaining variables.
After the second PLS procedure, the following variables presented VIP<0.8 and were excluded: VCL, VSL_2h, VCL_2h, AI, LBD, Length and Shape Factor. In addition, ellipticity was excluded because it represents the same sperm parameter than WLR and YoPro was excluded because it represents the inverse of the IPM parameter. Then, a third run of PLS procedure was performed with the remaining variables.
After the third PLS procedure, no other sperm variable presented VIP<0.8. Hence, the following variables were considered important for prediction of conception rate: TM, PM, BCF, RAP, TM_2h, PM_2h, VAP_2h, BCF_2h, RAP_2h, HOST, IPIA, IPM, IPNP, IPP, Maj_Def, Tot_Def, WLR, Fourier_Z, Fourier_2, AP symmetry, CV and sperm concentration. Table 1 demonstrates the descriptive data of conception rate and semen characteristics outcomes from sperm variables selected after the third PLS procedure.
No important curved patterns were detected in our analyses as to suggest the addition of transformed variables [19,29]. Table 2 demonstrates the Variable Importance for Projection (VIP) values, the parameters estimates for centered and scaled data, as well as the parameters for data in the original scale obtained after the last run of the Partial Least Square (PLS) procedure, performed for sperm variables in the prediction of conception rate.
Additionally, in figure 3, it is possible to observe the predictor loading profiles for the first factor extracted from the final model. Interpreting the contribution of each variable in the factors that are extracted by PLS can be also interesting because it helps to study the structure of relationships among the predictors and to relate it with the prediciton of the response.
As stated above, PLS analysis extract a number of factors (components) and in the present study, it was decided to extract four factors from our analysis. Each factor is a linear combination of all the original variables. The fisrt factor is constructed with the loadings presented in figure 3. The loadings demonstrates how the first PLS factor was constructed; i.e., it demonstrates essentially which variable, and in which sense (positive or negative), are participating in that factor ( Figure 3).

Discussion and Conclusion
In the present study, sperm characteristics were assessed by classical and modern laboratory tests. Then, in order to identify a group of important variables in the prediction of conception rate, PLS analysis was performed.   An ordinary multiple regression approach cannot be applied in this context due to the large number of variables compared to the number of observations, and no independency among the predictor variables (Table 3). Conversely, a principal component regression approach can be fitted, but it only maximizes the explanation of the predictor variables and does not guaranteed a relevant explanation for the response by contrast, PLS regression can find a set of linear combination of predictors (components) that best predict the response even with a relatively small sample size. Additionally, the components can be used to interpret the structure of the predictors and the relationship with the responses. Hence, with PLS statistical analysis, it was possible investigating the relationships between laboratory sperm characteristics and field fertility even with few repetitions for each bull and reduced number of bulls as described by Oliveira et al. [16].
As stated above, the PLS procedures selected the important in vitro variables in the prediction of field fertility. Our statistical analysis cutoff was based in Wold's criterion SAS [29][30][31][32] which considers a value less than 0.8 to be "small" for VIP in the PLS procedure. Therefore, the variables presenting values lower than 0.8 for VIP were excluded after the sequences of PLS procedures, until no remaining variable presented a VIP<0.8.
Consequently, according to our results, twenty two (out of forty seven) in vitro sperm variables were considered to be important predictors of conception rate: total motility (TM), progressive motility (PM), Beat Cross Frequency (BCF), Rapidly moving cells (RAP), TM after 2 h of thermal incubation (TM_2 h), PM after 2 h of thermal incubation (PM_2 h), average path velocity after 2 h of thermal incubation (VAP_2h), BCF after 2 h of thermal incubation (BCF_2 h), RAP after 2 h of thermal incubation (RAP_2 h), intact plasma membrane evaluated by HOST, intact plasma and acrosomal membranes evaluated by flow cytometry (IPIA), total of intact plasma membrane evaluated by flow cytometry (IPM), intact plasma membrane suffering lipid peroxidation (IPP), intact plasma membrane with no lipid peroxidation (IPNP), major defects, total defects, morphometric width/length ratio (WLR), mathematic parameters Fourier 0 and Fourier 2, anterior-posterior symmetry (SAP), chromatin heterogeneity (CV) and sperm concentration.    Table 3 demonstrates the R squared obtained in the last run of Partial Least Square (PLS) procedure performed for sperm variables in the prediction of conception rate. Note that 93% of the conception rate is already explained by two factors, but only 51% of the predictor variation is explained. A model using only the first two factors can be useful for prediction purposes; however, we decided to include 4 factors in order to increase the explanation of the predictor. Additionally, with the PLS regression procedure of the present experiment, it was possible to demonstrate, within the selected sperm characteristics, which variables presented higher importance in the prediction of conception rate. As demonstrated in figure 2, the most important predictors are the variables that presented higher absolute parameters from the centered and scaled values, and high VIP values. This information can be confronted with the contribution of each predictor in the construction of the factors (loadings).
Another interesting aspect of PLS regression is the possibility of checking the relationships among the predictors and between predictors and response by interpreting the factor loadings. For instance, in figure 3, it is possible to observe that TM, PM, RAP, TM_2 h, PM_2 h, VAP_2h, BCF_2 h, RAP_2 h, HOST, IPIA, IPM, IPNP Fourier 0, SAP and sperm concentration have positive values in the prediction of conception rate (CR) while BCF, IPP, major defects, total defects, WLR, Fourier 2 and chromatin heterogeneity have negative values.
The observation of factor 1 from the final model ( Figure 3) enriches the interpretation of the results, since some sperm variables may be positively (positive values) or negatively (negative values) important in the prediction of conception rate. In this sense, it is interesting to note that some negative variables of figure 3 actually seem to be negatively correlated to conception rate according to scientific literature; and most of the positive variables actually seem to be the sperm characteristics positively correlated with semen fertility. Farrell et al. [33] demonstrated that multiple combinations of CASA variables had high correlations with bull fertility. For instance, the authors observed that the combination of Progressive Motility, ALH, BCF, and VAP presented high correlation value Farrel et al. [33] In our study, Plasma membrane functionality (HOST+), plasma membrane integrity (IPM) and also sperm cells presenting intact acrosomal and plasma membranes (IPIA cells) presented positive loadings in the prediction of conception rate. Similarly, Januskauskas et al. [34] detected significant correlations between field fertility and plasma membrane integrity. Furthermore, Tartaglione and Ritta [35] demonstrated that HOST presented high correlation coefficient with in vitro fertility. When the sperm plasma and acrosomal membrane integrity results were included in the regression model, a higher correlation coefficient was obtained [33,34] reported that bull fertility was positively correlated to plasma membrane integrity and also to total progressive motility. The authors also demonstrated that sperm lipid peroxidation and bull fertility presented high negative correlation [34] moreover negative correlations between lipid peroxidation and IVF outcomes have already been reported in humans [35]. In the present study, sperm cells presenting lipid peroxidation (IPP) presented negative loadings in the prediction of conception rate. Sperm Major and Total Defects and Width/Length Ratio (WRL, which also represents sperm ellipticity) also presented negative loadings in the prediction of conception rate. In this sense, it has been reported that low-fertility bulls generally had high seminal content of morphologically abnormal cells Saacke [36]. Chromatin heterogeneity (CV) seems also to be an important predictor, and presented negative loadings in the prediction of conception rate. Acevedo et al. [37] reported that vulnerability of sperm DNA to acid denaturation was positively associated with potential incompetence for fertilization. Kasimanickam et al. [38] reported that sires with high sperm DNA fragmentation index presented lower sperm fertilization potential; whereas sires with lower DNA fragmentation index presented higher chance of siring calves Kasimanickam et al. [38]. However, substantial variations are commonly observed among experiments and low correlations are usually detected when these sperm characteristics are separately compared with the field fertility [4,15].

Profiles of Centered and Scaled Parameter Estimates
Coefflcients  Finally, it is noteworthy that the group of components that is important to predict conception rate is a combination of positives and negatives predictor effects. This information seems to be in accordance with several researchers demonstrating that semen fertility can be better estimated from semen quality when a combination of several in vitro sperm analysis is performed [4,31,33]. Another way to use the provided information of PLS regression is to use the regression coefficients (and the intercept value) to build the equation for the prediction of the response variable. In our case, using the results of the present experiment, it would be possible to present a formula to calculate the prediction of conception rate for semen samples. However, for such important finding, it would be more adequate to use a large number of bulls and cows in order to present a more accurate formula.
It was concluded that PLS regression is an interesting statistical method to identify a group of important predictor variables in the prediction of a response, for in vivo and/or in vitro bull fertility studies. More observations should also be included in the analysis to provide more information about the adequacy of the model using cross validation tests. Further multivariate statistical methods, i.e., Multiple factor analysis [39,40], Generalized Procrustes Analysis [41,42], can enrich the analysis of sperm characteristic, especially if the objective is to compare experimental conditions such as batch effects, AI technician, reproductive protocols, or even experimental location.