A New Robust Method for Nonlinear Regression

Background: When outliers are present, the least squares method of nonlinear regression performs poorly. The main purpose of this paper is to provide a robust alternative technique to the Ordinary Least Squares nonlinear regression method. This new robust nonlinear regression method can provide accurate parameter estimates when outliers and/or influential observations are present. Method: Real and simulated data for drug concentration and tumor size-metastasis are used to assess the performance of this new estimator. Monte Carlo simulations are performed to evaluate the robustness of our new method in comparison with the Ordinary Least Squares method. Results: In simulated data with outliers, this new estimator of regression parameters seems to outperform the Ordinary Least Squares with respect to bias, mean squared errors, and mean estimated parameters. Two algorithms have been proposed. Additionally and for the sake of computational ease and illustration, a Mathematica program has been provided in the Appendix. Conclusion: The accuracy of our robust technique is superior to that of the Ordinary Least Squares. The robustness and simplicity of computations make this new technique more appropriate and useful tool for the analysis of nonlinear regressions. Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s


Background
Nonlinear regression is one of the most popular and widely used models in analyzing the effect of explanatory variables on a response variable and it has many applications in biomedical research. With the presence of outliers or influential observations in the data, the ordinary least squares method can result in misleading values for the parameters of the nonlinear regression and the hypothesis testing, and predictions may no longer be reliable. The main purpose of robust nonlinear regression is to fit a model to the data that gives resilient results in the presence of influential observations, leverage points and/or outliers. Rousseeuw and Leroy [1] defined vertical outliers as those data points with outlying values in the direction of the response variable, while leverage points are outliers in the direction of covariates. An observation may be influential if its removal would significantly alter the parameter estimates. Edgeworth [2] proposed the Least Absolute Deviation as a robust method. Huber [3] introduced the method of M-estimation. Rousseeuw [4] introduced the Least Trimmed Squaresestimates. The S-estimator was introduced by Rousseeuw and Yohai [5]. Yohai and Zammar [6] introduced the τ-estimator of linear regression coefficients. It is a high efficiency estimator and has a high breakdown point. Tabatabai and Argyros [7] extended the τ-estimates to the nonlinear regression models. Stromberg [8] introduced algorithms for Yohai's MM estimator of nonlinear regression and Rousseeuw's least median estimators of nonlinear regression. Tabatabai et al. [9] introduced the TELBS robust linear regression method.
In Medical, biological and pharmaceutical research and development nonlinear regression analysis has been a major tool for investigating the effect of multiple explanatory variables on a response variable when the data follows a nonlinear pattern. When outliers and influential observations are present, nonlinear least squares performs poorly. In this paper we introduce a new robust nonlinear regression method capable of handling such cases. Minn et al. [10] showed that lung tumor size can lead to metastasis. Also, aggressive tumor growth is a marker for cells destined to metastasize. They validated their statement by analyzing the lung metastasis gene-expression signature using a nonlinear model. Arisio et al. [11] study of breast cancer confirmed that the size of tumor is an important predictor of axillary lymph node metastases. Ramaswamy, et al. [12]) found that geneexpression signature is a significant factor associated with metastasis in solid tumors carrying such gene expressions. Maffuz et al. [13] showed that pure ductal carcinoma in situ is not associated with lymphatic metastasis independently of tumor size. Hense et al. [14] found that the occurrence and primary metastases in Ewing tumors is related to tumor size, pelvic site and malignant peripheral neuroectodermal tumors. Umbreit et al. [15] studied a group of patients who had undergone surgical resection for a unilateral, sporadic renal tumor. They concluded that tumor size is significantly associated with metastasis in patients suffering from renal masses. Wu et al. [16] retrospectively analyzed 666 patients with nasopharyngeal carcinoma and concluded that the tumor volume was correlated with cervical lymph node metastasis as well as distant metastasis after radiation therapy. In computer vision, robust regression methods have been used extensively to estimate surface model parameters in small image regions and imaging geometry of multiple cameras. Coras et al. [17] used nonlinear regression and showed that micromolar doses of peroxisome prolefector-activated receptor γ reduce glioma cell proliferation.
Roth [18] applied nonlinear sigmoidal curves to monitor the accumulation of polymerase chain reaction products at the end of each cycle by fluorescence. In human blood samples, Kropf et al. [19] found a nonlinear binding association between transforming growth factor beta1 (TGF -β 1 ) and α 2 -Macroglobulin as well as TGF-β 1 and latencyassociated peptide (LAP). Yang and Richmond [20] used nonlinear least squares to estimate the effective concentration of unlabeled human interferon-inducible protein 10 that yields 50% maximal binding of iodinated protein 10 to chemokine receptor CXCR3. Hao et al. [21] examined the significance of Nav1.5 protein in cellular processes by applying a nonlinear regression which relates the gene expression of Nav 1.5 protein and TGF-β 1 as well as Nav 1.5 protein and vimentin. TGF-β families are important factors in regulation of tumor initiation, progression, and metastatic activities, Bierie, et al. [22]. Coras et al. [17] applied nonlinear regression models to show that traglitazone concentration has a tendency to inhibit 1 TGF-β 1 release in glioma cell culture.
This paper introduces a new robust nonlinear regression estimator. This new method for robust nonlinear regression has a bounded influence and high breakdown point and asymptotic efficiency under normal distribution and is able to estimate the parameters of nonlinear regression in such a way that is close to the parameter estimates we would have estimated with the absence of outliers in the data. In addition, this new robust nonlinear regression method is computationally simple enough to be used by practitioners.

Methods and Models
We begin with the introduction of our new robust nonlinear regression model. The introduction of the model is followed by two algorithms describing its implementation. We then apply this new model to a real data set with an outlier present. In addition, we will analyze a problem involving tumor size and metastases with and without outliers. Monte Carlo simulations are also performed to evaluate the robustness of our method, in comparison with the ordinary least squares method.

Robust nonlinear regression model
Consider the general nonlinear model of the form  (1 )( ( ; )) , where σ is the error standard deviation and h ii 's are the diagonal elements of the matrix H of the form For j =1,2,...,k , we define If σ is unknown, one may use one of the following two estimators of σ which were proposed by Rousseeuw and Croux [23]. The above estimators of σ have high breakdown points. Under the normality assumption for error terms, the estimators given in (3) and (4) have higher efficiency than median absolute deviation (MAD). In this paper all of our computations are performed using formula (3).
The function ρ ω : R→R is a differentiable function satisfying the following properties: Taking the partial derivatives of (1) with respect to parameters and setting them equal to zero results in the following system of equations where ψ ω is the derivative of ρ ω which is equal to Define the weights w i as Then for j =1, 2,..., p, the equation (5) can be written as The matrix of weights, W is a diagonal matrix whose elements on the main diagonals are w 1 ,w 2 ,...,w n , and the estimator of the parameter vector θ is given by is linear function of parameters, then the above model would be identical to TELBS robust linear regression model. Asymptotically, θ has a normal distribution with mean θ and variancecovariance matrix of the form Under the assumption of normality for the underlying distribution, the asymptotic efficiency, Aeff, is defined as The tuning constant ω can be calculated by solving equation (7) for ω.
An estimate for the variance-covariance matrix is derived and given as follows The robust deviance is defined as The deviance plays a major role in model fitting. A smaller value of deviance is preferred over larger values. Following Akaike Criterion [24] and Ronchetti [25], the robust equivalence of AIC is denoted by AICR, and is given by and the Robust Schwarz Information Criterion BICR is given by For more details, see Rosseeuw and Leroy [1].
There are numerous variable selection techniques available in the literature. One may use the stepwise procedure that may involve in forward selection or backward elimination. For each set S ⊆{x 1 , x 2 ..., x p } of explanatory variables, the robust final predicted error of Maronna et al. [26] is denoted by RFPE(S) and is defined as where #(S) is the number of elements in the set S. In the forward selection or backward elimination, choose the one whose inclusion or deletion results in the smallest value of RFPE. To perform hypothesis testing, we let Ω⊆ R p be the parameter space and For more information, the reader is referred to Hampel et al. [27]. Asymptotically under the null hypothesis Any of the following two robust algorithms can be used to estimate the parameter vector θ and standard deviation σ of a nonlinear Table 1 shows their actual and predicted concentrations as well as our results for fitted Hyperbolastic model of type III (H3). For this example, the new robust technique is an effective regression tool in estimating model parameters in the presence of outliers. Figure 1 shows the fitted curve using hyperbolastic model of type III (H3). Figure 2 uses formula (8) and the least squares fitted curve for the concentration data.

Tumor metastasis
The data in Table 2 consist of 12 observations. The response variable is the fraction of breast cancer patients with metastases and the predictor variable is the tumor size. Table 2 is from Michaleson et al. [29]. This data was originally collected by Tabar et al. [30][31][32] and Tubiana et al. [33,34]. To assess the robustness of our new method with regard to a special class of nonlinear growth models, we utilize this tumor metastasis data that is free of outliers. We first fit a model to the data using the robust method as well as least squares when there is no outlier present. Then we plant outliers in X direction, Y direction and both X and Y direction. In the X direction we change the X value in observation 12 from 90 to 2. In the Y direction, we change the y value in observation 6 from .55 to 3 and in both X and Y direction we change observation 12 in X direction from 90 to 2 and observation 7 in Y from .56 to 3.
For illustrative purposes, we have fitted hyperbolastic of type II, Gompertz and logistic models. In the past, these models have been used to monitor cancer progression and regression. Each model has three parameters θ 1 , θ 2 , and θ 3 with θ 1 and θ 2 being positive and ε i are random errors. The response y i is the fraction metastasized and x i is the tumor size for individual i. The left graphs in Figures 3-5 are fitted curves using our proposed robust nonlinear regression technique and the graphs on the right sides of Figures 3-5 have been drawn using the nonlinear least square regression technique by planting outliers in X direction, Y direction and both X and Y direction. As you can see, when there is no outlier in the data all models perform well regardless of using the robust method or Least Squares. But when we plant outliers in the X, Y, and/or XY directions the fits become unacceptable for Least Squares whereas the robust method performs well for all models.
The hyperbolastic model of type II or simply H2 has the form The Gompertz model is of the form: The logistic model is of the form ii. Use ( 1) . If convergence occurs, stop. Otherwise go to step 2 and continue the process.

Drug concentration data
Kenakin [28] used a set of responses to the concentration of an agonist in a functional assay. They fit the following model to their data. In this data, observation 5 has an outlier in the response direction,        where the parameter α denotes the maximum reaction velocity and β is the substrate concentration at which the initial velocity V 0 is 50% of the maximum reaction velocity. The larger the parameter β, the lower is the efficiency between the substrate and enzyme. This model has also been used in many biological systems such as gene regulatory system. In order to investigate the robustness of our new method relative to the method of least squares, we considered the nonlinear Michaelis-Menten equation of the form Where the response variable is y i and x i is fixed. In our simulations we set x i =i and ε i as the standard normal distribution with mean 0 and standard deviation 1. We performed 1000 repetitions using two sample sizes n=20 and n=50. The outliers were randomly chosen in the direction of X, Y and both X and Y. We used contamination levels of 0%, 10%, 20%, 30%, and 40%. In this simulation the parameter values are θ 1 =5 and θ 2 =1. The software Mathematica is used in the simulation process. To evaluate the robustness of these estimators, we randomly choose 10%, 20%, 30% and 40% of the simulated observations and contaminate the selected data by magnifying their size by a factor of 100 in the direction of explanatory variable X, response variable Y, and both response Y and explanatory X variables. Finally, we estimate both bias and mean squared errors using the following equations    Tables 3-5 give the summary of our simulation outcome for both small and large sample sizes. The asymptotic efficiency for our simulation studies has been set to 95% level. By examining the simulation tables, we find out that in the absence of contamination in the simulated data, both the least square and the proposed robust method perform well with respect to bias, mean square error and mean estimated parameter n=20 0% 10% 20% 30% 40%   values. However, when contamination enters into our simulated data in the direction of explanatory variable X, or response variable Y, or both X and Y, then the new method outperforms the least squares method for both small and large samples. We also observe that the estimated values of parameters θ 1 and θ 2 are in close proximity of the true values of the parameters θ 1 and θ 2 . The simulation results clearly indicate the robustness of our new nonlinear regression technique relative to least squares method when outliers or influential observations are present.

Conclusion
In this paper we introduced a new robust estimator of nonlinear regression parameters. In addition, robust testing for hypothesis about model parameters was introduced. Moreover, two algorithms were developed to perform the robust nonlinear estimation of model parameters. The computer simulation revealed the robustness of our new estimator. This robust method provides a powerful alternative to least squares method. The robust method presented in this paper has influence functions bounded in both the response and the explanatory variable direction. It has high asymptotic breakdown point and efficiency. A Mathematica program is also provided to ease in computations. This program does the necessary calculations to perform the robust nonlinear regression analysis of the drug concentration example given in this paper.