Weighted Nonlinear Least Squares Technique for Parameters Estimation of the NHPP Gompertz Model
Received Date: Mar 31, 2018 / Accepted Date: Apr 15, 2018 / Published Date: Apr 25, 2018
With the problem of heteroscedasticity an alternative precise estimation method of the nonlinear least squares (NLS) technique is needed. Weighted nonlinear least squares estimation (WNLSE) technique is an alternative that may increase the accuracy of parameters estimation by assigning suitable weights to the time between failures data. In the present study, the traditional maximum likelihood (ML), nonlinear least squares (NLS), and weighted nonlinear least squares (WNLS) techniques are formulated to estimate the three parameters of the NHPP Gompertz model. Empirical weighting method is investigated in NHPP Gompertz model prediction process. Three real software failure data examples are provided to analyze the performance of the three considered methods of estimation. The results of this numerical study indicate the preferences to the WNLSE method with respect to the NHPP Gompertz model’s performance, also the value of the weighting factors which give the optimum solution differ according to the nature of software failure data.
Keywords: NHPP Gompertz model; Heteroscedasticity; Maximum likelihood estimation; Nonlinear least square estimation; Weighted nonlinear least squares estimation
Modelling the software failure phenomena and estimating the model’s parameters are essential matter in the software engineering field. One of the most popular approach to describe the software development process is nonhomogeneous Poisson process (NHPP) software reliability growth models (SRGMs), some of the earlier models belong to NHPP are [1-3], and many others have been proposed afterthought. Two commonly used methods for parameters estimation in NHPP SRGMs are the maximum likelihood estimation (MLE) and the least squares estimation (LSE) [4-6]. The MLE method is very intensively used in the literature because of its statistical properties, using this approach the likelihood function is maximized to determine the parameters that are most likely to produce the observed data. It is usually recommended for large samples, applicable to most models and different types of data, and produces accurate estimates with small estimated variance . The LSE technique has attracted the attention of researchers because of its computational simplicity, least squares estimates are obtained by fitting a parameterized function to the data points via minimizing the sum of the deviations squared, it is simple to be calculated and programmed, but it does not use the information in the entire data set which affects its accuracy, with LSE method the importance of each data sample point is considered the same by assuming that the variance of the error term is constant. More realistic assumptions that suit the reliability data is assigning proper weights to data points through using weighted least squares analysis to try to improve the accuracy of the parameter estimation. Nonlinear least squares estimation (NLSE) and Weighted nonlinear least squares (WNLSE) techniques arise in the cases when the parameterized function is not linear in the parameters [8-11]. NHPP Gompertz model is one of the simplest S-Shaped software reliability models which consider the number of faults per unit of time as independent Poisson random variables, Sakata was the first who employed the Gompertz curve model , later the ability of its curve to give a good prediction of the detected cumulative number of faults encourages the researchers to develop NHPP formulation of this model which later has been widely utilized in software engineering field. In the literature, the maximum likelihood (ML), non-linear least squares (NLS), and ordinary least squares (OLS) estimation methods have been considered to estimate the parameters of NHPP Gompertz model [13-15]. The strong non-linearity of the NHPP Gompertz model may cause inaccurate prediction results when using the MLE method, LSE method require for the variance to be constant across the time range which is not the case with this model. This paper focuses on the weighted nonlinear least squares estimation (WNLSE) method as an alternative approach to overcome the problem of having unequal variance (heteroscedastic) of NHPP Gompertz model, this approach can often be used to increase the effectiveness of parameter estimation. This is accomplished by trying to give each data point its appropriate amount of influence over the parameter estimates based on weighting factors. The rest of this paper is organized as follows: Section 2 provides a brief historical review and some characteristics formulas of NHPP Gompertz model. Section 3 introduces the MLE, NLSE and WNLSE methods for NHPP Gompertz model in case of time data. To show the performance of the WNLSE method three of real data examples are discussed in Section 4. Finally, the last section summarizes the conclusions of the study.
NHPP Gompertz Model
Goel and Okumoto  were the first who proposed the NHPP Gompertz model which is later stated to be used by many computer scientists because of the good approximation of this S-Shaped growth model to a cumulative number of software faults observed in testing phase. S-shaped models are used to analyze software reliability where the mean value function versus time plot usually is S-shaped. This model can be expressed either by its mean value function which represents the total cumulated number of errors observed within time is (0, ti] or by its intensity function which is an appropriate way of specifying how the present failure depends on the past failures, their formulas are respectively as follows:
where k > 0 , 0 < b, c < 1, c is a constant, b is the shape parameter, and k is the expected initial error content of a software product, the NHPP Gompertz model is a flexible model, demonstrated by the different shapes of its mean value and intensity functions that shown in Figure 1 for different values of parameters, The cumulative failure occurring silhouette in Figure 1a is S- shaped, the growth rate of cumulative failure occurring is small at beginning; then it increases to reach a maximum, and in the end slowed down to give the S-shape form. The shape of the intensity function in Figure 1b is either decreasing or showing an initial raise before start declining (unimodal shape). The rest of this model’s characteristics are as follows: The number of remaining errors function based on the mean value function in Equation (1) can be expressed by:
While, the corresponding error detection rate function is given by using Equations (2 and 3) as follows:
Additionally, the conditional reliability and mean time between software failures functions can be respectively written as:
Estimation Procedures for NHPP Gompertz Model
The formulas for determining the estimates of NHPP Gompertz model’s parameters for the maximum likelihood (ML), nonlinear least squares (NLS), and weighted nonlinear least squares (WNLS) estimation methods will be presented in this section.
Maximum likelihood estimation method
The maximum likelihood (ML) estimates of the NHPP Gompertz model parameters are provided as follows: Given the mean value and intensity functions in Equations (1 and 2) respectively, the joint density or the likelihood function of t1, t2, … , tn of the NHPP Gompertz growth model can be obtained as:
Hence the log-likelihood function is:
By differentiating Equation (8) with respect to k, a, and b respectively, and equating to zero, yields:
Equations (10) and (11) can be solved numerically to obtain and , then by substituting them in Equation (9), can be found.
Nonlinear least squares and weighted nonlinear least squares estimation methods
LSE is a popular technique and widely used in many fields for function fit and parameter estimation. The LSE method may be simple but very useful in estimating model parameters. It finds values of the parameters such that the sum of the squares of the difference between the fitting function and the experimental data is minimized . Suppose that the following n data points are provided: (t1, y1), (t2, y2), . . . , (tn, yn). The model to be fitted to these data is defined by:
Where i = 1, … , n, θ is the parameter vector, and εi is the error term. In statistics theory εi is assumed as independent variables of normal distribution N(0, σ2), where σ2: is the variance of the normal distribution. Mathematically, the traditional nonlinear least squares estimation method concerns in determining the value of the unknown parameters that minimizes the following quantity:
If εi ∼ N(0, σi2), and σi (1 ≤ i ≤ n) is not constant, the phenomenon is called heteroscedasticity. In this case the WNLSE is recommended as it increases the accuracy of the estimators, the unknown parameters in this estimation method have to be estimated by minimizing the following objective function:
Where wi > 0: is the weighting function, selecting the appropriate weighting function playing an important role in enhancing the obtained prediction results, one of the approach of calculating these factors are the optimal which based on use the inverse of the variance, another way is to specify these factors empirically.
The NLS estimates of the NHPP Gompertz model’s parameters k, c, and b are to minimize the following objective function:
Differentiating Equation (15) with respect to k, and b and setting the partial derivatives equal to 0,
the following three equations will be obtained:
The solution of the Equations (16), (17), and (18) is the NLS estimates of the NHPP Gompertz model.
Similarly, the WNLS estimates of the NHPP Gompertz model’s parameters k, c,and b are to minimize the following objective function:
Differentiating Equation (19) with respect to k, a and b and setting:
then solving for k, a and b, yields:
The solution of the Equations (20), (21), and (22) is the WNLS estimates; .
Application of the Estimations Techniques
Description of datasets
To evaluate the predictive capability of the MLE, LSE. and WNLSE techniques the following three data sets are used: Data1 is taken from , it contains 30 execution times between successive failures in tens of seconds; Data2 is presented by Kim HC  consists of 41 time between failures in hours; Data3 were collected from a Philips development centre and consists of 246 inter-failure times in minutes. The three data sets are listed in Tables 1-3. Respectively; read from left to right, and represented graphically in Figure 2.
|Times between failures in tens of seconds|
Table 1: Data 1 of the times between failures in tens of seconds, number of failures=30.
|Times between failures in hours|
Table 2: Data 2 of the times between failures in hours, 41 single failure data.
|Philips failure data|
Table 3: Philips failure data 3: execution times between successive failures in minutes, number of failures=246.
Selection of weight functions
In WNLSE method, suggesting suitable empirical weight functions is critical and play an important role in enhancing the obtained prediction results. In theory, the optimal weight function is wi = 1/σ2 , where i = 1, … , n, another type of weighting methods is empirical weight function . Utilising the WNLSE method to find the parameter estimates permits the weights to determine the impact of each data item to the final parameter estimates. In fact, the weight for each data item is provided relative to the weights of the other data item; hence different sets of absolute weights can have same influences. Thus, wi is taken to be a sequence of positive weights which considers as a random variable. The suggested empirical weight functions in this application are constructed and summarized in Table 4 based on the software testing data and some suggested constant parameters to restrict the parameter estimates in a way that ensure getting enhanced prediction results.
|Empirical weight functions Name||Empirical weight functions Formula (wi)|
where and β=0.5
Table 4: Empirical weight functions.
Goodness-of-fit evaluation criteria
The following four criteria will be used to evaluate the performance of the NHPP Gompertz model based on the three different methods of estimation and three real data sets:
1 Mean square error (MSE)
2 Mean absolute error (MAE)
4 Root mean square prediction error (RMSPE)
Those measures are useful and widely used in model evaluations to measure the deviation between the predicted and actual values, mathematically are defined as follows:
μTTF(ti; θ): actual mean time between software failures
: predicted mean time between software failures
n: the number of observations
p: the number of parameters to be estimated.
And the bias, the average difference between the estimator and the truth, is obtained as follows:
Data analysis of results
This section evaluates the performance of maximum likelihood, nonlinear least squares, and weighted nonlinear least squares estimation methods for NHPP Gompertz model using MSE, AME, Variance, RMSPE criteria. Three real data sets are used and 10 different empirical weight functions for the WNLSE method, the results are summarized in Table 5, according to the obtained results the following comparisons are done:
|Data 1 (30)||Data 2 (41)||Data 3 (246)|
|Method of estimation||MSE AME Variance RMSPE||MSE AME||MSE AME|
|Variance RMSPE||Variance RMSPE|
Table 5: Goodness-of-fit statistics for different methods of estimation of NHPP Gompertz model.
Comparison between MLE and NLSE method: For the first and second data sets (Data 1 and Data 2) with n=30 and 41 respectively, the NLSE method gives better performance than the MLE method based on the four selected criteria. While for Data 3 the larger studied data set n=246, the MLE method gives better performance than the NLSE according to the MSE and AME criteria while the other two criteria show that the best prediction results are obtained using the NLSE. In more details the value of the MSE is 21.6463 for MLE method and 22.1516 for the NLSE method, while AME criteria is 2.5233 for the MLE method and 2.8201 for the NLSE method. The values of Variance and RMSPE equal to 4.8800 and 4.9591 respectively for the MLE method but equal to 4.7593 and 4.7441 respectively for the NLSE method. Overall, our results show that the NLSE is superior to the MLE for the data obtained from project with small testing phase, while increasing the time of testing helps to enhance the performance of MLE method.
Comparison between WNLSE and MLE method: The WNLS estimates based on the 10 suggested empirical weights in Table 4 is compared with the ML estimates considering the four evaluation criteria which results in 40 comparison cases for each data set:
a. For data 1: Only the Ewf5 and Ewf8 give larger Variance and RMSPE for the WNLS estimates than the MLE estimates since VarienceML=0.4728 and RMSPEML=0.4813 while VarienceWNLS=0.4958 and RMSPEWNLS=0.5134, VarienceWNLS= 0.5426 and RMSPEWNLS=0.5719 for Ewf5 and Ewf8 respectively, therefore 36 of the 40 cases show the superiority of the WNLSE method over the MLE.
b. For data 2: all the 40 cases show that the WNLSE method performs better than the MLE method.
c. For data 3: According to the MSE criteria five of the weight functions show that the WNLSE method gives the best fit model, but with respect to the AME criteria the MLE method gives better prediction results comparing with all the 10 weight functions, while for the Variance and RMSPE criteria all the 10 weight functions show the superiority of the WNLSE method over the MLE method. Overall for this data set, 21 cases show that the best predictive capability comes when using the WNLSE method and 19 cases demonstrate that the best performance model is obtained when using the MLE method.
Comparison between WNLSE and NLSE method: Also, the majority of the considered cases is for the sake of the WNLSE method against the NLSE method; in more details:
a. For data 1: 36 out of 40 cases exhibit that the WNLSE method has lower evaluation criteria than the NLSE method.
b. For data 2: 25 out of 40 cases illustrate that the performance of the WNLSE method is better that the NLSE method.
c. For data 3: 24 out of 40 cases show that the WNLSE has better prediction performance than the NLSE.
Comparison between the empirical weight functions:
a. Objectively, based on the four selected criteria: Ewf1 gives the best fit model for Data 1.
b. For data 2: the MSE and AME criteria show that Ewf8 gives the best performance model while the Variance and RMSPE demonstrate that Ewf10 gives the best accurate model.
c. For data 3: the MSE and AME criteria show that Ewf5 gives the best performance model while the Variance and RMSPE demonstrate that Ewf4 gives the best accurate model.
In the other hand and according to our subjective judgement, based on the graphical representation in Figures 3 and 4: Ewf7, Ewf2, and Ewf3 give the best fit models for Data 1, 2, and 3 respectively. Figure 3 presents the selected best fit models objectively and subjectively; while Figure 4 and 5 shows the empirical weight functions that give the best fit models according to the graphical criteria.
This paper has provided a detailed comparative study between the MLE, NLSE, and WNLSE approaches using three different SRGMs failure data and based on the NHPP Gompertz model. In reliability engineering, not all the failures have the same contribution to the overall system reliability, assigning different weight according to the significance of the failure helps to make the data more representative to the software failure behavior. Our main research objective is to investigate the performance of the WNLSE method in the presence of unequal variance of NHPP Gompertz model. Empirical weight functions based on software testing data is one of approaches that can be used in the WNLSE method. Application results show that the best empirical weight function depends on various factors and differs according to the nature of failure data sets. It also shown that in most cases the WNLSE approach has lower values of the chosen evaluation criteria and consequently better predictive ability comparable with the MLE and NLSE methods.
For the problem of unequal variances of SRGMs associated with reliability data the WNLSE approach is recommended to be checked for other types of growth models. Also obtaining the mathematical formula of the NHPP Gompertz model variance to discover the performance of the optimal weighting approach which may offer better predictive accuracy comparable with the empirical approach is another interesting point to be considered.
Reach to a sound conclusion and choose the best fit model for each project failure data is crucial, to make a reasonable decision subjective and objective methods were used in our examples, it is indicated that different evaluation criteria give different judges about the accuracy of the selected estimation method, also based on the considered objective and subjective methods different best fit models are chosen for each real failure data, In this ambit using the super model approach that may help to get one better accuracy model is worth to be investigated.
Lutfiah Ismail Al turk is currently working as Associate Professor of Mathematical Statistics in Statistics Department at Faculty of Sciences, King AbdulAziz University, Jeddah, Kingdom of Saudi Arabia. Lutfiah Ismail Al turk obtained her B.Sc degree in Statistics and Computer Science from Faculty of Sciences, King AbdulAziz University in 1993 and M.Sc (Mathematical statistics) degree from Statistics Department, Faculty of Sciences, King AbdulAziz University in 1999. She received her Ph.D in Mathematical Statistics from university of Surrey, UK in 2007. Her current research interests include Software reliability modeling and Statistical Machine Learning.
- Goel AL, OkumotoK (1979) Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Trans Reliability 28: 206-211.
- Yamada S, Ohba M, Osaki S (1983) S-Shaped reliability growth modeling for software error detection. IEEE Trans Reliability 32: 475-484.
- Goel AL (1985) Software reliability models: Assumptions, limitations and applicability. IEEE Transactions on Software Engineering 11: 1411-1423.
- Zhao M, Xie M (1996) Onmaximum likelihood estimation for a general non- homogeneous poisson process. Scandinavian Journal of Statistics 23: 597-607.
- Chang YP (2001) Estimation of parameters for non homogeneous Poisson process: software reliability with change-point model. Commun Stat Simul Comput 30:623-635.
- Rana R, Staron M, Berger C, Hansson J, Nilsson M, et al. (2013) Comparing between Maximum Likelihood Estimator and Non-linear Regression Estimation Procedures for NHPP Software Reliability Growth Modelling. Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement Ankara. pp: 213-218.
- Zeephongsekul P, Jayasinghe CL, Fiondella L, Nagaraju V (2016) Maximum- Likelihood Estimation of Parameters of NHPP Software Reliability Models Using Expectation Conditional Maximization Algorithm.IEEE Transactions on Reliability 65: 1571-1583.
- Ross GJS (1990) Nonlinearestimation.
- Seber GAF, Wild CJ (1989) Nonlinear Regression. Wiley, New York.
- Wu S, Shao J (1999) Reliability analysis using the least squares methods in nonlinear mixed-effect degradation models. Statistica Sinica 9: 855-877.
- Madsen K, Nielsen HB, Tingleff O (1999) Methods for Non-Linear Least Squares Problems. Informatics and Mathematical Modelling, Technical University of Denmark.
- Sakata K (1974) Formulation for predictive methods in software production control: Static prediction and failure rate transition model. IEICE Trans Inf Syst J57:277-283.
- Satoh D (2000) A discrete gompertz equation and a software reliability growth model. IEICE Trans Inf & Syst E83-D: 1508-1513.
- Ohishi K, Okamura H, Dohi T (2009) Gompertzsoftware reliability model: estimation algorithm and empirical validation. Journal of Systems and Software 82:535-543.
- Lutfiah Ismail A (2014) Comparing between maximum likelihood and least square estimators for gompertz software reliability model.International Journal of Software Engineering & Applications 5: 51-61.
- Hsu CJ, Huang CY, Chang JR (2011) Enhancing software reliability modelling and prediction through the introduction of time variable fault reduction factor. Applied Mathematical Modelling 35: 506-521.
- Littlewood B (1991) Forecasting software reliability, Bayesian methods in reliability. Kluwer Academic Publishers, USA.
- Kim HC, Park HK (2010) Thecomparative study of software optimal release time based on burr distribution. International Journal of Advancements in Computing Technology 2: 119-128.
- Douglas CM, Elizabeth AP, Geoffrey GV (2012) Introduction to linear regression analysis. Wiley, New York. pp: 1-67.
- Hwang S, Pham H (2009) Quasi-renewal time-delay fault-removal consideration in softwarere liability modelling.IEEET ranson systems man and cybernetics-Part A:Systems and humans 39: 200-209.
- HuangCY,KuoSY (2002) Analysis of incorporating logistic testing-effort function into software reliability modeling. IEEE Transactions on Reliability 51: 261-270.
- Chiu KC, Huang YS, Lee TZ (2008) A study of software reliability growth from the perspective of learning effects. Reliability Engineering and System Safety 93: 1410-1421.
Citation: Al turk LI (2018) Weighted Nonlinear Least Squares Technique for Parameters Estimation of the NHPP Gompertz Model. J Inform Tech Softw Eng 8: 231. DOI: 10.4172/2175-7866.1000231
Copyright: © 2018 Al turk LI. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Select your language of interest to view the total content in your interested language
Share This Article
9th World Congress and Expo on Optics, Photonics and Telecommunication
November 22-24, 2018 Bucharest, Romania
- Total views: 714
- [From(publication date): 0-2018 - Oct 19, 2018]
- Breakdown by view type
- HTML page views: 678
- PDF downloads: 36