Abdus Sattar^{1*}, Sanjoy K. Sinha^{2} and Nathan J. Morris^{1}  
^{1}Department of Epidemiology & Biostatistics, Case Western Reserve University, Cleveland, OH, USA  
^{2}School of Mathematics and Statistics, Carleton University, Ottawa, Ontario K1S 5B6, Canada  
Corresponding Author :  Abdus Sattar Department of Epidemiology and Biostatistics School of Medicine Case Western Reserve University 10900 Euclid Avenue, BRB, G19 Cleveland, OH 441064945, USA Tel: 1.216.368.1501 Fax: 1.216.368.1969 Email: [email protected] 
Received July 05, 2012; Accepted August 20, 2012; Published August 25, 2012  
Citation: Sattar A, Sinha SK, Morris NJ (2012) A Parametric Survival Model When a Covariate is Subject to LeftCensoring. J Biomet Biostat S3:002. doi:10.4172/21556180.S3002  
Copyright: © 2012 Sattar A, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Visit for more related articles at Journal of Biometrics & Biostatistics
Problem statement: Modeling survival data with a set of covariates usually assumes that the values of the covariates are fully observed. However, in a variety of applications, some values of a covariate may be leftcensored due to inadequate instrument sensitivity to quantify the biospecimen. When data are leftcensored, the true values are missing but are known to be smaller than the detection limit. The most commonly used adhoc method to deal with nondetect values is to substitute the nondetect values by the detection limit. Such adhoc analysis of survival data with an explanatory variable subject to leftcensoring may provide biased and inefficient estimators of hazard ratios and survivor functions.
Method: We consider a parametric proportional hazards model to analyze timetoevent data. We propose a likelihood method for the estimation and inference of model parameters. In this likelihood approach, instead of replacing the nondetect values by the detection limit, we adopt a numerical integration technique to evaluate the observed data likelihood in the presence of a leftcensored covariate. Monte Carlo simulations were used to demonstrate various properties of the proposed regression estimators including the consistency and efficiency.
Results: The simulation study shows that the proposed likelihood approach provides approximately unbiased estimators of the model parameters. The proposed method also provides estimators that are more efficient than those obtained under the adhoc method. Also, unlike the adhoc estimators, the coverage probabilities of the proposed estimators are at their nominal level. Analysis of a large cohort study, genetic and inflammatory marker of sepsis study, shows discernibly different results based on the proposed method.
Conclusion: Naive use of detection limit in a parametric survival model may provide biased and inefficient estimators of hazard ratios and survivor functions. The proposed likelihood approach provides approximately unbiased and efficient estimators of hazard ratios and survivor functions.
Keywords 
Leftcensored covariate; Maximum likelihood method; Numerical integration; Survival model 
Introduction 
Survival models are commonly used to assess the relationship between a covariate of interest and timetoevent data. In these models it is typically assumed that the covariate is fully observed, but there are many situations when the underlying covariate is not fully observed. Incomplete measurements of a variable can occur in environmental, epidemiological, biological and biomedical studies [13]. For example, when conducting a bioassay to quantify the biomarker some measurements are not fully observed because of inadequate instrument sensitivity. Similar incomplete measurements can also occur when measuring air quality, water quality, soils, contaminants in biota, etc. The measurement above the detection limit (LOD) is reported, and in those that are undetectable, LOD is reported. Several authors [2] reported that the use of the LOD or LOD/2 provide biased regression parameter estimate. When studying an association between a biomarker subject to LOD and timetoevent, it is necessary to adjust the impact of LOD in survival analysis. In this article we intend to study the association between a right censored survival outcome and a leftcensored covariate based on the direct maximization of a likelihood function where the impact of leftcensoring in the covariate of interest will be integrate out by a numerical integration method. 
As a running example, we use the Genetic and Inflammatory Marker of Sepsis (GenIMS) study. This was a large cohort study of patients with communityacquired pneumonia and sepsis [4]. The goal of this study was to understand the role of inflammatory cytokine response in a hospitalized cohort of patients. After enrollment in the study, blood was drawn for a cytokine assay immediately following the enrollment, daily for the first week and weekly thereafter while subjects remained in the hospital. There are several cytokine measured in this study and one of them is Interleukin 10 (IL10). About onethird of the IL10 measurements fall below the detection limit and LOD is reported. In this case IL10 is a risk factor or a covariate of interest which has leftcensoring. Our goal is to find the association between 90 day mortality and IL10 given that a large percentage of IL10 measurements are leftcensored. More details about the GenIMS study can be found in the result section. 
During the past several years new methods have been developed for improved statistical inference when there is a censored covariate in the regression model, and these methods have been compared with naive methods. Naive methods include removing observations falling below the detection limit. Removing observations may provide unbiased regression parameter estimates but results in reduced sample size and hence decreasing efficiency of the parameter estimates. Another commonly practiced approach is the adhoc substitution method. In this approach observations that fall below the detection limit are recognized by LOD, LOD/2, LOD/, or zero. Helsel [5] and Sattar et al. [6] studied these adhoc methods and showed that these adhoc methods provide biased estimates and the degree of bias increases with the increase in percent of LOD observations in the covariate. Helsel argued that there is no theoretical basis for the use of these substitution methods. Two articles on censored covariate in the generalized linear model appeared almost at the same time in the literature, one used a maximum likelihood method with the Monte Carlo EM algorithm (May et al. [7]), and the other used an optimal estimating equations approach (Tsimikas et al. [8]). Nie et al. [9] studied leftcensoring of an explanatory variable in the linear regression model setup. These authors demonstrated that the commonly used substitution methods of replacing leftcensored values with LOD, LOD/, LOD/2 provide biased parameter estimates with low Coverage Probabilities (CP). They proposed parameter estimation by the maximum likelihood method based on parametric distributional assumptions. The proposed method has been compared with a method of replacing LOD by E(XX<LOD). The authors concluded that these two methods are competitive and are promising alternatives to the multiple imputation method [10]. 
There are several approaches to model the hazard of an event. A common approach is the parametric survival model. In this type of modeling, a probability distribution is assumed for the underlying survival time. If the distributional assumption is satisfied then this modeling approach is more efficient than its counterpart nonparametric and semiparametric hazard models. Langor et al. [11] studied doubly censored survival data with an intervalcensored covariate in a parametric survival model framework. They have considered a censored discrete covariate. In their estimation approach, the likelihood function is maximized as a nonlinear constant maximization problem, and they used a sequential quadratic programming algorithm. This approach guarantees a local maximum likelihood estimate. Cox regression models with covariate subject to detection limit has also been studied. Lee et al. [12] propose to estimate the relative risk function based on the uncensored covariate data and used this risk function to derive a partial likelihood function. D’Angelo et al. [13] analyzed survival data in a Cox model framework with a covariate subject to leftcensoring. These authors have used an index approach which is conceptually similar to the EM algorithm. In this approach the censored value is expressed as a function of all of the observed values of the covariate. 
In this article, we propose a method for estimating survival regression parameter associated with a continuous covariate of interest which is subject to limit of detection. The covariate of interest is leftcensored because of the limit of detection in the bioassay. We maximized the likelihood function and integrate out the leftcensoring via Simpson’s numerical integration method. Monte Carlo simulations study show that the proposed method provides approximately unbiased estimates of the model parameters. The parameter estimates are also efficient and its Coverage Probabilities (CP) is at the nominal level. The method has been implemented in standard statistical software. To our knowledge, no one has addressed the detection limit problem in a parametric survival model using a numerical integration method. 
The article is organized as follows, in section “Materials and Methods”, we have developed the general framework of our proposed method. In sections “Simulation Study” and “Illustrative Example”, we have presented the simulation and GenIMS study results, respectively. We have offered a discussion and conclusion in the final section. 
Materials and Methods 
Suppose in an experiment with n subjects, T_{i} denotes the survival time of subject i, i=1,…,n. Assume that some of the “true” values t_{1},t_{2},…,t_{n} of the random variables T_{1},…,T_{n} are rightcensored. We further assume that the censoring is noninformative. The rightcensored observed survival data can be written as pairs (y_{i},δ_{i}), where δ_{i} is the event indicator: δ_{i}=1 if y_{i} is a true event time, that is, if t_{i}= y_{i}, and δ_{i}=0 if t_{i} is rightcensored, that is, if t_{i} > y_{i}. Let X_{i} denote a p×1 vector of covariates associated with the i^{th} subject. Suppose the hazard rate h_{i}(t) for subject i at time t is related to the values x_{i} of the covariates by the proportional hazards model 
where h_{0}(t) is a baseline hazard function depending on unknown parameters β_{0} and β_{1} is a p×1 vector of unknown regression coefficients. Assuming that the survival times are independent, the likelihood function of β = (β_{0}, β_{1}) for given data (y_{i}, δ_{i}, x_{i}) can be defined as 
where S_{i} (t) = P (T_{i} > t  x_{i}, β) is the survivor function for subject i at time t. Let denote the density function for the survival time T_{i} at time t. Then the above likelihood function can be expressed as 
(1) 
When the values of a covariate are censored due to the limit of detection, and the censored values are replaced by the LOD, then likelihood function (1) provides biased and inefficient regression parameter estimates [13,14]. To obtain consistent and efficient regression parameter estimates from a survival regression model with a covariate subject to leftcensoring we are proposing the following method. This method is based on Simpson’s numerical integration technique and easy to implement in standard statistical software. The likelihood function can be constructed for the censored and observed values with a fair amount of effort. For now we consider that X_{i} has only one continuous covariate and its value x_{i} is leftcensored if x_{i} < c for a given value of c. Let denote the density of the random variable X_{i}, which is assumed to be known. Define a binary random variable R_{i} which is 1 if X_{i} is observed and 0 if X_{i} is not detected, that is, 
We assume that the binary random variable R_{i} follows the Bernoulli distribution 
for r = 0,1, where π_{i} = P(X_{i} ≥ c) is the probability that the value of X_{i} is observed. To estimate the model parameters β, we propose to maximize the observed data likelihood function 
(2) 
In the absence of leftcensored covariates, the above likelihood function L(β) becomes the ordinary likelihood L_{0}(β), as defined in (1). From (2), the loglikelihood function is obtained as 
(3) 
Note that the above loglikelihood function (3) cannot be written in a closed form, and numerical methods may be used to evaluate the integral with respect to the covariate x_{i}. Here we consider evaluating this integral using Simpson’s 1/3 rule of numerical integration. The Simpson’s method produces a numerical value for the integration of a function over a set. Suppose that an interval [a,b] is divided into k subintervals, with k an even number. Then the composite Simpson’s rule is defined by [15] 
where z_{j} = a + jh for j = 0,1,…,n, with h = (ba)/n. The error term associated with the composite Simpson’s rule is bounded (in absolute value) by. Differentiating l(β) with respect to β, gives the score equations U(β) = (∂ / ∂β ) l(β) = 0. The maximum likelihood estimators of the model parameters β can be obtained by solving these score equations numerically using an iterative method or by directly maximizing the loglikelihood function (3) using some numerical optimization technique, which is discussed further in the next section. 
Standard maximum likelihood theory suggests that E{U(β)} = 0. The observed Fisher information I(β) is the negative of the p×p Hessian matrix of the loglikelihood, so that For the exponential family, the expected Fisher information matrix,. Under appropriate regularity conditions, the maximum likelihood estimators follow an approximate normal distribution for a large sample size n: 
Simulation study 
To examine the performance of the proposed method, we conducted a simulation study. In this study, we compared our proposed method based on the loglikelihood function (3) with the naïve method, which estimates the model parameters by replacing the leftcensored covariates with the LOD under a number of different scenarios. We refer to these two methods of analysis as the “corrected” and the “naïve” approach, respectively. In each scenario, we consider a Weibull proportional hazard model. Under this proportional hazards model, the hazard of death at time t for the i^{th} subject is [16] 
(4) 
where λ and γ are the scale and shape parameters of the Weibull distribution, respectively. The survivor function corresponding to the hazard function (4) is. For simplicity, we set the shape parameter γ =1. In this setting, the hazard function (4) can be written in the form , where and β = [β_{0}, β_{1}]′ with β_{0} = log(λ). The values of the covariate X were generated from the normal distribution with mean 5.0 and standard deviation which differed for some of the scenarios. True values of the regression parameters intercept (β_{0}) and slope (β_{1}) were set to 2.0 and 0.2, respectively. The rightcensored survival times were generated from the Weibull distribution by setting λ = exp(intercept + 50). If the observed time is less than the rightcensoring time, then the event is observed. Otherwise, the survival time is rightcensored. The values that differed for each scenario were the sample size (N ∈ {500,1000}), the standard deviation of the covariate (SD(X_{1})∈{0.5,1.0,2.0}) and the percentages of covariates which were censored (1π ∈{10%,20%,50%}).To generate various percentages of leftcensored covariate values, we set LOD = 5+SD(X)Ф^{1}(1π), where Ф is the normal cumulative density function. If the generated values of the covariate X are less than the LOD, then LOD is recorded. The statistical software R [13] was used for the computation. In particular, to maximize the likelihood function derived under the above Weibull proportional hazard model, we used the method LBFGSB [14] available through the R function optim. This method uses function values and gradients to build up a picture of the surface to be optimized. For the naive approach we used the survival package in R. 
The simulation results are presented in Table 1. As expected with no LOD (i.e. 1π = 0%), the naïve approach and corrected approach are identical. As the proportion of censored values increased, the bias in the estimates from the naïve approach also increased. Also, the bias in the estimates from the naïve approach was significantly higher when the standard deviation of X was higher. When the standard deviation of the covariate was 2.0 with a sample size 1000 and 50% observations were leftcensored, the estimated 95% coverage rate for both the slope and intercept was less than 22% for the naïve approach. In contrast, the corrected approach produced results with very small bias, smaller mean square error, and approximately correct coverage for most scenarios. When the standard deviation of the covariate was 2.0, the corrected approach had a slightly low coverage rate for 500 sample size, but significantly improved coverage compared to the naïve approach. Thus the proposed approach is approximately unbiased and achieves good coverage rates in most of the scenarios. 
Illustrative example 
Severe sepsis is the systemic inflammatory response to infection with complication of organ dysfunction. CommunityAcquired Pneumonia (CAP) is the leading cause of severe sepsis. The Genetic and Inflammatory Markers of Sepsis (GenIMS) study  a large, multicenter, cohort study of patients with CAP was conducted to understand the pattern of systemic cytokine response to infection and to determine if there were specific patterns associated with severe sepsis and death [17]. A total of 2320 patients with CAP presenting to the emergency departments of 28 hospitals in Pennsylvania, Connecticut, Michigan, and Tennessee enrolled in the study during December 2001 and November 2003. GenIMS included patients with age ≥ 18 years old with a clinical and radiologic diagnosis of pneumonia. After enrollment detailed baseline and clinical information were gathered, and blood was drawn for cytokine assays immediately following enrollment and daily throughout the first seven days of hospitalization. The primary outcome variable in the GenIMS study was severe sepsis and 90day mortality. The markers of greatest interest in the GenIMS study were the proinflammatory marker Interleukin6 (IL6) and antiinflammatory marker Interleukin10 (IL10). More information regarding the study population, outcomes, treatment, and covariates can be found in the Kellum et al. [17]. 
In this illustration, we consider the association between 90day mortality and the IL10 biomarker baseline (Day 1) data. Blood was drawn for a cytokine assay from 1429 subjects. If the patients presented to the emergency department after 11 pm or on the weekends or holidays, then the blood was not drawn for logistic reasons. Note that there are some intermittent missing biomarker data due to administrative reasons and we are assuming that this intermittently missing data are missing completely at random. A detailed decomposition of the study subjects can be found in the above mentioned reference. In this analysis, we have a total 867 subjects with IL10 measurements at baseline. However, the measurements of IL10 were leftcensored (47.87 percent) because of the inadequate sensitivity of the cytokine assay resulting in a leftcensoring of the measure at the lower limit of detection. 
Table 2 reports the descriptive statistics of the covariates that we consider in this analysis. The presented result is based on the baseline (Day 1) characteristics of demographic and clinical variables. From this table we can say that the patients who have died during the first 90 days after the hospitalization for CAP were mostly male and older patients. Higher proportions of these patients had been treated with steroids, and their Ddimer and IL10 levels were higher. 
Table 3 summarizes the results from the GenIMS data analyses. To examine the impact of leftcensoring in a real study, We have fitted the corrected and naive models described in the simulation section. The naïve survival model is a parametric Weibull survival model where nondetect values are replaced by the LOD. The corrected survival model is our proposed model where we have fitted the survival model with an implementation of the Simpson’s numerical integration technique for the leftcensoring for IL10. The model considered includes the antiinflammatory biomarker IL10, age, gender, steroid use, and coagulation marker Ddimer. We have performed the logarithmic transformation on the continuous skewed data (IL10 and Ddimer), and rescale the age variable (age ÷ 10) so that the estimates become stable and have improved the interpretation. The estimates from the two models are different. The corrected model Hazard Ratio (HR) estimate for the covariate IL10 is smaller than its counterpart naive model HR estimate. The proposed model HR estimate for IL10 is also more efficient than the other model. The 95% CI of the HR estimate for IL10 obtained from the naive and corrected models are [1.108, 1.539] and [1.111, 1.432] respectively. These results suggest that the naive use of the detection limit as a substitution for an undetected value can lead to a different estimate and interpretation of the risk factors. Our simulation results have shown that there are situations where the difference between the two approaches is even larger than in our real data example. 
Discussion 
A censored covariate is a challenge for statistical analysis. We consider leftcensoring of a covariate and examined the impact of leftcensoring in a parametric survival model. There are several adhoc methods to deal with the limit of detection problem of a covariate in a regression model framework. These methods provide biased and inefficient parameter estimates. In this paper we proposed a method for correcting bias and making an efficient parametric survival inference when there is a leftcensored covariate. Our propose likelihood method is based on Simpson’s numerical integration technique. Because the data involves both a rightcensored timetoevent outcome and a leftcensored covariate, the likelihood function becomes a complicated one. From this complicated likelihood function, we have integrated out the impact of leftcensoring. The Monte Carlo simulation study shows that the proposed model’s performance is comparable to the standard survival model’s performance where there is no leftcensoring. We have also applied the proposed method to a large cohort data set. From this exercise we have found that the proposed method results are different from the adhoc method results. 
In the situation when a covariate is subject to leftcensoring, this paper compares a new method for analyzing survival data to a commonly used naive method that replace the censored values by the limit of detection. We have demonstrated that the naive method provide biased, efficient regression parameter estimates with low coverage probabilities. On the other hand our proposed likelihood method based on a numerical integration technique provides approximately unbiased and efficient parameter estimates, and achieves good coverage probabilities in most of the scenarios. The proposed method is relatively simple to understand and easy to implement in a standard statistical software. 
We have implemented our proposed method by considering only one covariate with limit of detection. We expect that this method can be extended with some computational burden for more than one covariate with the limit of detection. A limitation of this study is that we assumed a normal distribution for the censored covariate and derive the likelihood function accordingly, and we did not investigate the robustness to the misspecification of the normality assumption in the simulation. We also did not examine the impact of changing the shape parameter value for the Weibull distribution in our simulation. We are working on another manuscript where we are intending to relax the assumption of normality, and perform sensitivity analysis. In summary, in the presence of limit of detection in a covariate of a parametric survival model, the estimates are biased and inefficient. Our proposed likelihoodbased method using a numerical integration provides unbiased and efficient parameter estimates. Therefore, the proposed method is an encouraging one to use when a covariate is subject to a limit of detection. The statistical analysis was performed using R software version 2.15.0. The R script can be obtained upon request to the corresponding author. 
Acknowledgements 
We thank Dr. Derek Angus and the CRISMA laboratory for access to the GenIMS data. We are indebted to the nurses, respiratory therapists, phlebotomists, physicians, and other healthcare professionals, as well as the patients and their families who supported this trial. A complete list of GenIMS investigators is available at www.ccm.upmc.edu/genims investigators. The GenIMS study was funded via grant R01 GM61992 by the National Institute of General Medical Sciences. Sanjoy Sinha is grateful for the support provided by a grant from the Natural Sciences and Engineering Research Council of Canada. 
References 

Table 1  Table 2 