alexa A Parametric Survival Model When a Covariate is Subject to Left-Censoring | OMICS International
ISSN: 2155-6180
Journal of Biometrics & Biostatistics

Like us on:

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

A Parametric Survival Model When a Covariate is Subject to Left-Censoring

Abdus Sattar1*, Sanjoy K. Sinha2 and Nathan J. Morris1
1Department of Epidemiology & Biostatistics, Case Western Reserve University, Cleveland, OH, USA
2School of Mathematics and Statistics, Carleton University, Ottawa, Ontario K1S 5B6, Canada
Corresponding Author : Abdus Sattar
Department of Epidemiology and Biostatistics
School of Medicine
Case Western Reserve University
10900 Euclid Avenue, BRB, G-19
Cleveland, OH 44106-4945, USA
Tel: 1.216.368.1501
Fax: 1.216.368.1969
E-mail: [email protected]
Received July 05, 2012; Accepted August 20, 2012; Published August 25, 2012
Citation: Sattar A, Sinha SK, Morris NJ (2012) A Parametric Survival Model When a Covariate is Subject to Left-Censoring. J Biomet Biostat S3:002. doi:10.4172/2155-6180.S3-002
Copyright: © 2012 Sattar A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics


Problem statement: Modeling survival data with a set of covariates usually assumes that the values of the covariates are fully observed. However, in a variety of applications, some values of a covariate may be left-censored due to inadequate instrument sensitivity to quantify the biospecimen. When data are left-censored, the true values are missing but are known to be smaller than the detection limit. The most commonly used ad-hoc method to deal with nondetect values is to substitute the nondetect values by the detection limit. Such ad-hoc analysis of survival data with an explanatory variable subject to left-censoring may provide biased and inefficient estimators of hazard ratios and survivor functions.

Method: We consider a parametric proportional hazards model to analyze time-to-event data. We propose a likelihood method for the estimation and inference of model parameters. In this likelihood approach, instead of replacing the nondetect values by the detection limit, we adopt a numerical integration technique to evaluate the observed data likelihood in the presence of a left-censored covariate. Monte Carlo simulations were used to demonstrate various properties of the proposed regression estimators including the consistency and efficiency.

Results: The simulation study shows that the proposed likelihood approach provides approximately unbiased estimators of the model parameters. The proposed method also provides estimators that are more efficient than those obtained under the ad-hoc method. Also, unlike the ad-hoc estimators, the coverage probabilities of the proposed estimators are at their nominal level. Analysis of a large cohort study, genetic and inflammatory marker of sepsis study, shows discernibly different results based on the proposed method.

Conclusion: Naive use of detection limit in a parametric survival model may provide biased and inefficient estimators of hazard ratios and survivor functions. The proposed likelihood approach provides approximately unbiased and efficient estimators of hazard ratios and survivor functions.

Left-censored covariate; Maximum likelihood method; Numerical integration; Survival model
Survival models are commonly used to assess the relationship between a covariate of interest and time-to-event data. In these models it is typically assumed that the covariate is fully observed, but there are many situations when the underlying covariate is not fully observed. Incomplete measurements of a variable can occur in environmental, epidemiological, biological and biomedical studies [1-3]. For example, when conducting a bioassay to quantify the biomarker some measurements are not fully observed because of inadequate instrument sensitivity. Similar incomplete measurements can also occur when measuring air quality, water quality, soils, contaminants in biota, etc. The measurement above the detection limit (LOD) is reported, and in those that are undetectable, LOD is reported. Several authors [2] reported that the use of the LOD or LOD/2 provide biased regression parameter estimate. When studying an association between a biomarker subject to LOD and time-to-event, it is necessary to adjust the impact of LOD in survival analysis. In this article we intend to study the association between a right censored survival outcome and a leftcensored covariate based on the direct maximization of a likelihood function where the impact of left-censoring in the covariate of interest will be integrate out by a numerical integration method.
As a running example, we use the Genetic and Inflammatory Marker of Sepsis (GenIMS) study. This was a large cohort study of patients with community-acquired pneumonia and sepsis [4]. The goal of this study was to understand the role of inflammatory cytokine response in a hospitalized cohort of patients. After enrollment in the study, blood was drawn for a cytokine assay immediately following the enrollment, daily for the first week and weekly thereafter while subjects remained in the hospital. There are several cytokine measured in this study and one of them is Interleukin 10 (IL10). About one-third of the IL10 measurements fall below the detection limit and LOD is reported. In this case IL10 is a risk factor or a covariate of interest which has leftcensoring. Our goal is to find the association between 90 day mortality and IL10 given that a large percentage of IL10 measurements are leftcensored. More details about the GenIMS study can be found in the result section.
During the past several years new methods have been developed for improved statistical inference when there is a censored covariate in the regression model, and these methods have been compared with naive methods. Naive methods include removing observations falling below the detection limit. Removing observations may provide unbiased regression parameter estimates but results in reduced sample size and hence decreasing efficiency of the parameter estimates. Another commonly practiced approach is the ad-hoc substitution method. In this approach observations that fall below the detection limit are recognized by LOD, LOD/2, LOD/, or zero. Helsel [5] and Sattar et al. [6] studied these ad-hoc methods and showed that these ad-hoc methods provide biased estimates and the degree of bias increases with the increase in percent of LOD observations in the covariate. Helsel argued that there is no theoretical basis for the use of these substitution methods. Two articles on censored covariate in the generalized linear model appeared almost at the same time in the literature, one used a maximum likelihood method with the Monte Carlo EM algorithm (May et al. [7]), and the other used an optimal estimating equations approach (Tsimikas et al. [8]). Nie et al. [9] studied left-censoring of an explanatory variable in the linear regression model set-up. These authors demonstrated that the commonly used substitution methods of replacing left-censored values with LOD, LOD/, LOD/2 provide biased parameter estimates with low Coverage Probabilities (CP). They proposed parameter estimation by the maximum likelihood method based on parametric distributional assumptions. The proposed method has been compared with a method of replacing LOD by E(X|X<LOD). The authors concluded that these two methods are competitive and are promising alternatives to the multiple imputation method [10].
There are several approaches to model the hazard of an event. A common approach is the parametric survival model. In this type of modeling, a probability distribution is assumed for the underlying survival time. If the distributional assumption is satisfied then this modeling approach is more efficient than its counterpart nonparametric and semi-parametric hazard models. Langor et al. [11] studied doubly censored survival data with an interval-censored covariate in a parametric survival model framework. They have considered a censored discrete covariate. In their estimation approach, the likelihood function is maximized as a non-linear constant maximization problem, and they used a sequential quadratic programming algorithm. This approach guarantees a local maximum likelihood estimate. Cox regression models with covariate subject to detection limit has also been studied. Lee et al. [12] propose to estimate the relative risk function based on the uncensored covariate data and used this risk function to derive a partial likelihood function. D’Angelo et al. [13] analyzed survival data in a Cox model framework with a covariate subject to left-censoring. These authors have used an index approach which is conceptually similar to the EM algorithm. In this approach the censored value is expressed as a function of all of the observed values of the covariate.
In this article, we propose a method for estimating survival regression parameter associated with a continuous covariate of interest which is subject to limit of detection. The covariate of interest is left-censored because of the limit of detection in the bioassay. We maximized the likelihood function and integrate out the left-censoring via Simpson’s numerical integration method. Monte Carlo simulations study show that the proposed method provides approximately unbiased estimates of the model parameters. The parameter estimates are also efficient and its Coverage Probabilities (CP) is at the nominal level. The method has been implemented in standard statistical software. To our knowledge, no one has addressed the detection limit problem in a parametric survival model using a numerical integration method.
The article is organized as follows, in section “Materials and Methods”, we have developed the general framework of our proposed method. In sections “Simulation Study” and “Illustrative Example”, we have presented the simulation and GenIMS study results, respectively. We have offered a discussion and conclusion in the final section.
Materials and Methods
Suppose in an experiment with n subjects, Ti denotes the survival time of subject i, i=1,…,n. Assume that some of the “true” values t1,t2,…,tn of the random variables T1,…,Tn are right-censored. We further assume that the censoring is non-informative. The right-censored observed survival data can be written as pairs (yii), where δi is the event indicator: δi=1 if yi is a true event time, that is, if ti= yi, and δi=0 if ti is right-censored, that is, if ti > yi. Let Xi denote a p×1 vector of covariates associated with the ith subject. Suppose the hazard rate hi(t) for subject i at time t is related to the values xi of the covariates by the proportional hazards model
where h0(t) is a baseline hazard function depending on unknown parameters β0 and β1 is a p×1 vector of unknown regression coefficients. Assuming that the survival times are independent, the likelihood function of β = (β0, β1) for given data (yi, δi, xi) can be defined as
where Si (t) = P (Ti > t | xi, β) is the survivor function for subject i at time t. Let denote the density function for the survival time Ti at time t. Then the above likelihood function can be expressed as
When the values of a covariate are censored due to the limit of detection, and the censored values are replaced by the LOD, then likelihood function (1) provides biased and inefficient regression parameter estimates [13,14]. To obtain consistent and efficient regression parameter estimates from a survival regression model with a covariate subject to left-censoring we are proposing the following method. This method is based on Simpson’s numerical integration technique and easy to implement in standard statistical software. The likelihood function can be constructed for the censored and observed values with a fair amount of effort. For now we consider that Xi has only one continuous covariate and its value xi is left-censored if xi < c for a given value of c. Let denote the density of the random variable Xi, which is assumed to be known. Define a binary random variable Ri which is 1 if Xi is observed and 0 if Xi is not detected, that is,
We assume that the binary random variable Ri follows the Bernoulli distribution
for r = 0,1, where πi = P(Xi ≥ c) is the probability that the value of Xi is observed. To estimate the model parameters β, we propose to maximize the observed data likelihood function
In the absence of left-censored covariates, the above likelihood function L(β) becomes the ordinary likelihood L0(β), as defined in (1). From (2), the log-likelihood function is obtained as
Note that the above log-likelihood function (3) cannot be written in a closed form, and numerical methods may be used to evaluate the integral with respect to the covariate xi. Here we consider evaluating this integral using Simpson’s 1/3 rule of numerical integration. The Simpson’s method produces a numerical value for the integration of a function over a set. Suppose that an interval [a,b] is divided into k subintervals, with k an even number. Then the composite Simpson’s rule is defined by [15]
where zj = a + jh for j = 0,1,…,n, with h = (b-a)/n. The error term associated with the composite Simpson’s rule is bounded (in absolute value) by. Differentiating l(β) with respect to β, gives the score equations U(β) = (∂ / ∂β ) l(β) = 0. The maximum likelihood estimators of the model parameters β can be obtained by solving these score equations numerically using an iterative method or by directly maximizing the log-likelihood function (3) using some numerical optimization technique, which is discussed further in the next section.
Standard maximum likelihood theory suggests that E{U(β)} = 0. The observed Fisher information I(β) is the negative of the p×p Hessian matrix of the log-likelihood, so that For the exponential family, the expected Fisher information matrix,. Under appropriate regularity conditions, the maximum likelihood estimators follow an approximate normal distribution for a large sample size n:
Simulation study
To examine the performance of the proposed method, we conducted a simulation study. In this study, we compared our proposed method based on the log-likelihood function (3) with the naïve method, which estimates the model parameters by replacing the left-censored covariates with the LOD under a number of different scenarios. We refer to these two methods of analysis as the “corrected” and the “naïve” approach, respectively. In each scenario, we consider a Weibull proportional hazard model. Under this proportional hazards model, the hazard of death at time t for the ith subject is [16]
where λ and γ are the scale and shape parameters of the Weibull distribution, respectively. The survivor function corresponding to the hazard function (4) is. For simplicity, we set the shape parameter γ =1. In this setting, the hazard function (4) can be written in the form , where and β = [β0, β1]′ with β0 = log(λ). The values of the covariate X were generated from the normal distribution with mean 5.0 and standard deviation which differed for some of the scenarios. True values of the regression parameters intercept (β0) and slope (β1) were set to -2.0 and -0.2, respectively. The right-censored survival times were generated from the Weibull distribution by setting λ = exp(intercept + 50). If the observed time is less than the right-censoring time, then the event is observed. Otherwise, the survival time is right-censored. The values that differed for each scenario were the sample size (N ∈ {500,1000}), the standard deviation of the covariate (SD(X1)∈{0.5,1.0,2.0}) and the percentages of covariates which were censored (1-π ∈{10%,20%,50%}).To generate various percentages of left-censored covariate values, we set LOD = 5+SD(X)Ф-1(1-π), where Ф is the normal cumulative density function. If the generated values of the covariate X are less than the LOD, then LOD is recorded. The statistical software R [13] was used for the computation. In particular, to maximize the likelihood function derived under the above Weibull proportional hazard model, we used the method L-BFGS-B [14] available through the R function optim. This method uses function values and gradients to build up a picture of the surface to be optimized. For the naive approach we used the survival package in R.
The simulation results are presented in Table 1. As expected with no LOD (i.e. 1-π = 0%), the naïve approach and corrected approach are identical. As the proportion of censored values increased, the bias in the estimates from the naïve approach also increased. Also, the bias in the estimates from the naïve approach was significantly higher when the standard deviation of X was higher. When the standard deviation of the covariate was 2.0 with a sample size 1000 and 50% observations were left-censored, the estimated 95% coverage rate for both the slope and intercept was less than 22% for the naïve approach. In contrast, the corrected approach produced results with very small bias, smaller mean square error, and approximately correct coverage for most scenarios. When the standard deviation of the covariate was 2.0, the corrected approach had a slightly low coverage rate for 500 sample size, but significantly improved coverage compared to the naïve approach. Thus the proposed approach is approximately unbiased and achieves good coverage rates in most of the scenarios.
Illustrative example
Severe sepsis is the systemic inflammatory response to infection with complication of organ dysfunction. Community-Acquired Pneumonia (CAP) is the leading cause of severe sepsis. The Genetic and Inflammatory Markers of Sepsis (GenIMS) study - a large, multicenter, cohort study of patients with CAP was conducted to understand the pattern of systemic cytokine response to infection and to determine if there were specific patterns associated with severe sepsis and death [17]. A total of 2320 patients with CAP presenting to the emergency departments of 28 hospitals in Pennsylvania, Connecticut, Michigan, and Tennessee enrolled in the study during December 2001 and November 2003. GenIMS included patients with age ≥ 18 years old with a clinical and radiologic diagnosis of pneumonia. After enrollment detailed baseline and clinical information were gathered, and blood was drawn for cytokine assays immediately following enrollment and daily throughout the first seven days of hospitalization. The primary outcome variable in the GenIMS study was severe sepsis and 90-day mortality. The markers of greatest interest in the GenIMS study were the pro-inflammatory marker Interleukin-6 (IL6) and anti-inflammatory marker Interleukin-10 (IL10). More information regarding the study population, outcomes, treatment, and covariates can be found in the Kellum et al. [17].
In this illustration, we consider the association between 90-day mortality and the IL10 biomarker baseline (Day 1) data. Blood was drawn for a cytokine assay from 1429 subjects. If the patients presented to the emergency department after 11 pm or on the weekends or holidays, then the blood was not drawn for logistic reasons. Note that there are some intermittent missing biomarker data due to administrative reasons and we are assuming that this intermittently missing data are missing completely at random. A detailed decomposition of the study subjects can be found in the above mentioned reference. In this analysis, we have a total 867 subjects with IL10 measurements at baseline. However, the measurements of IL10 were left-censored (47.87 percent) because of the inadequate sensitivity of the cytokine assay resulting in a left-censoring of the measure at the lower limit of detection.
Table 2 reports the descriptive statistics of the covariates that we consider in this analysis. The presented result is based on the baseline (Day 1) characteristics of demographic and clinical variables. From this table we can say that the patients who have died during the first 90 days after the hospitalization for CAP were mostly male and older patients. Higher proportions of these patients had been treated with steroids, and their D-dimer and IL10 levels were higher.
Table 3 summarizes the results from the GenIMS data analyses. To examine the impact of left-censoring in a real study, We have fitted the corrected and naive models described in the simulation section. The naïve survival model is a parametric Weibull survival model where nondetect values are replaced by the LOD. The corrected survival model is our proposed model where we have fitted the survival model with an implementation of the Simpson’s numerical integration technique for the left-censoring for IL10. The model considered includes the anti-inflammatory biomarker IL10, age, gender, steroid use, and coagulation marker D-dimer. We have performed the logarithmic transformation on the continuous skewed data (IL10 and D-dimer), and rescale the age variable (age ÷ 10) so that the estimates become stable and have improved the interpretation. The estimates from the two models are different. The corrected model Hazard Ratio (HR) estimate for the covariate IL10 is smaller than its counterpart naive model HR estimate. The proposed model HR estimate for IL10 is also more efficient than the other model. The 95% CI of the HR estimate for IL10 obtained from the naive and corrected models are [1.108, 1.539] and [1.111, 1.432] respectively. These results suggest that the naive use of the detection limit as a substitution for an undetected value can lead to a different estimate and interpretation of the risk factors. Our simulation results have shown that there are situations where the difference between the two approaches is even larger than in our real data example.
A censored covariate is a challenge for statistical analysis. We consider left-censoring of a covariate and examined the impact of left-censoring in a parametric survival model. There are several ad-hoc methods to deal with the limit of detection problem of a covariate in a regression model framework. These methods provide biased and inefficient parameter estimates. In this paper we proposed a method for correcting bias and making an efficient parametric survival inference when there is a left-censored covariate. Our propose likelihood method is based on Simpson’s numerical integration technique. Because the data involves both a right-censored time-to-event outcome and a left-censored covariate, the likelihood function becomes a complicated one. From this complicated likelihood function, we have integrated out the impact of left-censoring. The Monte Carlo simulation study shows that the proposed model’s performance is comparable to the standard survival model’s performance where there is no left-censoring. We have also applied the proposed method to a large cohort data set. From this exercise we have found that the proposed method results are different from the ad-hoc method results.
In the situation when a covariate is subject to left-censoring, this paper compares a new method for analyzing survival data to a commonly used naive method that replace the censored values by the limit of detection. We have demonstrated that the naive method provide biased, efficient regression parameter estimates with low coverage probabilities. On the other hand our proposed likelihood method based on a numerical integration technique provides approximately unbiased and efficient parameter estimates, and achieves good coverage probabilities in most of the scenarios. The proposed method is relatively simple to understand and easy to implement in a standard statistical software.
We have implemented our proposed method by considering only one covariate with limit of detection. We expect that this method can be extended with some computational burden for more than one covariate with the limit of detection. A limitation of this study is that we assumed a normal distribution for the censored covariate and derive the likelihood function accordingly, and we did not investigate the robustness to the misspecification of the normality assumption in the simulation. We also did not examine the impact of changing the shape parameter value for the Weibull distribution in our simulation. We are working on another manuscript where we are intending to relax the assumption of normality, and perform sensitivity analysis. In summary, in the presence of limit of detection in a covariate of a parametric survival model, the estimates are biased and inefficient. Our proposed likelihood-based method using a numerical integration provides unbiased and efficient parameter estimates. Therefore, the proposed method is an encouraging one to use when a covariate is subject to a limit of detection. The statistical analysis was performed using R software version 2.15.0. The R script can be obtained upon request to the corresponding author.
We thank Dr. Derek Angus and the CRISMA laboratory for access to the GenIMS data. We are indebted to the nurses, respiratory therapists, phlebotomists, physicians, and other health-care professionals, as well as the patients and their families who supported this trial. A complete list of GenIMS investigators is available at investigators. The GenIMS study was funded via grant R01 GM61992 by the National Institute of General Medical Sciences. Sanjoy Sinha is grateful for the support provided by a grant from the Natural Sciences and Engineering Research Council of Canada.

Tables and Figures at a glance

Table icon Table icon
Table 1 Table 2
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11974
  • [From(publication date):
    specialissue-2012 - Jun 23, 2018]
  • Breakdown by view type
  • HTML page views : 8192
  • PDF downloads : 3782

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2018-19
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

+1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals


[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

© 2008- 2018 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
Leave Your Message 24x7