Reach Us
+44-1522-440391

^{1}Department of Statistics, University of Peshawar, Pakistan

^{2}Department of Medical Statistics, Informatics and Health Economics, Innsbruck Medical University, Austria

- *Corresponding Author:
- Qamruz Zaman

Department of Statistics

University of Peshawar, Pakistan

**E-mail:**[email protected]

**Received date:** June 09, 2011; **Accepted date:** August 09, 2011; **Published date:** September 25, 2011

**Citation:** Zaman Q, Strasak AM, Pfeiffer KP (2011) Exact Waiting Time Survival Function. J Biomet Biostat 2:117. doi: 10.4172/2155-6180.1000117

**Copyright:** © 2011 Zaman Q, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

Although Kaplan-Meier survival function is the most commonly used statistical technique of survival analysis, it possesses a disadvantage. It may occur that Kaplan-Meier gives same survival probabilities for two groups having the same number of events and censored observations, although time spans between consecutive events (i.e. waiting times) may considerable vary. Therefore, severity of a disease, in terms of survival times, has no role in the conventional concept of Kaplan-Meier. To overcome this problem, in this paper we propose an exact waiting time survival function by explicitly considering waiting times between events. A new variance estimator, reducing to binomial variance in case of data free from censoring and time differences between two consecutive events equalling to 1, is presented. In order to compare the performance of the new estimator with conventional Kaplan-Meier estimator for small to large sample sizes, as well as for small to heavy censoring, we conducted a simulation study. The outcome shows that on average Pitman Closeness Criteria gives results in favour of our new estimator and confidence intervals have higher coverage rates, as compared to that obtained by Kaplan-Meier estimator, especially for lower confidence limits. Furthermore widths of confidence intervals are smaller than those based on Kaplan-Meier and Greenwood standard error. The proposed procedures are applied to a lung cancer data set.

Kaplan-Meier survival function; Exact waiting time; Pitman closeness criteria; Confidence intervals; Greenwood variance estimator

One of the basic purposes of survival analysis is the estimation of survival curves from censored survival data. In case of no covariates, non-parametric maximum likelihood estimator i.e. Kaplan-Meier (KM) survival function [1] is commonly used for estimating survival probabilities. However, the problem of Kaplan-Meier survival function is that it gives same survival probabilities for two groups having the same number of events and censored observations, by ignoring time spans (i.e. waiting times) between consecutive events.

To illustrate this, consider two groups X and Y each having n observations. In both groups, consider m events and n-m censored observations. Supposing the sequence of occurrence of events and censored observations to be equal, but their discrete observed times to be different, we consider that one data set consists of time ranges t_{1}, t_{2}, ., t_{p} and another of time ranges t_{1}, t_{2},.., t_{q} respectively, where q> p. Appling Kaplan-Meier survival function in this setting, we obtain the same results for both groups, ignoring severity of disease. We can explain this idea in a more understandable form, from the following dummy example.

Suppose we have data sets of two groups (G_{I}, G_{II}) having different diseases. Both groups have the same sequence of occurrence of events and censored observations, except times of occurrence are different. We apply Kaplan-Meier survival function on both data sets (**Table 1**). Although observed times regarding events and censoring are clearly different in both groups, the probability columns give the same results, indicating a severe flaw of Kaplan-Meier survival function.

In order to overcome this problem, in this paper we present an exact waiting time survival function (EKM), based on discrete survival times, as well as a variance estimator based on exact waiting times. Although the concept of discrete survival times is not entirely new [2-5] until now no attention has been given to different times of occurrence of events i.e. waiting times between two consecutive events. We conducted three different simulation studies to compare (1) the performance of Kaplan- Meier estimator and exact waiting time estimator by using the Pitman Closeness Criteria [6-8], (2) coverage rates of their lower confidence limits and (3) widths of their confidence intervals. We applied these methods to a lung cancer data set.

**Kaplan-Meier estimator**

Let X_{1}, X_{2}, .,X_{n} be the true survival times of n individuals having the distribution function F(x) = P(X = x) and Y_{1}, Y_{2}, ., Y_{n} be the censoring times having the distribution function G(y) = P(Y = y). Further, let the survival times X_{i} and censoring times Y_{i} assumed to be independent. Let T_{i} = min {X_{i}, Y_{i}} be the observed survival time and δ_{i} = I (X_{i} = Y_{i}) be the indicator function, whether the observed time is event or censored. The Kaplan-Meier product limit estimator is defined by

(1)

where ni is the number of individuals who are alive just before time T_{i}, including those who are about to die at this time, and r_{i} denotes the number dieing at this time. Kaplan and Meier used Greenwood [9] variance estimator for their survival function

(2)

Just like Kaplan-Meier survival function, also Greenwood variance estimator does not consider the waiting time between events.

**Exact waiting time survival function**

Due to not paying any attention to waiting times between two consecutive events, we assume severity of diseases to be equal. If the waiting time of the occurrence of two events of disease A is always less than that of two events of disease B, disease A is assumed to be more severe than B and it's survival probability therefore should be less than the survival probability of disease B. However, when applying conventional Kaplan-Meier survival function we erroneous obtain same survival probabilities for both diseases. To overcome this problem we developed an exact waiting time survival function, assuming the following sequence of observed survival and censored times:

Starting time __________ Event __________ Event _________ End of the study

Let Δ_{i} denote the waiting time between two consecutive events with i = 2,3,.,n and Δ_{1} = 1. We proceed as Δ_{2} equalling to the waiting time between the first and second event and Δ_{3} equalling to the time difference between the second and the third event. So on Δ_{n} denotes the time difference between the (n-1)^{th} and n^{th} event if the data is free from censoring. If the data consists of censored observations the number of differences is less than the number of observed times. Δ depends on the occurrence of events and Kaplan-Meier survival function gives the probabilities of these events by using the number of events and the number of persons at risk. As events are important parts of survival analysis and Δ is depending on events, Δ is also an essential part of survival analysis. Moreover Δ's also effect survival probabilities, since with passage of time, the number of person's at risk and Δ (not always Δ=1) varies. As Δ_{i} and n_{i}'s are related (large Δ_{i} (>1) and large n_{i} result in a higher survival probability), their product n_{i}Δ_{i} leads to corresponding results in terms of survival probabilities, but contains much more information about nature and severity of a disease. So, instead of taking n_{i} , we consider n_{i}Δ_{i} in our exact waiting time survival function and therefore the time difference between two events is directly related to the survival probabilities as well as to the nature of disease, meaning that a larger Δ results in a higher survival probability (indicating the less severity of disease) and vice versa. This concept is not incorporated in the conventional method of Kaplan-Meier survival function. However, if the difference between two consecutive events is 1, n_{i}Δ_{i} reduces to ni and gives the same survival probability at time i, as that obtained by Kaplan-Meier survival function.

The basic layout of the above introduced procedure, considering discrete survival times, is shown in (**Table 2**).

By using the Product-Limit procedure the exact waiting time survival function is given by

(3)

Just like Kaplan-Meier survival function, our exact waiting time function starts from 1 at time zero and it approaches the lower limit 0 if the last observed time is event and the last observed difference equals Δ = 1. Otherwise, it is greater than 0. Unlike the conventional range of Survival function, the range of our proposed estimator is defined by

If the nth difference [Δ_{n}] of our proposed estimator is 1, the range reduces to the range of Kaplan-Meier survival function. However, if this difference is greater than 1, the lower limit of our proposed estimator will always be greater than 0. In this case, our estimator approaches to the form of shrunken estimator. The idea of Shrunken has beneficial in improving the confidence interval coverage rate. The concept is not new in Survival analysis; for example, Parzen [10] proposed a 'shifted piecewise linear' empirical quantile function and Borkowf [11] proposed a Shrunken Survival estimator with the range [(2n)^{-1}, 1-(2n)^{-1}]. Due to the behaviour of our new estimator, its lower confidence limits have higher coverage rates than Kaplan-Meier.

**Variance of exact waiting time survival function**

The variance estimator for our new procedure is obtained by using the delta method. By definition

(4)

Taking log on both sides,

(5)

By applying delta method we derive

and so

(6)

Now applying the delta method a second time on we get

And by simplifying we derive our new variance estimator formula

(7)

Unlike Greenwood's and Peto's [12] variance estimators, our new variance estimator uses the exact waiting time between two consecutive events. However, if the data is free from censoring and the differences of times between events equals 1, the new variance is reduced to binomial variance.

**Simulation study**

In order to compare the performance of our exact waiting time survival function with Kaplan-Meier survival function, we designed three different simulation studies. First, we compared Kaplan-Meier survival function with exact waiting time survival function in terms of Pitman Closeness Criterion; second, we compared confidence interval coverage rates; and third, the width of confidence intervals between both methods.

**Comparison of exact waiting time survival function and kaplan-meier survival function by pitman closeness criterion**

The Pitman Closeness Criterion [6] for two estimators at the selected points is defined as PCC = P{EKM - d_{t} = KM - d_{t}}

where d_{t}'s denote the deciles of the distribution at t. The criterion states that the absolute error of exact waiting time survival function is smaller than that of Kaplan-Meier survival function if PCC> 0.5. We used the uniform distribution U (0, b) for generating censored survival times. As survival distributions, we choose (1) exponential, (2) weibull (with shape parameter 1.5, 0.5 i.e. monotone increasing and decreasing hazard rate) and (3) log-logistic (different shape parameter with monotone increasing, constant and decreasing hazard rate) distributions.

For comparison, we choose points of the form with fixed d_{j}'s (d_{1}=0.1, d_{3}=.03, d_{5}=0.5, d_{7}=0.7, d_{9}=0.9) and generated 5000 data sets of survival times with various sample sizes (n=15, 50, 100). For each data set of survival times, the censoring times were generated from the uniform distribution. The observed survival times have been obtained from the censored and survival times. In order to get discrete times, we considered the zero decimal points. Pitman Closeness Criterion was calculated for each data set.

**Table 3 and Figure 1** illustrate the results of our simulation study. At zero percent censoring, for all five deciles and sample sizes, the criterion gives results in favour of exact waiting time survival function. Also, by increasing censoring percentages, we observe similar results. Only in very few points exact waiting time survival function gives more PCC-error than traditional Kaplan-Meier method. The same situation can be observed from (**Figure 1**) with selected sample sizes. These results hold for small as well as for large sample sizes and for small to heavy censoring percentages.

**Lower confidence limit coverage rate**

Considering the same censoring and survival distributions, we inspected the coverage properties of asymptotic normal confidence intervals. By using the estimates of the survival function and standard errors, we construct the asymptotic normal (1-α) 100 percent confidence intervals as

We construct the confidence intervals by using Kaplan-Meier survival function and the Greenwood standard error. Similarly, we construct the confidence intervals centred on exact waiting time survival function and our proposed standard error. Since the upper limit of exact waiting time survival function is 1, the lower limit may not approach to the conventional limit i.e. 0. Moreover, as the upper coverage rates give the same results as that obtained by Kaplan-Meier survival function and Greenwood standard error, we are mainly interested in the comparison of the following two coverage rates.

(8)

and

(9)

We used the same censoring and survival distributions, but with different percentages of censoring. **Table 4** shows the simulated 95% coverage rates of the lower three deciles, as there are more chances of values falling below or equalling to zero. At zero censoring, both methods give the same coverage rate. This may be induced by the fact that the waiting times between events are close to 1. However, with an increasing percentage of censoring, the new method gives better coverage. Only in very few cases and for small censoring percentages, at some points, its coverage rates are smaller than that obtained by the traditional methods

**Width of confidence limits**

A further simulation study was performed in order to examine the width of the intervals for the same censoring and survival distributions. We calculated the width for both methods by using the simple logic "Upper limit - lower limit" for each simulation and took the sum over all simulations. The lower the width, the better the estimators. We considered the lower three deciles (d_{1}, d_{2}, d_{3}) and the upper three deciles (d_{7}, d_{8}, d_{9}), as in case of very heavy censoring, the lower deciles and their standard errors in most cases are undefined. Therefore, we also included the upper deciles to reach a general decision of the overall performance of both methods. The results of our simulation are summarized in (**Table 5**). In censored free data, both methods gave the same results, but once again, as censoring increases, our new estimators give much smaller width than Kaplan-Meier and Greenwood standard error.

**Application to a lung cancer data**

We compared Kaplan-Meier survival function and exact waiting time survival function on a data set from the Veterans Administration lung cancer trial (presented by Prentice [13] and also used by Gupta [14]), where chemotherapy was induced to males with advanced inoperable lung cancer. For convenience, we considered the same part of the data set that Gupta used in his paper, consisting of 97 patients out of which 91 were events and 6 were censored. The survival times, given in days, were as following:

72 | 228 | 10 | 110 | 314 | 100* | 42 | 144 | 30 | 384 | 4 | 13 | 123* |

97* | 59 | 117 | 151 | 22 | 18 | 139 | 20 | 31 | 52 | 18 | 51 | 122 |

27 | 54 | 7 | 63 | 392 | 92 | 35 | 117 | 132 | 162 | 3 | 95 | 162 |

216 | 553 | 278 | 260 | 156 | 182* | 143 | 105 | 103 | 112 | 87* | 242 | 111 |

587 | 389 | 33 | 25 | 357 | 467 | 1 | 30 | 283 | 25 | 21 | 13 | 87 |

7 | 24 | 99 | 8 | 99 | 61 | 25 | 95 | 80 | 29 | 24 | 83* | 31 |

51 | 52 | 73 | 8 | 36 | 48 | 7 | 140 | 186 | 19 | 45 | 80 | 52 |

53 | 15 | 133 | 111 | 378 | 49 |

* denotes a censored observation.

**Table 6** (included as supplementary data) summarizes the data and methods in 11 columns, where column 1 shows the time in days, column 2 the number of person's at risk (n_{i}), followed by the event column 3 (r_{i}). Column 4 shows the number censored (c_{i}) at different times. Δ is the heading of column 5 and represents the waiting time between two consecutive events. As the first Δ is 1, the difference between first and second waiting time is 3-1=2 days, and so on we calculated the other waiting times. The next two columns represent Kaplan-Meier survival function and exact waiting time survival function, whereby both have the same value at first event, i.e. the first observed time. The important role of the concept of waiting times starts at the next stage, as from here both methods give different results in terms of survival probabilities This difference continues to exist through to the end of the analysis. Since the last observed time 587 (days) is an event, Kaplan- Meier survival function gives the value zero, while exact waiting time survival function yields a survival probability at that time equalling to 0.312. This is due to the fact that the waiting time between the last two observed events is greater than 1, i.e. 587-553= 34.

Lower confidence limits of Kaplan-Meier survival function and exact waiting time survival function are shown in columns 8 and 9. Here, we do not round the negative limit to zero, as we want to check the full behaviour of the two methods. We can see from the columns that the lower limit constructed by Kaplan-Meier gives 16 negative limits, while when constructed by exact waiting time survival function all these values are greater than zero. This shows that exact waiting time survival function is a type of left Shrunken Kaplan-Meier survival function and gives better results at this end. The next two columns give the upper limits of both methods and again we give the full estimates. In case of Kaplan-Meier survival function, there are 3 values greater than 1, while our exact waiting time survival function gives 4 values, due to a higher survival probability by this method at that point. Except this, the other limits settle down to routine. The last two columns give the widths of intervals for the two methods. We can see that at each time the width of confidence intervals constructed by exact waiting time survival function is smaller than that obtained by Kaplan-Meier survival function. These findings are also shown in (**Figures 2 and 3**).

Kaplan-Meier survival function, which is a non-parametric technique of survival analysis, remains a reliable and frequently used method in medical research, as it is easy to understand, calculate and interpret. For the construction of confidence intervals, Greenwood variance estimator is commonly used. However, they both share the drawback of giving same results for two diseases of different nature, by ignoring waiting times between two consecutive events. In light of this deficiency, in this paper we proposed an exact waiting time survival function and a modified variance estimator. Unlike the traditional methods, these new estimators explicitly consider the waiting time between two events. The new methods perform equally well for both, small and large sample sizes. Similarly, as censoring increases, the performance of our new estimators increases as well. The new estimators give better coverage rates for the lower confidence limits and yield in considerably smaller width of confidence intervals. Finally, the simplicity of theses estimators make them attractive for use in different fields of medical research.

- Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53: 457-481.
- Hogan JW, Laird NM (1998) Increasing efficiency from censored survival data by using random effects to model longitudinal covariates. Statistical Methods in Medical Research 7: 28-48.
- Brook Meyer R, Curriero FC (2002) Survival curve estimation with partial nonrandom exposure information. Statistics in Medicine 21: 2671-2683.
- Lam FK, David IP (2003) REML and ML estimation for clustered grouped survival data. Statistics in Medicine 22: 2025-2034.
- Adebayo SB, Fahrmeir L (2005) Analysing child mortality in Nigeria with geoadditive discrete-time survival models. Statistics in Medicine 24: 709-728.
- Keating JP, Mason RL, Sen PK (1993) Pitman's Measure of Closeness: A Comparison of Statistical Estimators. SIAM Philadelphia.
- Rossa A, Zielin`ski R (1999) Locally Weibull-smoothed Kaplan-Meier estimator. Institute of Mathematics Polish Academy of Sciences.
- Rossa A, Zielin`ski R (2002) A simple improvement of the Kaplan-Meier estimator. Communication in statistics: Theory and Methods 13: 147-158.
- Greenwood Major (1926) A report on the natural duration of cancer. In Reports on public Health and Medical Subjects
- Parzen E (1979) Nonparametric statistical data modelling. Journal of the American Statistical Association 74: 105-121.
- Borkowf CB (2005) A simple hybrid estimator for the Kaplan-Meier survival function. Statistics in Medicine 24: 827-851.
- Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, et al. (1977) Design and analysis of randomized clinical trials requiring prolonged observation of each patient II Analysis and examples. British Journal of Cancer 35: 1-39.
- Prentice RL (1973) Exponential survivals with censoring and explanatory variables. Biometrika 60: 279-288.
- Gupta, RC (1999) A Study of Log-Logistic Model in Survival Analysis. Biometrical Journal 41: 431-443.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- Total views:
**12192** - [From(publication date):

September-2011 - Dec 14, 2019] - Breakdown by view type
- HTML page views :
**8384** - PDF downloads :
**3808**

**Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals**

International Conferences 2019-20