Confidence Intervals Estimation for Survival Function in Log-Logistic Distribution and Proportional Odds Regression Based on Censored Survival Time Data

Log-logistic and Weibull distributions have both accelerated survival time property. The log-logistic distribution has also proportional odds property. Log-logistic distribution has unimodal hazard curve which changes direction. Link [6,7] presented a confidence interval estimate of survival function using Cox’s proportional hazard model with covariates. Her idea more recently extended by [1] to the exponential distribution and [2] to exponential proportional hazard model, respectively. The same idea has been extended to the Weibull proportional hazard regression model by [3]. In this study, it is formed on confidence interval for log-logistic distribution survival function for any values of the time provided that the survival times have a log-logistic distributed random variable. It is also extended the same results to the proportional odds regression. A Real time data and a simulation data examples are also considered in the study for illustration the discussed confidence interval. J o ur na l o f B iometrics & Bistatis t i c s ISSN: 2155-6180 Journal of Biometrics & Biostatistics Citation: ALAKUŞ K, ERİLLİ NA (2011) Confidence Intervals Estimation for Survival Function in Log-Logistic Distribution and Proportional Odds Regression Based on Censored Survival Time Data. J Biomet Biostat 2:116. doi:10.4172/2155-6180.1000116 Volume 2 • Issue 3 • 1000116 J Biomet Biostat ISSN:2155-6180 JBMBS, an open access journal Page 2 of 7 from expression of S(t) and F(t)=1-S(t). The odds of surviving beyond time t are S(t)/F(t)=(λt)-α and consequently log odds of survival beyond t can be expressed as ( ) ( ) log log log S t t F t α λ α     = − −       or equivalently ( ) ( ) log log log F t t S t α λ α     = +       (4) That is, the log-logistic distribution corresponds to a linear model for the log odds of failure over the logarithm of time, with slope α [10,11]. The Kaplan-Meier sample estimate ŜKM(t) can be used to calculate the log odds and a plot of them versus log t should follow approximately a straight line for the log-logistic model in order to be suitable. Proportional odds regression model The application of accelerated survival time and proportional odds models to the analysis of reliability data has recently been described by [12]. The general proportional odds model for survival data was introduced by [13,14] describes the proportional odds model. The log-logistic distribution is commonly extended to include a vector of covariates x by reformulating the survival function is given as follows ( ) ( ) ( ) -1 T t; x = 1+ t exp x S α λ αθ       (5) Under the accelerated failure time model, the hazard of death at the time of t is given below ( ) ( ) ( ) ( ) ( ) 1 1 1 exp T h t x t t x α α α αλ λ αθ − −   ; = +     (6) Thus the survival time for the ith observation also has a log-logistic distribution and therefore it has both accelerated failure time and proportional odds property. Confidence intervals for survival function Log-logistic distribution: First multiply, both numerator and denominator, by α with equation (2). The equation value will not change. Therefore we can re-write equation (2) as follows ( ) ( ) 1 i i S t t α α α α λ −   = +     (7) If we take natural logarithm of α(λ)α identity, we can get the result as logα+ α logλ. Now let Ri = logα + logλ be the score (risk score) function. Then we denote logλ with β0 and logα with β1, the score function can be written as Ri = β0 + β1. Therefore the survival function given (7) can be re-written as follows ( ) 1 i R i i S t e t α α α −   = +   (8) Estimated survival function is also given by ( ) 1 ˆ ˆ ˆ ˆ ˆ i R i i S t e t α α α −   = +     (9) As a result we can prepare for confidence intervals of survival function for the log-logistic distribution. To do this, we first can form confidence intervals for Ri then extended it to the survival function. For the score function Ri 100(1-α)% confidence interval is given by ( ) ( ) { } 2 2 ˆ ˆ ˆ ˆ Pr 1 i i i i i R z se R R R z se R α α α − ≤ ≤ + = − (10) or ( ) low(i) upp(i) ˆ ˆ Pr 1 i R R R α ≤ ≤ = − (11) Here Zα/2 denotes coordinate value of standard normal distribution at the significance level of α/2 and se( R̂i ) is also denotes standard error of estimated score function. The estimated standard error of the score function is calculated using by ( ) ( ) { }1 2 ˆ ˆ â x T i i i se R x Var = (12) where, 1 1 T i x   =   is a unit column vector in this simple model and Var( â̂ ) is also variance-covariance matrix of estimated parameters. We can easily form a 100(1-α)% confidence intervals of survival function using score function confidence intervals. So that, the confidence intervals for survival function of a log-logistic distribution are given by ( ) ( ) 1 ˆ ˆ low ˆ ˆ ˆ UPP i R i i S t e t α α α −   = +     for lower limit and ( ) ( ) 1 ˆ ˆ upp ˆ ˆ ˆ LOW i R i i S t e t α α α −   = +     for upper limit, respectively. So, 100(1-α)% confidence intervals for survival function of the log-logistic distribution can given as following ( ) ( ) ( ) { } ow upp ˆ ˆ Pr 1 l i i i S t S t S t α ≤ ≤ = − (13) Proportional odds regression model Let Ri = logα + logλ + αθ xi be score function for i-th observation. If we denote αlogλ with β0 and logα with β1, then the score function for i-th observation can be written as Ri = β0 + β1 + β xi . Here β T is (p×1) column vector and equals to αθT. Let β0 + β1 + β T denote as âT a (k×1) size column vector and yi also a same size column vector too. Then the score function for i-th observation can be written as â y T i i R = . So that, the survival function given in (2) can be written as in equation (14) ( ) 1 ; y i R i i i S t e t α α α −   = +   (14) Estimated survival function is also given by equation (15) ( ) 1 ˆ ˆ ˆ ˆ ˆ ; y i R i i i S t e t α α α −   = +     (15) Therefore we can prepare confidence intervals of survival function for the proportional odds regression. As we made for log-logistic distribution, firstly we can form confidence intervals for Ri then extended it to the survival function. For the score function Ri 100(1α)% confidence interval is given by equation (10) or equation (11). The estimated standard error of the score function for i-th observation is calculated using equation (16) ( ) ( ) { }1 2 ˆ ˆ yâ y T i i i se R Var = (16) where, 1 y 1 1 T i i ki x x   =    a column vector in this model and ( ) â̂ Var is also variance-covariance matrix of estimated parameters. In this model, the estimated variance-covariance matrix might be given as follows


Introduction
There are two types of estimation for any identity. One is point estimation and the other is confidence interval estimation. In Survival analysis literature, confidence interval estimate for the survival function is not new. Especially confidence interval estimate for the baseline survival function is extensively studied with many authors. For example, for Kaplan-Meier survival function confidence interval estimate is studied using Greenwood formulae by [4,5] and many others. In Cox's proportional hazard model, [6,7] formed log transformed confidence interval for survival function with covariates. Her idea more recently extended by [1] to the exponential distribution and [2] to exponential proportional hazard model, respectively. Weibull proportional hazard function is also investigated by [3]. Interval estimate for survival function is generally useful in the analysis of survival or life time data. In this study, symmetric type and proportional odds transformed confidence interval approach is developed for log-logistic survival function without covariates and proportional odds regression survival function with covariates.
The plan of this study is given as follows. In the next section, the log-logistic distribution and proportional odds regression and their important functions are presented. In section 3, it is formed confidence interval estimate for the survival function of log-logistic distribution and proportional odds regression model too. In section 4; it has given a real data example and as an extension of the real data, simulation study for illustrating the proposed method in this study. The study completed with a discussion section.

Log-logistic distribution and proportional odds regression model
Log logistic distribution: The log-logistic distribution has the proportional odds property and the distribution is the natural one to use in conjunction with the proportional odds model. Cox and Oakes [8] demonstrated that the log-logistic distribution is the only one that shares the accelerated survival time property and proportional odds property. Situations in which the hazard function changes, direction of hazard function can arise. For example, a patient faces an increasing hazard of death in heart transplantation over the first few days or after the transplant, while the body adapts to the new recovers. The hazard then decreases with time as the patient recovers. In situations such as this, a unimodal hazard function may be appropriate.
A particular form of unimodal hazard is the function This hazard function decreases monotonically if α ≤ 1. If α > 1, than the hazard has a stable mode. The survival function corresponding to the hazard function in equation (1) is given below Probability density function is also given below This is the density function of a random variable T which has a log-logistic distribution with parameters α and λ. The distribution is so called because the variable logT has log-logistic distribution and a symmetric distribution whose probability density function is very similar to the normal distribution [9].
The suitability of the log-logistic distribution for the analysis of a data set can be empirically checked using a linear relationship derived *Corresponding author: Kamil ALAKUŞ, Ondokuz Mayıs University, Faculty of science and Arts, Department of Statistics, 55139 Turkey, E-mail: kamilal@omu. edu.tr from expression of S(t) and F(t)=1-S(t). The odds of surviving beyond time t are S(t)/F(t)=(λt) -α and consequently log odds of survival beyond t can be expressed as That is, the log-logistic distribution corresponds to a linear model for the log odds of failure over the logarithm of time, with slope α [10,11]. The Kaplan-Meier sample estimate Ŝ KM (t) can be used to calculate the log odds and a plot of them versus log t should follow approximately a straight line for the log-logistic model in order to be suitable.

Proportional odds regression model
The application of accelerated survival time and proportional odds models to the analysis of reliability data has recently been described by [12]. The general proportional odds model for survival data was introduced by [13,14] describes the proportional odds model.
The log-logistic distribution is commonly extended to include a vector of covariates x by reformulating the survival function is given as follows Under the accelerated failure time model, the hazard of death at the time of t is given below Thus the survival time for the i th observation also has a log-logistic distribution and therefore it has both accelerated failure time and proportional odds property.

Confidence intervals for survival function
Log-logistic distribution: First multiply, both numerator and denominator, by α with equation (2). The equation value will not change. Therefore we can re-write equation (2) as follows If we take natural logarithm of α(λ) α identity, we can get the result as logα+ α logλ. Now let R i = logα + logλ be the score (risk score) function. Then we denote logλ with β 0 and logα with β 1 , the score function can be written as R i = β 0 + β 1 . Therefore the survival function given (7) can be re-written as follows ( ) Estimated survival function is also given by As a result we can prepare for confidence intervals of survival function for the log-logistic distribution. To do this, we first can form confidence intervals for R i then extended it to the survival function. For the score function R i 100(1-α)% confidence interval is given by Here Zα/2 denotes coordinate value of standard normal distribution at the significance level of α/2 and se(Ri ) is also denotes standard error of estimated score function. The estimated standard error of the score function is calculated using by  is a unit column vector in this simple model and Var(â ) is also variance-covariance matrix of estimated parameters.
We can easily form a 100(1-α)% confidence intervals of survival function using score function confidence intervals. So that, the confidence intervals for survival function of a log-logistic distribution are given by for upper limit, respectively. So, 100(1-α)% confidence intervals for survival function of the log-logistic distribution can given as following

Proportional odds regression model
Let R i = logα + logλ + αθ T x i be score function for i-th observation. If we denote αlogλ with β 0 and logα with β 1 , then the score function for i-th observation can be written as R i = β 0 + β 1 + β T x i . Here β T is (p×1) column vector and equals to αθ T .
Let β 0 + β 1 + β T denote as â T a (k×1) size column vector and y i also a same size column vector too. Then the score function for i-th observation can be written as â y So that, the survival function given in (2) can be written as in equation (14) ( ) Estimated survival function is also given by equation (15) ( ) Therefore we can prepare confidence intervals of survival function for the proportional odds regression. As we made for log-logistic distribution, firstly we can form confidence intervals for R i then extended it to the survival function. For the score function R i 100(1α)% confidence interval is given by equation (10) or equation (11). The estimated standard error of the score function for i-th observation is calculated using equation (16) where, A 100(1-α)% confidence intervals of survival function uses confidence intervals of score function. Namely, the confidence intervals for survival function of a proportional odds model are given by for upper limit, respectively. So, 100(1-α)% confidence intervals for survival function of the proportional odds regression can be shown as follows

Model selection criterion
In this study we tested performance of proposed model with log-likelihood value and besides with AIC and BIC model selection criterions. The Akaike information criterion (AIC) is a measure of the relative goodness of fit of a statistical model. It was developed by [15]. Bayesian Information Criterion (BIC) is a criterion for model selection among a class of parametric models with different numbers of parameters. It has been introduced by [16].The AIC and BIC is given as follows, respectively When data has small samples size, corrected AIC can be given where p is defined as number of free parameters.
Given a set of candidate models for the data, the preferred model is the one with the smallest value of Log Likelihood, AIC and BIC.

Illustrative example
In this section a real data illustration considered to confidence intervals for survival function which we discussed it earlier sections in this study. For this reason, we first give some information about the data just in the next subsection. Second, we use the data for illustrating confidence intervals estimation for the survival function of the proportional odds regression model.

Real time data: Ovarian cancer study
Data are from [17] on ovarian cancer. The data are taken from [18]. The ovarian cancer frame includes the survival times (in days) and indicator variable (status) of death or censoring plus the following 4 additional variables on each patient. These are patient's age (age), an indicator of the extent of the residual disease (residual.dz), treatment given (rx) and measure of performance score or functional status using the Eastern Cooperative Oncology Group's scale (ecog.ps). The survival analysis chapter in the S-Plus documentation describes these data sets further and illustrates survival analysis methods with them. There were 26 patients in the study. Total censoring ratio is 53.85%.

Confidence intervals for survival function in Proportional odds regression model
We first have taken the goodness of fit for the survival times which comes from a log-logistic distribution. To do this, a common and useful technique for checking the validity of a parametric model is to embed it in larger parametric model and use, e.g., the likelihood ratio test to check whether the reduction to the actual model is valid; for applications in survival analysis [8,19].
Secondly, the tests of the survival times come from a log-logistic distribution. We use the Kolmogorov-Smirnov type test. The test statistic result is D 26 = 0.1251. This further indicates that a log-logistic distribution is a reasonable one.
Thirdly, one use of the Nelson-Aalen or Kaplan-Meier estimators for survival data is to check graphically whether the survival time will appear to follow a certain parametric distribution; in fact, this was the ration able behind the estimator in [20] original paper. For the log-logistic distribution with log odds function is so that log odds plotted against log t should yield an approximately straight line for the log-logistic distribution This result is given in (Figure 1a) for the ovarian cancer data. The curve is roughly linear, suggesting that a model may be appropriate. In the same Figure, the corresponding log odds estimates (straight lines) based on log-logistic distribution are added the graph and it can seen for approximate the Kaplan-Meier estimates quite good too. In (Figure 1b), QQ plot support same results in (Figure 1a). In (Figure 1c) Baseline Hazard function shows unimodal shape. Also in (Figure 1d), proportional odds regression hazard is a increasing function for risk score function of age.
As a result, (Figure 1d) summarizes that when ages and survival times increases hazard function increases in same direction.
We have also fit the Exponential Hazard Regression (HER), Proportional odds Hazard Regression (POR), Log-Normal Hazard Regression (LNR) and Weibull Hazard Regression (WHR) models to the data in ovarian cancer study. The log likelihood, AIC, AICc and BIC values for each model are given in (Table 1). From the table, we can see that the proportional odds regression model provides the best fits for the data.
As we can see from (Table 1), Proportional odds regression values are smallest for all criterions. Thus we can easily say, Proportional odds regression model is better than the other models.
Therefore we can fit the Proportional odds regression model and the results are given in (Table 2). From the (Table 2), we can see that all three parameters are very significant.
For calculating the confidence intervals of survival functions estimated variance-covariance matrix is given below Some calculations of confidence intervals for survival function in the log odds regression model is given in (Table 3). Survival probabilities estimates and 95% confidence intervals are also given ( Figure 2).
Approximate 95% confidence limits are obtained using the risk score function and log odds transformation approach. Figure 2 shows the log odds regression survival function estimate for ovarian cancer with approximate 95% confidence intervals using (18). Because of small sample data the confidence interval results are little bigger than expected. However we can easily say, the result is quite good.

Simulation study
In order to Ovarian Cancer Data has small sample; to see the behavior of Proportional odds regression on big samples, a simulation study is studied in this sub-section. Simulation study data has 100 variable value and obtained as follows: Step Step 2: 100 age variable values has been simulated from normal distribution. Here Age = z σ  + µ  where z~N(0,1).
Step 5: 100 random variable value is simulated with Bernoulli distribution which has mean value (successful probability)    p=0.80. Therefore 20% censored survival times is identified using these variables.
The result for simulation data in graphs is given in (Figure 3). In (Figure 3a), the curve is quite linear. In the same Figure, the corresponding log odds estimates (straight lines) based on log-logistic distribution are added the graph and it can seen for approximate the Kaplan-Meier estimates are very good. In (Figure 3b), QQ plot support same results in (Figure 3a). In (Figure 3) Baseline Hazard function shows unimodal shape. Also in (Figure 3d), proportional odds regression hazard is a increasing for risk score function of age. Table 4 results shows that Proportional odds regression model has the smallest criterion values than the other models. We can easily say that it is the best model above others. In Table 5, estimation of parameters is given and we can say that all estimations are very significant.
For calculating the confidence intervals of survival functions estimated variance-covariance matrix is given below: In (Figure 4), intervals of simulation data are quite small. Cox-Snell residuals intervals gave robust intervals compared to real time data. Thus we can easily say that when data sample gets bigger, expected intervals will be smaller on survival structural data sets.

Discussion
Many statistical investigations can occur both estimation and hypothesis testing. Estimation can be made in two different types. One is point estimation and the other is interval estimation. Both point and interval estimations can be achieve with an estimator. Interval estimation is generally called confidence interval estimation and naturally the estimators are also called confidence interval estimators.
Survival function may be the most important function in survival analysis or reliability analysis. Probability of living longer than t time is an important issue for both doctors and patients or patient relatives. Researching for factors which affects this issue is also important for determining risk factor function variables on survival times. It is necessary to search influence of hazard function for survival analysis. Both point estimation and confidence interval estimation of the survival function may be achieved by fitting parametric distributions. Semiparametric proportional hazard model is known Cox regression model. In the Cox regression model confidence interval estimation of survival function has studied by Link [6,7]. Her idea more recently extended by Alakus et al. [1] to the exponential distribution and Alakus et al. [2] to exponential proportional hazard model, respectively. Weibull proportional hazard function is also investigated by Alakuş [3].
For this reason, in this study we offer a new confidence interval with transformed log odds in symmetric type in proportional odds regression model. Proposed approach is studied with real time data and simulation data. Results were quite good enough. When sample size getting larger, confidence intervals were getting tighter. Model selection criterions Log-Likelihood, AIC and BIC gave the best results for proposed model. Confidence intervals are narrower than other studied models. At the thought real time data has small sample, a larger data set with simulation data also studied. Results are more robust than small sample. Based on the theorem of large numbers law; Ŝ(t;y) will be converges in probability to S(t;y). This summarizes us when sample size is quite enough, distribution of risk function approximate to normal distribution and this becomes Ŝ(t;y) closer to S(t;y). This is what we are investigating for.
The investigated confidence intervals may be extended to lognormal hazard model. This problem will be investigated by the forthcoming studies.