Group Sequential Survival Trial Designs Against Historical Controls under the Weibull Model

In this paper, two parametric sequential tests are proposed for historical control trial designs under the Weibull model. The proposed tests are asymptotically normal with properties of Brownian motion. The sample size formulas and information times are derived for both tests. A multi-stage sequential procedure based on sequential conditional probability ratio test methodology is proposed for monitoring clinical trials against historical controls. Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s


Introduction
Randomized clinical trials are the gold standard for comparing a new therapy to a standard treatment. However, when randomization is not feasible because of ethical concerns, patient preference, or regulatory acceptability, comparing data from patients receiving a new therapy to those from patients previously treated by standard treatment (historical control) is an alternative. If patients enrolled in the current trial are similar to those in the historical study, clinical trials with a historical control improve the reliability of testing results of single-arm phase II trials by including the variation of the null parameter, which is usually estimated from historical data. Compared with randomized phase III trials, clinical trials with a historical control require a much smaller sample size, and are therefore easier to conduct and save time and patient resources [1].
Despite the practical and statistical issues associated with historical control trials [2][3][4][5][6], they have been appropriately used in many clinical practices. Sample size calculations to design such trials have been discussed by Makuch and Simon [7] for binary endpoints and by Dixon and Simon [8] and Emrich [9] for exponential survival endpoints. These methods have been widely used in oncology trial designs. However, Korn and Freidlin [10] reported that these popular methods do not preserve the power and type I error when considering the uncertainty in the historical control outcome data. Recently, several studies have discussed sample size calculations for historical control trials by taking into account the uncertainty in historical control outcome data [11][12][13].
Clinical trials with historical controls are often monitored by preplanned interim analyses to stop accrual if patients in the current trial have poorer outcomes than those in the historical control. The monitoring of clinical trials with historical controls poses a statistical problem of comparing two outcomes in a situation wherein data from the current study are sequentially collected and compared with all data from historical controls at each interim analysis. Few studies have discussed the monitoring of clinical trials against historical controls. For example, Chang et al. [11] proposed a two-stage design for binary outcome and Xiong et al. [12] developed a multistage group sequential procedure for monitoring historical control trials with binary, continuous, and survival endpoints.
In this study, we propose a multistage group sequential procedure to de-sign survival trials against historical controls under the Weibull model. In Section 2, two sequential parametric tests are proposed for the trial design under the Weibull model. In Section 3, formulas for the number of events required for the current study are derived. In Section 4, a multistage group sequential procedure based on the sequential conditional probability ratio test (SCPRT) by Xiong [1] is proposed. In Section 5, simulation studies to calculate the empirical power and type I error of the proposed tests are described. In Section 6, an example is given to illustrate the proposed methods. The discussion and concluding remarks are given in Section 7.

Sequential Test Statistics
Two parametric sequential test statistics are discussed in this section to provide group sequential design of survival trials against historical controls under the Weibull model. Assume that the failure time variable T j of a subject from the j th group follows the Weibull distribution with a common shape parameter κ and a scale parameter ρ j , where j=1 for the historical control group and j=2 for the current study group. That is, T j has survival distribution function which has approximately a standard normal distribution. To derive the group sequential design, let then under the alternative of δ=λ 1 /λ 2 >1, the statistic is approximately normal with mean log(δ)V(t) and variance V(t) and has an independent increment structure, where The above results can be derived from Tsiatis et al. [15], who reported similar results for general parametric survival models. Because where D 1 is the total number of events in the historical control and is the number of events in the current study up to time t. Thus, is approximately a Brownian motion with drift parameter Sprott [16] showed that the distribution of ( ) / (9 ( )) t d t φ [17]. Therefore, the test statistic is an approximately standard normal distribution under the null hypothesis.
then under the alternative, the statistic is approximately normal with mean and U(t) has an independent increment structure. Because

Sample Size for Fixed Sample Test
Because historical control data are obtained from previous trials, sample size n1 and total number of events D1 for the historical control group are known. Therefore, we only need to calculate the sample size for the current study for a fixed sample test at the end of the study. On the basis of the test statistic Z(t) at t=τ , under the null hypothesis, has an approximately standard normal distribution. To calculate the power under the alternative δ=λ 1 / λ 2 (> 1), Z(τ) is an approximately normal distribution with mean log(δ)D 1/2 (τ) and unit variance. Therefore, given a significance level α, the power (1−β) of the Z(τ) test under the alternative is given by Φ ⋅ is the standard normal distribution function and Thus, the number of events required for the current study based on the Z(τ) test can be calculated by where δ=(m 2 /m 1 ) κ and D 1 is the total number of events observed in where I = D 2 (t)/ D 2 (τ) is the information time for the current study and R=D 2 (τ)/D 1 is the ratio of the number of events of the current study to the historical control for the Z(t) test, and This is called the transformed information time [12]. Because D 1 is known from historical control data, thus, under the Weibull model, the information time t * can be obtained by calculating D 2 (t)=n 2 p 2 (t), which is the expected number of events in the current study up to time t, where p 2 (t)=P(Δ 12 (t)=1) can be calculated as When t=τ, equation (11) is identical to equation (7).
For a maximum information trial where the trial continues until a pre-specified number of events D 2 (τ) observed for the current study, the information time at the k th look planned at number of events D 2k for the current study can be calculated by for Z(t) and S(t), respectively.

Group Sequential Procedure
In this section, we will apply an SCPRT procedure [1] to the test statistics Z(t) and S(t). The SCPRT has two unique features: (1) the maximum sample size of the sequential test is not greater than the size of the reference fixed sample test; and (2) the probability of discordance, or the probability that the conclusion of the sequential test would be reversed if the experiment were not stopped according to the stopping rule but continued to the planned end, can be controlled to an arbitrarily small level [12]. Furthermore, the power function of the SCPRT is virtually the same as that of the fixed sample test [1]. The SCPRT boundaries derived in our study have analytical solutions. All these features make the SCPRT attractive and simple to use. Now we apply the SCPRT to the test statistic for Z(t) and . Therefore, the conditional density the critical value of B 1 to reject the null for the fixed sample test. Then the conditional maximum likelihood ratio for the stochastic process on information time t * (see, [1,19]) is Taking the logarithm, the log-likelihood ratio can be simplified as is the information time at the k th look at calendar time t k . The a in (12) is the boundary coefficient, and it is crucial to choose an appropriate a for the design such that the historical control data. Therefore, the sample size for the current group is given by where p 2 (τ) is the probability of a subject from the current group having an event during the study. Similarly, the number of events required for the current study based on the S(τ) test can be calculated by and the sample size is given by 2 To calculate the number of subjects required for the study, we need to calculate p 2 (τ), the probability of a subject in the current group having an event during the study. Typically, we assume that subjects are accrued over an accrual period of length t a with an additional followup period of length t f . A subject enters the study at time u, the entry time is uniformly distributed on [0,t a ], and no subject is lost to followup during the study. Then the probability of a subject having an event during the study under the Weibull model can be calculated by [18] Therefore, given the design parameters δ (or κ), m 1 ,m 2 , α, β, t f and t a , the number of subjects n 2 required for the current study can be calculated by In designing an actual trial, given the accrual time t a , calculating the sample size is often impractical because it may be not possible to enroll the total number of subjects as planned in the given accrual duration. It is more practical to design the study starting with the accrual rate r and then calculate the required accrual time t a . This can be accomplished under the Weibull model assumption. First, the integration in the probability formula (7) can be simplified by approximation, using the Simpson rule Then, combining the sample size formula based on equations (5) or (6) with equation (8), we can define a root function of the accrual time t a Now the accrual time ta can be obtained by solving the root equation root(t a )=0 numerically in Splus using the uniroot function. The total sample size required for the current study is approximately n 2 =[rt a ] + , where [x] + denotes the smallest integer greater than x.
Once the number of events or sample size is calculated for the fixed sample test, we can calculate the information time at the planned calendar time t for the interim analysis by * ( ) / ( ).

R I t RI
probability of conclusion by the sequential test being reversed by the test at the planned end is small but not unnecessarily too small. The larger the a, the smaller is the discordance probability and the wider apart are the upper and lower boundaries, making it harder for the sample path to reach boundaries and stop early and resulting in larger expected sample sizes. Thus, an appropriate a can be determined by choosing an appropriate discordance probability [1,19]. The nominal critical p-values for testing H 0 are The observed p-value at the k th look is The stopping rule for monitoring the trial can be executed by stopping the trial when, for the first time,

Simulation Studies
In this section, we conducted simulation studies to compare the power and type I error of the proposed parametric test statistics Z(t) and S(t) under various scenarios. In the simulations, the survival distribution of the j th group was taken as which is the Weibull distribution with shape parameter κ and median survival time m j , j=1,2, where j=1 and j=2 represent the historical control and current study, respectively. The shape parameter κ was taken as 0.5, 1, and 2.0 to reflect cases of decreasing, constant, and increasing hazard functions, respectively. We assume a median time-to-event m 1 =3.4657 and a sample size n 1 =140 for the historical control. The null hypothesis was set to H 0 : m 1 =m 2 , and the hazard ratio δ=(m 2 /m 1 ) κ under the alternative was taken as 1.5-2.0. Furthermore, we assumed that subjects of the current study were recruited with a uniform distribution over the accrual period t a =4 (years) and followed for t f =1 (years), and no subject was lost to follow-up during the study period τ=t a + t f =5. Therefore, a subject was censored at calendar time t if his/her event time was longer than t−u, where u is the time when the subject entered the current study.
In Table 1, the sample sizes required for the current study were calculated by equations (5) and (6) for test Z(t) and S(t), respectively. Furthermore, in each design parameter configuration, 100,000 observed samples of censored event times were generated from the Weibull distribution to calculate the test statistics under the null or alternative hypothesis. The nominal significance level and power were set to 0.05 and 80%, respectively. Two simulation studies were done. The first simulation was done to study the empirical type I error and power for the fixed sample tests. The second simulation was done to study the empirical type I error and power for a two-stage SCPRT design at calendar times t 1 =3 and t 2 =5. The simulated empirical type I errors and powers in various scenarios for the fixed sample tests and two-stage SCPRT tests are summarized in Tables 1 and 2, respectively. The results of the fixed sample tests showed that the S(τ) test needs a slightly larger sample size for a small δ and smaller sample size for a large δ compared with the Z(τ) test. The simulated empirical type I errors and powers were close to the nominal levels for the S(τ) test, and the Z(τ) test was somewhat overpowered for a large δ. For the two-stage design S(t) had adequate empirical power and type I error whereas the Z(τ) test was conservative and under-powered for a large δ in the first stage. Overall, the test statistic S(t) performed better than Z(t) and is recommended for use in the trial design. By the way, to show if the sample size formula (5) and information time (10) developed for the Z(t) test also work for the non-parametric log-rank test L(t), the empirical type I errors and powers were simulated for the log-rank test too (Tables 1 and 2). The results showed that both sample size formula (5) and transformed information time (10) worked well for the logrank test. A rigorous derivation of these results for the log-rank test will be the future research.

An Example
Between January, 1974 and May, 1984, the Mayo Clinic conduct a double-blind randomized trial in primary biliary cirrhosis (PBC), comparing the drug D-penicillamine (DPCA) with a placebo (Fleming and Harrington, 1991). PBC is a rare but fatal chromic liver disease of unknown cause, with a prevalence of about 50-cases-per-millian population. The primary pathologic event appears to be the destruction of interlobular bile ducts, which may be mediated by immunologic mechanisms. A total of 65 had died among 158 patients treated with DPCA. The median survival time was 9 years. Suppose a new treatment is now available and investigators want to design a new trial using Mayo Clinic patients treated with DPCA as the historical control group. The survival distribution of DPCA data were estimated by Kaplan-Meier method and the Weibull model. The Weibull distribution fitted the survival distribution well with shape parameter κ=1.22 and scale parameter ρ=11.8 −1 . Thus to design the study, we can assume that the failure time of a patient on the current study follows the Weibull distribution with shape parameter κ=1.22 and median survival time m 2 . Let δ=(m 2 /m 1 ) κ be the hazard ratio, where m 1 is the median survival time of the historical control. Our aim is to test the following hypotheses: which is 54 events too. Assume that the lengths of accrual and followup for the current study are t a =5 and t f =3, respectively, and the study duration is τ=8. Then the probability of having an event during the study for a subject on the current study can be calculated by numerical integration 2   Suppose that the test statistic S(t) will be used to monitor the trial, and 3 interim looks are planned at calendar times t 1 =4, t 2 =6 and t 3 =8 years. Then the transformed information times can be calculated by  (14) and (15) is t * =(0.436, 0.773, 1), the lower and upper boundaries calculated by equation (12)   The operating characteristics of the sequential test S(t) for this example are given in Table 3.

Conclusion
We proposed two parametric sequential tests for group sequential trial de-sign against historical controls. Simulation results showed that the empirical power and type I error of the S(t) test are close to those of the nominal levels, and it outperforms the Z(t) test. Hence, we recommend using the S(t) test for historical control trial designs under the Weibull model. We derived transformed information times t * =(1+R)I/(1+RI) for both test statistics Z(t) and S(t). It is simple and convenient to use the transformed information time t * to derive the sequential monitoring rule for the historical control trial design based on the SCPRT procedure. With this monitoring procedure, data from the current study are sequentially collected and com-pared with data from the historical control. This allows investigators to monitor the trial at any calendar time of enrollment or at a pre-specified number of events of an interim look. The number of events required for the current study can be calculated by a simple formula. Therefore, the study design is much simpler than that of the method for survival data proposed by Xiong et al. [12], in which information times of the sequential test statistic are random and depend on data instead of being predetermined. The maximum sample size of the sequential test is the same as that for the fixed sample test and the group sequential boundaries have analytical solutions. Therefore, the proposed group sequential procedure is effective and simple to use. For the study design purpose, we need the number of events from the historical control data only. However for the trial monitoring and final data analyses, we need full failure time data from the historical control study to calculate the sequential test statistic Z(t k ) or S(t k ). In practice, the historical control data are often available from previous trials done by the same institution or by the same sponsor. If there is no such historical control data available from the same institution, then we need to extract the relevant data from published literatures. Recently, Guyot et al. [20] have proposed a method to reconstructing the survival data from published Kaplan-Meier survival curves. Thus designing survival trials with historical controls are feasible by using control data from published literatures.
Finally, even though the sample size formula (5) and transformed information time (10) were derived for the Z(t) test under the Weibull model, our simulation results showed that they also work well for the nonparametric log-rank test under the proportional hazard models. A rigorous derivation of these results for the log-rank test will be the future research.