Department of Animal Science, Laboratory of Animal Biotechnology, Federal University of São Carlos, campus bell Lagoon, Brazil
Received Date: March 18, 2016; Accepted Date: March 21, 2016; Published Date: March 25, 2016
Citation: Chiba Y (2016) Causal Inference in Randomized Trials: A Shift from the Sharp Causal Null Hypothesis to the Weak Causal Null Hypothesis. J Biom Biostat 7: 288. doi:10.4172/2155-6180.1000288
Copyright: © 2016 Chiba Y. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
In randomized trials, statistical inference of the average causal effect (ACE) of a treatment in comparison with a control treatment is desired with a focus on a particular outcome. Often, in addition to estimation of the ACE, the confidence interval (CI) is calculated, and a hypothesis test is performed. Nevertheless, many tests employed in randomized trials that are currently underway do not allow any statistical inference to be made about the ACE unless strict assumptions are satisfied. For example, the permutation test, which corresponds to Fisher’s exact test in the case of a binary outcome, is a hypothesis test for the sharp causal null hypothesis (i.e., the causal effect of treatment is the same for all subjects), but not for the weak causal null hypothesis (i.e., the average causal risks are equal in the two groups). In this article, I discuss causal inference in the context of randomized trials with a binary outcome. First, I state that a hypothesis test for the sharp causal null hypothesis is generally different from that for the weak causal null hypothesis, as I showed in a recent publication . Next, I demonstrate that previously proposed CIs linking to exact tests are not informative in terms of the ACE; I use hypothetical data to make this point. Finally, I discuss the future prospects for causal inference in randomized trials.
For demonstration purposes, I use hypothetical data with a small sample in Table 1.
Table 1: Hypothetical data with a small sample size.
Where X denotes a treatment, and Y the outcome. Let Y(x) denote the potential outcome for each subject under X = x; let nst denote the number of subjects with (Y(1), Y(0)) = (s, t), where s, t = 0, 1; and n is the total number of subjects. Then, all subjects belong to either (Y(1), Y(0)) = (1, 1), (1, 0), (0, 1), or (0, 0); and Σs,t nst = n. In randomized trials, it is generally sought to derive inferences about the ACE and thus to compare Pr(Y(1) = 1) and Pr(Y(0) = 1). The null hypothesis of interest is thus the following weak causal null hypothesis:
H0: Pr(Y(1) = 1) = Pr(Y(0) = 1).
Using nst , Pr(Y(1) = 1) and Pr(Y(0) = 1) can be expressed as (n11 + n10)/n and (n11 + n01)/n, respectively, and then the weak causal null hypothesis corresponds to
H0: n10 = n01.
Although the null hypothesis of interest is the weak causal null hypothesis, the hypothesis tests that are commonly used do not explore this null hypothesis. For instance, Fisher’s exact test, which is the exact test most commonly used to evaluate 2 2 contingency tables, is a hypothesis test for the following sharp causal null hypothesis [1,2]:
H0: Y(1) = Y(0) for all subjects.
Under this null hypothesis, the combination of (Y(1), Y(0)) is limited to (Y(1), Y(0)) = (1, 1) or (0, 0). Because subjects with (Y(1), Y(0)) = (1, 1) or (0, 0) do not exist, the sharp causal null hypothesis corresponds to
H0: n10 = n01 = 0.
Clearly, the sharp causal null hypothesis is a special case of the weak causal null hypothesis, and the proposition “the weak causal null hypothesis holds if the sharp causal null hypothesis holds” is true. However, the inverse “the weak causal null hypothesis does not hold if the sharp causal null hypothesis does not hold” is not true. This can be illustrated using the hypothetical data in Table 1 as follows. Assume that the 10 subjects in Table 1 comprise (n11, n10, n01, n00) = (4, 1, 1, 4). Then, Table 1 is obtained as a result that (1, 0, 1, 3) of (n11, n10, n01, n00) = (4, 1, 1, 4) is randomly assigned to the group X = 1. Because n10 = n01= 1, the sharp causal null hypothesis does not hold, but the weak causal null hypothesis does hold. This shows that the sharp causal null hypothesis can be rejected even when Pr(Y(1) = 1) – Pr(Y(0) = 1) = 0. In other words, rejection of the sharp causal null hypothesis does not mean that Pr(Y(1) = 1) – Pr(Y(0) = 1) ≠ 0.
Nevertheless, the inverse is true under the following monotonicity assumption :
Monotonicity assumption: Y(1) ≤ Y(0) for all subjects.
This assumption implies that there is no subject with (Y(1), Y(0)) = (1, 0); i.e., n10 = 0. Then, n01 > n10 = 0 corresponds to the situation that the sharp causal null hypothesis does not hold, and if this is the case, the weak causal null hypothesis also does not hold. Consequently, the sharp causal null hypothesis is equivalent to the weak causal null hypothesis under the monotonicity assumption.
Next, I discuss previously proposed CIs linking to exact tests. The limits of the CI for the ACE cannot be outside the nonparametric bounds , which are given by
Pr(Y = 1, X = x) ≤ Pr(Y(x) = 1) ≤ 1 – Pr(Y = 0, X = x).
This is because the nonparametric bounds are the range within which the ACE must exist. For the data in Table 1, the nonparametric bounds for the causal odds ratio (OR) are 0.028 ≤ causal OR ≤ 3.500. However, the usual and matching exact CIs based on the hyper geometric distribution , which links to Fisher’s exact test, yield 95% CIs of (0.003, 4.586) and (0.005, 3.172), respectively. Note that the conditional maximum likelihood estimator yields an OR of 0.203. Both lower limits are smaller than the lower bound of 0.028. This shows that these 95% CIs include values that the causal OR cannot take. Consequently, these exact CIs for the OR are not the exact CIs for the causal OR.
On the risk difference (RD) scale, Santner-Snell CI , which is an exact CI linking to Barnard’s exact test, yields a 95% CI of (-0.867, 0.305), while the bounds for the causal RD are -0.700 ≤ causal RD ≤ 0.300 . This also shows that the Santner-Snell CI is not the exact CI for the ACE.
Finally, I discuss future prospects for causal inference in randomized trials. As mentioned above, inference about the ACE is generally of most interest. Nevertheless, in general, the existing hypothesis tests and the CIs linking to them are not appropriate for evaluating the ACE. Fisher’s exact test, which is the exact test commonly used to evaluate 2 2 contingency tables, is a hypothesis test for the sharp causal null hypothesis, but unfortunately, in general, rejection of the sharp causal null hypothesis does not mean that Pr(Y(1) = 1) – Pr(Y(0) = 1) ≠ 0.
To increase the quality of statistical inferences about ACE in randomized trials, hypothesis tests for the weak causal null hypothesis and CIs linking to them require further attention. Although a few methods have recently been developed on binary outcomes [1,8-10], no well-established method exists yet. Also, new methods of sample size calculation are required, and it is necessary to create an algorithm permitting efficient use of such newly developed methods.
Furthermore, according to trial design, the methods need improvement. Such methods must be applicable to not only superiority but also non-inferiority trials; the methods will differ in the type of randomization applied. If simple (or equally complete) randomization is employed, the method must not require that the number of subjects assigned to each group is fixed. For randomized trials with restrictions, a stratified analysis is better than a crude analysis.
It is also important, in some settings, to consider whether an assumption made actually holds. For instance, the monotonicity assumption will be reasonable in many vaccine trials, in which there is no subject who would become infected if a vaccine was received, but would not become infected if a vaccine was not received.
For the hypothetical data in Table 2, Chiba’s conditional exact test  yields an RD of -0.100 (95% CI: -0.200, -0.014; two-sided p-value = 0.034) under the monotonicity assumption, but an RD of -0.100 (95% CI: -0.207, 0.007; two-sided p-value = 0.074) without the assumption. The latter result may give a somewhat more negative impression than the former. Thus, the conclusions drawn from the randomized trial may be in error, because the monotonicity was (or was not) assumed by mistake. When considering assumptions, cooperation between clinicians and biostatisticians is essential.
Table 2: Hypothetical data used in Chiba .
The hypothesis tests and CIs in randomized trials are not a new problem (indeed, the problem arose several decades ago). However, the ACE-related concerns described above are indeed new. Further work is needed.
This work was supported partially by Grant-in-Aid for Scientific Research (No. 15K00057) from Japan Society for the Promotion of Science.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals