Department of Medicine, Imperial College, London SW10 9NH, UK
Received Date: May 22, 2017; Accepted Date: May 30, 2017; Published Date: June 12, 2017
Citation: Longford NT (2017) A Decision-Theoretical Perspective on Bioequivalencel. J Bioequiv Availab 9:437-438. doi: 10.4172/jbb.1000339
Copyright: © 2017 Longford NT. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Bioequivalence & Bioavailability
Bioequivalence is a term used for the property of two treatments, formulations, or medical products (henceforth treatments) that their effects in a specified population are identical, or that the treatments can be interchanged without any differential therapeutic impact. In most contexts, ‘identical’ is qualified by ‘for all intents and purposes’, is acknowledged to mean ‘similar’, or is meant to be interpreted as such. In established approaches, providing evidence of bioequivalence amounts to rejecting the hypothesis that the difference of the average effects of the two treatments, Δ, is distant from zero-that the two-average treatment effects are dissimilar. A study to provide such evidence should start by defining the borderline between similar and dissimilar. It comprises a positive and a negative value, δ > 0 and δ< 0, or the interval they delimit, which contains zero. If we knew that the treatment effect is within this interval we would declare bioequivalence (B), and would declare dissimilarity (D) otherwise. Setting both δ and δ to zero corresponds to a dichotomy that is false because with such a degenerate borderline it would be safe to conclude that bioequivalence is absent. After all, there are uncountably many alternatives to the exact zero, and uncountably many of them are arbitrarily close to zero .
In the presence of uncertainty, ubiquitous in studies with finite samples and inherent variation of its subjects, the conclusion of a study is a verdict, bioequivalence (b) or dissimilarity (d). Although we use the same terms for the possible reality and for the available verdicts, it is essential to distinguish between B and b on the one hand, and D and d on the other. The pairs (B, b) and (D, d), correspond to appropriate or correct verdicts, and (B, d) and (D, b) to inappropriate or erroneous verdicts; (B, d) is known as false negative and (D, b) as false positive.
Sponsors of bioequivalence studies who have integrity hope for B and b; those with integrity undermined by short-term goals reduce their focus to b. The regulator may be more impartial. Both parties contemplate two courses of action (options): Approval (α), permitting the proposed treatment to be introduced in the market, and dismissal (δ), rejecting the sponsor’s proposal, causing a failure of the sponsor’s project brought on (usually) by the regulator’s attempt to maintain the integrity of the market involved in health care. The two kinds of error, (B, d) and (D, b), are committed with nontrivial conditional probabilities given B and D, respectively.
The core of the argument presented in this note is that, as a means of controlling or managing these errors, hypothesis test is deficient because it is oblivious to their consequences (ramifications). Choosing between a pair of mutually exclusive and complementary options, such as α and δ, or b and d, when we are uncertain as to which would result in a superior future (outcome), is a problem commonly encountered in all activities in our scientific, business and private lives. Unlike in the perspective com-mitted to hypothesis testing, we look beyond the conditional (or hypothetical) probabilities of making the correct choice-we contemplate the consequences in earnest. We have our private, institutional or corporate currency for error, which reflects our value judgments, priorities and remits. We treat this currency like monetary funds, and do our utmost to be frugal with it. Codifying and quantifying these priorities and value judgements is a complex and open-ended process, often confounded by changes in our perspectives and experiences.
The failure of the hypothesis test to incorporate these consequences disqualifies it from rational statistical practice , especially from applications in which the stakes are not trivial. By hypothesis testing, we subscribe to an arbitrary 5% rule that lacks any profundity and can hardly be justified by the depth to which this convention is ingrained. Without caring about the consequences, or by subscribing to the default setting implied by the convention, we render the analysis inconsequential.
Alternative approaches that address this deficiency are presented by Lindley (1998) and Longford (2016), who recast the problem of providing evidence about bioequivalence and other issues in pharmaceutical statistics in the framework of decision theory (Lindley, 1985). These approaches require two key inputs additional to the data collected from the subjects: the borderline between bioequivalence (similarity) and dissimilarity (Δ−, δ), and the losses associated with the two kinds of error that can be committed: (D, b) and (B, d). In most settings, we have symmetry δ = − δ> 0 .
The losses, LDB and LBD, quantify the consequences of the erroneous verdicts in our currency. The losses may be functions of the treatment effect Δ, but a milestone is reached even by considering constants for them. Instead of losses, gains can be specified. They are zero for the incorrect verdicts and positive, but usually not equal, for the two correct verdicts. The scale used for the losses is the currency for error, and the scale for gains is a currency for kudos, profit, or the like. Motivated by common accounting practices, we may specify a cost for each decision (choice between two verdicts). It would introduce a counterweight to frivolous decision making .
The borderline δ and the relative loss R=LDB/LBD are elicited from experts who have a direct stake in the outcome of the study. Negotiation of the regulator with the sponsor may be necessary, or the regulator may set them, like a ‘gamekeeper’ representing the (future) constituency of consumers (patients to be treated). The inevitably contentious nature of setting δ and R can be ameliorated by settling on plausible ranges for them. Specifying these two parameters, or their ranges, is a burden additional to the requirements in more established approaches, but it is hard to argue that this information is irrelevant, and that the analysis might satisfactorily proceed without it. When the values of these parameters are set prior to conducting the study they enhance the transparency of the regulatory process and, indeed, inform the sample size calculation.
For those who expect complex and obscure calculations, the solution comes as an anti-climax. Simply, we issue the verdict, b or d that has the smaller expected loss. In the Bayesian paradigm, the expected losses are evaluated using the posterior distribution of the average treatment effect. In the frequentist paradigm, the fiducial distribution is used.
One might argue that the consequences can be incorporated in the analysis after the ‘formal’ and objective statistical analysis. However, the evaluations that incorporate the losses, or gains, involve a calculus (integration) that assigns it firmly to the remit of statistics. The borderline and the losses (or loss functions) have a role in the protocol for a study because they can and should be established prior to the conduct of the study, and the sample size calculation should be informed by them.