Fernando Pires Hartwig^{*}  
Postgraduate Program in Epidemiology, Department of Social Medicine, Faculty of Medicine, Federal University of Pelotas, Brazil  
Corresponding Author :  Fernando Pires Hartwig Postgraduate Program in Epidemiology Department of Social Medicine, Faculty of Medicine Federal University of Pelotas, Brazil Tel: (5553)81347172 Email: [email protected] 
Received July 19, 2013; Accepted September 27, 2013; Published September 30, 2013  
Citation: Hartwig FP (2013) TwoTailed PValues Calculation in Permutation Based Tests: A Warning Against “Asymptotic Bias” in Randomized Clinical Trials. J Clin Trials 3:145. doi:10.4172/21670870.1000145  
Copyright: © 2013 Hartwig FP, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Visit for more related articles at Journal of Clinical Trials
The rapid developments in computer science worldwide are enabling the routine use of computationintensive methods for statistical applications. Among such methods, the permutation method is of particular interest since it allows robust calculation (regarding test assumptions) of the Pvalue based on an empirical null distribution. Moreover, this approach fits well with the general design and rationale of randomized clinical trials, indicating the potential of such method for studies with this design. In this commentary, a discussion to clarify the inadequacy of applying asymptotic reasoning for calculating twosided Pvalues for permutationbased tests is considered, since such mistake can be observed in modern teaching literature and is of great concern for cases when the empirical null distribution is asymmetric and/or the Pvalue is close to the predefined α level. Moreover, the suitability of permutationbased tests to analyze results from randomized clinical trials indicates that such mistake has to be stressed to the clinical research community in order to avoid incorrect analyses and misinterpretations of such studies.
Keywords 
Permutation; Randomized clinical trial; Twosided Pvalue; Asymptotic reasoning 
Introduction 
Parallels between permutation and randomized clinical trials 
Statistics has been applied in medical research for several years at an increasingly rate, and its importance to this field is hardly arguable. The core of statistical thinking has been developed in times when powerful computational tools were not available, and its admirable how brilliant the early statisticians were in developing theoretical (asymptotic) probability distributions to which approximations could be done, thus allowing the estimation of the probabilities that the observed results occurred due to mere sampling variation (i.e., under the null hypothesis). Such measure, usually called the Pvalue, although sometimes overrated as a threshold of “statistical significance” (with its meaning being more interpretable when associated with respective confidence intervals), has a central role in hypothesis testing [1]. 
The rapid development of computer science and access of computational resources worldwide provided additional tools for statisticians to better calculate central concepts such as the Pvalue. Among such tools there are the socalled resampling methods, which allow the calculation of empirical distributions (through computationintensive methods) to obtain, for example, confidence intervals for a given statistic (based on an empirical sampling distribution obtained by bootstrapping) or calculate Pvalues (based on an empirical null distribution) [2]. Perhaps the main advantage of this approach is the calculation of the “true null distribution” for the data at hand, which results in better estimates regarding a given exposureoutcome association. Focusing on the calculation of Pvalues, the rationale underlying the randomized clinical trial design is particularly illustrative: under the null, the values of a quantitative variable (i.e., outcome) that indicates the efficacy of the treatment are independent on whether or not that given experimental unit (in this case, the patient) received the treatment or the placebo. Then, the values of such quantitative variable can be shuffled, and the relevant statistic (any statistic, even ones not previously described in the literature  provided its adequacy) calculated. By repeating this process several times, an empirical distribution of the statistic is obtained and can be used to calculate the Pvalue of the observed statistic, calculated using the original (i.e., unshuffled) data. 
The possibility of calculating null distributions that better fit the data is a strong motivation for the use of permutation (normally based on an approximate permutation process, such as Monte Carlo) tests. Moreover, since the permutation rationale fits well with the general design of randomized clinical trials, it is important to consider this statistical method as an option to be applied for robust (regarding departures of the data from assumptions of a given statistical test) inference in such important studies, where it is critical to prevent (given their relevance for causal inference) errors as much as possible. 
Twosided Pvalues for permutationbased tests: avoiding “asymptotic bias” 
It is interesting to note that several concepts from asymptotic statistics are applicable to resamplingbased statistics. In fact, some fundamental concepts, such as confidence intervals calculation for a given statistic, were originated from hypothesizing a theoretical resampling process from the underlying population [3]. However, such applicability is not necessarily true for the calculation of twosided (or twotailed) Pvalues, which is normally the case for the majority of statistical analysis (excluding hypothesis testing based on statistics such as F or chisquared). This probability is obtained by calculating the area under the probability distribution curve referent to values equally or more “extreme” (that is, more distant from the null) than the calculated statistic for the observed data at both ends or tails of the distribution (since the twosided Pvalue is open to more than one – normally two – alternative hypotheses). For asymptotic distributions that are symmetrical (e.g., standard normal or T distributions), this calculation can be simplified by calculating a onesided Pvalue and multiplying this value by two. This works due to the symmetry of asymptotic distributions, but this might very well not be the case for empirical distributions calculated by a permutationbased method, especially when the dependent variable, in the case of a continuous outcome, does not approximate the normal distribution (and, ironically, such cases are when using permutation is more justifiable, since normality is one of the main assumptions among asymptotic statistical tests) [4]. 
Although the “multiplying a onesided Pvalue by two” approach (typical of asymptotic statistics) not being necessarily applicable for hypothesis testing under the permutation framework, recommendations to use such approach in this context can be seen in modern and qualified teaching materials. Two recent books [5,6] are examples of that (it is important to note that the quality of the entire books is not being assessed). In both of them, the authors correctly explain the logic of onesided tests in the context of permutation, but seem to be somehow “influenced” or “biased” towards the asymptotic reasoning regarding the calculation of twosided Pvalues. This (in a particular perspective) suggests that the problem is not lack of statistical knowledge, but rather a bias caused by transposing the reasoning used in asymptotic hypothesis testing for calculating twosided Pvalues into the permutation context, where such reasoning might not apply. The mentioned “bias of asymptotic reasoning” might be such that the authors discuss even the calculation of a twotailed Pvalue (using the “asymptotic approach”) based on a skewed empirical null distribution, which is clearly incorrect since the symmetry of the null distribution is what makes the use of the “multiplying a onesided Pvalue by two” approach correct. What should be done in place is to calculate the Pvalue for each side of the distribution (i.e., calculate the two onesided Pvalues individually) and sum them. An alternative (and perhaps simpler) approach would be to calculate a onesided Pvalue (relative to the null statistics equal to or greater than the observed statistic) using a null distribution of absolute values of the statistic of interest and the absolute value of the observed statistic, since this would put both positive and negative values of a given statistic in the same side (>0) of the distribution. 
Discussion and Remarks 
It is difficult to verify whether or not the pointed mistake in teaching literature has influenced the analyses and/or interpretation of published studies, since it is usually not described how the “twotailing” has been done. As an example, an interventional study published more than 10 years ago in the prestigious The New England Journal of Medicine is considered; in the statistical analyses, the use of a “twosided permutation ttest” is mentioned [7]. This manuscript is a particularly illustrative example since it indicates that permutationbased methods have been employed in highquality studies for some time (which can be verified in the archives of this and other medical journals). Although the study itself (including the adequacy of the statistical analyses) is not being assessed here, the description of the permutation test as it is in the manuscript is not sufficient to clarify which Pvalue calculation approach (asymptoticbased or actually twosided) was used. This consideration is even more important for studies that reported Pvalues in the borderline of the a priori defined significance threshold (i.e., the α level, normally defined as 5%), where the use of one approach or the other may interfere with the interpretations (that is, the Pvalue is < α if one approach is used, but ≥ α for the other) and/or conclusions of the results by the authors and by the scientific community (since a great deal of attention – arguably more than it should in some cases – is normally putted on the dichotomy significant/not significant). 
The issue regarding the statistical concept underlying the calculation of twotailed Pvalues in permutationbased tests is relevant to medical research (and to the scientific community in general, as well as other users of such information) for three main reasons: the first is the relevance that is generally given to the Pvalue, which is, indeed, an important concept for statistical inference; the second is the increasing popularity that permutationbased approaches have been receiving due to the almost universal access to powerful computational tools and to the appropriateness of such approach to the general design of randomized clinical trials,, which is commonly regarded as the goldstandard for causal inference in medical research; the third is the origin of the pointed equivoque: it is (or seems to be) the use of the wrong rationale rather than lack of knowledge of the topic, indicating that the misuse of asymptotic thinking for calculating twosided Pvalues in the context of permutation can be easily avoided. It has be to stressed to the medical researcher that the bias (possibly) related to thinking under the asymptotic statistical framework has to be avoided when working with (possible) asymmetric empirical null distributions originated from the permutation process by thinking what a twosided Pvalue actually means and, then, proceed on calculating it. 
References 
