Application of Information Theory to Bio-Equivalence Problem

When a drug is administered to a human subject, the drug generally passes through an absorption phase, distribution phase, metabolism phase, and finally an elimination phase within the body. The blood or plasma concentration-time curve (C (t)) is often used to study the absorption and elimination of the drug. Some of the indexes that can be obtained from concentration curves are AUC, Cmax, tmax, and PSR that denote the area under C (t) , the maximum value of C (t), the time at which concentration curve reaches its maximum value, and probability similarity region (i.e. common area under two corresponding concentration curves of the test and reference formulations).


Introduction
When a drug is administered to a human subject, the drug generally passes through an absorption phase, distribution phase, metabolism phase, and finally an elimination phase within the body. The blood or plasma concentration-time curve (C (t)) is often used to study the absorption and elimination of the drug. Some of the indexes that can be obtained from concentration curves are AUC, C max , t max , and PSR that denote the area under C (t) , the maximum value of C (t), the time at which concentration curve reaches its maximum value, and probability similarity region (i.e. common area under two corresponding concentration curves of the test and reference formulations).
Bioavailability is the rate and extent to which the active drug ingredient is absorbed from a drug product and becomes available at the site of drug action. Both AUC and C max are used to evaluate extent and rate of bioavailability, respectively.
The aim of bio-equivalence problem is to show the therapeutic equivalence of two or more different formulations (treatments) of the same drug.
The designs of bio-equivalence study and decision rules based on such studies are governed by some clinical regulations. These regulations, are stated by FDA, and reviewed in Chapter 16 of Chow and Liu [1]. For example, a bio-equivalence is concluded if the average bioavailability of the test formulation is within ±20%of the reference formulation with a certain assurance. Some requires that the ratio of means of log-transformed data to be within 80% and 125% with probability 90% to accept bio-equivalence. In this work, we are interested in the case of two treatments. A new treatment under development (called a test, T) and an existing treatment (called a reference, R) for the same disease used as a standard active competitor.

Short Review of Statistical Procedures
Several procedures were used in the literature to solve the problem of bio-equivalence. In this section we report some of these methods.
Westlake [2] and others pointed out that the classical testing hypothesis that depends on testing equality of two means of two independent normal populations scarcely makes sense from a medical point of view. Schuirmann [3,4] suggested two one-sided procedure. This procedure depends on splitting the problem of testing , A and B are given tolerance constants. If both hypotheses are rejected one concludes bio-equivalence. This procedure had been modified by Liu and Weng [5] and by Berger and Hsu [6]. Power-test procedure was suggested by Schurimann [3]. This test applies what is called 80/20 rule which states that; if T is not statistically different from R and if there is at least 80% power for detection of a 20% difference of R, the bio-equivalence is concluded. He compared this test with the two one-sided tests. Anderson and Hauck [7] and Hauck and Anderson [8] suggested a test statistic whose distribution is non-central t with random non-centrality parameter. They also approximated that distribution of the test statistic by a normal distribution and also by a t-distribution. Other parametric methods were used by Locke [9], Dannenberg et al. [10], and Wassmer [11].
Bayesian methods were used by Rodda and Davis [19], Mandallaz and Mau [20], and Grieve [21]. Nonparametric methods were used by Hauschke et al. (1990). Moment based criteria was used by Holder and Hsuan [22]. Bootstrap methods were used by Chow [23]. Shape analysis methods were used by Steinijans et al. [24] and Chinchilli and Elswick [25]. Kullback-Leibler directed divergence (KLD) were used by Dragalin et al. [26] and Pereira [27]. It is shown that KLD has several good properties, namely, it (i) possesses the natural hierarchical property that IBE PBE ABE ⇒ ⇒ , (ii) is invariant to monotonic transformations of the data, (iii) is applicable over a wide range of distributions of the response variable (i.e. there is no need to assume normality), and (iii) generalizes easily to the multivariate case where equivalence on more than one parameter (for example, AUC, C max and T max ) is required.

Entropy test for bio-equivalence
Let X be random variable with probability density function f(x). Shannon [28] suggested an entropy measure of X and denoted it by H(X) or H(f) and is defined as the expected value of -log(f(X)). If X is On the other hand, if X is discrete with probability vector P = (p 1 ,…,p n ), It is interesting to note that Shannon entropy is a measure of uncertainty (missing information). This is due to the fact that H(X) in maximum for the uniform distribution which is non-informative.
There are several methods to estimate entropy of a random variable. The simplest two methods are relative frequencies of the values (or classes of values) of the random variable and kernel estimates. More methods are given in Beirlant et al. [29].
Based on this entropy, we suggest Shannon bio-equivalence index as the ratio of Shannon entropy of test T to Shannon entropy of reference R, i.e. SI = H(T) / H(R). Two formulations are bio-equivalent if this index belongs to a suitable interval, e.g. 0.80 < SI < 1.20. Moreover, we introduce the concept of at least (1-β)100% Shannon equivalent distributions as follows.
Definition: Two random variables X and Y are said to be at least This definition may be used to test bio-equivalence using either one of the following: Given observed values of T and R, we say that T and R are bioequivalent if ∆(T,R) ≤ β. One may take 0.80 < 1 -β < 1.20.
Efron [30] introduced the bootstrap concept. Diaconis and Efron [31] introduced the computer intensive methods with some applications (see e.g. Noreen [32] who reviewed these methods for testing hypotheses). Using the above definition and the computer intensive methods we suggest a procedure for testing bio-equivalence. This procedure depends on using computer intensive methods to bootstrap each of the given samples of T and R a large number of times, e.g. 10000 times. For each obtained sample calculate ∆(T,R). Use the obtained 10000 values of ∆(T,R) to construct an upper 95% confidence interval for the true value of ∆(T,R). If the obtained interval is contained in (0,β) conclude that the two formulations are at least (1β)100% Shannon bio-equivalent with 95% confidence level.

Illustrative example
Consider the 2x2 cross over study for the comparison of bioavailability between two formulations of a drug product stated in Chow and Liu [33]. The study was conducted on 24 healthy volunteers (subjects). During each dosing period, each subject was administered either five 50 mg tablets (test formulation T) or five ml of an oral suspension (50 mg/ml) (reference formulation R). Blood samples were obtained and AUC values from 0 to 32 hours are given bellow. Let R 1 denote AUC in sequence 1 period 1, T 1 denote AUC in sequence 2 periods 1, T 2 denote AUC in sequence 1 period 2, and R 2 denote AUC in sequence 2 periods 2. It is usually assumed that the data follow normal distributions and there is no carry over effect or period effect. Two formulations are said to be bio-equivalent if A < µ T -µ R < B, or if a < µ T / R < b, where A, B, a, and b are constants.
The data matrix is given below where the rows represent R 1 , T 1 , T 2 , and R 2 respectively. To analyze this data, we have produced a Mathematica 8 package, which applies some of the procedures, which are suggested in the literature to test bioequivalence together with testing underlying assumptions. The output of this package is reported in three tables. Table 1 gives the bootstrapped quantiles of the test statistic ∆(T,R). Table 2 gives the mean, standard deviation and 95% confidence intervals of Shannon entropy of each of T, R, T-R and T/R based on 10000 bootstrapped samples from the given data. For the purpose of comparison, Table 3 gives the results of applying some bio-equivalence procedures that are given in the literature based on both raw and logtransformed data together with entropy index.

Discussion and Conclusion
First, we started testing the underlying assumptions that are imposed in the literature. Shapiro-Wilk test of normality gave 0.999609 and 0.765868 as p-values for R and T respectively. Hence normality assumption holds. Second, at level of significance 5%, the ANOVA table showed that there is no significant difference in each of carry over effect, direct drug effect, and period effect.
Concerning Shannon entropy procedure for testing bioequivalence of R and T, we found that a kernel estimator for probability density functions of T and R yielded Shannon index SI = 1.02169 ∈ (0.80, 1.20). This means that T and R are Shannon bio-equivalent. Moreover, ∆(T,R)= 0.021 which means that the two formulations are at least 97.9% Shannon bio-equivalent. On the other hand, a 10000 bootstrapped study gave an average bootstrapped value of ∆(T,R) equal to 0.0705 which means that T and R are 93% Shannon bio-equivalent. In addition, if some one wants to construct probability content interval (confidence interval) for ∆(T,R) he needs some of the most commonly used quantiles of the distribution of ∆(T,R). For example, using Table  (1), (0, 0.19400) is an upper limit one-sided 95% confidence interval for ∆(T,R). This means that we are 95% confident that the two formulations are at least 80.6% Shannon equivalent. And so, the two formulations are bio-equivalent. On the other hand, a 10000 bootstrapped study yielded a bootstrapped estimate for Shannon index SI = 1.84 / 1.83 = 1.00456, which leads same conclusion obtained from the kernel estimator from the given data.    It is clear from Table 3 that all tests under consideration, included the suggested Shannon entropy test, and yielded the same conclusion. This supports the applicability of the suggested test. Moreover, both raw data and log-transformed data gave same results which indicate that transforming this data is not necessary. KLD is used in bioequivalence studies. Under some regularity conditions, the distribution of KLD is usually approximated by chisquare distribution (see e.g. Kullback [34]. It is interesting to note that Awad et al. [35] gave an example where this approximation is not valid. Even if this approximation holds it needs a large sample size which may not be available in bioequivalence studies. So, the distribution of each of KLD and Shannon entropy indexes need to be simulated when ever its exact form is unknown. Hence any of them seem to be a reasonable index when normality assumption is not satisfied. A hence transformation of data is not required in such cases.
It is interesting to mention that the produced Mathematica 8 package is capable of producing analysis of data based on all concentration points. So, based on comments of two referees, an extensive comparison of several divergence and entropy measures, based on concentration points of some real data, will be the subject of another article that will be submitted for publication in the near future.