Statistical Considerations in Biosimilar Assessment Using Biosimilarity Index

When an innovative biologic product goes off patent, biopharmaceutical or biotechnological companies may file an application for regulatory approval of biosimilar products. Unlike small molecule drug products, biosimilars are not exact copies of their brand-name counterpart, and they are usually very sensitive to changes in environmental factors and have greater variabilities due to their complexity and sensitivity to variation in manufacturing processes. Facing these challenges, a biosimilarity index based on reproducibility probability is proposed to assess biosimilarity. In this article, we have demonstrated how to assess biosimilarity between the test and reference product in relative to a reference standard that is established in a study where reference product is compared with itself. Biosimilairty index approach is robust against biosimilarity criteria and has the advantage of allowing the assessment of the degree of similarity. Statistical Considerations in Biosimilar Assessment Using Biosimilarity Index


Introduction
According to the Biologics Price Competition and Innovation (BPCI) Act passed by the United States (US) congress in 2009, a biosimilar product is defined as a biological drug product that is highly similar to the reference product notwithstanding minor differences in clinically inactive components and there are no clinically meaningful differences in terms of safety, purity, and potency.
Biological drug products contain active ingredients that are derived from or made by living cells or organisms. Unlike generic chemical drugs, biosimilar products are expected to have similar but not identical active ingredients as the innovative brand name product. Furthermore, biological products are very sensitive to small changes at various stages of the manufacturing process and environmental factors such as light and temperature. Therefore, biosimilars have greater inherent variability than chemical drugs. Current regulations for the assessment of bioequivalence (BE) for generic chemical drugs only concern with average bioequivalence. The main criticism against the criteria for BE studies is that they do not take variabilities of the drug products into consideration. Given the greater variabilities of the biological drugs it is recognized that current regulations and/or criteria for the assessment of BE may not be applicable for the assessment of biosimilarity between biologic products.
The BPCI Act as part of the Affordable Care Act was signed into law in March 2010 which gave the US Food and Drug Administration (FDA) the authority to approve similar biological drug products. However, currently the FDA has not put forward clear standards for biosimilar approval [1] due to the complexity of the biological drug products. Facing these challenges, a new biosimilarity index approach was proposed by Chow et al. [2] to assess biosimilarity. This approach has the advantage of being robust to the study endpoints, criteria and study designs. Thus a universal approach for biosimilarity assessment could be implemented without well-accepted criteria by the regulatory agencies. The BPCI Act also introduced the term "highly similar" in the definition of biosimilarity, but there is little or no discussion regarding how similar is considered highly similar. The proposed biosimilarity index approach can also answer the question of "how similar is highly similar" as the index quantifies the degree of similarity.
The purpose of the paper is to illustrate how to operationalize the biosimilarity index approach, and to evaluate the performance of biosimilarity index under a crossover design using average similarity criterion.
In the next section, biosimilar index based on reproducibility probability is briefly introduced. In Section 3, the statistical properties of the biosimilarity index are discussed through simulation studies. In Section 4, an example is given to further illustrate the impact of the variability on the conclusion of biosimilarity. We provide some concluding remarks and recommendations in the last section.

Biosimilarity Index
In order to reflect the characteristics and the impact of variability on the therapeutic effect of biologic products, Chow et al. in 2011 proposed the development of an index based on the concept of the reproducibility probability to evaluate the degree of similarity between two drug products [2]. Reproducibility probability was first considered by Shao and Chow [3] to address the question of whether the observed significant result from a clinical trial is reproducible.
testing procedure when the alternative hypothesis is true, replacing the parameter by its estimate based on the observed data. The hypotheses of the similarity testing are often expressed as two sets of one-sided hypotheses: Where, θ is the study parameter chosen to assess biosimilarity; θ L and θ U are the biosimilarity limits, i.e., the accepted lower and upper bounds for declaring biosimilarity.
The evaluation of biosimilarity index depends on the form of the test statistics, which in turn depend on the study designs and the criteria chosen. For the 2×2 crossover design, we consider the following statistical model: where ijk Y is the response for subject i in the th k sequence at the th j period, where i=1,..., k n indicates subject, j=1, 2 indicates period, k=1, 2 indicates sequence; µ represents the overall mean; S ik ik S represents the random effect of th i subject in th k sequence, assumed independently and identically distributed (i.i.d.) as N (0, σ 2 ); j P is the fixed period effect; ( j,k ) T represents the fixed effect of the treatment in the th k sequence administered at the th j period; ijk ε is the withinsubject random error, assumed i.i.d. as N (0, σ 2 ). Finally ik S 's and ijk ε 's assumed to be mutually independent.
When we choose average biosimilarity criterion, i.e., T R θ = − µ µ , the test statistics for Equation (1) are: Where, T R Y and Y are the least square means for the test and reference products; they can be obtained from the sequence-by-period means: .11 .22 By the estimated power approach, the biosimilarity index  BI P for the 2×2 crossover study using average biosimilarity criterion can be obtained: Where, L T and U T are the test statistics given in Equation (3). Both L T and U T follow non-central t-distribution, with n 1 +n 2 −2 degree of freedom and non-centrality parameters δ L and δ U respectively; δ L and δ U relate to the population means, variances and biosimilarity limits; their estimate δ  L and δ  U can be obtained from the data using the formula To apply the proposed biosimilarity index approach to assess biosimilarity, Chow et al. [2] proposed the following steps [4]: Step 1: Assess the average biosimilarity based on a given biosimilarity criterion. The cri-terion could be based on mean, ratio or variability.
Step 2: Once the product passes the test for biosimilarity in Step 1, calculate biosimilarity index of Equation (4) based on the observed mean difference and standard deviation. The calculated biosimilarity index thus takes the variability and the sensitivity to heterogeneity in variances into consideration.
Step 3: We then claim highly biosimilar if the calculated 95% confidence lower bound of the biosimilarity index is larger than p 0 , a pre-specified limit on declaring highly biosimilar.
To establish p 0 , we recommend that it be based on RR p , the biosimilarity index ob-tained in an R-R study where reference product is compared with itself. By basing p 0 on RR p , the biosimilarity index approach allows us to assess the degree of similarity in relative to the reference product [5].
From the definition of the biosimilarity index and the testing steps outlined above, we can see that this approach has several advantages. First, it is robust with respect to the selected study endpoint, biosimilarity criterion and study design [4] because the biosimilarity index utilized in the second stage of testing "highly similar" is calculated using the same selected study endpoint, biosimilarity criterion and study design. Second, it takes variability into consideration for the calculation of the index, and is sensitive to the variance of the test products. Third, it allows the assessment of the degree of similarity. Other words, it provides an answer to the question of "how similar is considered similar?".

Numerical Results
The proposed biosimilarity index approach as outlined in Section 2 is implemented in simulation studies to demonstrate the statistical properties of the index. Standard 2×2 crossover study design and average biosimilarity criterion are used. The biosimilarity index is calculated as in Equation (4).

Simulation design
The study parameter θ is the mean difference between test and reference products, i.e., ) with equal allocation is also investigated. The parameter settings for the simulation studies are summarized in Table 1.

Results
A total of 1,000 random trials are generated for each combination of the parameter specifi-cations. Table 2 records the percentage of trials that have passed the Step 1 biosimilarity test, i.e., the probability of claiming biosimilarity on the basis of the average biosimilarity criterion. As the mean difference between the test and reference products increase, the probability of claiming biosimilarity decreases. When the variance of the test product increases, the probability decreases as well. Increasing sample size can help increase the probability of claiming biosimilarity. In typical BE studies, the sample sizes range from 18 to 24. To assess biosimilarity, the sample size for the comparative nonclinical and clinical studies are expected to be larger than those chosen in BE studies, but the studies are still conducted in limited number of patients when compared with that used in the pivotal trials for the innovative drugs.
For those trials that have passed the Step 1 test, Table 3 reports the average of the p-values obtained from the Schuirmann's two one-sided tests (TOST) procedure. As the mean difference between the test and reference products increase, the p-value increases. When the variance of the test product increases, the p-value increases as well. In another word, as the mean difference and/or variance increase, for those trials where we are able to declare biosimilarity, the evidence against null hypotheses nonetheless weakens.
The biosimilarity index, i.e., the reproducibility probability in Equation (4) is calculated as the steps outlined in Section 2. Table 4 further shows the value of p T R, for those trials that have passed the Step 1 test. As expected, the results show that the p T R decreases as the mean difference or variance increases; and it increases as the Next we calculate the percentage of trials that pass the "highly similar" test based on the biosimilarity index p T R . For the p 0 in Step 3 of the testing procedure proposed in Section 2, we choose it to be 0.8 p RR RR p where p RR RR p is assumed known and constant, set at 80%. As the mean difference between test and reference products increases or the variance of the test product increases, the percentage of passing "highly similar" test decreases. The percentage of passing increases as the sample size increases.
Note that when the mean difference between test and reference products is large, such as the difference is 0.15, the test drug could not pass the "highly similar" test even if we declared biosimilarity in Step 1. This is due to the fact that null hypothesis in Step 1 was rejected on weak evidence. Or in another word, when we want to make claims on the degree of similarity, additional information is utilized in order to quantify the similarity in comparison with the reference product.
Finally, when the test product has a larger variance than the reference product, the results show that it gets harder to conclude the same level of similarity. This demonstrates the proposed biosimilarity index approach is sensitive to the heterogeneity in variances, and can reflect the impact of variability of the biological products.

Example
As shown in the simulation studies, as coefficient of variation (CV) gets bigger, it is less likely that we are able to declare similarity    even when there is no true mean difference. In this section, we use an example data to further illustrate the impact of the high variability on the conclusion of biosimilarity and how the biosimilarity index assesses the degree of similarity (Table 5).
In the simulation studies above, we have considered the scenario where RR p is constant. This constant RR p could be obtained from a separate R-R study [5]. In the following example, we set out to obtain RR p concurrently as we assess the test product [6], and thus is considered random. To obtain RR p concurrently, a slight variation of the 2×2 crossover design is used. Namely, for the first sequence, subjects are treated with T at the first dosing period, and crossover to the second dosing period to receive R. However for the R treatment, subjects are again randomly split into two groups, and treated with either R 1 or R 2 .
Similarly, for the second sequence, subjects will be split into two groups treated with either R 1 or R 2 respectively, then both of these groups will crossover to receive T in the second dosing period. In this case, the design essentially becomes a 4×2 crossover design (Table 6) We generate a sample data with a total sample size of 160, i.e., 40 subjects per sequence. The means and variances of the two reference products are assumed to be the same, i.e., We further assume there is no true mean difference between test and reference product either. A CV of 30% is chosen for all reference and test products.
When R is compared with itself, biosimilarity could not always be declared; the probability of declaring biosimilarity depends on the CV when sample size is fixed. An example where similarity between different batches of R product could not be established is obtained and the sample means are given in Table 7. The observed mean difference When similarity could not be demonstrated between reference products, it is impossible to assess whether or not the test product is similar to this reference product. Careful studies should be conducted to avoid such situation.
Under the same parameter setting, another set of data is generated. From this set of data, we are able to declare biosimilarity between R 1 and R 2 . The sample means are given in Table 8 = 0.430, the 90% CI of the mean difference is (-0.108, 0.117), that is, the similarity between T and R is also declared. From the observed mean differences and variabilities, the RR p  and TR p  as evaluated from Equation (4) is 0.884, and 0.894, which could be used to further assess the degree of similarity.

Conclusion
The numerical results in Section 3 have shown that as variance increases, the probability of declaring biosimilarity in the Step 1 test decreases; the biosimilarity index decreases. The biosimilarity index is calculated for the trials that have passed the Step 1 test, thus reflects the characteristics of the biological products that have already been declared biosimilairty based on average biosimilarity criterion.
For the assessment of biosimilars, we should especially be aware of the higher variability of biological drugs, and its impact on the conclusion of biosimilarity testing. The results from the numerical studies have demonstrated that the biosimilarity index approach is sensitive to variance of the products. Other methods have been proposed in the literature to assess variability in addition to the assessment of average biosimilarity. Chow et al. [7] considered an approach based on the probability-based criteria for evaluating average biosimilarity, and demonstrated that the probability-based method is more sensitive to the change of variability than the moment based method. Hsieh et al. [8] has developed the statistical methodology for comparing variabilities for the assessment of biosimilarity and examined its performance under combinations of essential parameters. The advantage of the biosimilarity index approach is that it can be    applied to whatever criteria chosen: the criteria used in Step 1 testing are used again in Step 3 to quantify the level of similarity.
If we define 0 RR d p / p = , then d can be used to address the degree of similarity and the question of "how similar is highly similar?". In this article, we have set d=0.8, and claim it as highly similar. If we set d=0.7, and claim it as moderate similar, then the percentage of passing Step 3 test would be greater. Thus this factor d allows us to quantify the level of similarity in relative to the reference product. The regulatory agency would be able to consider the class of drugs and the impact of variabilities on clinical performance, and decide what is the tolerable difference, and on what level of similarity relative to the reference drug should be required of the biomiliar products.