Medical, Pharma, Engineering, Science, Technology and Business

**Krishna K Saha ^{*}**

Department of Mathematical Sciences, Central Connecticut State University, 1615 Stanley Street, New Britain, CT 06050, USA

- *Corresponding Author:
- Krishna K Saha

Department of Mathematical Sciences

Central Connecticut State University

1615 Stanley Street

New Britain, CT 06050, USA

**Tel:**+1 860 832 2840

**E-mail:**[email protected]

**Received Date:** March 09, 2013; **Accepted Date:** April 08, 2013; **Published Date:** April 13, 2013

**Citation:** Saha KK (2013) Estimating the Treatment Effect in the Analysis of Extra- Dispersed Count Response Data from Clinical Trials. J Biomet Biostat 4:165. doi:10.4172/2155-6180.1000165

**Copyright:** © 2013 Saha KK. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

Responses in the form of counts arise in many clinical trials and epidemiological studies, and are usually extradispersed. When one wishes to estimate the treatment effect in comparison with a placebo in clinical trials, confidence intervals are frequently used. It is of common interest in many clinical trials and epidemiological studies, to obtain the confidence interval for one of the two quantities, mean difference and mean ratio. The preference of one measure over the other depends on the design of the study. In many situations, the mean ratio is more relevant than the difference of means. Confidence interval procedures for the mean difference between treatment and control groups in the analysis of such extra-dispersed counts have been studied recently, but no attention has been paid to investigating the problem of confidence interval construction for the mean ratio. In this article, we develop several asymptotic confidence interval procedures for the mean ratio, by using the delta method, to extend the variance of a single mean estimate to the variance of the mean ratio estimate. The simulation studies indicate that all procedures perform reasonably well in terms of coverage. However, the interval based on the generalized estimating equation approach, using the logarithmic transformation, performs uniformly best in terms of coverage, expected width and location, and is preferable to the other intervals, in most of the situations considered here. Finally, three real-life examples from clinical trials are analyzed to illustrate the proposed confidence interval procedures.

Count data; Delta method; Extra-dispersion; Generalized estimating equations; Mean ratio

Extra-dispersed count responses are frequent in many clinical trials and epidemiological studies. In many applications, responses in the form of counts, for example, the magnetic resonance imaging (MRI) lesion counts in multiple sclerosis patients [1], the number of adverse events occurring during a follow up period in a randomized clinical trial [2], the number of seizures in epileptics in a randomized clinical trial of the anti-epileptic drug [3], the number of new skin cancers in a randomized, double-blind, placebo-controlled clinical trial [4], the number of side effects in patients receiving a pharmacotherapy or a vaccine [5,6], and the number of times a patient used medical services in the previous year [7], are usually extra-dispersed; that is, the variance of such count responses is either greater or smaller than its mean (for example, **tables 2** and **4**). These data are often described by the appropriate parametric or semiparametric models [8-12], by taking into account the extra-dispersion. In addition, these models have also been applied to estimation and hypothesis testing, to assess the treatment effect [13-16]. An inadequate model assumption for the underlying data distribution may lead to making falsely significant inferences, and one must be careful when applying these distributions.

It is of common interest in such studies to obtain the confidence interval for one of the two quantities, mean difference (MD), and mean ratio (MR). However, little work has been done to investigate the confidence interval procedure to evaluate the efficacy and safety of treatment, in comparison with a placebo in the analysis of the extra-dispersed count data. In a recent study, Saha [17] developed several confidence interval procedures for the difference between two treatment means, in the analysis of extra-dispersed count data based on the generalized estimating equations (GEE) of Zeger and Liang [18]; the usual survey estimator studied by Rao and Scott [19]; and the procedures studied by Newcombe [20] and Beal [21]. He concluded that the confidence interval based on GEE performed the best in terms of coverage, expected width, and location.

The preference for the MR versus the MD in drawing inferences depends on the design of the study. In addition, in some situations, especially when the means are small, interval estimation of the MR is often preferable [22,23]. For instance, Francois et al. [24] analyzed the lesion count in multiple sclerosis, to assess the effect of the fingolimod treatment in the FREEDOMS trial. These data refer to the number of Gd-enhanced lesions counted on brain magnetic resonance imaging scans at baseline and months 6, 12, and 24. From **table 1** of Francois et al. [24], we see that the means for fingolimod and placebo at baseline and months 6, 12, and 24 are small (between 0.22 and 1.74). These means are very small (between 0.19 and 0.30) for the two different doses of the fingolimod treatment at months 6, 12, and 24. In order to assess the treatment effect in the analysis of these lesion counts, the choice of the confidence interval would be in terms of the ratio of the means for fingolimod and placebo. For further explanation concerning when the MR is more relevant than the difference of means, see Cox and Lewis [25].

φ_{1}=0.2494 and φ_{2}=0.1848 |
|||||||||||||

GEE | Ratio | NB | GEE^{*} |
Ratio^{*} |
NB^{*} |
||||||||

μ_{2} |
MR | CP (L, R) | W | CP (L, R) | W | CP (l, R) | W | CP (L, R) | W | CP (L, R) | W | CP (L, R) | W |

0.5 | 1.0 | 94.7 (0.4, 4.9) | 0.885 | 94.7 (0.4, 4.9) | 0.890 | 94.7 (0.4, 4.9) | 0.887 | 95.4 (2.1, 2.5) | 0.830 | 95.5 (2.1, 2.4) | 0.835 | 95.5 (2.1, 2.5) | 0.832 |

1.2 | 94.7 (0.4, 4.9) | 0.846 | 94.8 (0.4, 4.8) | 0.850 | 94.7 (0.4, 4.9) | 0.848 | 95.1 (2.3, 2.6) | 0.798 | 95.2 (2.2, 2.6) | 0.802 | 95.2 (2.3, 2.5) | 0.800 | |

1.4 | 94.5 (0.4, 5.1) | 0.818 | 94.7 (0.4, 5.0) | 0.822 | 94.7 (0.4, 5.0) | 0.820 | 95.3 (2.0, 2.6) | 0.774 | 95.5 (2.0, 2.6) | 0.778 | 95.4 (2.0, 2.6) | 0.776 | |

1.6 | 94.5 (0.4, 5.1) | 0.797 | 94.6 (0.4, 5.0) | 0.801 | 94.6 (0.4, 5.1) | 0.799 | 95.2 (2.0, 2.8) | 0.756 | 95.4 (1.9, 2.7) | 0.760 | 95.3 (2.0, 2.8) | 0.759 | |

1.8 | 94.8 (0.4, 4.9) | 0.780 | 94.8 (0.3, 4.8) | 0.785 | 94.8 (0.4, 4.9) | 0.783 | 95.3 (1.9, 2.7) | 0.742 | 95.4 (1.9, 2.7) | 0.746 | 95.4 (1.9, 2.7) | 0.744 | |

2.0 | 94.7 (0.3, 5.0) | 0.767 | 94.8 (0.3, 4.9) | 0.771 | 94.7 (0.4, 4.9) | 0.770 | 95.2 (2.1, 2.8) | 0.731 | 95.3 (2.0, 2.7) | 0.735 | 95.2 (2.0, 2.7) | 0.733 | |

1.5 | 1.0 | 94.9 (1.1, 4.0) | 0.531 | 95.0 (1.1, 3.9) | 0.534 | 94.9 (1.1, 4.0) | 0.532 | 95.2 (2.4, 2.5) | 0.519 | 95.3 (2.3, 2.4) | 0.521 | 95.2 (2.3, 2.5) | 0.520 |

1.2 | 95.0 (1.1, 3.9) | 0.513 | 95.1 (1.1, 3.8) | 0.516 | 95.1 (1.1, 3.9) | 0.515 | 95.3 (2.3, 2.5) | 0.502 | 95.4 (2.2, 2.4) | 0.505 | 95.4 (2.2, 2.4) | 0.503 | |

1.4 | 94.9 (1.2, 3.9) | 0.500 | 95.0 (1.1, 3.9) | 0.503 | 95.0 (1.1, 3.9) | 0.501 | 95.2 (2.4, 2.4) | 0.490 | 95.3 (2.3, 2.4) | 0.492 | 95.2 (2.3, 2.5) | 0.491 | |

1.6 | 95.0 (1.1, 4.0) | 0.490 | 95.1 (1.0, 3.9) | 0.493 | 95.0 (1.0, 3.9) | 0.491 | 95.3 (2.3, 2.5) | 0.481 | 95.4 (2.2, 2.4) | 0.483 | 95.3 (2.3, 2.4) | 0.482 | |

1.8 | 94.9 (1.1, 4.0) | 0.483 | 95.0 (1.0, 4.0) | 0.485 | 95.0 (1.0, 4.0) | 0.484 | 95.1 (2.3, 2.6) | 0.473 | 95.2 (2.2, 2.6) | 0.476 | 95.2 (2.2, 2.6) | 0.474 | |

2.0 | 95.0 (1.1, 4.0) | 0.476 | 95.0 (1.0, 3.9) | 0.479 | 94.9 (1.1, 4.0) | 0.477 | 95.3 (2.2, 2.5) | 0.467 | 95.4 (2.1, 2.5 | 0.470 | 95.4 (2.2, 2.5) | 0.468 | |

φ_{1}= 0.25 and φ_{2}=0.25 |
|||||||||||||

GEE | Ratio | NB | GEE^{*} |
Ratio^{*} |
NB^{*} |
||||||||

μ_{2} |
MR | CP (L, R) | W | CP (L, R) | W | CP (l, R) | W | CP (L, R) | W | CP (L, R) | W | CP (L, R) | W |

0.5 | 1.0 | 94.5 (0.4, 5.1) | 0.892 | 94.7 (0.4, 5.0) | 0.897 | 94.6 (0.4, 5.0) | 0.894 | 95.5 (2.1, 2.4) | 0.836 | 95.6 (2.1, 2.4) | 0.840 | 95.5 (2.1, 2.4) | 0.838 |

1.2 | 94.4 (0.4, 5.1) | 0.853 | 94.6 (0.4, 5.0) | 0.858 | 94.6 (0.4, 5.1) | 0.855 | 95.2 (2.2, 2.6) | 0.804 | 95.2 (2.2, 2.6) | 0.808 | 95.3 (2.2, 2.6) | 0.806 | |

1.4 | 94.5 (0.4, 5.2) | 0.825 | 94.6 (0.3, 5.1) | 0.830 | 94.5 (0.4, 5.1) | 0.828 | 95.4 (2.0, 2.6) | 0.781 | 95.5 (2.0, 2.6) | 0.785 | 95.4 (2.0, 2.6) | 0.783 | |

1.6 | 94.5 (0.5, 5.0) | 0.805 | 94.6 (0.4, 4.9) | 0.809 | 94.6 (0.4, 5.0) | 0.807 | 95.2 (2.1, 2.8) | 0.763 | 95.3 (2.0, 2.7) | 0.767 | 95.3 (2.0, 2.7) | 0.765 | |

1.8 | 94.5 (0.4, 5.1) | 0.788 | 94.7 (0.3, 5.0) | 0.793 | 94.6 (0.3, 5.1) | 0.791 | 95.5 (1.8, 2.7) | 0.749 | 95.5 (1.8, 2.7) | 0.753 | 95.5 (1.8, 2.7) | 0.752 | |

2.0 | 94.6 (0.3, 5.1) | 0.775 | 94.7 (0.3, 5.0) | 0.780 | 94.7 (0.3, 5.0) | 0.778 | 95.3 (2.0, 2.7) | 0.738 | 95.4 (2.0, 2.6) | 0.742 | 95.4 (2.0, 2.6) | 0.740 | |

1.5 | 1.0 | 94.8 (1.1, 4.1) | 0.541 | 94.9 (1.1, 4.0) | 0.544 | 94.8 (1.1, 4.1) | 0.543 | 95.2 (2.3, 2.6) | 0.528 | 95.3 (2.3, 2.5) | 0.531 | 95.3 (2.3, 2.50 | 0.530 |

1.2 | 94.9 (1.1, 4.0) | 0.524 | 95.0 (1.0, 4.0) | 0.527 | 94.9 (1.0, 4.1) | 0.525 | 95.2 (2.3, 2.5) | 0.512 | 95.3 (2.2, 2.5) | 0.515 | 95.3 (2.2, 2.5) | 0.513 | |

1.4 | 95.0 (1.0, 3.9) | 0.511 | 95.1 (1.0, 3.9) | 0.514 | 95.1 (1.0, 3.9) | 0.512 | 95.1 (2.3, 2.6) | 0.500 | 95.2 (2.3, 2.5) | 0.503 | 95.2 (2.3, 2.5) | 0.501 | |

1.6 | 94.9 (1.1, 4.0) | 0.501 | 95.0 (1.0, 3.9) | 0.504 | 95.0 (1.1, 4.0) | 0.503 | 95.2 (2.3, 2.5) | 0.491 | 95.3 (2.3, 2.5) | 0.493 | 95.2 (2.3, 2.5) | 0.492 | |

1.8 | 94.9 (1.1, 4.0) | 0.493 | 95.0 (1.1, 3.9) | 0.496 | 94.9 (1.1, 4.0) | 0.495 | 95.1 (2.3, 2.6) | 0.484 | 95.2 (2.3, 2.5) | 0.486 | 95.1 (2.3, 2.6) | 0.485 | |

2.0 | 94.9 (1.1, 3.9) | 0.487 | 95.1 (1.1, 3.9) | 0.490 | 95.0 (1.1, 4.0) | 0.489 | 95.3 (2.2, 2.5) | 0.478 | 95.4 (2.2, 2.5) | 0.480 | 95.3 (2.2, 2.5) | 0.479 |

**Table 1:** The coverage percentage (CP), left non-coverage percentage (L), right non-coverage percentage (R), and the average interval width (W) for the 95% nominal CIs, based on various methods using 10,000 replications with sample sizes of 100.

In this paper, we consider asymptotic confidence interval construction for the MR between two independent treatment groups in the analysis of extra-dispersed count data. Using large sample theory, we first develop three asymptotic interval estimators of the MR, which are actually the direct generalizations of the confidence intervals for a single mean parameter based on the delta method. From the simulation studies given in a later section, we can see that these methods maintain the coverage well, but suffer from the interval location, because the sampling distribution of the MR estimate can be much skewed when sample sizes are not large enough. To overcome this issue, we also develop confidence interval procedures for the MR using the logarithmic transformation suggested by Katz et al. [26]. In Section 3, we conduct a simulation study to investigate the performance of various confidence interval procedures, with respect their coverage probabilities, expected confidence widths and interval locations based on the approach suggested by Newcombe [27]. In Section 4, we include two examples from toxicology and epidemiology studies to illustrate the use of the proposed methods. A brief discussion is given in Section 5.

**CI Based on NB model**

Suppose that there are two comparison groups, the experimental treatment group (i =1) and the standard treatment group (i=2). For the i^{th} treatment group, let Y_{ij} (j = 1,.. .,mi) be the counts of the j^{th} individual. Given the unobserved variable η_{ij}, suppose that Y_{ij} follows a Poisson distribution with mean η_{ij}μ_{i}. We further assume that η_{ij} independently follows a gamma distribution with mean 1 and variance ϕ_{i}. Then it follows that the marginal distribution of Y_{ij} becomes a negative binomial (NB) distribution, with mean E(Y_{ij}) = μ_{i} and variance var(Y_{ij}) =μ_{i}(1 + ϕ_{i}μ_{i}). Note that the parameter ϕ_{i } is used to measure the extra variability compared to the Poisson distribution, and is usually called the extra-dispersion parameter. Several parametric forms for the NB distributions exist, and we use the form found in Saha and Paul [28]. It can be shown that the limiting distribution of the NB distribution follows a simple Poisson distribution with mean μ_{i}, as the parameter ϕ_{i} approaches to zero.

The unbiased estimate of μ_{i} is and As (i=1,2) is an unbiased and consistent estimate of μ_{i}, it is natural to use as an estimate of and a confidence interval for MR can be constructed from the sampling distribution of Using the delta method, the asymptotic variance of is

(1)

It can be proved that is asymptotically normally distributed with mean and variance var as and

Then, an asymptotic (1-α) 100% confidence interval (called the NB method), for the MR, is given by

(2)

Where is upper th percentile of the standard normal distribution, and given by

(3)

Where (i=1,2) is the unbiased estimate of μ_{i} and (i=1,2) is the maximum likelihood (ML) estimate of ϕ_{i}. The ML estimate of ϕ_{i}.can be obtained by maximizing the log-likelihood of the NB model, or solving the estimating equations discussed by Saha and Paul [28].

Note that the sampling distribution of MRμμcan be much skewed, especially when sample sizes are not large enough. In such a case, the interval estimator of obtained in (1) may not perform well; in particular, this interval may not have satisfactory interval location. That is, the interval in (1) may be too distally located. Following Katz et al. [26], we use the logarithmic transformation to improve the normality approximation of this sampling distribution. Again, using the delta method, the asymptotic variance of in is given by

(4)

And hence, we obtain an asymptotic (1-α) 100% confidence interval (called the NB^{*} method), for the MR given by

(5)

Where

(6)

**CI Based on sandwich variance estimator**

The robust estimator, known as a sandwich estimator of the variance of the regression estimator can be obtained by using the generalized estimating equation (GEE) approach, introduced by Zeger and Liang [18]. Saha [17] applied this approach to the extra-dispersed count data, to obtain an estimate of the mean parameter and a sandwich estimate of its variance. From Saha [17], we obtain an estimate of μ_{i} (i=1,2) as ,and a sandwich estimator of the variance of (i=1,2) given by

Note that this variance formula does not involve the extra-dispersion parameters ϕ_{i} (i=1,2). Now, using this sandwich estimator of the variance of as an estimate of var() in (1), we also obtain an asymptotic (1-α)100% confidence interval (called the GEE method), for the MR based on equation (2), where

Similarly, using as an estimate of in (4), an asymptotic (1-α) 100% confidence interval (called the GEE^{*} method), for the MR can be obtained based on equation (5), where

**CI Based on variance of a ratio estimator**

The variance of an estimate of the mean parameter ϕ_{i} (i=1,2).can also be obtained, using the results by Cochran [29]. Saha [17] computed this variance by expressing the estimate of the mean parameter μ_{i} as the ratio of two means, where and Following Saha [17], an estimator of is given by

where t_{ij} is the surface area, or volume, or any other appropriate measure of size. However, in some situations, information on t_{ij} may not be available. In such cases, one can set t_{ij} equal to a constant, and without loss of generality, one can assume as t_{ij}=1 for all i and j. It follows that and so that

Like the sandwich variance, does not involve the extra-dispersion rameters ϕ_{i} (i=1,2). Now, using as an estimate of in (1), an asymptotic (1-α) 100% confidence interval (called the Ratio method), for the MR can also be obtained based on equation (2), where

Similarly, using as an estimate of in (4), we obtain an asymptotic (1-α) 100% confidence interval (called the Ratio* method), for the MR based on equation (5), where

The performance of the proposed six confidence interval (CI) methods for the MR, NB, NB*, GEE, GEE*, Ratio and Ratio*, was assessed in this section through simulations in terms of the coverage probabilities, expected confidence widths, and the distal and mesial non-coverage probabilities. In this study, we considered the following sample sizes: (m_{1},m_{2})={(30,30), (50,50), (100,100)} for the balanced designs, and (m_{1},m_{2})={(30,50), (50,80)} for the unbalanced designs. For the mean parameters, we considered μ_{2}=0.5, 1.5, 3.0 (these are very similar to the real-life applications in **tables 2** and **4**) and μ_{1}=MR*μ_{2}, where MR=1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0. The common extra-dispersion parameters (ϕ_{1},ϕ_{2})=(0.25,0.25) were considered for both the treatment and control groups, and the unequal extra-dispersion parameters ϕ_{1}=0.2494 for the treatment group and ϕ_{2}=0.1848 for the control group were considered based on the ML estimates of ϕ_{1} and ϕ_{2} from **table 6**. For each combination of (m,μ,ϕ), data for both groups were generated from a NB distribution using IMSL subroutine RNNBN.

ML Estimates of | ||||||

Follow-Up Period | Treatment Arms | Size | Mean | Variance | μ |
φ |

Year 1 | Control | 51 | 1.5294 | 1.3341 | 1.5294 | -0.1344 |

s.c. IFN beta-1a | 46 | 0.3696 | 0.5048 | 0.3696 | 1.1538 | |

i.m. IFN beta-1a | 46 | 1.1522 | 1.3319 | 1.1522 | 0.1420 | |

GA | 48 | 0.7917 | 0.7642 | 0.7917 | -0.0853 | |

Year 2 | Control | 51 | 2.9608 | 4.5584 | 2.9608 | 0.2559 |

s.c. IFN beta-1a | 46 | 0.7174 | 0.7850 | 0.7174 | 0.0818 | |

i.m. IFN beta-1a | 47 | 1.6596 | 2.2294 | 1.6596 | 0.2326 | |

GA | 48 | 1.2917 | 1.4876 | 1.2917 | 0.1225 |

**Table 2:** Summary statistics and the maximum likelihood estimates of the model parameters for MRI cortical lesions data of example 1.

Follow-Up Period | Comparison Groups | Method | Lower CI | Upper CI | Width |

Year 1 | s.c. IFN beta-1a VS control | NB | 0.0964 | 0.3869 | 1.3901 |

NB^{*} |
0.1325 | 0.4408 | 1.2024 | ||

GEE | 0.0999 | 0.3834 | 1.3449 | ||

GEE^{*} |
0.1344 | 0.4344 | 1.1732 | ||

Ratio | 0.0983 | 0.3849 | 1.3645 | ||

Ratio^{*} |
0.1335 | 0.4372 | 1.1860 | ||

i.m. IFN beta-1a VS control | NB | 0.4886 | 1.0180 | 0.7340 | |

NB^{*} |
0.5301 | 1.0705 | 0.7027 | ||

GEE | 0.4880 | 1.0187 | 0.7360 | ||

GEE^{*} |
0.5297 | 1.0715 | 0.7045 | ||

Ratio | 0.4851 | 1.0215 | 0.7446 | ||

Ratio^{*} |
0.5277 | 1.0755 | 0.7120 | ||

GA VS control | NB | 0.3286 | 0.7067 | 0.7658 | |

NB^{*} |
0.3593 | 0.7458 | 0.7304 | ||

GEE | 0.3256 | 0.7097 | 0.7792 | ||

GEE^{*} |
0.3572 | 0.7502 | 0.7421 | ||

Ratio | 0.3236 | 0.7117 | 0.7882 | ||

Ratio^{*} |
0.3558 | 0.7531 | 0.7498 | ||

Year 2 | s.c. IFN beta-1a VS control | NB | 0.1430 | 0.3416 | 0.8708 |

NB^{*} |
0.1608 | 0.3650 | 0.8196 | ||

GEE | 0.1445 | 0.3401 | 0.8562 | ||

GEE^{*} |
0.1618 | 0.3628 | 0.8075 | ||

Ratio | 0.1434 | 0.3412 | 0.8667 | ||

Ratio^{*} |
0.1611 | 0.3644 | 0.8162 | ||

i.m. IFN beta-1a VS control | NB | 0.3721 | 0.7489 | 0.6994 | |

NB^{*} |
0.4005 | 0.7845 | 0.6722 | ||

GEE | 0.3805 | 0.7405 | 0.6659 | ||

GEE^{*} |
0.4065 | 0.7728 | 0.6423 | ||

Ratio | 0.3786 | 0.7424 | 0.6734 | ||

Ratio^{*} |
0.4052 | 0.7754 | 0.6491 | ||

GA VS control | NB | 0.2874 | 0.5851 | 0.7111 | |

NB^{*} |
0.3101 | 0.6137 | 0.6826 | ||

GEE | 0.2927 | 0.5798 | 0.6835 | ||

GEE^{*} |
0.3139 | 0.6062 | 0.6581 | ||

Ratio | 0.2912 | 0.5813 | 0.6912 | ||

Ratio^{*} |
0.3129 | 0.6083 | 0.6649 |

**Table 3:** 95% confidence intervals of the MR=*μ*_{1} / *μ*_{2} with the confidence widths by the all six methods for MRI cortical lesions data of xample 1.

ML Estimates of | ||||||

Follow-Up Period | Treatment Arms | Size | Mean | Variance | μ |
φ |

Year 1 | Placebo | 827 | 0.2709 | 0.7619 | 0.2709 | 4.2018 |

Beta-carotene | 856 | 0.2979 | 0.6468 | 0.2979 | 4.2009 | |

Year 2 | Placebo | 803 | 0.2403 | 0.4771 | 0.2403 | 3.5114 |

Beta-carotene | 827 | 0.2612 | 0.4571 | 0.2612 | 2.9188 | |

Year 3 | Placebo | 776 | 0.2474 | 0.6071 | 0.2474 | 4.6972 |

Beta-carotene | 794 | 0.3154 | 1.2643 | 0.2859 | 6.0556 | |

Year 4 | Placebo | 699 | 0.2332 | 0.6117 | 0.2332 | 5.6482 |

Beta-carotene | 688 | 0.3154 | 1.2643 | 0.3154 | 4.8770 | |

Year 5 | Placebo | 419 | 0.2721 | 0.7153 | 0.2721 | 4.2656 |

Beta-carotene | 392 | 0.2985 | 0.8033 | 0.2985 | 3.5507 |

**Table 4: **Summary statistics and the maximum likelihood estimates of the model parameters for skin cancer data of example 2.

Ten thousand data sets were produced to compute the coverage proba-bilities (CP), the expected confidence width (ECW), the distal non-coverage probability (DNCP) estimated by the proportion of intervals that missed the true parameter value, MR, from the left, and the mesial non-coverage probability (MNCP), computed by the proportion of intervals that missed MR from the right. For each given combination, we compute the corresponding coverage probability by the proportion of intervals that included the true value of the parameter of interest, MR. Note those confidence intervals (CI) based on ML estimates of the extra-dispersion parameters did not exist for some samples [28], and these were discarded. Further note that a confidence interval is good if it is able to guarantee its CPs close to the nominal coverage level. Given the CPs are well controlled, one prefers those CIs which yield shorter ECWs on average. For the ratio measure, confidence interval width is best on a log scale. As a result, the ECWs are computed as the average widths between the log of upper and lower limits of the 10,000 confidence intervals. In addition to CPs and ECWs, Newcombe [27] also suggested assessing location from overall coverage, that is, to check whether the CI is completely above or below the true value of the parameter of interest. This can be measured by an index MNCP/(DNCP+MNCP), which ranges on the interval [0,1].

All six confidence intervals evaluated here are two-sided 95% intervals for the MR corresponding to the different sets of parameter combinations discussed above. For the results corresponding to (m_{1},m_{2})={(30, 30), (50, 80)} and μ_{1}=3, we do not observe any substantial difference; so these are omitted. Therefore, we present the simulation results only for two cases of balanced designs, one case of unbalanced designs, and two values of mean parameter μ_{1}. However, a complete list of simulation results can be obtained from author’s website.

The simulation results for the CPs, ECWs and symmetry of coverage for all six methods for both balanced and unbalanced designs are reported in **figures 1**-**3**. Note here that each box plot was constructed for various MR values between 1 and 2, with an increment of 0.1. The horizontal line for plots (a)-(d) indicates the coverage probability of 0.95. Furthermore, the horizontal lines for plots (i)-(l) indicate the proportions of 0.40 and 0.60, respectively. For the location property of the interval procedure, following Newcombe [27], we classify the index measure MNCP/(DNCP+MNCP) as satisfactory if it is between 0.4 and 0.6, the interval is too mesially located if it is below 0.4, and too distally located if it is above 0.6. Based on the simulation results in **figures 1**-**3**, we observe the following:

**Coverage probability**

In general, all six interval methods perform satisfactorily in terms of coverage, in the sense that these probabilities for all methods are between 93% and 97% in almost all situations. As expected, as sample sizes increase, the coverage probabilities become closer to 0.95, the nominal level. Irrespective of equal or unequal dispersions, the cover-age probabilities for the interval methods using logarithmic transformation (GEE*, Ratio*, and NB*) are slightly better than the interval methods with-out this transformation (GEE, Ratio, and NB). Although there is no overall winner in terms of coverage, the GEE* confidence interval provides slightly better coverage, in the sense that it controls well the coverage probabilities around the nominal level in most situations.

**Width of the CI**

As expected, the expected confidence widths for all methods becomes smaller when sample sizes increased (for example, **Figures 1e** and **3e**), as well as when the mean parameters become larger (for example, **Figures 1e** and **1f**). The confidence intervals using logarithmic transformation (GEE*, Ratio*, and NB*) provide significantly shorter widths compared to the other interval methods, specifically for smaller mean parameters (for example, **Figures 1e**). For larger mean parameters, the ECWs for all methods become very similar; however, the intervals using logarithmic transformation have slightly shorter ECWs than the others. Overall, the GEE* method has some edge and yields generally shorter confidence widths.

**Symmetry of coverage**

Irrespective of the sample sizes, as well as the other parameter combinations, the interval methods without logarithmic transformation (GEE, Ratio, and NB) show strong evidence of asymmetry; that is, the right non-coverage probabilities become much larger than the left non-coverage probabilities. More specifically, the index MNCP/(DNCP+MNCP) for these intervals becomes very close to 1, indicating that these intervals are too distally located. However, the asymmetric behavior of these confidence intervals improves a little bit for larger sample sizes (for example, **Figures 1i** and **3i**). The measures of this index for the interval methods using logarithmic transformation (GEE*, Ratio*, and NB*) are generally between 40% and 60%, indicating that these intervals have satisfactory interval locations in almost all situations. That is, the left and right non-coverage probabilities for these three intervals are very similar.

In addition to **figures 1**-**3**, the simulation results for selected parameter combinations are also presented in **table 1**, but only for the balanced designs of 100. From **table 1**, it can be seen clearly that all methods show nearly identical empirical coverage probabilities, and maintain the nominal coverage level of 95% reasonable well. As expected, ECWs for all methods decrease as the mean parameters increase. Irrespective of parameter combinations, the GEE*, Ratio*, and NB* methods provide shorter confidence widths, compared to the other methods in almost all situations. In terms of ECWs, the GEE*, Ratio*, and NB* intervals are quite similar; however, the GEE* method has somewhat shorter widths. The left non-coverage probabilities for the GEE, Ratio and NB methods are almost 1% or less, whereas the right non-coverage probabilities for these are between 4% to 5%, which is an evidence of asymmetry. However, the left and right non-coverage probabilities for the GEE*, Ratio*, and NB* intervals are almost identical, which is evidence of symmetry.

This section illustrates the analysis of three real-life data sets obtained from clinical trials. The first example is from the multiple sclerosis longitudinal studies reported in Sormani et al. [30]. Second, we consider the example from the skin cancer prevention study of Greenberg et al. [4], and then we revisit the Type II clinical data example given in Saha [17].

**Example 1: MRI cortical lesions data**

Multiple sclerosis (MS) is a chronic inflammatory disease involving the central nervous system (CNS). In order for diagnosis and monitoring disease activity in clinical trials and practice, MRI is widely used to detect the white matter, gray matter, and cortical lesions in specific MRI sequences. The main goal of these studies is to lessen the degree of inflammation within the CNS, which can lessen the number of lesions, indicating ultimately progress in disability. Sormani et al. [30] studied new cortical lesions developed by MS patients over the follow up period. This clinical study was conducted on a group of 191 relapsing remitting (RR) MS patients who were randomized into four different groups. Fifty patients did not received any treatment, 46 were given subcutaneous (s.c) interferon (IFN) beta-1a (44 mcg three times weekly), 47 received intramuscular (i.m.) IFN beta-1a (30 mcg weekly), and the remaining patients received glatiramer acetate (GA) (20 mg daily). All 191 subjects were evaluated by MRI at baseline, 12 and 24 months, and the number of new cortical lesions was counted on the 12-and 24-month scans, as compared to the baseline. The descriptive statistics of the number of new cortical lesion counts over 1 and 2 years, as well as the maximum likelihood estimates of the model parameters for all four treatment arms are reported in **table 2**. The hypothesis tested whether the new treatment has an effect in reducing the mean value of lesions. We used this data and computed the confidence intervals for the MR between treatment and control groups for all six proposed methods, and the results are summarized in **table 3**, which shows that intervals are less than one except for i.m. IFN beta-1a treatment after 1 year. This leads to the same conclusions, indicating that all treatments have significant effects in reducing the mean number of new cortical lesions over the follow up period, except for the i.m. IFN beta-1a treatment. Similar conclusions were obtained by Sormani et al. [30]. Note that the intervals based on logarithmic transformation using GEE have the shortest lengths compared to the others in almost all cases.

**Example 2: Skin cancer data**

Greenberg et al. [4] conducted the Skin Cancer Prevention Study. This was a randomized, double-blind, placebo-controlled clinical trial of beta-carotene to prevent basal-cell and squamous-cell cancers of the skin in high risk people. A group of 1805 patients were randomized to either a placebo or 50 mg of beta-carotene per day, over the follow up period of 5 years. Patients were examined once a year and biopsied, if a tumor was suspected to determine the number of new cancerous lesions occurring since the last exam. The data from this study consist of counts of the number of new skin cancers per year, over the follow up period of 5 years. The complete dataset on 1683 patients comprising a total of 7081 measurements are given in Fitzmaurice et al. [31]. The summary of the data for each year, as well as the maximum likelihood estimates of the model parameters for placebo and beta-carotene treatment, are presented in **table 4**, which shows that the mean number of new skin cancer for each data set is very small (that is, between 0.23 to 0.32). This table also shows that the variance for each data set is much larger than its mean, indicating extra-dispersion. In addition, the mean for each group per year is very small, so it is preferable to use the confidence interval procedures for the MR to assess the effect of the treatment. Therefore, we computed six types of 95% confidence intervals for this ratio, and the results are given in **table 5**. The intervals include the value of 1, which indicates that the beta-carotene treatment was not effective on the high risk patients, for any of the five follow up years. The CI based on NB models using logarithmic transformation is reasonably good compared with the other intervals, since it provides the shortest confidence width compared to the others for all data sets.

95% Confidence Interval for MR = μ_{1} / μ_{2} |
||||||

Follow-Up Period | NB | NB^{*} |
GEE | GEE^{*} |
Ratio | Ratio^{*} |

Year 1 | (0.8076, 1.3920) | (0.8432, 1.4345) | (0.7871, 1.4126) | (0.8276, 1.4616) | (0.7869,1.4127) | (0.8275, 1.4618) |

Year 2 | (0.8032, 1.3702) | (0.8372, 1.4106) | (0.7981, 1.3753) | (0.8332, 1.4173) | (0.7979,1.3755) | (0.8331, 1.4175) |

Year 3 | (0.8098, 1.5011) | (0.8567, 1.5584) | (0.7635, 1.5475) | (0.8231, 1.6222) | (0.7632,1.5477) | (0.8229, 1.6225) |

Year 4 | (0.9258, 1.7793) | (0.9866, 1.8543) | (0.8601, 1.8451) | (0.9398, 1.9467) | (0.8597,1.8454) | (0.9395, 1.9472) |

Year 5 | (0.6859, 1.5081) | (0.7542, 1.5957) | (0.6361, 1.5579) | (0.7207, 1.6698) | (0.6355,1.5585) | (0.7203, 1.6707) |

Follow-Up Period | NB | NB^{*} |
GEE | GEE^{*} |
Ratio | Ratio^{*} |

Year 1 | 0.5444 | 0.5314 | 0.5848 | 0.5687 | 0.5852 | 0.5690 |

Year 2 | 0.5340 | 0.5217 | 0.5443 | 0.5312 | 0.5446 | 0.5315 |

Year 3 | 0.6171 | 0.5983 | 0.7065 | 0.6785 | 0.7070 | 0.6789 |

Year 4 | 0.6533 | 0.6310 | 0.7633 | 0.7283 | 0.7639 | 0.7288 |

Year 5 | 0.7879 | 0.7495 | 0.8957 | 0.8403 | 0.8970 | 0.8413 |

**Table 5:** 95% confidence intervals of the MR = *μ*_{1} / *μ*_{2} with the confidence widths by the all six methods for the skin cancer data of example 2.

**Example 3: MRI vascular lesions data**

We now revisit the example of Type II clinical trial data from Saha [17]. This was originally conducted by the National Heart, Lung, and Blood Institute (NHLBI), to study Type II Coronary Intervention, and analyzed by Brensike et al. [32]. In this study, patients with Type II hyperlipoproteinemia and coronary heart disease were randomly allocated to a daily dosage of 24 g of cholestyramine and diet (treatment group), or placebo and diet (control group), after a standardizing period. After five years, the number of vascular lesions was counted for each patient’s angiogram, for both treatment and placebo groups. The summary of this data and the maximum likelihood estimate of the model parameters are presented in **table 6**, which shows that the variances are greater than corresponding mean responses, indicating that the lesion count data are extra-dispersed. The main purpose of this study was to determine whether the 24 g of cholestyramine and diet treatment reduces the mean number of vascular lesions for patients with Type II hyperlipoproteinemia and coronary heart disease. Here, we compute the 95% confidence interval for the MR between the treatment and control groups based on the methods discussed in earlier, and the results are also given in **table 6**, from which we see that intervals based on all methods include the value of one, which lead to the same conclusion drawn by Saha [17]. Note that the confident interval based on GEE using logarithmic transformation has the shortest width.

Summary of the Data Sets | |||||

Treatment Arms | Size | Value | Parameter | ML Estimate | |

treatment | 59 | Mean | 4.9322 | μ |
4.9322 |

Variance | 9.6850 | φ |
0.2494 | ||

Control | 57 | Mean | 5.5088 | μ |
5.5088 |

Variance | 10.4330 | φ |
0.1848 | ||

Method | Lower CI | Upper CI | Width | ||

NB | 0.6870 | 1.1036 | 0.4740 | ||

NB^{*} |
0.7095 | 1.1299 | 0.4653 | ||

GEE | 0.6987 | 1.0920 | 0.4466 | ||

GEE^{*} |
0.7188 | 1.1153 | 0.4393 | ||

Ratio | 0.6987 | 1.0920 | 0.4466 | ||

Ratio^{*} |
0.7188 | 1.1153 | 0.4393 |

**Table 6:** Summary statistics, the maximum likelihood estimates of the model parameters, and 95% confidence intervals of the MR = *μ*_{1} / *μ*_{2} with the confidence widths by the all six methods for MRI vascular lesions data of example 3.

In this paper, we developed six different asymptotic confidence interval methods for the ratio of two treatment means in the analysis of extra-dispersed count response data from clinical trials. Three methods were based on large sample theory of the MR estimate, using three different variances obtaied using the NB model, the generalized estimating equations, and the ratio estimator. Actually, three variances were obtained by the direct generalizations of the variances of a single mean estimate, using the delta method. It has been seen from the simulation results that these three methods maintained the coverage reasonably well, but showed evidence of asymmetric confidence intervals, when the sample sizes are not large enough. Following the suggestion of Katz et al. [26], we also developed three other confidence interval methods, based on the point estimate for the logarithmic version of the MR and its variances. From the simulation results, we found that not only the coverage probabilities of the logarithmic versions improved, but also showed strong evidence of symmetry, and shorter widths of these intervals compared to the other three methods. That is, the mesial and distal differences of these intervals are very close to zero, which guarantees that the intervals are not directionally biased. This is true, regardless of the sample sizes and parameter combinations. Because the three logarithmic intervals maintain coverage, have shorter widths, and are close to symmetric, they outperform the non-logarithmic versions. However, we recommend the GEE based logarithmic interval because it is also very simple to use; does not require the iteratively obtained estimates of the dispersion parameters; and provides somewhat shorter width compared to the other methods considered here.

This research was partially supported by the CSU-AAUP University Research Grant. The author is grateful to Maria Pia Sormani (Genoa, Italy),for providing the MRI cortical lesions data. The author would like to thank the referee for his/her valuable suggestions that improved the presentation.

- Van den Elskamp I, Knol DL, Uitdehaag B, Barkhof F (2009) The distribution of new enhancing lesion counts in multiple sclerosis: further explorations. Mult Scler 15: 42-49.
- Siddiqui O (2009) Statistical methods to analyze adverse events data of randomized clinical trials. J Biopharm Stat 19: 889-899.
- Thall PF, Vail SC (1990) Some covariance models for longitudinal count data with overdispersion. Biometrics 46: 657-671.
- Greenberg ER, Baron JA, Stukel TA, Stevens MM, Mandel JS, et al. (1990) A clinical trial of beta carotene to prevent basal-cell and squamous-cell cancers of the skin. N Engl J Med 323: 789-795.
- Min Y, Agresti A (2005) Random effect models for repeated measures of zero-inflated count data. Stat Modelling 5: 1-19.
- Rose CE, Martin SW, Wannemuehler KA, Plikaytis BD (2006) On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J Biopharm Stat 16: 463-481.
- Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Econ 12: 313-336.
- Hilbe JM (2011) Negative binomial regression. (2nd Edn), Cambridge University Press, Cambridge, UK.
- Margolin BH, Kim BS, Risko KJ (1989) The Ames
*Salmonella*/microsome mutagenicity assay: Issues of inference and validation. J Am Stat Assoc 84: 651-661. - Breslow NE (1984) Extra-Poisson variation in log-linear models. Appl Stat 33: 38-44.
- Breslow NE (1990) Tests of hypotheses in over dispersed Poisson regression and other quasi-likelihood models. J Am Stat Assoc 85: 565-571.
- Paul S, Saha KK (2007) The generalized linear model and extensions: A review and some biological and environmental applications. Environmetrics 18: 421-443.
- Clark SJ, Perry JN (1989) Estimation of the negative binomial parameter
*k*by maximum quasi-likelihood. Biometrics 45: 309-316. - Dean CB (1992) Testing for overdispersion in Poisson and binomial models. J Am Stat Assoc 87: 451-457.
- Paul SR, Banerjee T (1998) Analysis of two-way layout of count data involving multiple counts in each cell. J Am Stat Assoc 93: 1419-1429.
- Piegorsch WW (1990) Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics 46: 863-867.
- Saha KK (2013) Interval estimation of the mean difference in the analysis of over-dispersed count data. Biom J 55: 114-133.
- Zeger SL, Liang KY (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42: 121-130.
- Rao JNK, Scott AJ (1999) A simple method for analysing overdisper- sion in clustered Poisson data. Stat Med 18: 1373-1385.
- Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17: 873-890.
- Beal SL (1987) Asymptotic confidence intervals for the difference between two binomial parameters for use with small samples. Biometrics 43: 941-950.
- Agresti A (2007) An introduction to categorical data analysis. (2nd Edn), Wiley, New Jersey, USA.
- Newcombe RG (2012) Confidence intervals for proportions and related measures of effect size. Chapman and Hall/CRC Biostatistics Series, Boca Raton, USA .
- Francois M, Peter C, Gordon F (2012) Dealing with excess of zeros in the statistical analysis of magnetic resonance imaging lesion count in multiple sclerosis. Pharm Stat 11: 417-424.
- Cox DR, Lewis PAW (1966) The statistical analysis of series of events. Chapman and Hall, London, UK.
- Katz D, Baptista J, Azen SP, Pike MC (1978) Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics 34: 469-479.
- Newcombe RG (2011) Measures of location for confidence intervals for proportions. Commun Stat Theory Methods 40: 1743-1767.
- Saha KK, Paul S (2005) Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics 61: 179-185.
- Cochran WG (1977) Sampling techniques. (3rd Edn), Wiley, New York, USA.
- Sormani MP, Calabrese M, Signori A, Giorgio A, Gallo P, et al. (2011) Modeling the distribution of new MRI cortical lesions in multiple sclerosis longitudinal studies. PLoS One 6: e26712.
- Fitzmaurice GM, Laird NM, Ware JH (2004) Applied longitudinal analysis. (2nd Edn), Wiley, New Jersey, USA.
- Brensike JF, Kelsey SF, Passamani ER, Fisher MR, Richardson JM, et al. (1982) National Heart, Lung, and Blood Institute Type II coronary intervention study: Design, methods, and baseline characteristics. Control Clin Trials 3: 91-111.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- 7th International Conference on
**Biostatistics**and**Bioinformatics**

September 26-27, 2018 Chicago, USA - Conference on
**Biostatistics****and****Informatics**

December 05-06-2018 Dubai, UAE

- Total views:
**11695** - [From(publication date):

May-2013 - Mar 23, 2018] - Breakdown by view type
- HTML page views :
**7928** - PDF downloads :
**3767**

Peer Reviewed Journals

International Conferences
2018-19