Estimating a Proportion Based on Group Testing for Correlated Binary Response

When the sampling scheme is in clusters and when the pools (of size k) within a cluster are assumed not to be independent, the Dorfman model for estimating the proportion under the binomial model is incorrect. The purpose of this paper is to propose a method for analyzing correlated binary data under the group testing framework. First, assuming that the probability of an individual varies according to a beta distribution, we derived an analytic expression for the probability of a positive pool and the correlation between two pools in each cluster. Second, we derived the exact probability mass function of the number of positive pools in each cluster that should be used to obtain the maximum likelihood estimate (MLE) of the proportion of individuals with a positive outcome. However, this MLE is not efficient in terms of computational resources. For this reason, we proposed another estimator based on the beta-binomial model for obtaining the approximate MLE of the proportion of interest. Based on a simulation study, the approximate estimator produced results that are very close to the exact MLE of the proportion of interest, with the advantage that this approach is computationally more efficient. Liu et al. [6] provide confidence interval procedures for estimating proportions estimated by group testing with groups of unequal size adjusted for overdispersion (extra-binomial variation). They used a quasi-likelihood approach to correct for the presence of overdispersion. However, in this case, heterogeneity in pool responses is induced by using different pool sizes (k) and may be due to the number of pools per cluster used in the group testing method. In their study, Liu et al. [6] introduced heterogeneity by assuming three clusters (m=3) and using a different pool size (k1,k2,k3) in each cluster, with the following number of pools per cluster: N1=5, N2=10 and N3=15. For example, when k1=20, k2=10 and k3=5, they observed that if Y1=5, Y2=7 and =4, then 2 1. 28 ˆ 00 σ = , where Yi denote the number of positive pools observed, i=1,2,3, and 2 σ̂ denotes the estimated dispersion parameter. However, if Y1=1, Y2=7 and Y3=3, then 2 4. 33 ˆ 07 σ = , which indicates that the proportion of group testing varies widely for specific combinations; this also implies the presence of overdispersion. Here it is important to point out that the outcomes of the units in each cluster are assumed to be independent, identically distributed (i.i.d.) binomial distributions with N and p and that testing was conducted with no errors. However, the assumption of i.i.d. binomial distribution with N and p is not Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s


Introduction
The group testing model of Dorfman [1] is effective for reducing the number of diagnostic tests because instead of performing n individual diagnostic tests, it only requires = n g k when retesting is not done (where k is the pool size).However, caution needs to be exercised when choosing the pool size (k), because if k is too large, the diagnostic test may be sensitive to dilution effects [2,3].Assuming perfect testing, a pool is declared positive if at least one of the k individuals is positive, and declared free of the disease if the test is negative.
The assumption of a homogeneous distribution of transgenic maize (Zea mays L.) in a population, though easy to use in practice, is unrealistic [4] and therefore may affect the quality of the estimated proportion of interest.Since plant samples are taken at different locations throughout a geographical region or seed samples are taken from seed lots obtained from different regions, this means that individual plants or seed lots are inherently clustered by design and share common characteristics [5].This clustering results in correlated samples.Therefore, it is important to develop methods for analyzing pooled data when individuals are correlated and do not require the assumption of homogeneous plant distribution, as in a binomial distribution.
When there is overdispersion (extra-binomial variation), binary data often show greater variability than predicted by the binomial model [6].Overdispersion is said to be the norm in practice, and nominal dispersion, the exception.Hung and Swallow [7] studied the robustness of group testing in estimation problems when the underlying assumption of independent individuals is violated.They found that when defectives are clustered, as in a serial correlation model with positive serial correlation, even using a small group size offers little robustness.Group testing to estimate the proportion of defectives in a serially correlated population should be done cautiously.The recommendation is not to form groups directly from the ordered population, but to randomly assign the individuals to groups and destroy the correlation.However, group testing for classification purposes only (whether defective or non-defective) benefits from having the defectives clustered, and the clustering should be preserved and exploited [7].Y 3 appropriate when the sampling process is hierarchical and the plants in each cluster are correlated due to genetic factors or because the plants are spatially adjacent [6].
Regression models for pooled data have been proposed that incorporate covariates to identify which factors influence prevalence [8][9][10], while assuming that individual statuses (positive or negative) are independent random variables.Group testing regression models with fixed and random effects have also been developed to handle within-cluster correlation among individual latent binary responses [5], where the correlation is incorporated into the model by using the clusters as random effects; with help of covariates, it is possible to vary the prevalence between units.However, when we do not have access to covariates, it is not possible to know the unit-specific prevalences that control the correlation between units induced by this variability in the prevalence between units.Also, with these models, it is not possible to get a closed form of the likelihood function and of the correlation between pools (or individuals) induced by the random effect.For this reason, it would be useful to develop an alternative method for analyzing pooled correlated data that takes into account the correlation between individuals when estimating the proportion of interest.Such a method would provide us with an analytical expression for the likelihood function that we could use for calculating the probability of a positive pool and the correlation between two pools.Furthermore, ignoring the correlation among individuals with cluster data under group testing produces a biased estimate of the proportion of interest; it also narrows down the confidence intervals and causes overestimated p-values for hypothesis testing.Data analysis methods are available for data with correlated responses in a non-group testing context with the correlation incorporating extra-binomial variation.One way of including extra-binomial which is independently distributed on the interval (0,1) with E(P i )=p i ; Var P p p , where ϕ is the parameter of overdispersion, and by assuming that, conditional on , where δ is the intraclass correlation.However, note that when m i =1, the variance does not change Var(R i )=p i (1−p i ), but we are still introducing a correlation between the individual binary responses [11,12].
A special case of this model for extra-binomial variation, described by Williams [11], assumes that P i has a beta distribution, which results in R i having a beta-binomial distribution.Another distribution with the same relationship between E(R i ) and Var(R i ) is the correlated-binomial model [13,14], in which ϕ plays the role of a correlation between the binary components of a population.[15] used beta-binomial distribution to estimate the proportion (p) when there is heterogeneity.The key element of their approach was to approximate the probability of a positive pool of size k in the presence of heterogeneity with the probability of a positive pool under the binomial model (assuming homogeneity, that is, assuming that p is constant across clusters) and adjust this binomial probability with the design effect (deff= [ ]

Turechek and Madden
).In this case, deff was defined as the ratio of the variance of the beta-binomial model divided by the variance of the proportion under the binomial distribution.Turechek and Madden [15] then defined the effective pool size ).It is important to point out that this approach works well if the correlations between pools are negligible; however, most of the time this assumption is violated in the context of plants collected from the same cluster that share genetic and environmental background.
Recent work by Lendle et al. [16] proposed group testing procedures for case identification with correlated responses for studying the efficiency of a group testing procedure when units within clusters are correlated, understanding by efficiency the expected number of diagnostic tests per unit required to classify all units as either positive or negative.In the work of Lendle et al. [16], clusters were assumed to be of equal size with the same distribution, contain exchangeable units and have a particular type of distribution.They used three models to examine how the efficiencies of group testing procedures are affected by correlated responses: a beta-binomial model where π has a beta distribution with mean p and variance σp(1−p); the model of Madsen [17], which is useful for modeling exchangeable binary data letting π=p with probability 1−σ, π=0 with probability σ(1−p), and π=1 with probability σp; and the model of Morel and Neerchal [18], which is constructed by letting with probability 1−p.However, it is important to point out that the focus of the Lendle et al. [16] paper was classification, not estimation.In fact, they derived a closed-form expression for the expected number of tests per unit (i.e., efficiency) of hierarchical and matrix-based group testing procedures used for classification when units within clusters are correlated under a class of model for exchangeable binary random variables.Considering the above three models of exchangeable binary random variables in their study, they found that if units from the same cluster are tested together, the efficiency of a particular procedure can be improved, sometimes substantially, relative to random arrangements, which ignore information about cluster membership [16].
The main objective of this research is to propose a method for estimating binary responses using the Dorfman group testing model without retesting when the data were collected in clusters and the individuals within each cluster are positively and equally correlated.Negative correlations are not discussed here.To account for this correlation in the analysis, we proceed as in the standard context of the group testing binomial model, but vary the parameter p as a beta distribution, which is used to achieve a closed form of the probability mass function (pmf) of the number of positive pools in each cluster.This also allows deriving a closed form for the probability of a positive pool .This pmf is used to estimate the proportion of interest (π) and the correlation between two individuals (δ) [16].
It is essential to point out that with this method we get a closed expression for the probability of a positive pool and the correlation between two pools that is not available in conventional approaches for pooled correlated data.However, with the proposed model these maximum likelihood estimates (MLEs) are difficult to compute, so we variation is by introducing an unobserved continuous variable P i approximated them by using the beta-binomial distribution which was applied directly over the pooled correlated data to obtain estimates of ( ) . Equating these two estimates with the closed-form expressions derived for δ k p , we get the approximate MLEs for π and δ while solving a system of nonlinear equations.These approximate MLEs based on the beta-binomial distribution produce results that are close to the exact MLEs derived using the proposed pmf with cheap computational resources.

Sampling Process using the Dorfman Model
Suppose that our population is composed of I clusters, and that N independent clusters are drawn from the I clusters in the population.Further within the l-th cluster, we form n i pools of size k l individuals, where we use the Dorfman model without retesting, with random allocation of individuals to the pools.Let Y ijl denote a binary random variable that indicates whether the i-th individual within the pool be the indicator variable, whether the j-th pool inside the cluster l is positive Z jl =1 or negative (Z jl =0).
The correlation between any two pools in the same cluster, ( ) ( ) where / , (1 is the probability that a pool of 2k individuals is positive.Although we are using only pools of size k, is a simplified notation and will be used in the proposed graphical estimator method.In this way, we can see that both the probability that a pool (of size k) is positive, ( ) π k p , and the correlation between any two pools, ( ) δ k p , are functions of the probability that an individual is positive, π, and of the correlation between any two individuals in the same cluster, δ.
From Appendix C we have that .

Maximum likelihood estimation
Let z=(z 1 ,…, z N ) be the vector that contains the number of positive pools of N clusters analyzed.Then, since the clusters are independent, the log-likelihood is given by Thus the ML estimators π and ( ) ˆθ δ are obtained by solving the equations ( ) ( ) ( ) are given in Appendix D. This system of equations can be solved iteratively using the Newton-Raphson method.

Moment estimation
We first obtain moment estimates for δ k p ; from this we obtain estimates for the interest parameters (π and δ) by solving a system of nonlinear equations.We define the first and second empirical moments based on the number of positive pools contained in N clusters sampled respectively by  δ k p , we obtain the moment estimators for the cluster l as (3) Details of how this pmf was derived are given in Appendix B. It is interesting to point out that that is, as 0 δ → , the pmf of Z l reduces to the binomial ( ) there is no correlation between individuals.
Let us assume that all clusters are independent and that for each cluster, conditional on p, all individuals have a Bernoulli distribution with parameter p, and that p varies according to a beta distribution with parameters α=π/θ and β=(1−π)/θ, where π, θ>0.It is not difficult to show that for each individual, the unconditional mean and variance, respectively, are π and π(1−π), while the correlation between any two individuals within the same cluster l, Y ijl and Y i'jl (i≠i'), is (see Appendix A and Kupper and Haseman [14] for details).In this context, from Appendix A we derived that the probability that a pool of size k is positive, is given by Now, since ( ) π k p and ( ) δ k p are functions of π and δ (Eq. 1 and Eq. 2), estimates of the parameters of interest can be obtained by solving the next system of nonlinear equations By replacing Eq. 5 in Eq. 6, this system of equations is reduced to The system of nonlinear equations given by Eq.6 and Eq.7 can be solved iteratively by the Newthon-Raphson method; alternatively, given that the right side of Equations 6 and 7 involves a quantity in the interval (0,1), and the parameters are between 0 and 1, they can be approximated by graphing the contours of g(π,δ,k) and g(π,δ,2k) at levels 1 π −  p and ( ) ( ) respectively, and then observing where this intersection is located.This can be done with the R contour command.We will denote this solution ( , ); π δ   it can be used as the initial value in true maximum likelihood (π and δ).
However, these MLEs are difficult to compute with the proposed model, so we approximated them using beta-binomial distribution.We applied it directly over the pooled correlated data to obtain estimates δ k p , we get approximate MLEs for π and δ by solving a system of nonlinear equations.These approximate MLEs based on beta-binomial distribution produced results that are close to exact MLEs derived using the proposed pmf with cheap computational resources.

An Alternative Approach based on Beta-binomial Distribution
Calculations of MLEs of the parameters of interest (π and δ) using the model described in the last section are difficult due to the complexity of the derived pmf.Therefore, in this section, we propose an alternative approach for estimating the parameters required with the beta-binomial model.
As shown in the previous section, the total number of positive pools in every cluster does not have a beta-binomial distribution; however, within each cluster the pool responses are binary with a probability of success .This alternative approach based on beta-binomial distribution has computational advantages over the exact solution, since for large n l (e.g., >20), the exact solution [Eq.3] is unstable due to the alternative sums involved in ( ) .
The corresponding log-likelihood using the beta-binomial model is given by is the probability function of the beta binomial with parameters , l n π p and θ p evaluated at ; l z specifically, ( ) where π p and θ p ( δ p ) are given by Equations 1 and 2 ignoring the superscripts.

Simulation Study
We present the results of a simulation study conducted to evaluate the performance of the approximate estimators (using the binomial or beta-binomial distribution) instead of the exact distribution (Eq.3).The simulation study was performed using four values of π (0.025, 0.05, 0.075 and 0.1), four values of δ (0.025, 0.05, 0.075 and 0.1) and five values of N (10, 30, 50, 100, 200) with k=25 and n l =10, l=1,…,N.For each combination of these parameters, we obtained 2000 random samples generated using the model given in Eq. 3. To estimate the relative bias (RB) and the relative mean squared error (RMSE) for each of these samples, we calculated the corresponding MLEs of the parameters using the true model, the binomial model and the betabinomial model.
We also evaluated the results of the simulation based on the use of the beta-binomial model in order to approximate the correct distribution given in Eq. 3.This approach has an attractive computational advantage over the exact distribution.To evaluate the quality of the approximate estimators, we calculated the relative bias (RB) as and the relative mean squared error (RMSE) as: where π is the MLE of π using the true model, πi is the usual MLE of π using the binomial or beta-binomial model, and π 0 is the parameter for which the data were generated using the model given in Eq. 3.
Figure 1 shows the RMSE plots assuming a binomial model.All the plots show that miss-specification of the true model (Eq.3) lowers (less than 1) the RMSE when the sample size at the cluster level is equal to 10 and larger than 1 when the sample sizes are 30, 50, 100 and 200.This means that when the number of clusters is equal to 10, the RMSE using the binomial model is smaller.However, when the sample size at the cluster level is 30, 50, 100 or 200, the RMSE with the binomial model is considerably larger and increases linearly with sample size; the performance of the binomial model is more deficient for larger values of δ (Table 1).The binomial model has worse RB (Figure 2) than the beta-binomial model and the true model (Eq.3), and underestimates the true values; this behavior is less severe as the sample size increases.Also, it is clear that increasing the correlation between individuals (δ) significantly increases the RB (Figure 2). Figure 3 depicts the RMSE plots for the same parameters as in Figures 1 and 2. All of these plots show that, each time, the approach of the beta-binomial model in Eq. 3 performs well in RMSE.However, when δ increases, RMSE performance decreases somewhat, but is still reasonable for the larger values of δ.In addition, it is important to point out that for N ≥ 30, performance is good and similar in all cases studied (Table 2).For the same parameters studied, Figures 3 and 4 shows that the beta-binomial model performs well in RB, except when the number of clusters is less than 30 (N<30) but comparable with the exact model; additionally, when the correlation between individuals (δ) decreases or N increases, this performance improves substantially in a similar way for both the beta-binomial approach and the exact model.Furthermore, Figure 4 shows that in all cases the approach using the beta-binomial model has a positive off-target bias, but it gradually converges to 0 as N increases, although with different patterns in each combination.The parameter that influences RB the most (in the exact and the beta-binomial approximation) is δ.For larger values of δ, RB convergence to the desired value is slower; for example, for δ=0.025, convergence is reached approximately at N>50, while for δ=0.075, it is reached approximately at N>100.Furthermore, smaller values of π are more affected by δ because they have larger RB values, but again this is observed in both estimators of π (the beta-binomial and the exact model).Therefore, the performance in RMSE and RB of the approach based on the beta-binomial model is good and has the advantage of being more efficient than the exact distribution (Eq.3).

Application
In this section, we give two examples to illustrate the methodology.

Example: Transgenic maize estimation
In 2009, a study was conducted to estimate the proportion of genetically modified maize plants in farmers' fields in the Sierra Juárez region of Oaxaca, Mexico (Table 3) [20].Of an estimated total of 50 fields in the Santa María Jaltianguis locality, 30 fields were sampled; 300 leaves were collected from plants randomly chosen throughout each field.During leaf collection in each field, 4-mm leaf sections were bulked per field totaling 300 sections per bulk sampled.The remaining leaves were labelled and stored separately (a total of 9000 leaf samples were stored).The bulk samples comprising 4-mm sections of 300 leaves each were subdivided into six pools of 50 leaves each.DNA was extracted, and the presence of 35S and NOSt sequences was determined by polymerase chain reaction (PCR) (Table 3) [20].
Each 300-leaf bulk was disaggregated into 50-leaf bulks (6 per field) for DNA extraction, and PCR amplification of HSP101, 35S and NOSt sequences was performed.Data on HSP101 and NOSt amplification are not shown.Results presented in Table 3 correspond to bulks that were confirmed as positive in at least two independent PCRs [19].Fields 6, 8, 11, 15, 25 and 27 had exactly one positive pool, field 17 had exactly 2 positive pools and field 30 had 3 positive pools.E ) for the exact distribution (Eq.3), using various combinations of π, δ and N with k=25 and n l =10, l=1,2,…,N.
Relative bias (RB) for the beta-binomial model (in black) and for the true model (in red) with various combinations of π, in pure water overnight; the suspensions were then used as coating antigens to initiate the indirect enzyme-linked immunosorbent assay (ELISA) for detecting the presence of CGMMV.Fifteen sub-samples were randomly taken from the seed lot.Working samples were prepared using pool sizes (k) of 1, 2, 5, 10, and 100 seeds from each sub-sample (cluster).When k=1, 2, 5, or 10, 10 replicates ( ) l n of each were used in the experiment.However, if k=100 of a sample, only five replicates were used.The aim of the experiment was to estimate the proportion of infected seeds and its CI with group testing (Table 4).
The MLEs based on Eq. 3 were 0.0 21 ˆ202 π = and 0.05 ˆ7920 δ = , while the approximate MLEs using the beta-binomial approach were .The approximate MLE based on the beta-binomial approach is almost identical to the exact MLE, whereas the binomial estimate is different.Also, the estimated correlation using the beta-binomial model is very close to that given by the exact MLE.Furthermore, the 95% confidence interval based on the profile likelihood of π using the exact MLE approach is (0.006897, 0.070214) and that of the beta-binomial approach is (0.006928, 0.068484), which indicates the similarity of the results of the two approaches.Indeed, the profile likelihood of this approach overlies the profile exactly, as shown in Figure 6.
Using the traditional binomial model, (0.002724, 0.009334) and (0.003181, 0.010005) are the 95% Wald and profile confidence intervals, respectively.The similarity of these confidence intervals is due to the assumption of independence among individuals (and also among pools) in each cluster and a large sample of 135 pools.Note that these based on the profile likelihood of π for the transgenic maize example.
based on the profile likelihood of π for the seed health assay data.
The traditional binomial approach resulted in an estimated prevalence of transgenic plants of 0.001260; the exact MLEs were 0.0 24 ˆ013 π = and 0.002 ˆ104 δ = taking into account the correlation; the beta-binomial approach gave estimates of 0.001 4 ˆ32 π = BB and 0.0 04 ˆ021 δ = .Since the estimated correlation is low ( 0.00 ˆ2104 δ = ), this data set is not appropriate for illustrating the proposed methodology.For the purpose of illustration, we assumed that 0.001324 is the true prevalence (π), and that δ=0.045 is the true correlation between individuals, and we maintained the same number of clusters and individuals per cluster (frequencies obtained now are in row N x.s of Table 3).Now the exact MLEs were 0.00 6 ˆ148 π = and 0.0 91 ˆ194 δ = , while the MLEs using the beta-binomial approach were .Again, we see that the approximate MLEs based on the beta-binomial model are very close to the exact MLEs.However, assuming there is no correlation between individuals and pools, the estimated prevalence is equal to 0.001142515.The 95% Wald and profile confidence intervals for π using the exact approach were (-0.000662, 0.003635) and (0.000367, 0.012265), respectively; using the betabinomial approach, they were (-0.000701, 0.0003711) and (0.000369, 0.012063), respectively, and using the binomial mode, they were (0.000435, 0.001850) and (0.000573, 0.002003), respectively [19].The similarity between the results of the exact and beta-binomial models can be observed in more detail in the profile likelihood shown in Figure 5.

Example: Seed health assay
We used the data set given in Liu et al. [6] for detecting seed transmission of the cucumber green mottle mosaic virus (CGMMV).They selected seed lot (1877T-2B) of bottle gourds (Lagenaria siceraria L.) cv."S-1" for testing.Test seeds of the working samples were soaked confidence intervals have a narrow width because they ignore extra binomial variation.
The 95% Wald confidence interval for π is (-0.002029,0.042472) with the exact MLE approach, while with the approximate beta-binomial approach it is (-0.001769,0.042303).As before, the exact MLE and the approximation based on the beta-binomial approach produced similar results.It is important to point out that the width of our confidence intervals is larger than the width adjusted for overdispersion that Liu et al. [6] reported.This can be explained by the fact that Liu et al. [6] used a quasi-likelihood approach to model the number of positive pools by cluster with the assumption that the individuals within each cluster are independent binary variables having the same prevalence.In contrast, our approach is based on the assumption that the responses of all individuals within each cluster are equally correlated binary variables and, as a result, we take into account the induced correlation between individuals and pools.

Conclusions
When we obtained a sample of N independent clusters from a finite population of clusters, we sampled individuals within each selected cluster and randomly allocated these individuals to n l pools of size k l individuals for the detection or estimation of a particular disease (positive).To produce correct estimations, in this case it is important to take into account the correlation between units and pools.For the purpose of estimation, it is important to use the probability mass function (pmf) of the number of positive pools in a cluster derived in this study to correctly estimate the proportion of interest, because it takes into account the fact that the pools formed in each cluster are correlated.Also, we showed that if we use the binomial distribution to estimate the proportion of interest, the results will present a large bias and very inflated mean square errors when N ≥ 30.This result agrees with the paper of Hung and Swallow [7], who concluded that "for clustered and correlated individuals in each cluster even using a small pool size offers a little robustness." Since our methods (exact and approximate) induce correlations between individuals with a beta distribution, they are valid for hierarchical sampling because they take into account the correlation between individuals and pools in each cluster.This is an advantage over the approach proposed by Liu et al. [6], which is not appropriate for a hierarchical sampling process because they assumed that the individuals in each cluster are i.i.d binomial distributed and used a quasi-likelihood approach to correct for the presence of overdispersion.
For this reason, it is important to use the pmf given in Eq. 3 to obtain correct estimations of the proportion in a group testing context when the responses are correlated.However, using Eq. 3 when the sample size increases is inefficient due to the term involving the sum that it contains.For this reason, we studied an approach based on the beta-binomial model, which according to the simulation study performed, produces results that are very close to those obtained using the exact distribution [Eq.3] with the great advantage that the approach based on the beta-binomial model is computationally more efficient, although we still need to use Eq. 1 and Eq. 2 to estimate the corresponding parameters required for the beta-binomial model.In addition, we control the induced correlation because we get a closed form of the probability of a positive pool and the correlation between any two pools.
the reduction in information obtained in the pool size due to the effects of over-dispersion.Then, to correct for overdispersion when calculating the probability of a positive pool, they replaced k with k 2deff in the binomial model to approximate the probability of a positive pool under the beta-binomial model.However, this effective pool size predict the probability of a positive pool in the presence of heterogeneity very well, so they suggested using

Z
denote the number of positive pools in cluster l and n is the total number of pools analyzed.
the probability mass function (pmf) of Z l , where equating these two estimates with the closed-form expressions derived for which are functions of π and δ, as shown in Eq. 1 and Eq. 2. Alternative estimates of the parameters (π and δ) can be developed if we assume that the total number of positive pools in each cluster has a beta-binomial distribution with parameters l n , ( ) π k p and ( ) δ k p , and we obtain the MLEs of this approach and by solving Equations 6 and 7 for π and δ with π p and δ  , or by directly maximizing the likelihood we estimate π and δ.We can obtain MLEs (for library VGAM and the betabinomial function[19]
conditionally on p , i Y and j Y are independent Bernoulli variables with parameter p , the unconditional correlation of i Y and j Y is given by all the individuals within a cluster are independent conditional on p , subsequently any two pools are as well, i.e., , it is easy to see the following:

Table 1 :
Relative bias (RB) and relative mean squared error (RMSE) for the binomial model using various combinations of π, δ and N with k=25 and n l =10, l=1,2,…,N.

Table 2 :
Relative bias (RB) and relative mean squared error (RMSE) for the beta-binomial model and Relative bias (RB

Table 3 :
Number of pools comprised by leaf samples from Oaxaca, Mexico (2009), with a positive 35S PCR band based on 30 fields and 300 maize leaves per field.xindicates the number of positive pools, N x is the observed frequency of each category at this location and N x.s is the frequency of each category simulated as-

Table 4 :
Data for detecting CGMMV in seed.k l is the pool size in cluster l, n l is the number of pools in cluster l, and z l is the number of positive pools in cluster l.