Bayesian Corrections of a Selection Bias in Genetics

When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. Sometimes the situation is even worse; the investigator takes all the families that appear. Thus, there is a selection bias [1].


Introduction
When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. Sometimes the situation is even worse; the investigator takes all the families that appear. Thus, there is a selection bias [1].
Fisher [2] illustrated the importance of adjusting for the selection bias in genetics; see also [3] for a discussion of ascertainment bias in the analysis of family data. For example, studies of the issue of whether a rare disease shows an autosomal recessive (dominant) pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades. The Mendelian segregation ratio is p = 0.5 for an autosomal dominant disease and p = .25 for an autosomal recessive disease. These follow from the first law of Mendel. For a rare disease one would be interested to know whether it is autosomal dominant or recessive. That is, whether p = 0.5 or p = .25 respectively. But because the disease is rare, the investigator will select all those nuclear families that appear. Then there is a selection bias; specifically the estimates will be inflated. See also chapter 2 of [4] and chapter 2 of [5] for very clear pedagogy on this problem. How do we correct for this ascertainment bias? Non-Bayesian methods are available to correct for the ascertainment bias. Specifically, see [6] for a review and a discussion of difficulties associated with maximum likelihood estimation for the ascertainment bias problem.
Here, we develop a Bayesian analysis to estimate the segregation ratio in nuclear families when there is an ascertainment bias. To our knowledge this is the first Bayesian approach to the ascertainment bias problem in genetics. More importantly, we investigate the effects of familial correlation among siblings within the same family. It is expected that one sibling getting affected will be related to the other siblings because they are in the same nuclear family sharing the same genes. In addition, we investigate the effects of heterogeneous familial correlations and proband probabilities. Again, these analyses are new within the Bayesian paradigm, and there has not been any frequentist analysis with heterogeneity. The Bayesian analysis is useful because we can obtain exact distributions under the specified model, and we can input important prior information (e.g., about the genetic features of cystic fibrosis).
Cystic fibrosis is a hereditary disease that affects the mucus glands of the lungs, liver, pancreas, and intestines, causing progressive disability due to multisystem failure. The CFTR gene, found in Chromosome 7, is the cause of cystic fibrosis, where mutations result in proteins that are too short because of premature end to production. We have been analyzing data on cystic fibrosis for the School of Medicine, Medical College of Georgia, and because of confidentiality issues we cannot present these data in this paper. Although these data are very sparse with only a few individuals reported cystic fibrosis in southern Georgia, our data set has the same structure as one that has been used repeatedly in the literature. Table 1 gives a set of data on cystic fibrosis, which was presented by Crow [3] to illustrate the need to take account of the method of ascertainment in segregation analysis. One can countthe total number of offspring to be 269, the total number of affected offspring to be 124, and the total number of probands to be 90. Thus, one might estimate the segregation ratio to be 124/269 = .4610, and the ascertainment probability to be 90/124 = .7258. Again, these simple estimates are too inflated. Note that 46.1% is far in excess of the 25% expected on simple recessive inheritance (cystic fibrosis is autosomal recessive). One reason for the excess is the ascertainment bias -the exclusion of families where the parents are heterozygous, but fail to have a homozygous recessive child. These would add to the number of normal children and thereby reduce the proportion affected. This data set was also used in [4] for illustration.
When all families with affected offspring are ascertained, we say that there is complete ascertainment; otherwise there is incomplete ascertainment and in this case (unknown to the investigator) there are families with affected siblings who are not probands. When there is complete ascertainment, the proband probability is one; otherwise it is distinctly less than one. Fisher [2] first analyzed the data using complete ascertainment. His analysis was done using a truncated Binomial distribution. However, Fisher [2] also described a simpler method for the more appropriate incomplete ascertainment for these data. This discussion was further developed by Bailey [7] and Morton [8]. In this paper, we will focus on incomplete ascertainment as is evident in Table  1. Crow [3] pointed out for the cystic fibrosis data the need to adjust for ascertainment bias and incomplete ascertainment.
The key idea for the correction of ascertainment bias is to find the correct sampling distribution under the ascertainment bias. Let x represent the quantity being measured, and let A denote the ascertainment event. Without the ascertainment bias, f(x|θ) is the sampling distribution for a random sample. This is an example of an ignorable selection model. However, when there is an ascertainment bias, we need That is, we condition on the ascertainment event, A. Here f(x| θ, A) provides a non-ignorable selection model. In general, the two sampling distributions f(x| θ, A) and f(x| θ) are different; f(x| θ, A) is the more appropriate sampling distribution. Correcting for ascertainment bias means that we need to construct the sampling distribution, f(x| θ, A). A simple example, introduced in [2] for complete ascertainment, is on the number r siblings affected in a family of size s in a binomial model with r> 0. Then, ( ) Here, A is the event that r > 0, leading to the binomial distribution truncated at 0. More importantly the binomial probabilities are being re-weighted so that all the mass points are 1, . . . , s. That is, assuming that each sib-ling is affected independently, then P (r > 0 | q) = 1 -P (none of the s siblings is affected) =1 -(1 -q) s .
The problem of ascertainment bias is not new to survey samplers. For finite population sampling, Sverchkov and Pfeffermann [9] defined the sample and sample-complement dis-tributions as two separate weighted distributions (see [1]) for developing design consistent predictors of the finite population total; see also the more recent presentation [10]. Malec et al. [11] used a hierarchical Bayesian method to estimate a finite population mean whenthere are binary data. These works are not directly applicable to our situation, but the ideas they portray are important for the issues associated with ascertainment bias. For probability proportional to size (PPS) sampling, Nandram [12] used surrogate sampling techniques to provide simulated random samples by using a model which reverses the selection bias. Under PPS sampling, Nandram et al. [13] used a method, developed by [14], to perform Bayesian predictive inference when a transformation is needed.
We distinguish between two ascertainment bias problems in population genetics. One occurs in the study of rare Mendelian disorders, and the other in single nucleotide polymorphism discovery.
We describe the first ascertainment bias problem. It is almost the case that a disease is inherited from recessive parents when the disease is rare in the entire population. The number of at-risk parents is usually small (i.e., the number of parents capable of producing affected siblings is very small relative to the number not capable of producing affected siblings). So if a sample is taken at random from the entire population, there could be no at-risk families. Thus, at-risk families are divided into two groups, those with at least one affected sibling and the other with no affected siblings. A sample is then drawn from the families with at least one affected sibling, thereby introducing an ascertainment bias. Thus, a direct estimate of the proportion of affected siblings will be too large; one needs to adjust for the ascertainment bias. Our example on cystic fibrosis falls in this first category of ascertainment bias problems.
We describe the second ascertainment bias problem. The human genome has very low density of polymorphisms, and the single nucleotide polymorphism (SNP) discovery has an ascertainment bias. The strategy of using a small sample (panel) followed by genotyping of a large sample in SNP discovery saves time and money. In SNP discovery a small sample of people is taken from the population, and these individuals are genotyped for a large number (≈ 10 6 ) of nucleotides. However, because of the low density of polymorphisms, many of the nucleotides of the panel are not polymorphic, and they are eliminated from the panel (i.e., they are not variable in the panel). The discovery goes on to genotyping a larger sample for the variable nucleotides (i.e., the remaining nucleotides). But, if the panel sample was larger, some of the discarded nucleotides could have been polymorphic in the population. Thus, there is an ascertainment bias. Kuhner et al. [15] show that representing panel SNPs as sample SNPs leads to large errors in estimating population parameters. Their recommendation to collect and preserve information about the method of ascertainment is very sensible. Clark et al. [16] point out that ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders. Nielsen and Signorovitch [17] review some of the current methods of SNP discovery, and derive sample distributions of single SNPs and pairs of SNPs for some common SNP discovery schemes. They also show that the ascertainment bias in SNP discovery has a large effect on the estimation of linkage disequilibrium and recombination rates, and they describe some methods of correcting for ascertainment biases when estimating recombination ratios from SNP data.
In this paper we provide a Bayesian analysis of the ascertainment bias problem in which we assume incomplete ascertainment for rare  recessive disease, not the SNP problem. The plan of the rest of the paper is as follows. In basic models, theory and estimation section, we present the basic models, theory, estimation, and a Bayesian analogue of the existing method. In Bayesian analysis with familial correlation section, we discuss the issue of incorporating a familial correlation in the ascertainment model, and we provide a simulation study to assess the effect of the ascertainment bias and the familial correlation. In heterogeneous probabilities and correlations section, we investigate the effect of heterogeneous proband probabilities and familial correlations using the cystic fibrosis data. In conclusion section, we provide concluding remarks, and we discuss ascertainment bias in SNP discovery. Thompson [18] discussed many ascertainment models. In this paper, we discuss the simplest ascertainment model [5] and [4]. Essentially Lange [4] shows how to adjust for the ascertainment bias using the EM algorithm [19]; Sham [5] uses Fisher's scoring. In basic selection models section, we describe the basic selection models, ignorable and nonignorable, and in Properties of the joint probability mass function section, we describe some properties of the joint probability mass function for the nonignorable selection model. In bayesian method section, we present a simple Bayesian method of the ascertainment bias problem.

Basic selection models
Suppose there are n families selected through ascertainment sampling. Letting the k th ascertained family have s k siblings, we assume that there are r k affected and ak ascertained. In Crow's data s k vary from 1 to 10. The simplest ascertainment model specifies that   k = 1, . . . , n. This is the basic ignorable selection model. The a k are really covariates, and this leads to improved precision. Thus, the joint probability mass function of (a k , r k ) is a k = 0,...,r k , r k = 0,...,s k , k= 1,...,n. Note that (1) provides the likelihood for any family without conditioning on whether it is ascertained or not.
To adjust for ascertainment bias, we need to restrict (1) to the support 1≤a k ≤r k ≤s k ,k=1,...,n. This adjustment of the basic ignorable selection model gives the basic nonignorable selection model.
The probability that a family with sk siblings is ascertained is . This is the probability that there is at least one affected sibling (i.e., at least one proband is identified). This leads to the truncated probability mass function for the basic nonignorable selection model (1 ) (1 ) ( , | , ) , a k =1,..., r k , r k =a k ,...,s k . Note that (2) provides the likelihood for a family that has been ascertained. Thus, in the terminology of missing data, while (1) is the complete data likelihood, (2) is the incomplete data likelihood. Note that in (2) is simply the probability that 1≤a k ≤r k ≤s k , k=1,...,n. Thus, p(a k , r k |π, p) actually includes the ascertainment event in the condition; henceforth, it is convenient to omit this conditioning. Now a reasonable assumption is that the families are sampled independently. Then the likelihood function for all ascertained families is The logarithm of the likelihood function of (π, p) in (3) can be maximized, and one can use a normal approximation for the joint distribution of the maximum likelihood estimators. Sham [5] used the method of scoring, and Lange [4] used the expectation maximization (EM) algorithm. Nandram et al. [6] described three other algorithms: Newton's method, the Nelder-Mead algorithm and a new simple iterative algorithm. For example, for Crow's data, the EM algorithm givesp = .268,π = .359; the standard errors are respectively .0347 and .0814 with a small correlation of .248. These are consistent with the estimates given by Lange [4] and the algorithms of [6]; Lange [4] did not present the standard errors. As pointed by [4], these estimates are consistent with the theoretical value of 5 .25 for an autosomal recessive as in cystic fibrosis.

Properties of the joint probability mass function
We describe some useful properties and interpretations of the joint probability mass function in (2).
Using (2), the marginal probability mass function of r k is All other points have zero probability. (This is obtained by simply summing over a k .) By using the probability mass function, p(r k |p, π), one can show that Thus, E(r k |p, π)is bigger than skp with the discrepancy related to p, π and s k . With some cumbersome algebraic manipulation, it can be shown that , Qk is an adjustment factor). So that if Q k ≥0, then Var(r k |p, π)≤s k p (1-p), the situation in which r k |p~Binomial (s k , p). For example, if s k = 1, then Q k = {1−π p−2π (1−π)}/ πp (1−πp). If, in addition, (reasonable for autosomal recessive), then 0 ≤Q k ≤1 and Var(r k |p, π) ≤ p (1−p).
Also, for a family that has not been ascertained (i.e., a k = 0), it is easy to show that ) ( ) (1 ( | , ) , 1,..., ; All other points have zero probability. It is easy to show that ( ) and Thus, as expected, E(a k | p, π) increases from s k πp, and Var(a k | p, π) decreases from s k π p(1-π p).
We can also show that the correlation between a k and r k is nonnegative as follows. It is easy to show that is a nonnegative decreasing function of s k starting at s k = 1 with the value of 1, the correlation must be nonnegative.
The conditional probability mass function of rk given a k is also interesting. It is easy to show that That is, r k -a k |a k , π, p~Binomial{s k -a k ,(1-π)p/(1-π p)}. Then Thus, in the conditional probability mass function, expectation increases with a k and variance decreases with a k . That is, knowledge of a k is informative, consistent with [5]. Sham [5] used data from [2] to illustrate this issue, but here we have obtained an analytical argument.

Bayesian method
We consider Bayesian inference about p and π in which (3) is the likelihood function. This is accomplished by using the noniformative proper priors , iid p π  Uniform(0,1). Then, using Bayes' theorem, the joint posterior density of (π, p) is Note that the uniform is updated using the likelihood (3) to get the joint posterior density in (6). Also, note that it is the term, , in the denominator of (6) which primarily contributes to the complexity of the two-dimensional posterior density.
To make posterior inference about (p, π), one can use standard numerical integration. However, it is simpler and more convenient to draw a random sample from the joint posterior density. Of course, one can use a Metropolis sampler to fit (draw a sample) from (6). This requires monitoring of convergence and it provides dependent samples. It is much simpler and more elegant to draw a sample from (6) using a grid method because the posterior density lies in the unit square, and it is easy to calculate. Thus, in this case we do not need to use Markov chain Monte Carlo methods.
To draw the bivariate sample of the posterior density of (p, π), we use a grid method in the unit square (0, 1) by (0, 1), the full domain of the joint posterior density (p, π) in (6). Our method allows us to construct a discrete bivariate approximation to the joint posterior density. We divide the interval (0, 1) into 100 intervals; so there are 10,000 little squares in the original unit square. We obtain the heights of the posterior density (without the normalization constant) at the center of each of the 10,000 squares. Because these little squares have the same area, the heights of the bivariate density are proportional to the posterior probabilities that (p, π) fall in each of these squares. Thus, we have constructed a joint posterior mass function of (p, π) on very fine grids. It is easy to draw a sample from the discrete bivariate probability mass function by using the cumulative distribution method. Each draws gives us one of the 10, 000 little squares, and then a random jittering is performed in the selected square. This is actually a random draw of one of the 10,000 squares with probabilities proportional to the heights of the little squares. Then within the selected square we choose a point at random by drawing two uniform random variables (i.e., uniform random jittering). This is a very accurate random draw from the joint posterior density in (6). We draw M = 10, 000 samples from this approximation for posterior inference in a standard Monte Carlo procedure with independent samples, not a Markov chain. Because of the random jittering the numbers are different with probability one. This goes at the blink of an eye! For example, letting (p (h) , π (h) ), h = 1, . . . , M, denote the probability sample of size M from the bivariate distribution, then for any function H (p, π) we can obtain the posterior mean as ( )  [24], there is one important difference. We know that the domain of the joint posterior density is in (0, 1) 2, the unit square, and for all practical purposes (p, π) are not on the boundary of the parameter space. Also, we can explore the entire domain using small grids of dimension .012. Thus, unlike [24], we do not need to search for the 'modal region' of the posterior density. Moreover, the posterior density (without the normalization constant) is easy to calculate. In fact, our procedure is an improvement over the grid method described in [24].

Bayesian Analysis with Familial Correlation
We investigate the effect of familial correlation among siblings within the same family. We start by adding an intra-class correlation to the model with a single proband probability. One can expect an intra-class correlation because siblings of the same nuclear family are genetically similar to some degree. For example, one sibling getting cystic fibrosis will be related to another getting infected because they have some common genes. Our new model contains a nonnegative intra-class correlation θ similar to [20]; see also [21] for developments in two-way categorical tables and the effects of intra-class correlation to the chi-square test. We will also describe a model that does not incorporate any information about the ascertainment bias; this is the ignorable selection model. The model that incorporates the selection bias will be called the nonignorable selection model.
It is useful to note that for r k = 1, . . . ,s k ,  In Appendix A, we show how to obtain the joint probability mass function of (a k , r k ) for ascertained families, 1≤a k ≤r k ≤s k .For r k =1,...,s k −1, This is the nonignorable selection model (i.e., the model that accommodates the ascertainment bias). In Appendix A, we also show that Note that when sk=1under ascertainment bias a k =r k =1 with probability one; so all families with exactly one sibling are excluded from the analysis.
For comparison, we briefly describe the ignorable selection model. Essentially this is the model without the normalization constant in (a k , r k |p, π, θ) (i.e.,0≤a k ≤r k ≤s k ). It is useful to separate the probability mass function of (a k ,r k ) into the following four parts. For 0≤a k ≤r k ≤1, and for r k = s k , where a k = 0, . . . , r k .

Posterior inference
We use the same assumption as in the original model that (a k ,r k ) are independent over families (k=1, . . . ,n), and we assume that , ,~Uniform(0,1) iid p π θ Then, using Bayes' theorem, the joint posterior density of (p, π, θ) is For the ignorable selection model, the joint posterior density is Note that in (8) there is no term with a k =r k =0 because they are simply not in the data of ascertained families.
To make posterior inference about (p, π, θ), we use a grid method in three dimensions in a manner similar to the one discussed earlier for (p, π). With 100 intervals in each variable, we have to evaluate the joint posterior density at 106 values of (p, π, θ), not too time-consuming though. It is unnecessarily complex to run a Gibbs sampler here. Because each of p, π and θ lives in (0, 1), the grid procedure is still attractive. Note that for the ignorable selection model, a posteriori p and θ are jointly independent of π. In fact, Thus, we use a grid to draw (p, π), and we draw π independently. In either case, we have used 10, 000 iterations, perhaps too many! In Table 2 we have compared the ignorable and the nonignorable selection models for Crow's data when inference is made for p, π and θ. The correlation is almost zero under both the ignorable and the nonignorable selection models, but the difference between these models for inference about p and π is enormous with much larger estimates from the ignorable selection model. Under the nonignorable selection model, the posterior mean, posterior standard deviation and 95% credible interval for p are .257, .033, (.190, .320). This small correlation seems to have some effect: the posterior mean, posterior standard deviations, the 95% credible interval without the familial correlation are .271, .035 and (.206, .340).
It is worth noting that we have repeated the computations with 1,000 iterations instead of 10,000. The posterior means, standard deviations and 95% credible intervals are approximately the same to three decimal places. Of course, the numerical standard errors are increased by a factor of 10 but they are still small. Thus, we can do the computations with 1,000 iterations, and perhaps fewer. This is important for the simulations we do next.

Simulation study
The purpose of the simulation study is to investigate the effects of the familial correlation and the disparity between the ignorable and the nonignorable selection models. We have generated data from the nonignorable selection model, and we have fit both the ignorable and the nonignorable selection models. Here we use a single π and a single θ. We have taken p = .257, π = .371 and n = 100 to obtain data similar to Crow's data. To study the effect of the familial correlation, we have taken θ = .02, a small value and θ = .20, a larger value.
We have generated 1000 data sets from the nonignorable selection model. From Crow's data, we have obtained the distribution of the ten family sizes 1, 2, . . . , 10. The frequencies of the family sizes are 9, 24, 16, 13, 9, 2, 4, 1, 1, 1. Thus, using the table method, we draw 100 family sizes for each of the 1000 simulated data sets. Now, noting that p(a k ,r k |p, π, θ)=p(a k |r k , π)p(r k |p, π, θ), We use the composition method to draw r k from p(r k | p, π, θ), and with this value of r k , we draw ak from p(a k |r k , θ), where p(r k |p, π, θ) is given in (A.2) of Appendix A, and ( ) a truncated binomial distribution. It is easy to draw ak using a rejection method: draw a k ~Binomial (r k , π), and take a k whenever it is not 0. We repeat this process for all 100 families.
We have used 1000 iterates to fit each model to the 1000 data sets. For each data set we have computed (a) the posterior mean, posterior standard deviation and the width of the 95% credible interval of each parameter; (b) the probability content of each interval by calculating the proportion of intervals containing the true value of each of the three parameters; and (c) the bias and the mean squared error. In (c) we calculated Abias, which is the average over the 1000 simulations of the absolute deviations of the posterior mean from the true value, and APMSE, which is the average over the 1000 simulations of the square of the deviations of the posterior mean from the true value plus posterior variance. We have also presented standard errors of the quantities in (a), (b) and (c).
In Table 3 we present the results for the simulation study. We consider each measure in turn. The posterior means are in order under the nonignorable selection model, but not under the ignorable selection model; the estimates for p and π are too large (relative to the true values) as the two examples show. The posterior standard deviations are smaller under the ignorable selection model, more than 100% smaller in some cases. This also makes the 95% credible interval much shorter under the ignorable selection model. The probability contents of the 95% credible intervals are not much smaller than the nominal value under the nonignorable selection model; under the ignorable selection model they are virtually 0 except at θ = .02, where it is really too large. The Abias and APMSE are much smaller under the nonignorable selection model. Therefore, the ignorable selection model gives badly inaccurate estimates with artificially high precision. Under the nonignorable selection model the point and interval estimates are acceptable, but not those for the ignorable selection model. In fact, Abias and APMSE favor the nonignorable selection model. There is some effect of the intra-class correlation.

Heterogeneous Probabilities and Correlations
We generalize the discussion in this paper by considering heterogeneous proband probabilities and familial correlations. Specifically, in heterogeneous proband probabilities section, we consider the case in which there are different proband probabilities, and in heterogeneous familial correlations section, we consider the case in which there are different familial correlations.

Heterogeneous proband probabilities
Then, with this simple adjustment the likelihood function for the n families is ( ) Table 2: Comparison of ignorable (IG) and nonignorable (NIG) selection models by data set and parameters using the posterior mean (PM), posterior standard deviation (PSD), numerical standard error (NSE) and 95% credible interval for Crow's data.
Here we allow the proband probabilities to vary with the number of affected siblings within each family. Crow's data have four values (1,2,3,4) for the number affected. So for Crow's data there are four different parameters (π 1 ,..., π 4 ). Thus, generally let π rk denote the proband probabilities, and d be the number of distinct proband probabilities (π 1 ,..., π d ).
Using a grid, we draw a random variate from (11), and with this value of p we have drawn independently the d remaining parameters from (12). Actually, we started with p = .35, and we drew from (12) first and (11) second; this is useful because we only need to specify one starting value of p. Again, we use 100 grid intervals for each conditional. Conservatively, we "burn in" 1000 iterates, and we use the next 10,000 values to make posterior inference about . The griddy Gibbs sampler settles down very quickly, and there are virtuallyno auto correlations in the iterates. We use these iterates to do inference as in the standard Monte Carlo procedure. For Crow's data, the posterior mean, posterior standard deviation and 95% credible interval for p are .294, .036, (.229, .369); the numerical standard error is .00035. Here, the hypothesis of an autosomal recessive is not in dispute, but we note that the 95% credible interval moves over a little to the right. Compare the posterior mean of p of .268 with a single proband probability versus .294 with five proband probabilities. In table  4 we present posterior inference about the proband probabilities. We can see that the parameters are different, and there are for all practical purposes only two distinct values of π (i.e., when variability is taken into consideration, the last three proband parameters may be taken to be equal). Thus, we repeat the computations with just two distinct values of π. Now, the posterior mean, posterior standard deviation and 95% credible interval for p are .293, .037, (.221, .361); the numerical standard error is .00039. The 95% credible intervals for the two values of π are (.732, 1.000) and (.181, .457). Again posterior inference about p does not seem to be sensitive to the number of π's used, when more than one proband probability is used.
In Figure 1 we have presented the empirical distributions of p under the three scenarios. The empirical posterior density of p is different from the posterior densities of p corresponding to five proband probabilities and the five proband probabilities collapsed into just two distinct proband probabilities; these latter two empirical posterior densities are similar.
θ π can be easily written down]. We "burn in"1000 iterates, and we use the next 10,000 to make posterior inference. The autocorrelationswere negligible for all parameters, and there were fast convergence as is evident in the quick settling down of the trace plots.
In Table 5 we present results corresponding to different intra-class correlations. With nine intra-class correlations, the posterior mean, posterior standard deviation and the 95%credible intervals of p are .259, .033 and (.200, .329). The credible interval moves over a little to the left. The nine intra-class correlations are all small, but partitioning according to the intra-class correlations, one can see two groups with sibship sizes 2, 8, 10 and the other with sibship sizes 3, 4, 5, 6, 7, 9. So we collapsed the nine different correlations to two distinct ones. As expected, there are some changes in the standard errors and intervals, but these are small.

Concluding remarks
When one wants to find out about the proportion of people with a rare disease, one cannot take a random sample from the population. It is convenient to take a random sample of the cases that appear. Thus, clearly this sample is biased (i.e., there is a selection bias). An important example in genetics occurs when one is interested in the segregation ratio for a rare recessive disease. This problem exists over a century, and there are many solutions depending on the sampling scheme. The Bayesian solutions have some merit though.
We have considered the problem of estimating the segregation parameters and the proband probabilities when there is an autosomal recessive disease. We make three useful contributions which are (a) we provide a full Bayesian analogue to the available non-Bayesian solutions; (b) we extend the methodology to reflect an intra-correlation within family; (c) we discuss the cases when there are heterogeneous proband probabilities and familial correlations. The computation in (a) and (b) is easy because we can use Monte Carlo methods with only random samples. However, in (c) we used the griddy Gibbs sampler.
The nonignorable (NIG) selection model holds, and the ignorable (IG) selection model is fit. PM, PSD and W are the posterior mean, posterior standard deviation and width of the 95% credible interval averaged over the 1000 simulations; C is the probability content of 95% credible interval, Abias is the average over the 1000 simulations of the absolute deviations of the estimate from the true value. APMSE is the average over the1000 simulations of the square of the deviations of the posterior mean from the true value plus posterior variance. Here the notation ab means that a is the average and b is the standard error. True p = .257, true π =.371 and true θ = .02, .20. We now allow the intra-class correlation to vary with family size s . In this paper we have not reported on the ascertainment bias that occurs in single nucleotide polymorphism (SNP) discovery. This is an enormously important problem with implications for the study of many genetic disorders. Our work on rare autosomal recessive disorders is a preamble to the study of ascertainment bias in SNP discovery. However, we give a brief description.

Ascertainment bias in SNP discovery
nucleotide throughout the population by using allele frequency. Other measures that are potentially more useful are heterozygosity (H) and the polymorphism information content (PIC) [23]. For s individuals in the ith nucleotide, let di denote either H or PIC at the ith nucleotide. Then, Note: In (b) there are four distinct parameters for π k , k = 1, . . . , 4 and in (c) there are two distinct parameters with π 2 , π 3 and π 4 collapsed into a single parameter, π 2 . Note: Crow's data set has sibship sizes 1 -10, and there are nine distinct correlation parameters, θ k , k=2,...,10; θ 1 ≡0. In(c) the parameters θ 1 , θ 8 , θ 10 are collapsed into θ 2 ,and the parameters θ 3 ,..., θ 7 , θ 9 are collapsed into θ 3 .  (14) is reasonable. Then, a reasonable assumption is Both assumptions (14) and (15) are the basis of a model for SNP discovery under ascertainment bias. All structures and quantities of interest can be added as are needed. Different correlation structures among the nucleotides can be specified. The important disease-causing genes can be assessed, and more accurate results from case-control studies, used in SNP discovery, can be obtained.  In SNP discovery, one can measure the polymorphism at the i th panel, let c i be the number of ones among the 2s zeros and ones at the I i =1 if the i th nucleotide is selected, and I i = 0 ifthei th nucleotide is not