Reach Us
+44-1522-440391

**J. Fellman ^{1,2*}**

^{1}Folkhälsan Institute of Genetics, Department of Genetic Epidemiology, Helsinki, Finland

^{2}Hanken School of Economics, Helsinki, Finland

- *Corresponding Author:
- J. Fellman

Folkhälsan Institute of Genetics

Department of Genetic Epidemiology, POB 211

FIN-00251 Helsinki, Finland

**E-mail:**[email protected]

**Received date:** May 24, 2012; **Accepted date:** June 19, 2012; **Published date:** June 20, 2012

**Citation:** Fellman J (2012) Analysis of Sex-Linked Recessive Traits: Optimal Designs for Parameter Estimation and Model Tests. J Biomet Biostat 3:146. doi: 10.4172/2155-6180.1000146

**Copyright:** © 2012 Fellman J. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

The estimation of the gene frequency of sex-linked recessive traits is reconsidered. The estimation formulae for mixed, male, and female samples are presented and compared. Optimal designs for efficient estimation are studied. Male samples are optimal for gene frequencies below 1/3 and female samples for frequencies above 1/3. Mixed samples are never optimal. The model testing problem is discussed. Mixed samples are necessary for model testing. We analyse the loss in efficiency when both estimation and testing must be performed. In general, our results indicate that mixed samples should contain an excess of males. The results obtained are applied to empirical data found in the literature.

Maximum likelihood estimation; Model testing; Efficiency; Colour vision; xg Blood groups

In the literature, abundant studies exist concerning probabilistic models in genetics. These have mainly investigated model building and the statistical estimation of gene frequencies. However, in to our opinion, experimental design problems have not been examined sufficiently. Against this background, this study is performed. We evaluate the estimation of the gene frequencies of sex-linked recessive traits and our basic assumption is that the trait is monogenic and recessive. Such a trait has markedly different phenotype frequencies in the male and female segments of the population. This is caused by the fact that if the trait is recessive and has a gene frequency p in the total population, then the frequency of affected individuals is p among males and p2 among females. Consequently, direct comparisons of phenotype frequencies between males and females are of no value; e.g. the genes for colour-blindness and for blood group Xg are sex-linked, being located on the X chromosome.

We discuss and compare the maximum likelihood estimators of the gene frequency for mixed, male, and female samples. Among geneticists there is consensus that colour-blindness is not a monogenic trait. Kalmus (1985, p. 63) discussed whether the genes responsible for protan or deutan defects represent one common series of alleles on the X-chromosome or two separate series. He stated that the two-loci hypothesis seems better supported. The possibility to test the genetic model is crucial, and we give alternative methods for model testing. We analyse the loss in efficiency when both estimation and testing must be performed. The results obtained are applied to empirical data found in the literature [3,4].

**Maximum likelihood estimation**

The model: We consider a monogenic sex-linked recessive trait. We assume that we have a sample of size N consisting of M males and F females and that there are m_{1} males with a recessive phenotype, m_{2} males with a dominant phenotype, f_{1} females with a recessive phenotype and f_{2} females with a dominant phenotype. If the gene frequency of the recessive trait is p among both males and females [5,6], then the genetic model is given in **Table 1**.

Males | Females | |||||
---|---|---|---|---|---|---|

Number | Affected | Not affected | Number | Affected | Not affected | |

Observed | M |
m_{1} |
m_{2} |
F |
f_{1} |
f_{2} |

Theoretical | M |
Mp |
M(1-p) |
F |
Fp^{2} |
F(1-p^{2}) |

**Table 1:** Observed and expected number of subjects according to a monogenetic
recessive sex-linked trait. Affected individuals have recessive and not affected individuals
have dominant phenotypes.

A mixed sample: If we ignore a proportionality factor independent of p, we obtain from **Table 1** the likelihood function ,with the restriction 0 < p < 1. The function L(p) can be written

(1)

The log-likelihood function is

(2)

If the log-likelihood function is written

(3)

the first parentheses (lm(p)) contain the contribution of the male data and the second parentheses (lF(p)) the contribution of the female data. When we maximize l(p) in (2), we obtain

(4)

The condition , yields an algebraic equation of second degree

(5)

This equation has two roots, one negative outside the admissible region (0,1) and one positive. The positive root is

(6)

The upper limit ofis

Consequently, and belongs to the admissible interval (0,1). This estimation result was given by Haldane (1963). One obtains

(7)

and . Consequently, the unique solution maximizes l(p) (and L(p)). If we accept the model, we can estimate the estimator variance. We have

(8)

From (7) and (8), we get the information

(9)

If we introduce and , we obtain

(10)

We note that for high values of x (a majority of males) the information is high for low values of p and that for low values of x (a majority of females) the information is high for high values of p. Later, we will discuss this observation in more detail.

The inverse of I(x,p) yields the variance

(11)

The estimator is asymptotic normal and the variance can be estimated by using instead of p in (11). Haldane (1963, formula (5)) gives a slightly different estimate of . His formula contains the observed frequencies and is, in to our opinion, not altogether satisfactory. In fact, he estimates p with given below in (13) in the “male part” of the formula and withgiven in (16) in the “female part” of the variance formula (c.f. formula (11)).

**A male sample:** If we consider a male sample and ignore the proportionality factor, which is independent of p, we obtain from (2) the log-likelihood function

(12)

When we maximize l_{M}(p), we get the “male” estimator

(13)

with the information and the well-known variance

(14)

The estimator is asymptotic normal and the variance in (14) can be estimated by using instead of p.

A female sample: If we consider only the female part of the sample and ignore the proportionality factor, which is independent of p, we obtain from (3) the log-likelihood function [7,8]

(15)

If we maximize the log-likelihood function, we obtain the “female” estimator

(16)

with the information and the variance

(17)

The estimator is consistent, efficient and asymptotic normal and the variance in (17) can be estimated by using instead of p. According to Huether and Murphy (1980), it is not clear how rapidly these asymptotic properties are approached with increasing sample size. The log likelihood equation (15) yields an unbiased estimate of p^{2}, but in (16) is biased with a negative bias. Haldane (1956) proposed an improved estimate [9,10]

(18)

In order to improve the ML estimates, Huether and Murphy proposed a jackknife procedure. Their estimate is, using our notations.

(19)

How these improvements influence our gene estimates will be discussed in the Discussion section. Eq. (9) indicates that the information obtained for the whole data set is. This is a consequence of the male and female data sets being independent.

**Model testing**

**A mixed sample:** In the mixed data set, there are two degrees of freedom because the row sums for males and females are fixed. After the estimation of p, one degree of freedom remains. According to **Table 1**, the model can be tested by the quantity [11]

(20)

where

Under the null hypothesis that the model holds, this quantity is approximately χ^{2} distributed with one degree of freedom.

The model can also be tested by the Likelihood Ratio Test (LRT). Consider

Where

The maximizations give

(21)

Where , and are given in (6), (13), and (16), respectively. Under the null hypothesis, −2logΛ is approximately χ^{2} distributed with one degree of freedom. In situations not far from the null hypothesis, the χ^{2} tests based on (20) and (21) give similar results. In the applications, the formula (20) is used.

**Separate male and female samples:** If we estimate p separately for the male and female series, there is no degree of freedom left in either series. Consequently, if we test the hypothesis = , we must consider the difference - with the variance

(22)

Under the null hypothesis, is standard normal.

If we accept the null hypothesis p, then we can obtain a weighted estimate of the common gene frequency p. To minimize the variance of the weighted estimate, the weights should be the inverses of the variances in (14) and (17). The weighted estimate is

(23)

and its theoretical variance is , which is identical to (11). The estimator maximizes L( p) and but the weighted estimator and the Haldane estimator ˆp have asymptotically the same efficiency. Consequently, both estimators are best asymptotic normal (BAN).

**Design of experiments**

In connection with another type of genetic problem, Brown (1975) considers efficient experimental designs for the estimation of genetic parameters. We start from the same basic idea, but use different methods. In his book on colour-blindness, Kalmus (1965, p. 85) states, without further comments, that the population frequency for rare sexlinked recessive traits must be based on male samples. Now we study this problem in more detail. We apply experimental design theory using the inference results in the preceding sections [12].

**Designs for parameter estimation:** Let us assume that we intend to investigate N (fixed) individuals and that the gene frequency is p. Now our problem is in what proportion M : F shall we include males and females in our sample in order to minimize the variance given in (11) or, alternatively, to maximize the information measure (9). We study the information I( x, p) and the variance V( x, p) as functions of p and x. From (9) we get

(24)

The function (24) is a linear function of x. For is an increasing function of x and the maximum is obtained for x = 1, i.e. for a male sample. For is a decreasing function of x and the maximum is obtained for x = 0, i.e. for a female sample. For is constant and all samples are equally good. Our optimal experimental design for parameter estimation is hence

(i) Use a male sample if

(ii) Use an arbitrary sample if

(iii) Use a female sample if

We observe that the optimal design of the experiment depends on the true parameter value. This is common in non-linear situations, but in this case the dependence is very simple. In different populations, the frequency of colour blindness is about 0.08 so the rule (i) is in good agreement with Kalmu´s (1985) statement.

In practice, the problem is not so simple. Often when we start an investigation, we do not know the gene frequency. If we have prior information (from earlier studies) that the gene frequency is far in a known direction from one-third we can with confidence use a male or a female sample. If, however, we have no prior information or if the gene frequency is known to be in the neighbourhood of , then it is difficult to decide whether to use a male or female sample. We can see in **Table 2** that for the Xg blood group p is close to , and this is a good example of this problem.

N | Recessive | Dominant | ^{a)} |
^{b)} |
SD | ^{c)} |
Reference | ||
---|---|---|---|---|---|---|---|---|---|

Males | 154 | 59 | 95 | 0.356021 | 0.88 | 0.383117 | 0.039175 | 0.355486 | Mann et al., 1962 |

Females | 188 | 21 | 167 | 0.025542 | 0.334219 | 0.034369 | 0.025836 | ||

Males | 1751 | 620 | 1131 | 0.341226 | 2.62 | 0.354083 | 0.019206 | 0.334715 | Noades et al., 1966 |

Females | 1667 | 179 | 1488 | 0.008075 | 0.327687 | 0.011570 | 0.009911 | ||

Males | 3513 | 1209 | 2304 | 0.340577 | 0.41 | 0.344150 | 0.013664 | 0.338743 | Sanger et al., 1971 |

Females | 3271 | 371 | 2900 | 0.005731 | 0.336780 | 0.008232 | 0.007051 |

a)Maximum likelihood estimate on the upper line and SD on the lower line

b)Male estimate on the upper line and female estimate on the lower line

c)Weighted estimate of and on the upper line and SD on the lower line

**Table 2:** X_{g} in different studies

Let us now analyse the efficiency of a mixed sample in more detail. Assume that the true gene frequency is p. Now, we have to compare with if and with if , and we obtain the relative efficiencies for the mixed sample

(25)

If we must test the model, it is necessary to include in the sample both males and females. If this is done, there is a loss of efficiency relative to the best (but unknown) design. In general, if we compare a male sample, a female sample, and a mixed sample of the same size, then the efficiency of the mixed sample is always between the efficiencies of the single-sex samples.

In **Figure 1**, we see how the efficiencies depend on the gene frequency for the single-sex samples (x = 0 and x = 1) and for some mixed samples (x = 0.3333, 0.5155, and 0.6667). The choice of x = 0.5155 and x = 0.6667 will be explained later. We observe that for small values of p the efficiency strongly depends on the true value of p. For , the male sample is most efficient. For , the female sample is most efficient but the efficiency of a female sample is not as good as the efficiency of the male sample for . Therefore, **Figure 1** supports the conclusion that, independently of the true value of p, if we want to play safe a mixed sample should contain an excess of males.

This result can also be obtained in the following way. We consider the efficiency E(x,p) for a mixed sample as a function of p for a given x. For , we have and , with equality for x = 1, i.e. the sample contains only male subjects. Hence, E(x,p) is an increasing function of p and for .

Similarly, we obtain for and Now, E(x,p) is a decreasing function of p and for . From these results, it follows that . Hence, , and this value is obtained for and p = 0 or 1

Speaking in terms of game theory, the strategy of nature is the choice of p and our strategy is the choice of x, and E(x,p) is the payoff of the game. The solution indicates that we are playing safe. We expect the worst, i.e. that nature has chosen one extreme p value, and consequently, we prepare for it and choose the strategy that maximizes our gain (the efficiency). From this point of view, we should use a sample with males and females. This mixed sample guarantees at least the efficiency (cf. **Figure 1**).

**Designs for model testing:** A sample consisting of both males and females is necessary if we have doubts about the model. The doubts may concern the simple recessive inheritance (cf. colour blindness), absence of selection (the same gene frequency in males and females), exact typing independent of the sex, or the non-existence of border cases that are difficult to type. If we have a mixed sample, we can then test the model as we have noted above. This is not possible with a male-, or female-only sample. This problem is a good example of the common situation that an experimental strategy, which is optimal for parameter estimation, is too restricted to be of any value for model testing.

If we want to test the model and to use given in (22) most efficiently under the null hypothesis, then we have to consider the variance

(26)

and to pursue . This solution indicates that we are again playing safe. We expect the worst situation, i.e. that nature has chosen a p value that maximizes the variance, and consequently, we prepare for it and choose the strategy (x) that minimizes our loss (the variance). In other words, we want to answer the question: Which sample mixture x: (1-x) minimizes the For a given x, we obtain

The corresponding W value is a maximum for . This maximumW_{max}( x), which depends on x, is

Now, we minimize W_{max}( x) by using the derivative

If we use the condition that , the derivative reduces to Now, we solve the equation under the restriction . The equation simplifies to

(27)

This equation of third degree satisfies the conditions and . Consequently, the equation has one root or three roots within the interval (0,1). The case three roots within this interval are impossible because the product of the roots has to be . Thus, there is only one root within the interval (0,1). In **Figure 2**, we present the function in order to locate the roots. An iterative calculation yields the numerical root ,and the corresponding p value is . Finally, we obtain

(28)

The solution is our best testing strategy in order to meet nature’s worst alternative . This minimax solution of the testing problem does not coincide with the maximin solution of the efficiency problem. **Figure 3** shows how NW( p,x) depends on p for different values of x. The minimax property of is easily seen.

We apply our theoretical results to empirical data. We consider both colour vision and Xg blood group data. In **Table 2**, we present the results of the analyses of blood group data, and in **Table 3** the results of the colour vision data. The results obtained by the mixed sample and obtained by combined estimates of male and female samples are fairly similar.

N | Recessive | Dominant | ^{a)} |
^{b)} |
SD | ^{c)} |
Reference | ||
---|---|---|---|---|---|---|---|---|---|

Males | 9049 | 725 | 8324 | 0.077226 | 4.76 | 0.080119 | 0.002854 | 0.076979 | Waaler, 1927 |

Females | 9072 | 40 | 9032 | 0.00247 | 0.066402 | 0.005238 | 0.002506 | ||

Males | 6863 | 532 | 6331 | 0.074505 | 4.89 | 0.077517 | 0.003228 | 0.074141 | Schmidt, 1936 |

Females | 5604 | 20 | 5584 | 0.002862 | 0.059740 | 0.006667 | 0.002905 | ||

Males | 21231 | 1687 | 19544 | 0.078034 | 5.62 | 0.079459 | 0.001856 | 0.077898 | Koliopoulos et al., 1976 |

Females | 8754 | 37 | 8717 | 0.001740 | 0.065013 | 0.005333 | 0.001753 |

b)Male estimate on the upper line and female estimate on the lower line

c)Weighted estimate of and on the upper line and SD on the lower line

**Table 3:** Colour blindness in different studies.

The reduction of the biases in the female estimates in the **Tables 2 and 3** is presented in **Table 4**. The comparison between the maximum likelihood estimates and the improved estimates indicates that the MLE has a negative bias, but the sample sizes result in ignorable errors. The improvements proposed by by Haldane (1956) and Huether & Murphy (1980) yield almost identical estimates.

ML estimate | Haldane, 1956 | Huether & Murphy, 1980 | |
---|---|---|---|

Xg |
|||

Mann et al., 1962 | 0.33422 | 0.33598 | 0.33603 |

Noades et al., 1966 | 0.32769 | 0.32789 | 0.32789 |

Sanger et al., 1971 | 0.33678 | 0.33688 | 0.33688 |

Colour vision |
|||

Whaaler, 1927 | 0.06640 | 0.06661 | 0.06661 |

Schmidt , 1936 | 0.05974 | 0.06011 | 0.06012 |

Koliopoulos et al., 1976 | 0.06501 | 0.06523 | 0.06523 |

**Table 4:** Comparison between the maximum likelihood estimates and the improved estimates proposed by Haldane (1956) and Huether & Murphy (1980).

If our minimax design is used for an estimation problem, then the minimum efficiency is 0.5155, which is obtained for p = 0. If we compare this value with the maximin solution x = 0.6667 for the estimation problem, we observe how much we have to “pay” for the hypothesis testing. On the other hand, if our primary goal is estimation and we choose the design , then the corresponding maximal variance is which is obtained for p = 0.3333. This can be compared with the earlier obtained . Hence, if our target is parameter estimation, then the efficiency of the model test is reduced in the proportion 0.8991 : 1 .

The common opinion of today is that colour blindness is not a onelocus trait. Waaler´s, Smith´s, and Koliopoulo´s data show statistically significant differences from the one-locus model. The common finding in this study is that the estimate is less than , and this result supports the two-loci hypothesis. However, the other colour vision data, especially the female data, are very limited. NZHTA Report 7 (1998) presents colour vision data collected from different sources and the value of this study is this collection. In addition, that study presents tests of the sex differences in the distribution between subjects with colour deficiency and normal sight. The tests indicate strong sex differences, but the tests have ignored the effect of the sex-linked property of colour blindness, and consequently, these results are of minor interest.

The author is grateful to an anonymous referee for very constructive suggestions concerning the biasedness among the “female estimates” and has tried to consider these remarks in the text. This work was supported in part by a grant from the Magnus Ehrnrooth Foundation.

. For details, see the text.

- Brown AH (1975) Efficient experimental designs for the estimation of genetic parameters in plant populations. Biometrics 31: 145-160.
- Haldane JB (1956) Almost unbiased estimates of functions of frequencies. Sankhya 17: 201-208.
- Haldane JB (1963) Tests for sex-linked inheritance of population samples. Ann Hum Genet 27: 107-111.
- Huether CA, Murphy EA (1980) Reduction of bias in estimating the frequency of recessive genes. Am J Hum Genet 32: 212-222.
- Kalmus H (1965) Diagnosis and genetics of defective colour vision. Pergamon Press.
- Koliopoulos J, Iordanides P, Palmeris G, Chimonidou E (1976) Data concerning colour vision deficiencies amongst 29,985 young Greeks. Mod Probl Ophthalmol 17: 161-164.
- Mann JD, Cahan A, Gelb AG, Fisher N, Hamper J (1962) A sex-linked blood group. Lancet 1: 8-10.
- Noades J, Gavin J, Tippett P, Sanger R, Race RR (1966) The X-linked blood group system Xg tests on British, Northern American, and northern European unrelated people and families. J Med Genet 3: 162-168.
- NZHTA Report 7 (1998) Colour Vision Screening. A critical appraisal of the literature. New Zealand Health Technology Assessment.
- Sanger R, Tippett P, Gavin J (1971) The X-linked blood group system Xg. Tests on unrelated people and families of Northern European ancestry. J Med Genet 8: 427-433.
- Schmidt I (1936) Result of a mass examination of color sense with anomaloscope. Z Bahnärtztex 44-53.
- Waaler GH (1927) Color blindness on the Erblichkeitsverhältnisse of different types of congenital. Ztschr F Induct Lineage-u Vererbungsl 45: 279-333.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- Total views:
**11995** - [From(publication date):

September-2012 - Jul 22, 2019] - Breakdown by view type
- HTML page views :
**8203** - PDF downloads :
**3792**

**Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals**

International Conferences 2019-20