Medical, Pharma, Engineering, Science, Technology and Business

**Esther Herberich ^{1} and Ludwig A Hothorn^{2*}**

^{1}Institute of Statistics, Ludwig-Maximilians-University Munich, Germany

^{2}Institute of Biostatistics, Leibniz University Hannover, Germany

- *Corresponding Author:
- Ludwig A Hothorn

Institute of Biostatistics

Leibniz University Hannover, Germany

**Tel:**49-511-762-5566

**E-mail:**[email protected]

**Received Date:** October 10, 2013; **Accepted Date:** November 18, 2013; **Published Date:** November 23, 2013

**Citation:** Herberich E, Hothorn LA (2013) A Maximum-Type Association Test for Censored Time-to-Event Data. J Biomet Biostat 4: 178. doi: 10.4172/2155-6180.1000178

**Copyright:** © 2013 Herberich E, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

**Background: **Testing the association between a diallelic marker and a censored time-to-event trait is a specific problem in population-based association studies. For a certain gene, the mode of inheritance may be of particular interest. Therefore, the principle of maximum-type tests (or minimum p procedure) is modified for continuous traits, especially for censored time-to-event data.

**Results:** We propose a Marcus-type multiple contrast test for a single censored time-to-event trait in a populationbased study assuming a Cox proportional hazard model. Using simulations we worked out the limitation of this asymptotic approach: sufficient sample sizes and non-rare alleles are required. A user-friendly implementation of this method is available in the survival and multcomp packages of the statistical software R.

**Conclusions: **The proposed approach can be used for the analysis of individual SNPs when censored time-to-event data in population-based association studies are of interest. The approach allows both a global claim of association and determination of the particular underlying mode of inheritance. The mode-specific hazard ratios and their lower simultaneous confidence limits provide information about statistical significance and genetic relevance.

Max-test; Censored time-to-event trait; Populationbased association study

Particular SNPs in population-based association studies can be analyzed using a maximum test, such as the MAX-3 test. This approach is widely used for proportions in the case control design and for continuous traits. Here, an extension is proposed for censored time-to-event traits using a Marcus-type multiple contrast test under the assumptions of the Cox proportional hazard model. Simulations revealed serious limitation of this asymptotic approach as sufficient sample sizes and non-rare alleles are required. Both a global claim of association and the particular underlying mode of inheritance can be identified. The mode-specific hazard ratios and their lower simultaneous confidence limits provide information about statistical significance and genetic relevance. A user-friendly implementation of this method is available in the survival and multcomp packages of the statistical software R.

Genome-wide association studies involving large population-based samples are used to identify common variants that affect a particular trait. Most of these studies compare the allele frequencies of di-allelic markers in cases and controls using the Cochran-Armitage trend test [1]. Because the mode of inheritance at a given locus is often unknown, a maximum-test (minimum p approach, respectively) based on three mode-specific standardized Cochran Armitage trend tests have been proposed [2]. Alternatively, continuous endpoints (i.e., quantitative traits), such as gene expression, are commonly analyzed using a linear regression model of genotype scores *x*=(0, 1, 2) adjusted for covariates [3]. A special case of quantitative traits is time-to-event data with right censoring. For example, in a study of the survival of 116 female mice with the three genotypes aa, Aa and AA at the marker DM13D147 in chromosome 13 after an infection with *Listeria monocytogenes* [4], the raw data consist of the following three items (available in the R package qtl [5]): survival time (pheno), genotype group (geno) and censoring status (cens) (**Table 1**). The related Kaplan-Meier estimators reveal substantial differences in survival between the three genotype groups aa, Aa and AA, in which A is assumed to be the high risk allele:

pheno | geno | cens | |
---|---|---|---|

1 | 118.32 | AA | TRUE |

2 | 264 | Aa | FALSE |

3 | 194.92 | Aa | TRUE |

4 | 264 | Aa | FALSE |

5 | 145.42 | Aa | TRUE |

6 | 177.23 | Aa | TRUE |

7 | 264 | aa | FALSE |

8 | 76.67 | AA | TRUE |

9 | 90.75 | AA | TRUE |

10 | 76.17 | Aa | TRUE |

. | . | . | . |

. | . | . | . |

. | . | . | . |

115 | 76.48 | AA | TRUE |

116 | 116.47 | Aa | TRUE |

117 | 116.52 | Aa | TRUE |

118 | 139.55 | Aa | TRUE |

119 | 264 | Aa | FALSE |

120 | 116.2 | Aa | TRUE |

**Table 1: **Raw data.

**Figure 1** shows that the survival function of the heterozygous genotype, Aa, is not symmetrical to the functions of the two homozygous genotypes, aa and AA, which would be an indicator of an additive mode of inheritance. Instead, the heterozygous genotype is close to the non-risk homozygous, aa, indicating a recessive mode of inheritance. The idea of the maximum test is sensitivity to each of the basic modes of inheritance (i.e., additive, recessive, and dominant). Although reporting tiny p-values for the list of top-k SNPs is common nowadays, the ‘Strengthening the Reporting of Genetic Association studies’ report [6] recommended reporting appropriate effect size estimators and their confidence intervals. To compare survival functions, the hazard ratio is the appropriate effect size and simultaneous confidence intervals for a maximum-test for the three basic genetic models (i.e. additive, recessive, and dominant mode), are estimated. We propose a testing procedure that not only is sensitive to these three alternatives, but also able to determine which of the alternatives is likely using the diagnostic characteristics of simultaneous confidence intervals. We use the multiple contrast test approach [7], extended to the Cox proportional hazard model. This test is not likely applicable to genome-wide studies, merely because of the long computation time (about 0.03 sec on i7-4600 CPU per phenotype and SNP), but it can be used for specific analysis of the top-k SNPs or a priori genes of interest.

Here, we describe an asymptotic multiple contrast test for a censored time-to-event trait assuming the Cox proportional hazards model based on the simultaneous inference approach in general parametric models [8]. This is an extension of a related scores test for the generalized linear model [9] for censored time-to-event traits.

**Marcus-type association test for censored time-to-event data**

We consider three genotype groups i ∈ {aa, Aa, AA} with ni subjects carrying genotype i. A denotes the high risk allele and a denotes any other allele. To describe the effect of genotype i and, when applicable, the effects of other covariates on the hazard of death we use a Cox proportional hazard model:

The vector x_{j} contains the covariates of the j^{th} individual including the genotype i; the vector with restriction includes the genotype effects and the effects of further covariates; λ_{0}(t) denotes the baseline hazard rate at time t and is assumed to be identical for all individuals.

Let be a vector of contrast coefficients fulfilling the constraint For the sake of simplicity, we assume a model with the genotype as single covariate in the following, i.e., C=(c_{aa}, c_{Aa}, c_{AA}) and β=(β_{aa}, β_{Aa}, β_{AA}). If the elements of vector c fulfill and the linear combination can be interpreted as a difference of weighted averages of genotype effects.

We consider three genetic contrasts

each corresponding to one of the three genetic models, i.e. for each genetic model an individual statistic is computed. These contrasts are formulated by the so-called Marcus-type contrast matrix [10]

(1)

whose elements cmi are the contrast coefficients. The product of each row vector of C and the vector of genotype effects β=(β_{aa}, β_{Aa}, β_{AA}) corresponds to one linear combination Lm, m ∈ {dom, add, rec} associated with a specific genetic model.

In case of a dominant mode of inheritance, the effects β_{Aa} and β_{AA} on the hazard rate are identical. Therefore a genetic contrast for this mode can be expressed by

which denotes the difference between the pooled effects of genotypes Aa and AA. Analogously, in case of a recessive mode of inheritance, the effects β_{aa} and β_{Aa} on the hazard rate are identical. A recessive genetic contrast can be specified by

The genetic contrast for an additive model can be expressed by

Thus, the case of no global genetic effect is characterized by β_{aa}=β_{A}=β_{AA} or, equivalently We can test for a genetic model by performing a one-sided union-intersection test on the three linear combinations L_{m} with control of the FWER over all three contrasts. That is, we test the intersection of the elementary null hypotheses

versus the union of the elementary alternative hypotheses

This multiple contrast test was already described for normally distributed variables [10]. Therefore, we denote this as the Marcus-type multiple contrast test.

Instead of multiple tests, lower simultaneous confidence intervals for the linear combinations L_{m} can be used. Exponentiating the confidence limits leads to confidence intervals for exp (L_{m}), which can be interpreted as a hazard ratio of the weighted average of the genotype effects. The presence of an association between genotype and trait is indicated if at least one of the three confidence intervals for the hazard ratios exp (L_{m}) excludes the value 1.

In other words, the above procedure tests the null hypothesis that no genetic effect exists against the three alternatives that the mode of inheritance is dominant, additive, or recessive.

An adjustment is needed to ensure that the overall hypothesis (no global genetic effect) is tested at level α. The three local hypotheses are positively correlated, and this correlation is included in the test procedure in order to prevent the overall test from being too conservative. By testing three different local hypotheses, the procedure is sensitive to three different genetic models and has greater power to detect an association when the mode of inheritance is not additive.

**Approximate lower confidence limits for one contrast of genotype effects**

In the Cox proportional hazards model, parameter estimates are obtained by maximization of the partial likelihood [11]. The maximum partial likelihood estimates are asymptotically normally distributed [12]. The point estimator for a single linear combination L is and the lower (1-α) Wald confidence limit for L is

where z_{1-α} denotes the (1−α) quantile of the standard normal distribution and the element in the ith row and jth column of the matrix . is the inverse of the observed Cox information matrix and used as an estimation of the covariance of .

A lower (1−α) Wald confidence limit for the hazard ratio exp (L) is given by

In case of non-proportional hazards the accelerated failure time model can be used. In addition, the frailty Cox model can be used to model clustered survival data, such as when considering multiple studies in a meta-analysis. Both approaches are available for multiple contrast tests [13].

**Approximate simultaneous lower confidence limits for multiple of genotype effects**

According to Hothorn et al. [8], limits of approximate lower simultaneous confidence intervals for several linear combinations of model parameters Lm can be constructed by

where z_{3,R,1−α} is the upper equicoordinate (1−α) quantile of the multivariate normal distribution with expectation 0 and correlation matrix R and Φ3 (q; 0, R) the associated cumulative density function. The quantile z_{3,R,1−α} is chosen such that

where Z_{m} is the m^{th} element of a trivariate normal random vector Z ~ N (0, R). The probability that atleast one of the simultaneous confidence intervals does not include the true value of the associated contrast L_{m} is α with n→ ∞. Control of the FWER is achieved using quantiles that take the number of estimated contrasts and correlation between them into account.

Again, exponentiating the lower limit leads to simultaneous confidence intervals for multiple hazard ratios:

**Simulations**

To evaluate the performance of the proposed method we estimated the type I error rate and power using simulations in the open-source software R [14]. Each simulation step was repeated 10,000 times.

The trait genotypes for N=500, 1000, 2000 subjects were randomly drawn from a multinomial distribution assuming Hardy-Weinberg equilibrium. Allele frequencies were chosen p=0.5 at trait locus, pm=0.05, 0.1, 0.3, 0.5 at trait marker, and linkage disequilibrium (LD) was chosen δ=0.025, 0.05, 0.1, 0.2. Phenotypic time-to-event data were generated according to Bender et al. [15] using a Weibull distribution with baseline hazard rate Censoring times were generated from a uniform distribution on the interval [0,τ] with the τ chosen such that the censoring rate was approximately 20%.

The desired confidence level was 1−α=0.95.

For estimation of the probability of type I error, data were generated under the null hypothesis β_{aa}=β_{Aa}=β_{AA}=0, i.e. corresponding to a hazard ratio of 1. In one setting the model was investigated without additional covariates besides the genotype. In another setting, the model was investigated with two covariates: x_{1} uniformly distributed on [2,4] without an effect on the hazard rate, i.e. β_{1}=0, and x_{2} uniformly distributed on [0,4] with effect β_{2}=0.5. The family wise error rate (FWER), that is the probability of falsely detecting any mode of inheritance, was used as measure of the type I error and estimated by the proportion of datasets in which at least one simultaneous confidence interval for Marcus-type hazard ratios did not include the value 1.

For estimation of the power phenotypic values were simulated using genotype effects β_{aa}=β_{Aa}=0 and β_{AA} ∈ [0,2] for a recessive mode of inheritance, β_{aa}=0, β_{Aa} [0,1] and β_{AA}=2. βAa for an additive mode of inheritance, and β_{aa}=0 and β_{Aa}=β_{AA} ∈ [0,2] for a dominant mode of inheritance. These genotype-specific effects correspond to mode of inheritance-specific hazard ratios HR ∈ [1,4,7]. Each value of power was estimated by the proportion of datasets in which the correct mode of inheritance was detected by the simultaneous lower confidence intervals.

The estimated type I error (FWER) is shown in **Figure 2**. For fixed sample size, the procedures get more liberal with increasing disbalancy of allele frequencies and/or lower LD. For settings with rather balanced allele frequencies, i.e. pm=0.3, 0.5, a sample size of N=500 is sufficient to ensures FWER control even when LD is low. For settings with unbalanced allele frequencies, i.e. pm=0.05, 0.1 larger samples are required. A sample size of N=2000 for pm=0.1, and a sample size of N=5000 for pm=0.05 provides FWER control (results for N=5000 not shown). In the setting with covariates, the FWER is slightly higher than in the model with the genotype as single covariate.

The power of the procedure to identify the correct genetic model is given in **Figure 3** for sample sizes of N=1000. The power increases with higher linkage disequilibrium in all models. The power is considerably higher for the dominant and additive mode of inheritance compared to the recessive mode of inheritance, with the latter showing very poor power except when allele frequency was pm=0.5 at trait marker. In the dominant model, the power decreases with increasing frequencies of the rare allele, whereas the power increases with increasing frequency of the rare allele in the recessive model. The power in the additive model is similar for all allele frequencies. With increasing sample size (N=500 vs. N=1000 vs. N=2000), no general increase in the power to detect the correct mode of inheritance can be found (Power curves for N=500 and N=2000 not shown.). In some settings the mode-specific power increases slightly, whereas in some settings the mode-specific power increases. The power to detect any association increases with increasing sample size, but more often incorrect mode of inheritance is chosen. When the high-risk allele is rare and thus genotype frequencies are unbalanced, an additive mode of inheritance is more often stated to be dominant, and a recessive mode of inheritance is more often stated to be additive (**Figure 3**).

**Figure 4** shows the power of the procedure to detect any association between genotype and trait. The extent to which the procedure’s power to detect the correct mode of inheritance and the power to detect any association differ depends on the underlying mode of inheritance and the disbalancy of allele frequencies. A ’symmetrical’ relation of the modes of inheritance and the differences between overall and modespecific power exists, which is caused by the fact that the three modes of inheritance are ’symmetrical’ for the alleles. When high-risk alleles are rare (pm=0.05, 0.1), the difference in overall power and mode-specific power is negotiable for the dominant, considerable for the recessive, and intermediate for the additive mode of inheritance. When highrisk alleles are frequent (pm>0.75) the difference in overall power and mode-specific power is negotiable for the recessive, considerable for the dominant, and again intermediate for the additive mode of inheritance. When alleles frequencies are balanced, the difference in overall power and mode-specific power is negotiable for the additive model, and intermediate for the dominant and recessive modes of inheritance (**Figure 4**).

**Evaluation of the example**

The Listeria example described above was analyzed using the new Marcus-type association test for censored time-to-event data. Simultaneous lower 95% confidence intervals for contrasts of genotype effects corresponding to the three genetic models were computed using the R [14] packages survival [16] and multcomp [17]. The estimated hazard ratios and simultaneous lower confidence limits for the three basic modes of inheritance are given in **Table 2**.

Inheritance-specific contrast | Hazard ratio | Lower confidence limit |
---|---|---|

Cdom | 1.56 | 0.82 |

Cadd | 3.16 | 1.60 |

Crec | 3.50 | 2.23 |

**Table 2: **Estimated inheritance-specific hazard ratios and their simultaneous lower confidence limit for marker DM13D147 at chromosome 13.

Clearly, the largest hazard ratio with the most distant lower confidence limit was determined for the recessive mode of inheritance (abbreviated with C_{rec}). Mice homozygous for the high risk allele had a greater chance of a short survival time compared to animals carrying either one or two non-risk alleles (genotypes Aa and aa). With a probability of 95% animals carrying two high risk alleles had at least a 2.23-fold greater hazard.

**Software**

The simultaneous confidence intervals for hazard ratios obtained from a Cox proportional hazard model can be computed using the coxph function in the package survival [16]. The lower simultaneous confidence intervals for Marcus-type hazard ratios can be computed by the function glht in the package multcomp [17,18]. Intervals for the hazard ratios were obtained by exponentiating the estimated confidence limits. Alternatively, adjusted p-values can be calculated for the corresponding multiple tests. The package multcomp employs the algorithms used for computing the multivariate normal quantiles [19] implemented in the package mvtnorm [20].

Evaluation of selected SNPs by the proposed Marcus-type multiple contrast test for censored time-to-event data in population-based association studies is useful in several aspects. First, the most likely underlying mode of inheritance, not just a global association, can be concluded. The outcomes, i.e., the mode-specific hazard ratios and their lower simultaneous confidence limits, allow interpretation in terms of both statistical significance and medical relevance. This asymptotic approach is limited to study designs with sufficient sample sizes, such as 1000 or more, and is limited to non-rare alleles. Real data can be analyzed easily using the R packages survival and multcomp. Straight forward extensions for data with non-proportional hazards and/or multiple studies (i.e., meta-analysis) are possible.

This work was supported in part by HO 1687/9-1 for the second author (LAH).

- Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53: 1253-1261.
- Freidlin B, Zheng G, Li Z, Gastwirth JL (2002) Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered 53: 146-152.
- Hong KW, Moon S, Kim YJ, Kim YK, Kim DJ, et al. (2012) Association between the ABO locus and hematological traits in Korean. BMC Genet 13: 78.
- Broman KW (2003) Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163: 1169-1175.
- Broman KW, Wu H, Sen S, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: 889-890.
- Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, et al. (2009) Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE Statement. Human Genetics 125: 131-151.
- Konietschke F, Libiger O, Hothorn LA (2012) Nonparametric evaluation of quantitative traits in population-based association studies when the genetic model is unknown. PLoS One 7: e31242.
- Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biom J 50: 346-363.
- So HC, Sham PC (2011) Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates. Behav Genet 41: 768-775.
- Marcus R (1982) Some Results On Simultaneous Confidence-Intervals For Monotone Contrasts In One- Way Anova Model. Commun Statisti Theor Meth 11: 615-622.
- Cox DR (1972) Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B Statistical Methodology 34: 187-220.
- Andersen PK, Gill RD (1982) Cox's Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics 10: 1100-1120.
- Herberich E, Hothorn T (2012) Dunnett-type inference in the frailty Cox model with covariates. Stat Med 31: 45-55.
- Dean CB, Nielsen JD (2007) Generalized linear mixed models: a review and some extensions. Lifetime Data Anal 13: 497-512.
- Bender R, Augustin T, Blettner M (2005) Generating survival times to simulate Cox proportional hazards models. Stat Med 24: 1713-1723.
- http://CRAN.R-project.org/package=survival
- http: //CRAN.R-project.org/package=multcomp
- Bretz F, Hothorn T, Westfall P (2002) On multiple comparisons in R. R News 2: 14-17.
- Genz A, Bretz F (2009) Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Heidelberg: Springer-Verlag 2009.
- Hothorn T, Bretz F, Genz A (2001) On Multivariate t and Gau Probabilities in R. R News 1: 27-29.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- 6th International Conference on
**Biostatistics**and**Bioinformatics**

November 13-14, 2017, Atlanta, USA

- Total views:
**11669** - [From(publication date):

December-2013 - Oct 17, 2017] - Breakdown by view type
- HTML page views :
**7862** - PDF downloads :
**3807**

Peer Reviewed Journals

International Conferences 2017-18