alexa Literature Reviews on Methods for Rare Variant Association Studies | Open Access Journals
ISSN: 2161-0436
Human Genetics & Embryology
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Literature Reviews on Methods for Rare Variant Association Studies

Shurong Fang1*, Shuanglin Zhang2 and Qiuying Sha2

1Department of Mathematics, Fairfield University, Fairfield, CT 06824, USA

2Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA

*Corresponding Author:
Shurong Fang
Department of Mathematics, Fairfield University
Fairfield, CT 06824, USA
Tel: (203) 254-4184
E-mail: [email protected]

Received Date: February 01, 2016; Accepted Date: February 16, 2016; Published Date: February 18, 2016

Citation: Fang S, Zhang S, Sha Q (2016) Literature Reviews on Methods for Rare Variant Association Studies. Human Genet Embryol 6:133. doi:10.4172/2161-0436.1000133

Copyright: © 2016 Fang S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Human Genetics & Embryology

Abstract

The widespread availability of genome sequencing data has yielded different rare variant association methods in population-based or family-based designs. However, it is challenging to know which method is appropriate in practice. Our purpose of this paper is to provide a general review of the literature for rare variant association studies and suggestions on future research directions. This paper discusses methods for recent rare variant association studies in three categories. The first two categories are for population-based designs, with/without considering the direction of the effects of causal rare variants. In the third category, methods for family-based designs are concluded.

Keywords

Family-based design; Population-based design; Quantitative traits; Qualitative traits; Rare variants

Introduction

Genome-wide association studies (GWAS) have successfully identified a large number of common variants underlying various complex diseases [1,2]. However, current studies suggest that these common variants identified by GWAS only account for a small fraction of disease heritability [2,3]. It is widely recognized that rare variants are considered to be responsible for the missing heritability [2-4]. Nextgeneration DNA sequencing technologies allow sequencing of parts of the genome for a large number of individuals and the whole genome for a set of individuals, and thus make directly testing rare variants feasible [3,5-9]. Existing single-marker tests used to detect common variants on complex diseases may not be suitable for detecting rare variants due to the allelic heterogeneity and low frequency of rare variants [10]. Several methods using the strategy of collapsing a group of rare variants in a gene or a pathway have been proposed recently. These methods include the cohort allelic sums test (CAST) method [11], the combined multivariate and collapsing (CMC) method [10], the weighted sum (WS) method [12], the variable minor allele frequency threshold (VT) method [13], and the cumulative minorallele test (CMAT) method [14], among others. All the above methods assume that all causal variants are risk to diseases, while some causal variants are protective to some diseases [15,16]. More recently, several methods considering the direction of the effects of association have been proposed, including the C-alpha test [17], the sequence kernel association test (SKAT) [18], the adaptive sum (aSum) test [19], and the step-up method [16], etc. All aforementioned methods are only applicable to population-based designs with unrelated samples, whereas family-based designs have been shown to improve power to detect causal rare variants [4,20]. In addition, population-based tests can be seriously confounded by population stratification in rare variant association studies while family-based tests are robust to population stratification. So far, only a few family-based rare variant association methods have been developed, including the sib-pair and odds ratio weighted sum statistics (SPWSS, ORWSS) [20], two adaptive weighting methods (AW-FBAT, AW-Joint) [21], the FBAT-based test (FBAT-T) [22], the TOW for a family-based design (TOW-F) [23], and the TOW for sib-pair designs (TOW-sib) [24], among others. In this article, we will provide a general review of the literature for rare variant association studies. We summarize and discuss the methods for recent rare variant association studies in three categories. In the first two categories, we provide the summary on methods in population-based designs, with or without considering the direction of effects of causal rare variants. In the third category, methods in family-based designs are concluded.

Materials and Methods

Category 1: Methods not robust to the direction of effects of causal variants in population-based designs

Recently, the strategy of collapsing a group of rare variants, in a gene or a pathway, has been proposed. Morgenthaler and Thilly [11] developed the cohort allelic sums test (CAST) which collapses rare variants and then compares collapsed allele frequencies in cases and controls. CAST was a milestone of rare variant association studies and started a sequence of collapsing methods in later research. Li and Leal [10] extended the CAST to come up with the combined multivariate and collapsing (CMC) method in which rare variants are collapsed within different subgroups and the information of both collapsed rare variants and common variants is used in the association test. Both CAST and CMC need to choose a fixed minor allele frequency (MAF) threshold to define common and rare variants. Madsen and Browning [12] proposed the weighted sum (WS) method where both common and rare variants can be included, but the variants are weighted according to their allele frequencies. Thus, common variants are giving small weights while rare variants are given large weights. Price et al. [13] proposed the variable minor allele frequency threshold (VT) method which tests the association using the ‘optimal’ MAF threshold. Zawistowski et al. [14] developed the cumulative minorallele test (CMAT) which is based on the summation of minor allele counts across all sites for cases and controls. All these methods are burden tests and essentially test the effect of a weighted combination of variants in a genomic region. Let xik denote the genotype (number of minor alleles) of the ith individual at the kth variant, and wk denote the weight for the kth variant. The aforementioned methods are essentially testing the effect of a weighted combination of variants, equation or its function with different definitions of wk [25]. More specifically, CAST, CMC, VT, and CMAT set wk = 1. CMAT tests the effect of Xi while CAST, CMC, and VT test the effect of I {Xi ≥ 1} , where I {•} is an indicator function. WS tests the effect of Xi with wk to be the inverse square root of the expected variance based on allele frequencies in the controls. These collapsing methods are more powerful than single-variant tests. However, they assume that all causal variants are risk to diseases, while causal variants may be protective to some diseases [15,16]. When both risk and protective variants are present, the above mentioned methods are underpowered because the opposite association effects will counteract each other [26].

Category 2: Methods robust to the direction of the effects of causal variants for population-based designs

More recently, several methods that are robust to the direction of the effects of causal variants have been proposed. Neale et al. [17] developed a C-alpha test, comparing the expected variance to the actual variance of the distribution of rare variants in cases versus controls. Wu et al. [18] introduced the sequence kernel association test (SKAT), a variance-component score test, testing for association between variants in a region (both common and rare) while adjusting for covariates. Both C-alpha test and SKAT are essentially testing the variance of the effects rather than the mean. Han and Pan [19] proposed the dataadaptive Sum (aSum) test, which incorporates the signs of the observed effects of causal variants into a burden test. It sets equation is an estimated coefficient of the kth variant based on the marginal logistic linear model for qualitative traits. Hoffmann et al. [16] developed the step-up method, where weights can incorporate MAF, the direction of the effects and the threshold all in a single analysis. It sets wk = akskvk, where ak is a continuous weight (e.g., to incorporate allele frequencies); sk determines the direction of the variant effect; vk is an indicator variable determining whether the kth variant should even be in the model at all. Zhang et al. [27] proposed two grouping strategies (GS) based on WS and used data to decide the direction of the effects of causal variant. Also based on WS, Ionita-Laza et al. [28] proposed the replication-based (R) method, where two one-sided replication-based statistics are applied to risk variants and protective variants, respectively. Sha et al. [29] proposed an adaptive clustering method and adaptive weighting method (AC/ AW) to detect rare variant association in the presence of neutral and/or protective variants. Both AC/AW methods are applicable to quantitative and qualitative traits, and have clear advantages from power to computational efficiency comparing with existing collapsing methods and data-driven methods that allow neutral and protective variants. The above methods in Category 2 are also essentially testing the effect of a weighted combination of variants. Thus, how to choose appropriate weights is critical to the performance of these methods. Lin and Tang [30] derived theoretically optimal procedures for combining rare mutations and applied a general score-based test (GS) to population-based samples based on regression models. The proposed test statistic is optimal if wk is proportional to the set of regression coefficients. Sha et al. [25] proposed a Test for testing the effect of an Optimally Weighted combination of rare variants (TOW). The optimal weights are analytically derived and calculated from genotypes and traits in a population-based design. Furthermore, TOW can be extended to a Variable Weight TOW (VW-TOW) [25] to include both rare and common variants. Liu et al. [31] applied metaanalysis of single variant association tests, burden tests, and variable threshold tests and developed RAREMETAL, which could include covariates for both quantitative and qualitative traits. Zeng et al. [32] proposed the likelihood ratio test (LRT) and restricted likelihood ratio test (ReLRT) to test the association of rare variants based on the linear mixed effects model. Like SKAT, LRT and ReLRT examine variance component in the mixed model. However, LRT and ReLRT estimate both the null and alternative models, and provide an indirect estimate of heritability explained by rare variants. The disadvantage is that they are computationally time-consuming. Ladouceur et al. [33] suggested that the power of currently proposed statistical methods depends strongly on the underlying hypotheses of the relationship of traits with proportions of causal variants or/and the direction of the associations. The methods in the first category are more powerful than most of the methods in the second category when all or almost all of rare variants in a region are causal and in the same direction of association, while methods in the second category outperform those in the first category when there are both risk and protective variants, and more generally, when a substantial portion of the variants is neutral. How to select an appropriate method when there is limited biological knowledge in practice? It was recommended that both tests in the first and second categories should be used in the settings where prior biological knowledge is limited [34]. Lee et al. [35] combined a burden test and SKAT into the optimal sequence kernel association test (SKAT-O). Specifically, SKAT-O automatically behaves like a burden test when the burden test is more powerful than SKAT, and works as SKAT when SKAT is more powerful. Derkach et al. [36] developed the Fisher’s method to combine p-values from two or more complementary tests (Fisher-CT). When most causal variants have the same direction of association, Fisher-CT consistently outperforms SKAT-O, and is often considerably better than burden tests in the first category and non-burden tests in the second category. Sha and Zhang [37] proposed an optimal combination of single-variant tests (OCST) by combining information from the tests of the three classes: only risk variants, both risk and protective variants and only protective variants. Under some scenarios, OCST is consistently more powerful than Fisher-CT and Fisher-CT is consistently more powerful than SKAT-O. Besides studies for qualitative and quantitative traits, Lin and Tang [30] introduced a score test for potentially censored age-at-onset traits. Wang et al. [38] conducted longitudinal data analysis based on TOW and proposed L-TOW to detect rare variant association in population-based designs, since incorporating traits at multiple-time points may increase the statistical power by providing more information than only using the trait at a single-time point.

Category 3: Methods for family-based designs

For any type of study design, the statistical power will be improved when rare variants are enriched in samples. If one parent carries a copy of a rare allele, half of the offspring are expected to carry it, thus, variants that are rare in the general population could be common in certain families [39]. Therefore, family-based methods may improve power to detect causal rare variants [20,40]. Moreover, for rare variant associations, population-based association tests can be seriously confounded by population stratification while family-based association tests are robust to population stratification. However, there are only a few methods for family-based designs available so far.

Qualitative traits: Zhu et al. [4] proposed a two-stage haplotype based (HB) method to identify casual rare haplotypes for qualitative traits. First, a set of risk haplotypes is detected using a small proportion of the sample. Then association with that set of identified haplotypes is tested in a larger case-control sample. Feng et al. [20] introduced a sib-pair weighted sum statistic (SPWSS) to detect both rare and common causal variants in a gene or a genomic region. SPWSS uses either affected or discordant sib-pairs in sequencing or genome-wide association data, is not affected by the directionality of the effect of causal variants, and does not require choosing a MAF threshold. Zhu and Xiong [40] transformed a population-based test to the familybased test by calculating the covariance matrix of the functional principal-component scores, and developed the family-based functional principal-component analysis (FPCA) for qualitative traits. De et al. [22] proposed a method (FBAT-T) in a family-based design by extending the traditional single-SNP Family-Based Association Test (FBAT). Sha et al. [26] proposed a test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC tests the combined effect of rare and common variants in a genomic region by using optimal data-driven weights. Later, He et al. [41] incorporated rarevariant association analysis into the transmission disequilibrium test (TDT) framework [42] to analyze trio sequence data and proposed Rare Variant Extensions of the Transmission Disequilibrium Test (RV-TDT). Choi et al. [43] proposed a FAmily-based Rare Variant Association Test (FARVAT) for extended families. It is based on the quasi-likelihood of whole families. Epstein et al. [44] proposed a framework for rare variants in affected sib-ships (SIB) based on the logic that rare susceptibility variants should be found more often on regions shared identical by descent by affected siblings. They derived both burden and variance-component tests under the SIB framework. SIB does not require variant information from unaffected relatives. Sha and Zhang [24] tested association of an optimally weighted combination of variants for affected sib-pairs (TOW-sib), based on either unrelated individuals or affected sib-pairs.

Quantitative traits: Lin and Tang [30] used generalized linear mixed models to capture the dependence of trait and a score test statistic in a family-based design (GS-F). GS-F is applicable to both quantitative and qualitative traits and allows for covariates. Liu and Leal [45] proposed a unified framework of modeling extreme trait genetic associations (MEGA) for direct quantitative trait loci (QTL) mapping. The framework is based on the mixed effect model, which generalizes the Fisher’s biometrical model [46], coupled with a likelihood method. The QTL effects are modeled as fixed effect to facilitate joint analysis of multiple rare variants. Using MEGA and appropriate permutation algorithms, many rare variant tests for unrelated individuals can be extended to the tests for family data. Fang et al. [21] proposed two adaptive weighting methods, Adaptive Weighting Family-Based Association Test (AW-FBAT) and Adaptive Weighting Joint Test (AW-Joint). AW-FBAT uses between-family information to calculate adaptive weights and uses within-family information to test for association, while AW-Joint uses joint information of between-family and within-family components to calculate the adaptive weights and to test for association. Fang et al. [23] extended TOW for unrelated individuals to TOW-F, TOW for Family-based data. TOW-F is robust to population stratifications in a wide range of population structures. Feng et al. [47] provided meta-analysis of rare variants in families (META-F) which applied to both single-variant and gene-level association tests.

Discussion and Future Directions

As we can see in the previous section, many statistical methods in population-based designs for testing associations of rare variants have been developed. The summary of the currently popular methods is shown in Table 1. Based on the objective and prior biological knowledge of a project in practice, researchers could narrow their choices and select appropriate methods. For rare variant associations, population-based association tests can be seriously confounded by population stratification, since the spectrum of rare variation can be very different in diverse populations. Current studies show that family-based association tests can be robust to population stratification. In family-based data, association information can be partitioned into between-family and within-family information [21]. Within-family information is robust to population stratification while betweenfamily information can be confounded by population stratification. In addition, from the statistical power point of view, the power under any type of study design can be improved when rare variants are enriched in the samples [20,40]. Several popular statistical methods in family-based designs are summarized in Table 2 for researchers to compare. With the rapid advance of biotechnology, new biological knowledge will become available, and next-generation DNA sequencing technologies will allow sequencing the whole genome. It is significantly important to incorporate this new information to improve statistical power to detect rare variants associated with complex diseases. Continued development of novel statistical methods for identifying rare disease susceptibility variants is needed for population-based designs, and especially for family-based designs. We hope this paper can help researchers with practical problems on rare variants. Most of the methods summarized in this paper considered a single trait. However, a gene often affects multiple traits. Thus, analysis of multiple traits simultaneously will increase power to detect rare variant association. When the same variants affect multiple traits, trait values for an individual will tend to be correlated. Very few methods for common variants association studies have been proposed [48,49]. However, this field is still under way and challenging, and needs our special attention. Meta-analysis has facilitated many discoveries in common variant association studies. It is essential for detecting associations with rare variants too, because meta-analysis can increase the sample size, especially for rare variant association studies. To better explore the relationship between rare variants and complex diseases, it is urgent and essential to develop efficient multiple-trait methods as well as meta-analysis for rare variants studies.

Method Direction Quantitative traits Qualitative traits Include common variants Proportion of neutral variants Covariates Population stratification
CAST N N Y N Y N N
CMC N N Y Y N N N
WS N N Y Y N N N
VT N Y Y N N N N
CMAT N N Y N Y Y Y
C Y N Y N Y N N
SKAT Y Y Y Y N Y Y
aSum Y N Y Y N N N
step-up Y Y Y Y N N N
AC/AW Y Y Y Y Y N Y
TOW Y Y Y N Y Y Y
VW-TOW Y Y Y Y Y Y Y
R Y N Y N Y N N
GS Y Y Y Y N Y Y
SKAT-O Y Y Y N Y Y Y
Fisher-CT Y Y Y N Y Y N
OCST Y Y Y N Y Y N
LRT Y Y N N N Y Y
ReLRT Y Y N N N Y Y
RAREMETAL Y Y Y Y N Y Y

Table 1: Summary of statistical methods for population-based designs.

Method Direction Quantitative traits Qualitative traits Include common variants Proportion of neutral variants Covariates Population stratification
AW-FBAT;
AW-Joint
Y Y N N Y N Y
FBAT-T N N Y Y N N Y
TOW-F Y Y N N Y N Y
MEGA N Y N N N N Y
SPWSS Y Y Y Y Y Y Y
FPCA Y N Y Y N N Y
GS-F Y Y Y Y N Y Y
HB N N Y N N N N
TOW-PAC Y N Y Y Y N Y
RV-TDT Y N Y N N N Y
FARVAT Y N Y N N N Y
META-F Y Y N N Y Y Y
SIB Y N Y N Y N Y
TOW-sib Y N Y Y Y N N

Table 2: Summary of statistical methods for family-based designs.

Acknowledgement

There are no sources of financial support.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 7957
  • [From(publication date):
    April-2016 - Jun 29, 2017]
  • Breakdown by view type
  • HTML page views : 7869
  • PDF downloads :88

Review summary

  1. Falaknaz
    Posted on Oct 12 2016 at 7:26 pm
    Authors provided a general review of the literature for rare variant association studies and suggestions on future research directions. This paper discusses methods for recent rare variant association studies in three categories. Also, this manuscript gives additional new knowledge to the literatüre.
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords