alexa Integrating P-values for Genetic and Genomic Data Analysis | Open Access Journals
ISSN: 2155-6180
Journal of Biometrics & Biostatistics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Integrating P-values for Genetic and Genomic Data Analysis

Hongying Dai1*, Richard Charnigo2, Tarak Srivastava3, Zohreh Talebizadeh4 and Shui Qing Ye4,5

1Research Development and Clinical Investigation, Children’s Mercy Hospital, 2401 Gillham Road, Kansas City, MO, 64108, USA

2Department of Statistics and Biostatistics, University of Kentucky, 725 Rose Street, Lexington, KY, 40536, USA

3Section of Nephrology, Children’s Mercy Hospital, 2401 Gillham Road, Kansas City, MO, 64108, USA

4Division of Genetics Research, Department of Pediatrics, Children’s Mercy Hospital, 2401 Gillham Road, Kansas City, MO, 64108, USA

5Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, 2464 Charlotte Street, Kansas City, MO 64108, USA

*Corresponding Author:
Hongying Dai
Research Development and Clinical Investigation
Children’s Mercy Hospital
2401 Gillham Road, Kansas City
MO, 64108, USA
E-mail: [email protected]

Received date: November 02, 2012; Accepted date: November 05, 2012; Published date: November 12, 2012

Citation:Dai H, Charnigo R, Srivastava T, Talebizadeh Z, Ye SQ (2012) Integrating P-values for Genetic and Genomic Data Analysis. J Biom Biostat 3:e117. doi:10.4172/2155-6180.1000e117

Copyright: ©2012 Dai H, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Rapid developments in molecular technology have led to evolution in Biostatistics and Bioinformatics, to identify genetic variations associated with complex traits. A large amount of information becomes accessible to investigators through Genome Wide Association Studies (GWAS), gene expression arrays, whole genome sequencing and other technologies.

The increase of variants requires more statistical testing to be conducted in analyses, which poses a “curse of dimensionality” to multiple testing correction methods. For instance, false discovery rate (FDR) and its extended methods are commonly used to adjust multiple individual tests, in order to control the family wise Type I error [1,2]. Unfortunately, in large-scale hypothesis testing, these methods tend to yield low power to detect risk factors.

Global testing (also named omnibus testing) of p-values from numerous individual tests may combine evidence, and turn dimensionality from a curse into rich information. From a systems biology perspective, genes, cells, tissues and organs function as a system through metabolic networks and cell signal networks. In non- Mendelian inheritance such as complex disorders, a subset of variants may jointly confer moderate effects in mediating molecular activities. As a result, signals may not be significant in single marker-single trait analysis, but many such values from related genes might provide valuable information on gene function and regulation.

The global test is designed to evaluate the pattern (distribution) of p-values, instead of choosing p-values less than an arbitrary threshold. Therefore, this method has the potential to identify multiple genes with small effects. Assuming that all individual tests are independent and arise from genes with no effects, p-values are identically and independently distributed as Uniform(0,1). Taking this as a null hypothesis for the pattern of p-values in the global test, one can assess whether p-values, especially small p-values, are generated by chance. The global test of p-values is robust and can be applied to p-values from a t-test, an ANOVA, a linear mixed model, and so forth. Multiple simulation studies and case studies have demonstrated that the approach usually has sufficient power to detect signals of genetic association from a group of genes.

Methods

Combination of p-values into a sum or product has long been used by evolutionary biologists in meta-analysis [3]. Many methods can be expressed in the form of , where p-values might first be transformed by a function H. Early researchers had been exploring a raw sum of p-values and sums with various transformations, including log transformation, inverse normal transformation, inverse gamma transformation, logit transformation, and count of p-values less than a threshold, etc. Some classic methods include Fisher’s method [4], Z-test [5], and Lancaster’s procedures [6]. Extensive Monte Carlo comparisons have been conducted for independent [7], and correlated [8] p-values. The classic methods yield simple limiting distributions when p-values follow the identical and independent uniform distribution, under the global null hypothesis. One can also combine p-values using the product method [9,10]. By taking log-transformation on the product of p-values, the product method becomes a special case of sum of log-transformed p-values .

Order-based approaches are another category of global testing for p-values. Tippett’s procedure is to assess the minimal p-value. Simulation studies show that this approach has well controlled Type I error for both independent and correlated data, but will reduce power to identify multiple genes with small effects [8]. Wilkinson extended Tippett’s procedure to the k smallest p-values. By expanding (α + (1 −α ))m , where m is the total number of individual tests, tables of the incomplete beta function can be used to obtain the probability of tests with p-values less than α [11]. Furthermore, empirical distributions of p-values can be calculated and compared to the uniform distribution. These tests include the positive-side Kolmogorov-Smirnov test, the positive-side Cramer-von Mises test, the newly developed order-based approach that accounts for ordering of p-values under the alternative hypothesis [12], and the higher criticism method to detect sparse signals [13].

Current Trends

Recent developments have focused on introducing weight functions and truncation to increase power, as well as on developing global tests for genetic analysis. For instance, a rank truncated method that combines the first k ordered p-values and a truncated product method, that combines p-values that are smaller than a specified threshold, have recently been developed and applied in large scale genomics experiments [14]. Later, an adaptive rank truncated product method was proposed and applied in GWAS [15]. By Yu et al. [15], permutation testing was used to determine the optimal number of k smallest p-values for a product test. In Hess and Iyer [16], Fisher’s method was extended to Affymetrix gene expression arrays and shown to be a suitable diagnostic tool for exploratory analysis of microarray data. The combined p-value method was shown to be favorable versus competing methods through validated microarray data analyses.

Efforts have also been made to cope with complex correlations among p-values. In expression quantitative trait loci (eQTL) analysis to identify genotype and phenotype associations [17], researchers have observed strong correlations among multiple tests due to linkage disequilibrium and functional interactions among single nucleotide polymorphisms (SNPs). To address this issue, Fisher’s method was modified to incorporate correlations among p-values, and then a Satterwhite’s approximation was used to derive the limiting distribution of the test statistic, under the global null hypothesis. Similarly, the weighted Z-test has been modified to include correlations and has been applied in shared controls designs in GWAS [18].

Modeling p-values using analytic distributions also starts to show promise. A beta mixture model has been proposed to model p-values that might come from a combination of null and alterative hypotheses for individual genes. Then, a modified likelihood ratio test and a D-test are proposed to test homogeneity in the mixture model [19]. In Dudbridge and Koeleman [20], extreme-value distributions for fixed numbers of combined evidence and a beta distribution for the most significant evidence are shown to be accurate and efficient for large exploratory studies. Analytic modeling may provide a deeper level of insight into properties of p-values. For instance, a mixture model of p-values may not only suggest the existence of overall signals, but also measure the proportion of variants associated with a phenotype, as well as the strength of association effects.

Below we describe two major trends in application of the combined evidence approach to complex genetic data analysis.

Filtration of Variants with No Association

Global tests can filter out genes with no association and direct researchers to a smaller part of the genome [19]. Filtration is a critical process in current genetic data analyses to remove noises, irrelevant variants and weak signals. Removing genes using arbitrary cutoff values (such as fold change>1. 5 or p-value<0. 05) might increase bias in gene selection. We advocate incorporating global tests into a gene filtration process. Essentially, one can group genes into gene sets based on biological information, pathway or functional network etc. Global tests of p-values will then be performed in the various gene sets to detect whether overall signals exist. Gene sets with no overall signals will be removed, which will greatly reduce the dimensionality.

A global test of p-values can also be used to select the optimal number of genes for a final analysis. For instance, if an auxiliary measure can be used to rank the genetic variants and this auxiliary measure is independent of the global test, then the global test can be used to find a cutoff for the auxiliary measure and select the optimal number of genetic variants for the final analysis. In MDR analysis (the method for gene-gene interaction), several filtration algorithms (such as SURF [21], TuRF [22], and Relief F [23]) have been developed to rank SNPs based on efficiency and redundancy. Then, global tests can be used to determine the optimal cutoff points for these measures and select the optimal number of genes. The global test and ReliefF combined filtration approach has been applied to a candidate gene study of drug response in Juvenile Idiopathic Arthritis, and has identified gene-gene interaction in the folate pathway [8].

Two-stage Reversed Pathway Analysis

Pathway analysis is a field of study to detect a wide range of molecular entities which regulate specific cell functions, metabolic processes and biosynthesis. In Traditional Pathway Analysis (TPA), adjusted cutoffs of fold changes/p-values are being used to select significant individual genes (step 1). Next, it will be tested whether significant individual genes are over represented in pathways (step 2). However, the bias and random error in individual gene selections may severely impact subsequent steps of TPA. We suggest incorporating global testing into pathway analysis and reversing the aforementioned two steps by first detecting significant pathways, and then detecting significant genes in the significant pathways, as illustrated in Figure 1. By switching to this omnibus testing based pathway analysis (OPA), the number of multiple tests is dramatically reduced from ~105 to ~102.

biometrics-biostatistics-traditional-pathway-analysis

Figure 1: Comparisons between traditional pathway analysis (TPA) and omnibus test based pathway analysis (OPA).

Conclusions

Fisher’s method was shown to be asymptotically Bahadur optimal and efficient, assuming p-values are independent. However, there is no uniformly most powerful method of combining p-values. Moreover, accounting for correlations among p-values represents a major challenge to applying global methods that were originally designed based on independence assumptions. Using methods that are designed for correlated data will effectively prevent inflation of Type I error due to complex correlation structures. More ground-breaking theoretical works are needed to develop global tests of p-values that account for such correlation structures.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Article Usage

  • Total views: 11644
  • [From(publication date):
    November-2012 - Nov 20, 2017]
  • Breakdown by view type
  • HTML page views : 7859
  • PDF downloads : 3785
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

 
© 2008- 2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords