alexa Generalized Measure of Dependency for Analysis of Omics Data | OMICS International
ISSN: 2153-0602
Journal of Data Mining in Genomics & Proteomics
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Generalized Measure of Dependency for Analysis of Omics Data

Qihua Tan1,2*, Martin Tepel3, Lars M. Rasmussen4 and Jacob von Bornemann Hjelmborg2

1Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, Odense, Denmark

2Epidemiology, Biostatistics, and Biodemography, Department of Public Health, University of Southern Denmark, Odense, Denmark

3Department of Nephrology, Odense University Hospital, and University of Southern Denmark

4Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, and University of Southern Denmark

*Corresponding Author:
Qihua Tan, MD, PhD
Professor, Epidemiology, Biostatistics
and Biodemography, Dept of Public Health
University of Southern Denmark
J. B. Winslows Vej 9B, DK-5000, Odense C Denmark
Tel: 0045 65503536
E-mail: [email protected]

Received date: October 20, 2015; Accepted date: November 10, 2015; Published date: November 17, 2015

Citation:Tan Q, Tepel M, Rasmussen LM, Hjelmborg JB. (2015) Generalized Measure of Dependency for Analysis of Omics Data. J Data Mining Genomics Proteomics 7:183. doi:10.4172/2153-0602.1000183

Copyright: © 2015 Tan Q, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Data Mining in Genomics & Proteomics

As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].

Currently, the generalized measures of dependency mainly refer to the concepts of rank correlation and information theory based measures. The rank based correlation is well represented by Hoeffding’s D [6] which measures the difference between the joint ranks of two random variables (X, Y) and the product of their marginal ranks. The information theory based approaches include mutual information (MI) [7] and maximal information coefficient (MIC) [5,8]. By providing the amount of information one variable reveals about another, MI measures the dependency between two variables of any type. In the middle of last century, Linfoot [9] proposed the information coefficient of correlation which is a monotone increasing function of mutual information with attractive properties for measuring dependency. Using binning as a means to apply MI on continuous random variables, the MIC [5] can be seen as a continuous variable counterpart to MI. MIC searches over various possible grids through binning to achieve maximal mutual information between two variables. A general overview of the main methods used to identify dependency between random variables has been provided and applications illustrated using microarray gene expression data [4].

In a very recent paper published in Scientific Reports [2], we reported a signature of 82 plasma proteins that predicted the increase of inflammation marker C-reactive protein from index day to next-day using proteome analysis in 91 incident kidney transplant recipients. C-reactive protein is an acute-phase-reactant and is an early nonspecific indicator of infectious or inflammatory situations. Although important, current methods cannot determine the day-to-day development of C-reactive protein at the time of its measurement in plasma. The paper showed that it is possible to define a plasma protein signature to predict the increase of next-day C-reactive protein. The predictive proteins were selected from 359 quantified plasma proteins by correlating plasma protein concentrations of each protein with changes of nextday C-reactive protein using the Pearson’s correlation coefficient. Feature selection was done by recursively shrinking correlation smaller than a predefined threshold to zero and using the remaining subset of proteins for prediction model building using support vector machines. Leave-one-out cross validation estimated a sensitivity of 81%, and a specificity of 69%, and an overall accuracy of 77%.

Taking the same dataset, we explored prognostic protein signature selection using Hoeffding’s measure of dependence, which is a nonparametric measure of association that detects more general departures from independence [6]. Following the same procedure as in Tepel et al. [2] but replacing Pearson’s correlation in the feature selection step with Hoeffding’s D measure, a 62-protein signature was selected for prediction model building. Our new list of proteins performed about equally well as the 82-protein signature with a sensitivity of 79%, a specificity of 70% and a mean accuracy of 76%. Noticeably, among the 62 proteins selected, 48 overlapped with the published 82-plex signature with 14 new proteins. Our novel application of generalized association measure in feature selection in prediction analysis of high dimensional data shows that, by relaxing the linear relationship assumption, the non-traditional method of association could help with more efficient feature selection while maintaining high prediction accuracy.

The capability of handling both linear and nonlinear associations promotes the use of the generalized correlation measures in analysing massive and complex omics data with aim at ultimately disentangling and interpreting the complex patterns of relationships between omics data concepts in an integrative manner. Taking the relationship between gene expression and DNA methylation for example, multiple studies have been conducted in analysis their correlation using Spearman’s correlation coefficient and reported predominantly low or even poor correlation patterns [10,11]. Here, we think that the more adequate generalized correlation methods should help to characterize the biological relationship more adequately and precisely. Moreover, the generalized correlation can also be a useful tool for investigating the functional dependency between sets of attributes in omics data.

Recently, De Siqueira Santos et al. [4] reviewed and evaluated the main methods for identifying dependency between random variables and provided a suggestive list of methods for use in different types of datasets. The main methods can be easily implemented using free R packages such as matie (, FNN (, minerva (, and Hmisc (


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 8173
  • [From(publication date):
    January-2016 - Jun 25, 2018]
  • Breakdown by view type
  • HTML page views : 8098
  • PDF downloads : 75

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2018-19
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

+1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals


[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

© 2008- 2018 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
Leave Your Message 24x7