alexa Gene-environment Interaction Studies with Measurement Error Application in the Complex Diseases in the Newfoundland Populat ion: Environment and Genetics Study | Open Access Journals
ISSN: 2155-6180
Journal of Biometrics & Biostatistics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Gene-environment Interaction Studies with Measurement Error Application in the Complex Diseases in the Newfoundland Populat ion: Environment and Genetics Study

Taraneh Abarin*

Department of Mathematics and Statistics, Memorial University, Canada

*Corresponding Author:
Taraneh Abarin
Department of Mathematics and Statistics
Memorial University
St. John's, NL, Canada
Tel: (709) 864-8733
Fax:(709) 864-3010
E-mail: [email protected]

Received Date: July 29, 2013; Accepted Date: August 26, 2013; Published Date: August 30, 2013

Citation: Abarin T (2013) Gene-environment Interaction Studies with Measurement Error Application in the Complex Diseases in the Newfoundland Population: Environment and Genetics Study. J Biomet Biostat 4:173. doi:10.4172/2155-6180.1000173

Copyright: © 2013 Abarin T. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Abstract

Newfoundland and Labrador NL has had the highest percentage of overweight/obese residents in Canada since 2007. This complex trait is determined by multiple genetic and environmental factors that interact with one another. The existing studies examine such factors under the assumption that they are measured accurately. However, error- prone environmental and genetic factors are unavoidable. The impact of ignoring these errors varies from bias to false results in detecting associations. Motivated by COD-ING study, we present methodologies to estimate model parameters, while accounting for measurement error and misclassification. We applied bias-corrected methods for three separate studies: candidate-gene association study and two gene-environment interaction models, where both environmental and genetic factors are subject to error. Our results based on simulation studies show that the proposed methodologies perform quite satisfactory.

Keywords

Bias-corrected; Gene-environment interaction; Measurement error; Genotyping error; Misclassification

Abbreviations

CODING: Complex Diseases in the New found land Population Environment and Genetics; BC: Bias-corrected; ME: Measurement Error; GEI: Gene-environment Interaction; PA: Physical Activity; TFP: Trunk Fat Percentage; FMO: Fat Mass and Obesity

Introduction

Obesity is a major health issue in Canada. Newfoundland and Labrador (NL) has had the highest percentage of overweight/obese residents in Canada since 2007, and had risen by nearly 7% to 69.3% in 2011 (Statistics Canada). Obesity is determined by multiple genetic and environmental factors that interact with one another in complicated ways. The existing studies examine such factors under the assumption that they are measured accurately [1-4]. However, unobserved or errorprone environmental factors, and/or misclassification in genotyping are unavoidable. In reality, both genetics and environmental factors are likely measured with errors. It is now well-known that measurement, and/or classification errors can influence the results of a study. The impact of ignoring these errors varies from bias and large variability in estimators to low power or even false-negative results in detecting genetic associations [5-7]. In fact, in the presence of measurement error and misclassification, detecting the interaction terms is more challenging than either the genetic or the environmental factors [8]. Motivated by an ongoing, large scale nutrigenomics (CODING) study of Newfoundland adults' population, we present methodologies to estimate model parameters, while accounting for measurement error and misclassification. We applied bias-corrected methods for three separate studies: candidate gene association study and two geneenvironment interaction models, where both environmental and genetic factors are subject to error. This paper is organized as follows. In Section 2, we introduce the three models, and present bias-corrected estimators. We investigate the finite sample performances of the proposed estimators in comparison with the naive estimators, using some simulation studies, in Section 3. The estimation approaches are also illustrated in this section, with the analysis of the CODING data.

Materials and Methods

Model I: Candidate gene association study

Motivated by CODING study of Newfoundland population, we present the methodologies to estimate the model parameters, for three separate studies: candidate gene association study, and two different GEI models. In all these three models, we assume that the response is measured accurately.

In this section, we consider a simple linear regression model for typical candidate-gene association studies. The model can be written as

 Equation    (1)

Where Equation is the response for the ith individual, β0 and Equation are unknown parameters, and G is a binary variable, coded for a candidate gene with dominant effect. One can write model (1) in matrix format as

Equation    (2)

where Equation is the vector of response, Equation is the vector of model error terms with mean zero and variance 2 Equationthe vector of parameters, and Equation is the n× p design matrix.

Moreover, binary variable G with probability of success π is not observable, and instead a binary variable g is observed with classification error. We denote sensitivity or probability of correctly classifying success in G, with EquationTherefore, Equation is the probability of false-negative. Similarly, specificity or probability of correctly classifying failure in G, is defined as Equation Therefore, Equation is the probability of false-positive. The probability of success for g, is determined as follows.

Equation

In order to obtain an unbiased estimate for π based on the observed variable, one must correct the bias in g. In fact, with a simple algebra an unbiased estimate for π based on g is

Equation

The naive Least Squared estimator of β in model (1) that ignores the misclassification in g, is Equation, where X = [1, g].

Rewriting model (1) based on the observed variable g as follows; it is easy to see that is Equationan unbiased estimator.

Equation

In the second equation, Equation or actually g is assumed to be surrogate, which means that it does not provide any extra information about the distribution of Y given what is already provided by G.

From the above equations, it can be seen that Equation naive is biased. It is known that this bias is attenuated with large sample size [5,6]. Furthermore, when π is not very small, the naïve estimator is sensitive to sensitivity, in the sense that the smaller θ11, the worse the naïve estimator [8].

Modifying the methodology suggested by Buonaccorsi [9] for the linear model with an intercept, the matrix of classification probabilities is defined as

Equation

Using the same notation as [9], we have the mean responses for both genotyping groups as µ1 = β0 + β1 and µ2 = β0

Bounacccorsi (9) proposed a bias-corrected estimator for Equation as Equation where Equation Equationand nw is defined as

Equation

With nw1 to be the number of successes in the sample and Equation number of failures in the sample. Returning back the estimates based on β, we have

Equation

This method can be easily extended to any candidate gene with an additive effect. We should mention in here that genotyping error is usually estimated in two different ways. There are either two different methods of genotyping compared, or genotyping using one system is repeated more than once. The later is less expensive.

Model II: Gene-environment interaction I

Now, we consider the first GEI model as

Equation  (3)

In this model, an environmental factor Equationis unobservable. Instead, one observes Z subject to certain measurement error. The measurement error (classic) model may be expressed as

Equation          (4)

Where U is an unobservable measurement error variable, independent from W, with mean zero and variance, say Equation . We also observe g (instead of G) with error. In model (3), there is another environmental factor (A), which is assumed to be measured without error. The interaction term Equation in the model, is between two errorprone variables. We are interested in estimating Equation

Defining X to be the designed matrix based on the observed variables [1, g, Z, gZ, A], the naive estimator that ignores both ME and misclassification in the variables, can be expressed

Equation

Equation     (5)

where the sums in the matrices are over the number of observations.

The methodology suggested by Buonaccorsi [9] to correct the bias caused by misclassification, cannot be applied to this model. Since both sensitivity and specificity are large, the bias caused by this error is small [8]. However, the bias caused by U cannot be ignored. In fact, the larger the variability of U, the worse the naive estimator.

Since

Equation

Equation in equation 5 need to be corrected for bias. However, bias-correcting these terms requires Equation to be estimated. Generally, estimating Equation requires extra information, such as internal or external validation data [5,6]. The BC estimator of ß, therefore, can be expressed as follows.

Equation

where Equation Since g is binary, Equation

Moreover, since Equation, there is no need for correcting the other terms in the naive estimator

Model III: Gene-environment interaction II

Now, we consider the second GEI model as

Equation  (7)

In this model again, both W and G are unobservable. However, in here, the interaction is between the misclassified variable and the accurately measured environmental factor. We are interested in estimating Equation

Defining X to be the designed matrix based on the observed variables [1, g, A, gA, Z], the naive estimator can be expressed as

Equation

Equation  (8)

In here, only ΣZ2 needs to be corrected for bias. The BC estimator of ß, therefore, can be expressed as follows.

Equation

Equation

Covariance matrices

Since the naive estimator does not consider Z or g as random variables, its covariance matrix can be easily written as

Equation

The covariance matrix of the bias corrected estimator, however, is conditional on both g and Z, as follows

Equation      (9)

Results

Simulation studies

To examine the finite-sample performance of the bias-corrected approaches for estimating the regression parameters, we carried out some simulation studies. For each model, we present the simulation set ups and the results, separately.

Model I: Candidate gene association study: For this model, we considered n=500 observations. The regression coefficients were Equation, and the model error variance was set to be Equation . The response was generated 1,000 times, by using model (1). Both sensitivity and specificity were 0.95. We compared three, namely True (based on G), Naive (based on g), and BC estimation approaches.

Figure 1 exhibits the magnitude of biases produced by all the three approaches. From the figure we can clearly see that among the three estimators, True and BC estimators are performing well. It is also noted in here that, since both sensitivity and specificity were relatively large, the impact of misclassification on the estimators, is relatively small.

biometrics-biostatistics-box-plots-misclassified

Figure 1: Box plots of True, Naïve and BC estimators for Y = β0 + β1 G+∈ with misclassified.

Model II: Gene-environment interaction I: For this model, we considered n=500 observations. The regression coefficients were (2,0.1,0.5,0.3,0.2)', and the model error variance was set to be Equation The response was generated 1,000 times, by using model (3). Both sensitivity and specificity were 0.95. Environmental factor W and A, for simplicity, were generated from a standard normal distribution. The error-prone variable Z was generated from model Z=W+U, where is independent of U and has normal distribution with mean zero and varianceEquation Here again, we compared the three approaches: True (based on G and W), Naive (based on g and Z), and BC estimation approach.

Figure 2 shows the magnitude of biases produced by the three approaches. From the figure we can see again that True and BC estimators are performing well. The naive use of W as Z causes remarkable biases in the estimators of β2 and the coefficient of the interaction term β3. It is also noted that, since the misclassification rates are low, the impact of misclassifications on the estimators are negligible. Moreover, since the naive estimate of β0 is unbiased, the box plot for this parameter is omitted.

biometrics-biostatistics-box-plots-error-prone

Figure 2: Box plots of True, Naïve, and BC estimators for Y = β01G +β2W +β3G*W +β4 A+∈ with error-prone W and G.

Model III: Gene-environment interaction II: For this model, we again considered n=500 observations. The regression coefficients were β = (2,0.1,0.5,0.3,0.2)', and the model error variance was set to be Equation The response was generated 1,000 times, by using model (7). Both sensitivity and specificity were 0.95. Environmental factor W and A were generated from a standard normal distribution. The error-prone variable Z was generated from model Z=W+U, where U is independent of W and has normal distribution with mean zero and variance Equation Here again, we compared True, Naive and BC estimation approaches.

Figure 3 shows the magnitude of biases produced by the three approaches. From the figure we can see again that True and BC estimators are performing well. The naive use of W as Z causes remarkable bias in the estimator of β4, the coefficient of W. It is also noted that since the misclassification rates were low, the impact of misclassifications on the estimators were negligible. Moreover, since the naive estimate of β0 is unbiased, the box plot for this parameter is omitted.

biometrics-biostatistics-box-plots-error-prone

Figure 3: Box-plots of true, naïve and BC estimators for Y = β01G +β2A+β3G* A+β4W+∈ with error-prone W and G.

Application: CODING study

Complex Diseases in the Newfoundland Population: Environment and Genetics (CODING) is an ongoing, large scale nutrigenomics study of Newfoundland population, in which 2256 individuals from the Newfoundland population were recruited. Variables considered were PTF measured by dual X-ray absoprtiometry as response, rs9939609 single-nucleotide polymorphisms of the FTO gene, genotyped using the high-throughput MassARRAY R platform (Sequenom Inc, San Diego, CA, USA), and PA measured by the Ability of the Atherosclerosis Risk in Communities (ARIC) Baecke et al. [10], questionaire as covariates. Subjects were stratified by gender for analysis. Gene-candidate association, gene-physical activity interaction, and gene-age interaction were studied. PTF was assumed to be measured with no error. Age was also assumed to be measured accurately. To avoid the colinearty between the variables, age was centred around its mean.

Combination of Sports and Leisure Time Index was selected for the analysis of PA, which was assumed to be measured with error. FTO were coded as G=1 and G=0, for "A" allele with dominant effect. Genotyping error was estimated to be 5%. The purpose of our study was to estimate the coefficients of the three models, accounting for measurement error and genotyping error.

Since there was no extra information available to estimate Equation we performed a sensitivity analysis. It should be mentioned in here that for the ME model (4), Equationis always larger than Equation . In the CODING data, the sample variance for the observed PA was 1.3. Therefore, two arbitrary values of 0.1 and 0.5 were chosen as representatives for relatively and relatively large values for s 2 u . Evidently, the larger the value for Equation , the worse the naive estimates of the parameters! Naive (based on observed genotyping and PA) and BC approach (biascorrected for errors) estimates were calculated for each model with their corresponding standard errors. Equation was calculated using the naive least squared estimators of each model.

Tables 1 and 2 show the results for males and females, separately. As the tables show, when the impact of ME is very small Equation Naive and BC approach estimates for the three models are very similar. However, it is not the case when the impact of ME is relatively large Equation In Model II, Naive estimates of the coefficients for variables G, PA and Equation are affected by the large ME error. Although there was no correction for misclassification of G in BC approach, there is a significant difference between the two estimators of β1. The reason is the interaction between G and the error-prone variable PA. In Model III, however, as it was expected, Naive estimate of the coefficient of the only variable that is highly affected by the ME, is PA. As it was stated in the introduction, the impact of ignoring the ME error, generally, varies from bias in the naive estimators, to false-positive (negative) results in detecting associations. As there is no estimate available for Equation in this data, it is not possible to find out about the impact. However, some interpretations can be made. The large sample Wald test for all the parameters in Model II in Table 2 indicates that both Naive and BC approaches provide similar significant results, different signs of the estimates for β1 and β3 for large variability in ME, however, provides different interpretations of these values. Naive estimates of these parameters imply that for low risk genotype, every additional score in PA makes 4.7% reduction in PTF. For males of high risk genotype, the same amount of increase in PA, obtains only 2.8% reduction in PTF. BC approach, from another hand, starts with higher average PTF for males. It also implies that for males of low risk allele, every additional score in PA makes 6.2% improvement in PTF, when for high risk genotype this amount is 6.8%.

  Naive SE BCEquation SE BCEquation SE
Model I
β0 28.18 0.68 27.92 0.50 27.92 0.50
β1 2.19 0.83 2.50 0.75 2.50 0.75
Model II
β0 33.85 3.06 34.70 3.20 39.27 3.93
β1 0.70 3.38 0.75 3.55 1.25 4.46
β2 -2.17 0.41 -2.28 0.43 -2.90 0.54
β3 0.15 0.52 0.14 0.54 0.05 0.69
β4 0.21 0.03 0.21 0.02 0.19 0.03
Model III
β0 33.35 2.67 34.31 2.76 39.59 3.29
β1 1.49 2.19 1.39 2.18 0.89 2.19
β2 0.21 0.04 0.20 0.04 0.18 0.04
β3 0.004 0.05 0.006 0.05 0.01 0.05
β4 -2.08 0.26 -2.20 0.27 -2.88 0.36

Table 1: Estimates of model coefficients and the standard errors of naive and BC approach for CODING study–Males.

  Naive SE BC Equation SE BC Equation SE
Model I
β0 38.62 0.33 38.61 0.28 38.61 0.28
β1 0.14 0.42 0.16 0.36 0.16 0.36
Model II
β0 44.35 1.76 45.39 1.87 51.77 2.54
β1 0.99 1.91 0.86 2.03 -0.43 2.77
β2 -2.09 0.25 -2.25 0.27 -3.21 0.38
β3 -0.11 0.31 -0.09 0.32 0.13 0.45
β4 0.15 0.01 0.16 0.0 0.14 0.021
Model III
β0 45.38 1.54 46.34 1.58 51.85 1.85
β1 -0.61 1.48 -0.62 1.48 -0.65 1.48
β2 0.14 0.02 0.14 0.02 0.12 0.03
β3 0.02 0.03 0.02 0.03 0.02 0.03
β4 -2.17 0.15 -2.31 0.16 -3.12 0.21

Table 2: Estimates of model coefficients and the standard errors of naive and BC approach for CODING study–Females.

Conclusion

It is now well known that studies of gene-environment interactions can improve the accuracy and precision of the assessment of both genetic and environmental influences. The existing GEI studies on obesity related traits examine both genetics and environmental factors under the assumption that they are measured accurately. However, in reality, both genetics and environmental factors are likely measured with errors. The impact of ignoring errors in variables varies from bias and large variability in estimators to low power or even false negative (positive) results in detecting genetic associations. In order to obtain more accurate results, the bias caused by the errors needs to be corrected.

In this paper, we studied gene-environment interaction and candidate gene association models, where there are misclassification and measurement errors on covariates. In particular, we proposed biascorrected methods to account for these errors. The proposed methods are easy to apply, and unlike some other bias-corrected methodologies [11], do not require distributional assumptions on ME, and/or errorprone covariates. Our results based on simulation studies show that the proposed methodologies perform quite satisfactory. We also analyzed the CODING data showing that when ME is relatively large, the bias caused by it can dramatically affect the estimation in parameters, and therefore, interpretation of the corresponding values.

There are methodologies suggested by other authors to deal with ME in linear and nonlinear models. Some, studied regression calibration and simulation extrapolation [12-14]. These two methods are only "approximately" consistent, which means that even for large sample size, they still require small ME to perform well. Likelihoodbased methods have also been investigated (for example [9] and [12]). Generally, likelihood approaches suffer from restrictive distributional assumptions on ME, covariates with ME and the model error term. Since error-prone covariates and ME are unobservable, likelihoodbased approaches might not be realistic. The proposed approaches in this paper do not require parametric assumptions for the distributions of the unobserved covariates and of the measurement errors, which are difficult to check in practice. They also perform well, no matter how large the ME is. Moreover, the same methodologies may be applied to any interaction models between categorical and continuous variables. However, in those models, both sensitivity and specificity are required to be estimated.

ME models, in general, require extra information such as replicate data, internal or external validation data, or instrumental variables, in order to be identifiable. For example, Abarin and Wang [15] proposed a semi-parametric method for estimating parameters of generalized linear regression models with the classical ME model using instrumental variables. In the case that no extra information is available, sensitivity analysis is performed.

The methodology proposed in this paper can be generalized to longitudinal models. Fan et al. [16] proposed a bias-corrected quasilikelihood approach for longitudinal models, where continuous covariates are subject to error. Generalization of the methodology to longitudinal models, with both misclassified and ME, and the interaction between them, yet to be studied. More studies are also required on the proposed methodology in this paper, to the GEI models where there are more than two categories in the classified variable.

Overall, the results of this paper contribute to enhance the discovery of the genetics and environmental factors in GEI studies. We developed modern yet flexible measurement error techniques that will improve the identification of genetic variants, environmental factors, and their interactions associated with any complex trait.

Acknowledgements

Research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Research and Development Corporation Newfoundland and Labrador (RDC). The author is grateful to Dr. Guang Sun, Faculty of Medicine at Memorial University for providing the data for the analysis. The author is also grateful to the reviewers for their very helpful comments that improved the paper.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11573
  • [From(publication date):
    October-2013 - Aug 23, 2017]
  • Breakdown by view type
  • HTML page views : 7801
  • PDF downloads :3772
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords