Epigenomic Indicators of Age in African Americans

Age is a well-established risk factor for chronic diseases. However, the cellular and molecular changes associated with aging processes that are related to chronic disease initiation and progression are not well-understood. Thus, there is an increased need to identify new markers of cellular and molecular changes that occur during aging processes. In this study, we use genome-wide DNA methylation from 26,428 CpG sites in 13,877 genes to investigate the relationship between age and epigenetic variation in the peripheral blood cells of 972 African American adults from the Genetic Epidemiology Network of Arteriopathy (GENOA) study (mean age=66.3 years, range=39–95). Age was significantly associated with 7,601 (28.8%) CpG sites after Bonferroni correction for α=0.05 (p<1.89×10−6). Due to the extraordinarily strong associations between age and many of the CpG sites (>7,000 sites with p-values ranging from 10−6 to 10−43), we investigated how well the DNA methylation markers predict age. We found that 2,095 (7.9%) CpG sites were significant predictors of age after Bonferroni correction. The top five principal components of the 2,095 age-associated CpG sites accounted for 69.3% of the variability in these CpG sites, and they explained 26.8% of the variation in age. The associations between methylation markers and adult age are so ubiquitous and strong that we hypothesize that DNA methylation patterns may be an important measure of cellular aging processes. Given the highly correlated nature of the age-associated epigenome (as evidenced by the principal components analysis), whole pathways may be regulated as a consequence of aging.


Introduction
Age is a well-established risk factor for chronic diseases [1,2]. However, the cellular and molecular changes associated with aging processes that are related to chronic disease initiation and progression are not well-understood. As the United States transitions into an unprecedented increase in the number of aging adults over the next few decades [3], there is an increased need to identify new markers of cellular and molecular changes that occur during aging processes. These new markers may lead to earlier identification and more effective treatments for chronic disease.
Genetic biomarkers of age include telomere length, gene expression, and DNA methylation patterns. Telomere length decreases with age, and a recent review of 124 cross-sectional studies estimated a mean telomere loss of 24.7 base pairs per year in leukocytes [4]. Some [5,6], but not all [7], cross-sectional and longitudinal studies of telomere length in leukocytes have shown that African Americans have longer telomere lengths than European Americans, after adjusting for age. Telomere loss has also been shown to happen faster in African Americans [5,7]. Since telomere length has also been shown to be associated with chronic disease status, particularly cardiovascular disease [8] and mortality [9], it may serve as an important biomarker for human aging. Gene expression patterns have also recently shown promise as a physical marker of aging in humans. A study by Harries, et al. found that approximately 2% of transcripts genome-wide are robustly associated with age, and that six gene expression probes could be used to build an efficient model to distinguish between younger (<65 years) and older (≥75 years) subjects [10]. To date, little work has been conducted on gene expression patterns and their association with age in African American populations.
Recently, differential DNA methylation patterns that affect gene expression have been shown to be associated with aging [11]. More specifically, age has been found to be associated with methylation status in pathways related to liver development and metabolism [5,12], inflammation, endothelial function, oxidation [13,14], and tumor suppression [15,16]. Since DNA methylation and other epigenetic mechanisms provide a potentially modifiable link between a gene's expression and a resulting phenotype [17][18][19][20], unraveling the relationship between epigenetic mechanisms and cellular aging processes is crucial to understanding the origins of chronic diseases and target organ damage that accompanies aging.
Many prior preliminary studies that have investigated the relationship between DNA methylation and aging processes have either focused on specific genomic regions, such as genes in a single biological pathway [13,14], or have investigated average whole-genome DNA methylation [11,21]. Studies of whole-genome methylation have consistently shown an overall decrease in methylation with increased age. Methylome-wide studies conducted in a variety of tissue types and across a wide range of age groups are now emerging [22][23][24][25][26][27].
These studies have shown significant age-associated changes in DNA methylation at many loci throughout the genome in pediatric (N=398 [23]; N=15 [24]) and adult populations (N=68 [22]; N=63 [23]; N=93 [26]). A few of the age-associated methylation sites have been shown to have a significant overlap between pediatric and adult populations [23,24]; however, the rate of change of DNA methylation with age is estimated to be three-to fourfold faster in pediatric populations [23]. In accordance with the whole-genome methylation studies, the comparison of a newborn and a centenarian genome showed more hypomethylated DNA in the centenarian genome across promoters, exonic, intronic, and intergenic regions, though a greater level of methylation was observed in CpG island promoter regions [27]. Methylome-wide and gene-specific studies have also focused on developing predictive models for age [22,28]. For example, Bocklandt et al. showed that the methylation of three CpG sites is linear with age in adults 18 to 70 years of age, and can predict age with high accuracy (an average of 5.2 years) [22].
Despite the benefits of preliminary studies discussed above, the majority of methylome-wide studies have been conducted in European American samples and/or have consisted of relatively small sample sizes (N<400). In this study, we use genome-wide DNA methylation information from 26,428 individual CpG sites in 13,877 genes to investigate the relationship between age and epigenetic variation in the peripheral blood cells of 972 African American adults from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. We also compare our findings across 4 studies that used the same method for measuring DNA methylation (Illumina Infinium HumanMethylation27 BeadChip) to identify those sites that replicate across studies. Building off of other studies, this work can help to begin identifying the chromosomal regions and pathways involved in the epigenetics of aging.

Sample
The Genetic Epidemiology Network of Arteriopathy (GENOA) study is a community-based study of hypertensive sibships that was designed to investigate the genetics of hypertension and target organ damage in African Americans from Jackson, MS [29]. In the initial phase of the GENOA study (Phase I: 1996(Phase I: -2001, all members of sibships containing ≥2 individuals with essential hypertension clinically diagnosed before age 60 were invited to participate, including both hypertensive and normotensive siblings (N=1,854). In the second phase of the GENOA study (Phase II: 2000-2004), 1,482 participants were successfully rerecruited for a second examination. DNA methylation was measured on 1,008 African American participants using stored blood samples collected during the second (Phase II) examination. The Phase I and II examinations included questionnaires to assess health status, health behaviors, and medical history; physical examination for blood pressure, height, and weight; and fasting blood samples for creatinine, cholesterol, glucose, insulin, and other biochemical measures [30].

Measurement of DNA methylation
Sample preparation and methylation assay-DNA was isolated from peripheral blood leukocytes obtained from stored blood samples, and was bisulfite-converted with the EZ DNA Methylation Gold Kit (Zymo Research, Orange CA). Bisulfite-converted DNA samples were whole-genome amplified, enzymatically fragmented, and purified, then hybridized to Illumina Infinium HumanMethylation27BeadChips, which contain locusspecific DNA oligomers and a set of 56 control probes. The array was then fluorescently stained, scanned using the Illumina BeadXpress reader, and assessed for fluorescence intensities across the methylated and unmethylated bead types at 27,578 CpG sites [31][32][33]. This work was performed at the Genotyping Core in the Mayo Clinic Advanced Genomics Technology Center (Rochester, MN).
Data processing and methylation quantification-At each CpG site, fluorescent signals were measured from the site-specific M (methlyated) and U (unmethylated) bead types. The raw fluorescence data was processed using Illumina BeadStudio. To reduce batch and chip effects, the correlation structure among all 56 control probes was evaluated within channel to identify the most parsimonious subset of probes that explained the maximum amount of batch and chip variation across samples (5 probes in the red channel and 8 probes in the green channel). We adjusted for batch and chip effects by linearly regressing the 13 selected probes onto the intensity signals from the methylated and unmethylated bead types separately across each CpG site.
Before statistical analysis, samples were checked for data quality. Seven samples were excluded from analysis due to poor bisulfite conversion efficiency (intensity <4,000), and an additional 29 samples were excluded due to extreme control probe values (i.e., at least one control probe greater than four standard deviations from its mean value). This resulted in a total sample size of 972.
In this study, we analyzed only autosomal CpG sites. Since our modeling strategy assumes that the error terms for the regression on CpG sites are normally distributed [34], we removed 58 CpG sites from the analysis because they were found to be multimodal based on the Dip Test of unimodality proposed by Hartigan and Hartigan [35] using a cut-off of p<0.001 on the signal intensities of the methylated and/or unmethylated bead types. This resulted in 26,428 CpG sites included in our analysis. We next identified the 2,984 CpG sites with non-specific binding probes and 908 CpG sites with polymorphic probes that overlap with single nucleotide polymorphisms (SNPs) reported by Chen et al. [36]. Although these sites were not removed from the analysis, we have interpreted the results from these sites with caution. That is, we acknowledge that the relationship between DNA methylation and age at these sites may be in part influenced by probe characteristics.
Finally, an M-value for each individual i at a single CpG site, k, was calculated as: Mvalue ik =log 2 [(max(M ik ,0)+1) / (max(U ik ,0)+1)] [37]. Relatively unmethylated M-values were considered to be <−2, methylated M-values were >2, and semi-methylated M-values were between −2 and 2. These M-value cut-offs correspond to β values of 0.2 and 0.8 [37], where β is the ratio of the signal from the methylated probe to the sum of the methylated and unmethylated probes, as follows: β ik =max(M ik ,0) / (max(M ik ,0)+ max(U ik ,0)+100). Mvalues greater than four standard deviations from the mean of each CpG site were removed because these values are discontinuous with the distribution and extend beyond the point where 99.9% of the values are predicted to lie, according to the Empirical Rule [38]. A total Smith et al.
Page 4 Hereditary Genet. Author manuscript; available in PMC 2016 January 21. of 28,278 outliers were removed from the 26,428 CpG sites included in the analysis. The number of outliers removed ranged from 0 to 34 across all sites (mean=1.07, sd=1.74).

Statistical analyses
Linear mixed effects modeling-We used a linear mixed effects modeling approach to evaluate the cross-sectional associations between DNA methylation and age while accounting for the familial relationships among study participants using the nlme package in R [39]. In order to examine the effects of age on DNA methylation, we considered each of the 26,428 individual CpG sites separately as outcomes, with participant age as a covariate in the following model: and W jk is the random effect for each sibship. Thus, in each model, sibship was modeled as a random intercept, and the rest of the effects were modeled as fixed effects. In performing this modeling, four CpG sites exhibited convergence issues and were subsequently removed from the analysis. The Bonferroni method was used to assess experiment-wise statistical significance of the p-values (Bonferroni-corrected p-value=1.89×10 −6 for a significance level of α =0.05).
Due to the extraordinarily strong associations between age and many of the CpG sites (>7,000 sites with p-values ranging from 10 −6 to 10 −43 ), we wanted to assess the joint effects of CpG sites with age. We first used a set of models to evaluate how well each of the DNA methylation markers predicted age. In these models, age was the outcome and each of the 26,428 CpG sites were predictors, individually, in a linear mixed model: We again used the Bonferroni method to assess experiment-wise statistical significance (Bonferroni-corrected p-value=1.89×10 −6 ).
In order to better understand the joint effects and correlation structure of the large number of CpG sites associated with age, we performed principal component (PC) analysis. We calculated PCs using all 2,095 CpG sites that were significantly associated with age at 1.89×10 −6 . From the scree plot of the PCs, we identified elbow points at 1 PC, 5 PCs, and 10 PCs. Next, we evaluated the bivariate association between age and each of the top five PCs in separate mixed models such that Age ij =β 0 + β 1 PC ij + W j + ε ij . Finally, we evaluated the association between age and the top five PCs combined in a multivariable mixed model such that We also constructed a multivariable mixed model using the top 10 PCs. R 2 values based on likelihood ratio models (R 2 LR ) were calculated for each model using the R package lmmfit [40]. (BMI) of 31.2 kg/m 2 . Additional descriptive statistics are presented in Table 1. The mean M-value for each of the 2,428 CpG sites ranged from −5.37 to 5.07 with an average mean M-value across all sites of −1.58 (Figure 1). The majority of the sites (15,221 sites, 57.6%) were unmethylated, with a mean M-value of <−2.

Associations between age and CpG sites
In modeling age as a predictor of M-value, age was significantly associated with 7,601 (28.8%) CpG sites after Bonferroni correction for α=0.05. Of the sites with statistically significant associations, 671 (8.8%) contained nonspecific binding probes, 159 (2.1%) contained polymorphic probes, and nine sites (0.12%) had both non-specific binding and polymorphic probes as defined by Chen et al. [36]. Adding sex as a covariate into the model did not substantially change the associations between age and the CpG sites (7,410 of the 7,601 associations were still significant after accounting for sex). Table 2 shows the 30 CpG sites that were most strongly predicted by age. A striking finding of this analysis is that age had an inverse association with all but two of the top 30 CpG sites, indicating that increased age is strongly associated with decreased methylation at the majority of the most strongly associated sites.
The tendency for age to be inversely associated with CpG site methylation was also observed in the 7,601 CpG site M-values that were significantly predicted by age. Figure 2 shows the relationship between the mean M-value at each of the 26,428 sites and the tstatistic corresponding to the regression coefficient for age. The t-statistic on the y-axis provides two types of information: a) the magnitude of the association with age, and b) the direction of the association with age. For example, a t-statistic of −5.0 represents a p-value=5×10 −7 and indicates that increasing age is associated with decreasing methylation. Of the 7,601 sites statistically significantly associated with age, 7,292 (95.9%) had negative t-statistics, while only 309 (4.1%) had positive t-statistics. Of the 7,292 CpG sites with negative t-statistics, 5,589 sites (76.6%) were unmethylated, 1,675 (23.0%) were semimethylated, and 28 (0.4%) were methylated. The increased density of negative t-statistics for unmethylated markers (M-values < −2) indicates that they are increasingly less methylated with older age. In contrast, of the 309 sites with positive t-statistics, 34 (11.0%) were unmethylated, 106 (34.3%) were semi-methylated, and 169 (54.7%) were methylated. The increased density of positive t-statistics for methylated markers (M-values >+2) indicates that these methylated markers are increasingly more methylated with older age. A final feature of the genome-wide results displayed in Figure 2 is that it appears that vast majority of the most significant associations with age (p<10 −10 ) were in markers that are semimethylated (M-values between −2 and +2).
Given the very large number of highly significant age associations with DNA methylation at CpG sites, we investigated how well the DNA methylation markers could predict age. We examined linear mixed models of CpG site M-values as predictors of age and found 2,095 (7.9%) sites that were significant predictors of age after Bonferroni correction with experiment-wise α=0.05. Supplemental Table 1 shows the 30 CpG sites that had the strongest association with age. Nearly all (2,086, 99.6%) of these sites were also significant Smith et Hereditary Genet. Author manuscript; available in PMC 2016 January 21.
in the previously evaluated regression of M-values on age, and had the same direction of effect.
Principal components of the 2,095 age-associated CpG sites were estimated in order to examine the features of the multivariable distribution of significant epigenetic predictors of age (Table 3). The top five principal components accounted for 69.3% of the variability in the 2,095 CpG sites, and the next five principal components accounted for an additional 4.7% (i.e., a total of 74.0%). When each of the top five PCs was used as a predictor of age, each of the first four PCs was significantly associated with age. In a multivariate model, the top five PCs combined explained 26.8% of the variation in age. The linear mixed model containing the top 10 together explained an additional 9.22% (i.e., a total of 36.5%) of the variation in age.

Discussion
Our findings in GENOA African Americans suggest that age and DNA methylation are very strongly associated at many CpG sites across the genome (28.8% of the CpG sites that we examined). In this study, the associations between the methylation markers and adult age are so ubiquitous and strong that we hypothesize that DNA methylation patterns may be an important measure of cellular aging processes in this population. Given the highly correlated nature of the age-associated epigenome (as evidenced by the principal components analysis), whole pathways may be regulated as a consequence of aging.
Consistent with previous studies in humans and other vertebrates [41][42][43], we found that the majority of CpG sites (95.9%) tended to be less methylated with increased age (Figure 2). These changes in methylation may contribute to chronic diseases through a variety of mechanisms. For example, it has been found that loss of methylation in CpG dinucleotides over time may transcriptionally activate silenced retrotransposons and lead to genomic instability [44,45]. We also detected a minority of sites (4.1%) that were more methylated with increased age. Increases in methylation at CpG dinucleotides may prevent the binding of transcription factors and potentially suppress gene expression [46]. More investigation of the pathways implicated in these sets of sites may lead to important insights into aging and disease processes. However, replication of these sites would be an important prerequisite to detailed pathway analysis.
Previous research has indicated that DNA methylation is a molecular representation of the cellular memory of environmental experiences. We found that the joint effects of 2,095 CpG sites, represented in the top 10 principal components, were able to explain ~36% of the variation in our GENOA African American adults (mean age=66.3 years; SD=7.6). This indicates that epigenetic markers may be an important link to understanding the genetic and environmental components that contribute to inter-individual differences in the aging process.
Several other studies conducted in a variety of populations have examined the association between age and DNA methylation across the genome using the same Illumina Infinium HumanMethylation27 microarray platform that was used in this study [22][23][24][25]. We were able to replicate many of the associations between CpG sites and age that were detected in other studies; however, the extent of replication in GENOA African Americans varied according to the age distribution of the other study population, as well as the tissue type used to measure methylation. Table 4 summarizes the findings from studies that have examined the association between age and DNA methylation and the extent of replication of these findings in GENOA African American adults.
Briefly, we replicated 84.4% of the age-associated CpG sites from a study of saliva samples from 34 monozygotic twin pairs aged 21-55 years conducted by Bocklandt et al. [22] (pvalue <0.05 in GENOA and the same direction of effect). In a study of whole blood methylation from 398 healthy males aged 3 through 17 years conducted by Alisch et al., we replicated 72.5% of the age-associated CpG sites [23]. Of the sites that we replicated from the Alisch et al. study, the majority (84.6%) were less methylated with increasing age. In order to assess methylation patterns throughout different phases of development, Numata et al. examined methylation in the dorsolateral frontal cortex of the brain in study groups of varying ages (fetal (N=30), childhood, ages 0-10 years (N=15), and beyond childhood, age > 10 years (N=63)) [24]. Despite using a biologically available tissue, we replicated 13%, 49%, and 63% of the frontal cortex age-associated CpG sites in these study groups, respectively. Finally, we replicated 86% of the age-associated CpG sites associated from a study conducted by Teschendorff et al., which examined the association between age and DNA methylation from whole blood samples of postmenopausal women (N=113 ovarian cancer cases and N=148 controls) [25]. Of the sites replicated in GENOA, the majority (69.3%) were less methylated with increasing age.
A variety of factors may have contributed to the differences in findings between the present study and previous studies. Different tissue types display differences in methylation patterns, and there is also a substantial difference between the methylation patterns observed between tissue samples and blood samples [47]. It is not surprising that we replicated a much higher percentage of the age-associated sites from studies that measured methylation in peripheral blood than studies that used tissue samples. Population demographics of the studies may also have contributed to differences in findings. The GENOA population is African American, has an older average age than other populations studied, and is primarily hypertensive. The higher prevalence of hypertension, diabetes, and obesity in this population and/or the higher prevalence of risk factors for these chronic diseases (such as diet, stress, and physical activity) may have led to specific DNA methylation signatures. Since we assessed a cell population of peripheral blood leukocytes that consists largely of neutrophils (40-75%) and lymphocytes (16-48%) [48], we recognize that we may be exploring the aging processes of these cell types which are involved in promoting chronic inflammation, a common correlate of common chronic diseases. Differences in statistical techniques and sample sizes may also have led to differences in the significance levels of age-associated sites, and hence the comparability across studies. However, despite these important differences between studies, we can conclude that there are many CpG sites that are associated with age across a variety of studies, and that our study contributes to a growing body of knowledge that indicate groupings of CpG sites that are important indicators of age and developmental stage across a variety of populations. Our study does have several limitations. First, as discussed above, the study population is African American, of older age, and primarily hypertensive. Thus, findings may not be entirely generalizable to populations of other ethnic backgrounds, ages, or disease history profiles. However, the GENOA study is a community-based sample that is composed of both hypertensive and normotensive individuals in sibships that have demographics that are similar to other families in the community (age range=39-95 years) [29]. A second limitation is that we do not know the extent to which genetic variation influences epigenetic variation. If there is a substantial influence, then admixture in the African American community from Jackson, MS may affect the results of this study. A third limitation of this study is that we only have cross-sectional measures of methylation and age. Since we do not have longitudinal measures of methylation, we can't assess how methylation changes with age in individual participants.
This study shows that in this population of GENOA African Americans, many CpG sites are strongly associated with age and predict a substantial amount of variation in age. Future research should include a closer examination of the highly significant markers to determine their molecular physiological role in the aging process. Another avenue of research would be to identify individuals with methylation profiles that are extremely different than their chronological age in order to understand how these markers translate into physiological differences. From a clinical and public health perspective, differences between chronological age and cellular age could be used to identify individuals at greater risk of premature aging and age-related chronic diseases.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    Table 2 Top 30 methylation sites most strongly predicted by age. Model: E ij =β 0 + β 1 Age ij + W j Probes are designated as polymorphic and/or non-specific binding according to Chen et al. [36].
CpG sites listed within this table were not among those with non-specific binding probes.
Hereditary Genet. Author manuscript; available in PMC 2016 January 21.   Table 4 Comparison of age-associated methylation sites between GENOA and previous studies.