Received Date: January 30, 2013; Accepted Date: April 08, 2013; Published Date: April 13, 2013
Citation: Ryu D, Xu H, George V, Su S, Wang X, et al. (2013) Quantifying and Normalizing Methylation Levels in Illumina Arrays. J Biomet Biostat 4:164. doi:10.4172/2155-6180.1000164
Copyright: © 2013 Ryu D, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
Measure of methylation level; Methylated signal variability; Test for differential methylation; N-value
Interest in the role of genome-wide patterns of methylation in human disease has increased in recent years . The epigenome, in general, and the methylome, more specifically, have the potential for large effects in disease etiology. DNA methylation has already been associated with many cancers , and different cell types are known to differ in their methylation patterns. Illumina has been developing genome-wide methylation arrays that enable epigenome-wide association studies of human disease . These arrays are based on BeadChip technology, and the most recent ones contain probes for over 480K CpG sites. These sites cover 99% of RefSeq genes with multiple probes per gene, and 96% of CpG islands from the UCSC database.
This new array utilizes some of the site-specific probes from the previous generation chip that contains probes for 27K CpG sites . Probes for the sites from the 27K array use the Infinium I assay, while the newer probes use the Infinium II assay. The Infinium I assay is based on separate beads for methylated and unmethylated DNA, and the Infinium II assay relies on a single bead for both methylated and umethylated DNA . Both assays result in red- and green-channel intensities, which are normalized using a proprietary method in BeadStudio .
Regardless of the probe design, tests for differential methylation involve comparing the relative methylated/unmethylated signal among experimental conditions . The standard output from BeadStudio provides estimates of the percent signal that is methylated, usually denoted by β. This value is a natural way to summarize the relative methylated/unmethylated signal. Although β is a convenient way to summarize the extent of methylation for any given CpG site, statistical analyses aimed at detecting differential expression may be optimized using other measures. Recently, Du et al.  showed that using a logistic transformation of β, named M-value, may yield better results. However, neither β nor M take into account, the variability in both the methylated and unmethylated signal. We investigated statistical approaches that utilize a weighted logistic transformation of the methylated and unmethylated signals, with the goal of improving the analyses aimed at detecting differential methylation. The method that we develop below takes into account the probe specific variances in summarizing methylation levels.
Another aspect to ensuring accurate results is the need for data normalization, prior to statistical testing for differential methylation. The Illumina software does normalize probe signals in calculating the methylated and unmethylated signals. While much work has focused on normalizing data obtained using BeadArrays for gene expression, less research has focused on normalization of the methylation arrays . While not all studies normalize data beyond that supplied through BeadStudio, some studies have used various normalization strategies, including quantile [7,8], and mean normalization [9,10]. We observed that some arrays in our data had very different distributions of signals after normalization based on the Illumina software, and we were therefore, concerned about between array normalization. Sun et al.  has examined the use of various normalization methods for adjusting for batch effects. Teschendorff et al.  examined several approaches to normalization, examining several factors that could affect the analyses, including batch, DNA input and bisulfite conversion efficiency, as measured using control probes. The optimal normalization for their data was a linear regression that included batch, DNA input and bisulfite conversion efficiency. Bell et al.  normalized β to follow the standard normal distribution, and then used this normalized β in all analyses. We demonstrate below that our proposed method for summarizing methylation levels also has the advantage, in that it appropriately normalizes relative methylation levels.
Definitions of β-value, M-value and N-value
The Illumina software provides several measures for summarizing methylation levels: the average signal for both methylated and unmethylated “probes” (Infinium I utilizes separate probes, while Infinium II utilizes a single probe to generate both a methylated and unmethylated signal), the standard deviation of both methylated and unmethylated signals, and an estimate of the percent of chromosomes that are methylated, known as β-value. The β-value for the ith CpG site on the jth array, where i =1,.. .,I and j =1,.. .,J, is defined as the ratio of the average signal of the methylated probe (methylated signal) divided by the total average signal of both the methylated and unmethylated probes (methylated and unmethylated signal),
Where is the methylated signal, is the unmethylated signal, and α is a constant offset (α=100 by default). Du et al.  suggested the M-value,
Which is equivalent to a log2 logistic transformation of β with a constant offset α.
To define the N-value incorporating the standard deviations of signals, we assume that the logarithm of the average signal is proportional to the logarithm of the standard deviation of signal. Let denote the standard deviation of the methylated signal, and let uijSdenote the standard deviation of the unmethylated signal for the ith site on the jth array. The linear relationship between average and standard deviation of signal in log-scale is then described using two regression models,
Where ,, and are regression coefficients, and and are normal random errors with zero means and constant variances, respectively for the methylated and unmethylated signals.
We define the N-value based on a scaled β. First, we obtain scaled estimates of average methylated signal, and unmethylated signal,by utilizing estimated parameters and in the above regression models. In fact, the exponential residuals from the two regression models, and are scaled average signals, with respect to standard deviations of signals:
Next, the scaled average signals, and are used to define a scaled β,
Which, in turn, is used to define the N-value,
The N-value is based on adjusting both the methylated and unmethylated signals relative to the expected signals, given the observed standard deviations for signals. While these normalized signals have little meaning in themselves, the induced β measures the relative strength of the methylated and unmethylated signals in relation to the expected signals, given the standard deviations. The important measure in this case is the normalized β. Comparing equations (1) and (2) shows that the N-value can be viewed as a version of the M-value rescaled by the standard deviation. We will further show that this rescaling results in a normalization of the signals, adjusting the distribution of intensities that vary among samples with different standard deviations. The analysis of differential methylation using all three quantification methods (β, M, and N), is next considered using data from a pilot experiment.
We utilize data collected from 7 obese samples (case samples) and 7 age-matched lean control samples, using the 27K Illumina array (27,578 CpG sites ; NCBI’s Gene Expression Omnibus accession number GSE25301). These 14 subjects were identified from the participants (n=534) in the Lifestyle, Adiposity, and Cardiovascular Health in Youth (LACHY) study, using the following inclusion criteria: (1) African American ancestry; (2) male; (3) having leukocyte DNA available; (4) obese cases having a body mass index (BMI) ≥ 99th percentile for age and sex, and lean controls having BMI ≤ 10th percentile for age and sex. The LACHY study consisted of roughly equal numbers of African American and European American adolescents, aged 14 to 18 years, of both sexes recruited from high schools in the Augusta, Georgia area .
We conducted a simulation study to compare the performance of testing for differential methylation, using a t-test and each of the three measures of methylation level. In this simulation study, the percent of CpG sites designated as being differentially methylated was 5%, 10%, 15%, or 20%, with the specific sites being chosen randomly to be differentially methylated. Data for both cases and controls were simulated using the same distributions for all sites designated as not differentially methylated. Data for cases and controls were simulated from different distributions for all sites designated as being differentially methylated. Some differentially methylated sites were simulated with the case samples having a different distribution from the null distribution, and some sites were simulated with the control samples having a different distribution from the null distribution.
We simulated the average signals and standard deviations of signal, which were then used to calculate the β-, M, and N-value as above. The methylated and unmethylated signals were simulated separately, since these two signals had slightly different distributions in the obesity data. We simulated standard deviations of methylated signal, and unmethylated signal, for the CpG site i, and the sample j, from lognormal distributions such that
The means and standard deviations used in the simulation were estimated using the obesity data. Intensities were simulated next, by using simple linear regression models of log intensity on log standard deviation for the intensities of the obesity data such that and where and and are regression coefficients.
The least square estimators of these regression coefficients, and follow normal distributions such that and where and are means and variances of the estimated regression coefficients. We estimated the variances of regression errors through the best quadratic unbiased estimators denoted by and . Similarly, we have estimated means and variances of regression coefficients, and for methylated signals, and and for unmethylated signals based on the regressions fit separately for each sample.
Random samples of regression coefficients were then generated by for methylated probes and for unmethylated probes. These simulated regression coefficients were then used to simulate the intensities using the regression equations
The distribution of signal intensity is heterogenous across samples in the obesity data. The simulation procedure described above does not result in the heterogeneity observed in the data. We, therefore, also simulated data with increased heterogeneity, in which the data from one subject, subject o, came from modified distributions. For the methylated probes, the standard deviations were generated by To maintain the linear relationship between the standard deviations and the intensities, regression coefficients were sampled from These regression coefficients were then used to generate the sample intensities as above.
The unmethylated probes were simulated using larger standard deviations than the methylated probes, such that The linear relationship between standard deviation and intensity was generated using
Differences between cases and controls were generated by shifting the standard deviations in the case samples. In one set of simulations, these standard deviations were shifted by 1, and in another set of simulations these standard deviations were shifted by -1.
We examined 1,000 simulations for all scenarios. Two sample sizes for cases and controls were considered: 10 and 20, under two well known significance levels α (Type I error): 0.05 and 0.01.
Signal intensity mean and standard deviation
Two measurements are made for each CpG site, one methylated and the other unmethylated. Any measure of relative methylation levels will ultimately depend on these signal intensities. As such, we first examined the distribution of the mean signal intensity, as well as the standard deviation of the signal intensity for each CpG site. Comparing the distribution of the mean signal intensity among samples showed that the samples are not homogenous, with the signal standard deviation showing a similar pattern (Figure 3). Of note was case sample #5, which has a wider range in mean and standard deviation of the signal. Further, the signal distributions showed greater variability among cases than among controls. A linear relationship between the mean and the standard deviation of the signals appeared to underlie these differences (Figure 4). We utilized this linear relationship to obtain our proposed N-value (Equation 2).
Distributions of measures of methylation level
Although the distributions of the original signals showed clear differences among samples, these differences need not translate into differences in the three summary statistics, β, M, and N. As noted above, β ranged from 0 to 1, with 0 indicating that none of the DNA molecules in that sample were methylated, and 1 indicating that all of the DNA molecules in that sample were methylated. Although β provided an easily interpretable measure of methylation level, its distribution for each sample was highly skewed and slightly bimodal (Figure 1). General concerns about analyzing proportions suggest that a logistic transformation might be appropriate (e.g. M-value, Equation 1). The distribution of the M-value for each sample was bimodal, with the two peaks being more prominent than that for β. While the distribution of β and M for case sample #5 did not appear to be as drastically different from the other samples, the variability in the distribution of both β and M was greater among the cases than among the controls. This heterogeneity in distributions suggested the need for normalization.
Figure 1: Distributions of β, M, and N for obesity data. The first column shows dot plots of the respective summary statistic of methylation levels for each sample, with controls and cases plotted separately. The middle and the right columns show kernel density estimates of each sample separately for cases and controls. Samples are color coded as follows: — sample #1, — sample #2, — sample #3, — sample #4, — sample #5, — sample #6, and — sample #7.
Figure 2: Relationship between means and standard deviations of summary measures of methylation levels among samples. β-value (top, left), M-value (top, right) and N-value (bottom). Light blue blocked lines distinguish upper, middle, and lower regions of values, which have been used in Du et al. . The red line represents the Lowess fit.
Given the linear relationship observed between mean signal intensity and the standard deviation of the signal, the sample specific signal for any given CpG site could be considered as the deviation from the expected relationship between mean and standard deviation (Figure 4). Using these adjusted signal intensities, we defined the N-value (Equation 2). The distribution of the N-value was symmetric, and more importantly, showed less variability among samples than that observed for β and M (Figure 1).
Figure 3: Distribution of signal intensities and standard deviations. (a) Each plot is a dot plot showing the means and standard deviation of the signal for each target site, with control samples on the left and case samples on the right. Red vertical lines indicate the data for either control sample #5 or case sample #5 (b) Kernel density estimates are shown that correspond to the dot plots in (a) Samples are color coded as follows:— sample #1, — sample #2, — sample #3, — sample #4, — sample #5, — sample #6, and — sample #7.
Being a proportion, the variance in β among samples was likely to be associated with the mean of β, which was what we observed (Figure 2). This relationship for β occurred mostly when 0.2 ≤ β ≤ 0.8. The M-value also showed a relationship between the standard deviation and mean among the samples (Figure 2). Our proposed N-value was quite different, with no obvious relationship between the mean N-value and the standard deviation of the N-value (Figure 2). This property should result in the N-value being more appropriate for testing for differences in methylation levels based on a test of mean differences such as a t-test.
Comparison of differential methylation analyses
We compared the results of our analyses to detect differential methylation, using the three summary measures, β, M, and N. We used t-tests to test for differential methylation on a per site basis, using all three summary measures. Using all samples, the analysis using β resulted in 2,091 CpG sites at the 0.05 significance level (Table 1). Analysis using M identified 2,115 sites, and analyses of N identified 1,508 sites. The number of sites identified when case #5 is excluded from the analyses, changed with β identifying 1,901 sites, M 2,137 sites, and N 1,549 sites. Although both β and M identified a larger number of candidate CpG sites under both analyses, these analyses showed the lowest consistency, as determined by the percent of the CpG sites that have p ≤ 0.05 in both analyses (Table 1). This improved consistency suggested that the N-value results are more stable, mainly due to the normalization that is inherent in N compared to both β and M.
|Measure||Site with p ≤ 0.05||Total|
|with case #5||regardless #5||without case #5|
|β-value||809 (29.85%)||1,282 (47.31%)||619 (22.84%)||2,710|
|M-value||592 (21.69%)||1,523 (55.81%)||614 (22.50%)||2,729|
|N-value||361 (18.90%)||1,147 (60.05%)||402 (21.05%)||1,910|
The differentially methylated CpG sites have been identified by two-sample t-test, with p ≤ 0.05 for 7 control samples and 7 case samples over 27,578 CpG sites. By including an outlying sample, case #5, the identified number of CpG sites is changed. Columns (a), (b) and (c) summarize the number of CpG sites identified to be differentially methylated under t-tests, with or without case #5: (a) t-tests identify corresponding sites only when case #5 is included; (b) t-test identify the sites regardless of inclusion of case #5; and (c) t-test identify the sites only when case #5 is excluded. The total column presents the number of identified CpG sites, with or without case #5 by three summary measures.
Table 1: Number of identified CpG sites as being differentially methylated in obesity data.
Our simulations were based on simulating the mean signal and signal standard deviation based on our observations in the obesity study. We simulated data, with and without the sample heterogeneity that was observed in our data. Our proposed N-value had the highest right decision makings, say true positive rates, based on using t-tests, where the significance level 0.1 brought a higher true positive rate than the significance level 0.05 (Table 2). All three summary measures, β, M and N, were affected by the introduction of heterogeneity into the data. Interestingly, the true positive rate for β increased when sample heterogeneity was added to the simulations, whereas M and N both exhibited decreased true positive rates. Overall, N tended to have higher true positive rates than β and M, although there was no significant difference between β and N for sample sizes of 20, when sample heterogeneity was present under significance level 0.05 (Table 2).
|n||R||Without Heterogeneity (α=0.05)||Without Heterogeneity (α=0.01)|
|10||5||38.77 (1.07)||42.65 (0.95)||50.32 (0.69)||63.77 (2.04)||69.02 (1.77)||80.26 (1.02)|
|10||55.44 (0.92)||59.42 (0.81)||66.97 (0.63)||77.55 (1.37)||81.39 (1.13)||89.11 (0.61)|
|15||67.13 (0.76)||70.58 (0.67)||76.96 (0.52)||84.94 (0.97)||87.71 (0.81)||93.04 (0.41)|
|20||73.40 (0.70)||76.60 (0.59)||82.07 (0.42)||88.27 (0.80)||90.57 (0.63)||94.74 (0.33)|
|20||5||48.81 (0.82)||50.71 (0.71)||52.43 (0.69)||79.26 (1.23)||81.52 (1.02)||84.56 (0.78)|
|10||65.37 (0.71)||67.09 (0.65)||68.72 (0.58)||88.22 (0.72)||89.68 (0.60)||91.57 (0.46)|
|15||75.67 (0.56)||77.10 (0.51)||78.43 (0.47)||92.47 (0.47)||93.44 (0.39)||94.72 (0.32)|
|20||80.89 (0.47)||82.23 (0.44)||83.43 (0.40)||94.26 (0.40)||95.11 (0.32)||96.14 (0.25)|
|n||R||With Heterogeneity (α=0.05)||With Heterogeneity (α=0.01)|
|10||5||40.51 (1.21)||32.88 (1.27)||46.41 (0.83)||65.63 (2.38)||62.37 (2.99)||73.59 (1.40)|
|10||57.64 (1.09)||49.11 (1.26)||63.18 (0.71)||79.35 (1.59)||76.47 (2.12)||84.57 (0.87)|
|15||68.59 (0.91)||61.25 (1.09)||74.11 (0.59)||86.00 (1.11)||83.98 (1.59)||90.26 (0.57)|
|20||74.89 (0.76)||67.72 (0.98)||79.42 (0.49)||89.14 (0.90)||87.22 (1.27)||92.43 (0.48)|
|20||5||51.73 (0.90)||48.49 (1.05)||52.31 (0.68)||81.26 (1.34)||80.26 (1.52)||84.23 (0.80)|
|10||68.04 (0.74)||65.23 (0.86)||68.59 (0.59)||89.59 (0.78)||88.94 (0.85)||91.38 (0.48)|
|15||77.50 (0.61)||75.41 (0.65)||78.33 (0.48)||93.25 (0.50)||92.89 (0.56)||94.61 (0.33)|
|20||82.55 (0.49)||80.54 (0.58)||83.36 (0.41)||94.92 (0.40)||94.52 (0.47)||96.03 (0.26)|
The average (standard deviation) of the true positive percent in 1,000 simulations is shown. The significance level (Type I error) of the test is α, the per group sample size is n, and the percent CpG sites that are differentially methylated is denoted by R. Results shown are for the scenario in which the case standard deviation was shifted by 1
Table 2: True positive rates for all CpG sites in simulation study without and with sample heterogeneity.
In addition to having better true positive rates, we also examined the consistency between results, with and without sample heterogeneity. All three summary measures showed increasing robustness with increasing sample size (Table 3). Regardless of sample size, our proposed N-value always showed greater consistency than β and M, while β always showed greater consistency than M.
|10||5||47.94 (1.06)||34.95 (0.88)||68.05 (0.92)||45.12 (1.76)||23.23 (1.36)||62.06 (1.55)|
|10||53.40 (0.93)||35.48 (0.78)||72.50 (0.79)||49.78 (1.43)||23.59 (1.02)||63.03 (1.19)|
|15||56.29 (0.80)||35.81 (0.68)||76.31 (0.66)||50.66 (1.33)||23.55 (0.88)||65.68 (1.00)|
|20||58.54 (0.73)||35.26 (0.62)||76.91 (0.59)||51.64 (1.11)||22.96 (0.79)||64.49 (0.92)|
|20||5||64.06 (0.90)||55.51 (1.10)||82.00 (0.68)||67.72 (1.43)||49.97 (1.57)||90.46 (0.71)|
|10||71.30 (0.74)||61.32 (0.99)||87.49 (0.48)||71.77 (1.02)||51.61 (1.24)||93.51 (0.48)|
|15||75.14 (0.58)||64.53 (0.81)||90.98 (0.36)||72.16 (0.82)||51.85 (0.97)||94.97 (0.35)|
|20||77.83 (0.57)||65.32 (0.81)||92.69 (0.31)||73.10 (0.73)||50.63 (0.89)||95.12 (0.32)|
The average (standard deviation) of the percent of sites with p ≤ 0.05, in both a simulation with and a simulation without sample heterogeneity is shown, summarized for 1,000 simulations. Tests have been performed under the given significance level α. The per group sample size is n, and the percent CpG sites that are differentially methylated is denoted by R. Results shown are for the scenario in which the case standard deviation was shifted by 1
Table 3: Consistency of test results between analyses with and without heterogeneity.
Interestingly, while the true positive rate was affected by the proportion of sites differentially methy-lated (Table 2), the consistency between results for simulations with and without sample heterogeneity was slightly affected by the proportion of sites differentially methylated (Table 3).
We proposed the N-value as a weighted logistic transformation of β, with the advantage that N also normalizes data across arrays. Using our existing obesity data to compare N with both β and M, we showed that the use of N yielded more stable results. Although results were more stable with N, it produced fewer discoveries. The effect sizes in an experiment comparing obese subjects with matched controls are expected to be small, and the number of sites showing differential methylation is likely to be small. The p-value in this case is more likely to be close to uniformly distributed. As such, the smaller number of discoveries with N in the obesity data may be more reflective of true signals. The potential increased accuracy of N relative to β and M was confirmed in our simulations. Further, the simulations demonstrated that N produces results that are more robust to the type of heterogeneity that we observed. Of note is that analyses of M result in increased accuracy, relative to β only when no sample heterogeneity was present, while β was more robust to sample heterogeneity. Our results suggest that N should be preferred to both β and M, with β being preferred to M.
Our results were entirely based on data obtained using the Infinium I assay. Most of the probes on the 450K array are based on the Infinium II assay, which may suggest that our results are not applicable here. The distribution of β is known to differ between Infinium I and Infinium II assays, and the distribution of the signal values will also differ between Infinium I and Infinium II. However, our proposed N-value only depends on separate methylated and unmethylated signals, in which there is a relationship between the signal standard deviation and the signal mean. Importantly, our method accounts for the uncertainty in mean signal, and appropriately adjusts for the variability in signal. Therefore, the N-value should show improved performance with the Infinium II assay as well.
Although we used linear regression to define the N-value here, any sort of regression relationship could be used, since the normalized signals were defined as residuals from the regression of the mean signal on signal standard deviation. The assumption about linear relationship between mean signal and signal standard deviation is one that can easily be examined with each data set, and the regression function adjusted appropriately. We have examined this relationship in preliminary data obtained from the 450K chip, and the relationship between mean signal and signal standard deviation was linear. Such results do suggest that the N-value is a potentially important summary statistic to be used in testing for differential methylation in methylome-wide association studies. Ultimately, the efficacy of different methods will need to be evaluated based on confirmatory biological findings based on analyses using the different summary measures.
This work was supported by intramural funds from Georgia Regents University. The authors appreciate many conversations with Dr. Huidong Shi and for his feedback about our work, and are thankful to the editor and reviewers for comments to enhance the manuscript.