The Role of Copy Number Variation in African Americans with Type 2 Diabetes-Associated End Stage Renal Disease

This study investigated the association of copy number variants (CNVs) in type 2 diabetes (T2D) and T2Dassociated end-stage renal disease (ESRD) in African Americans. Using the Affymetrix 6.0 array, >900,000 CNV probes spanning the genome were interrogated in 965 African Americans with T2D-ESRD and 1029 non-diabetic African American controls. Previously identified and novel CNVs were separately analyzed and were evaluated for insertion/deletion status and then used as predictors in a logistic regression model to test for association. One common CNV insertion on chromosome 1 was significantly associated with T2D-ESRD (p=6.17×10-5, OR=1.63) after multiple comparison correction. This CNV region encompasses the genes AMY2A and AMY2B, which encode amylase isoenzymes produced by the pancreas. Additional common and novel CNVs approaching significance with disease were also detected. These exploratory results require further replication but suggest the involvement of the AMY2A/AMY2B CNV in T2D and/or T2D-ESRD, and indicate that CNVs may contribute to susceptibility for these diseases. The Role of Copy Number Variation in African Americans with Type 2 Diabetes-Associated End Stage Renal Disease


Introduction
Diabetes is a complex and heterogeneous disease with a staggering global impact -most recent estimates indicate 346 million people worldwide suffer from this disease [1]. Type 2 diabetes (T2D) is the most common form of diabetes, accounting for more than 90% of cases, and occurs when peripheral tissue insulin resistance accompanies insufficient β-cell insulin production. While more than 80% of diabetes deaths occur in low-and middle-income countries [1], there remains an enormous diabetes burden in the United States as it is the seventh leading cause of death [2]. One of the most detrimental and expensive comorbidities of diabetes is kidney disease (nephropathy), which develops in ~40% of patients [3] and accounts for 10-20% of all-cause mortality [1]. The final stage of diabetes-associated nephropathy is end-stage renal disease (ESRD), a complication which occurs more frequently in African Americans than other American populations [4]. T2D and T2D-associated ESRD (T2D-ESRD) are likely influenced by complex interactions between environmental surroundings and lifestyle choices, highlighted by the significant increase in T2D over the last several decades [5]. In addition to environmental factors, numerous genetic loci have been identified as potential mediators of T2D and T2D-ESRD [6][7][8]. Though these have aided the understanding of the pathogenesis of these diseases, the combined impact of identified genetic contributors is low relative to the heritability of these diseases. While common genetic variants, specifically single nucleotide polymorphisms (SNPs), have been identified and explored in detail as possible mediators, copy number variation (CNV) is a relatively novel source of genetic variation that has only recently been reported as having a potential role in T2D in European-derived and Asian populations [9][10][11]. Additionally, prior studies of CNVs associated with T2D have not included African Americans [9][10][11].
Important for this study, recent reports have identified CNVs associated with related metabolic traits and disorders including BMI [12] and obesity [13][14][15]. CNVs associated with kidney disease (specifically glomerulonephritis) have also been reported [16]. Complex diseases with disparate incidence rates in different race groups are particularly interesting areas to focus attention, as gene dosage may impact disease manifestation [17]. Based on these observations we performed a CNV analysis of T2D and T2D-ESRD in an African American sample.

Page 2 of 4
Case and control subjects were unrelated, self-described African Americans born in North Carolina, South Carolina, Georgia, Virginia or Tennessee. Subjects with T2D-ESRD were recruited from dialysis facilities (case subjects). T2D was diagnosed as developing diabetes at ≥25 years of age without prior diabetic ketoacidosis. Additionally, T2D-ESRD cases met at least one of the following criteria for inclusion: a) T2D diagnosed ≥5 years before initiating renal replacement therapy, b) background or greater diabetic retinopathy and/or c) ≥100 mg/dl proteinuria on urinalysis without other causes of nephropathy. Control subjects without a current diagnosis of diabetes or renal disease were recruited from the community and internal medicine clinics. DNA extraction was performed using the PureGene system (Gentra systems; Minneapolis, MN).

Genotyping and quality control
Genotyping was performed at the Center for Inherited Disease Research (CIDR) using 1μg of genomic DNA (diluted in 1X TE buffer and at 50ng/μl) on the Affymetrix Genome-wide Human SNP array 6.0 (Affy6.0). Genotyping and quality control have been previously described in depth [18,19].

Analysis
Principal components analysis (PCA) was performed utilizing all high-quality SNP data from the Affy 6.0; regions of high linkage disequilibrium and inversions were excluded [18,19]. Direct comparison of the PCA with FRAPPE [20] analysis of 70 ancestry informative markers (AIMs) resulted in a high correlation between PC1 and the AIMs (r 2 =0.87). The mean (± SD) African ancestry proportions in 965 T2D-ESRD cases and 1,029 controls were 0.80 ± 0.11 and 0.78 ± 0.11, respectively, as estimated by FRAPPE analysis [20]. PCA was also computed on intensity data to detect batch effects; intensity data were normalized using the software Partek [21] prior to CNV calling.
CNV data was analyzed using Affymetrix Genotyping Console 4.0 software with version 30 library and annotation files. This software implements the Birdsuite program tools to identify common, novel, and rare CNV [22] . Copy number (i.e., insertion or deletion) was estimated at all individual probes (novel CNV) and an a priori set of CNVs previously identified and described in McCarroll et al. [23] and implemented in Birdsuite [22]. Following software recommendations, common CNVs were called using the CANARY algorithm and novel CNVs were called using the hidden Markov models implemented in Birdseye [22]. For each method, probes or regions were categorized by copy number (CN) status with insertions defined as CN>2 and deletions defined as CN<2; probes with CN=0 were considered homozygous deletions. Insertions and deletions present in <5 individuals were discarded from further analyses. Following these criteria, for each of the 1,130 previously described CNV regions, 328 insertions and 941 deletions were deemed informative. For individual probes, 113,191 and 134,300 insertions and deletions, respectively, were deemed informative. CN status was coded as a binary variable indicating presence or absence of an insertion/deletion in each sample and used as the predictor in logistic regression models with covariates age, gender, and principal component 1 (PC1) (adjustment for admixture) to test for association with disease. Statistical significance, determined separately for insertions and deletions, was defined by Bonferroni correction. Genomic control inflation factors, based on a single probe when the detected CNV was estimated to extend over a region in which ≥1 probe identified the same CNV, and P-P plots were evaluated for evidence of systematic bias. Table 1 summarizes the clinical characteristics of the 1,994 study samples. Age at enrollment for T2D-ESRD cases was older than the non-diabetic controls; however, age at enrollment for controls was older than age at T2D diagnosis in the cases. The T2D-ESRD cases included a higher proportion of females; additionally, on average, all groups were overweight or obese at enrollment.

Analysis
In the analysis of previously identified CNVs, 14,570 probes corresponding to 1,130 known CNV regions [23], including 328 insertions and 941 deletions, met QC parameters and were informative. Bonferroni-corrected thresholds of significance were P≤1.52×10 -4 and P≤5.31×10 -5 for previously identified insertions and deletions, respectively. Inflation factors (the mean of the Χ 2 statistic) were computed based on informative regions; these were 1.13 and 0.98 for insertions and deletions, respectively. P-P plots for these suggest there was little evidence of a systematic bias ( Figure 1A-B).
A by-probe analysis was performed to detect novel CNVs; 227, 287 individual probes passed QC metrics and were informative. These included 113,191 insertions and 134,300 deletions, corresponding to Bonferroni-corrected significance thresholds of P≤4.42×10 -7 and P≤3.7×10 -7 for insertions and deletions, respectively. Inflation factor calculations were 0.77 and 0.88 for insertions and deletions, respectively. P-P plots are shown in Figure 1C-D. No result from this analysis met strict Bonferroni-corrected significance thresholds; top results are shown in Table 2. The most significant insertion spanned 149.27 kb on chromosome 17; the best p-value for all probes in this region was 5.91×10 -4 , frequencies were 0.091 and 0.147 in cases and controls, respectively, and OR=0.54 (95% CI 0.38-0.77). Genes in this region include KIAA1267, LOC644246, LRRC37A (leucine rich repeat containing 37A) and ARL17B (ADP-ribosylation factor-like 17B). The most significant deletion spanned 247. 16       best p-value for all probes in this region was 4.03×10 -5 , frequencies were 0.003 and 0.026 in cases and controls, respectively, and OR=0.07 (95% CI 0.02-0.24). Genes nearest this region include DAD1 (defender against cell death 1) and ABHD4 (abhydrolase domain containing 4).
Further scrutiny of the probe-level data revealed four regions encompassed by three or more probes spanning >1 kb and where overlap of insertions and deletions occurred ( Figure 2). Results trended towards association with disease susceptibility/protection. A rare overlapping region on chromosome 4 ( Figure 2A) spanning 15.87 kb had an insertion p-value of 2.25×10 -2 , OR=0.74 and frequency=0.16 in cases and 0.19 in controls; the deletion p-value was 1.84×10 -3 , OR=11.25, frequency of 0.04 in cases and 0.02 in controls. The gene nearest this CNV was DEFB131, defensin beta 131, which belongs to the beta-defensin family and is highly expressed in the testis and moderately expressed in the prostate and small intestine [24]. Another overlapping region on chromosome 7 ( Figure 2B) spanning 13.28 kb had an insertion p-value of 3.81x10 -4 , OR=3.11, case frequency=0.06 and control frequency=0.03. This CNV encompasses TRY6, which is thought to be a transcribed pseudogene that encodes a protein similar to trypsinogen [24]. Another overlapping region on chromosome 7 ( Figure 2C) spaning 124.62 kb had an insertion p-value of 3.4510 -3 , OR=0.54, case frequency=0.07 and control frequency=0.10; the deletion p-value was 6.92×10 -4 , OR=2.15, case frequency=0.12 and control frequency=0.09. This CNV encompasses several genes, including five olfactory receptors. The fourth and more common overlapping region of interest occurred on chromosome 8 ( Figure 2D); this region spans 1.70 kb and had an insertion p-value of 2.58×10 -2 , OR=0.60, case frequency=0.12, and control frequency =0.13; the deletion p-value was 1.56×10 -2 , OR=1.48, case frequency=0.16, control frequency=0.12.

Discussion
We performed a GWAS of common and novel CNV data obtained from the Affy 6.0 GWAS array in African Americans with T2D-ESRD. This array was designed to interrogate samples using 946K copy number probes, including 800,000 probes for uniform coverage across the genome and 140,000 additional probes for detection in regions of previously reported CNVs [23], and likely has better coverage of CNVs than other arrays utilized for CNV analyses in T2D. This allowed for two comprehensive analyses. First we examined previously identified common CNVs, a method that is useful for determining if known CNVs are associated with disease and if they exist in the population of interest. Secondly, we performed an unbiased genome-wide analysis based on individual probes, which theoretically enables the detection of undocumented CNVs. Both methods are particularly informative, as African Americans have not been included in prior studies that investigated the potential role of CNVs in T2D or T2D-ESRD.
A single CNV in a 50 kb region encompassing the AMY2A and AMY2B genes on chromosome 1 met Bonferroni-corrected thresholds of significance. AMY2A and AMY2B are pancreatic alpha amylase genes encoding secreted proteins that hydrolyze 1, 4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the initial step in digestion of dietary starch and glycogen. Both genes are expressed in the pancreas and AMY2B is also expressed in the liver. These characteristics suggest that the region is more likely associated with T2D risk. The region encompassed by this CNV contains more than 20 SNPs including several synonymous and nonsynonymous polymorphisms, intronic enhancers, and splice sites. Additionally, this CNV is within 50 kb of a previously identified area of common CNV in the salivary amylase gene AMY1, [25], although these have not been reported in disease studies. Our results suggest a potential role for disruption or duplication of these genes in the development of T2D and/or T2D-ESRD.
We additionally detected known and novel CNVs that occur at appreciable frequencies and approach statistically significant association with T2D/T2D-ESRD in this sample. These included four overlapping regions covering >1 kb where insertions and deletions trended towards association in opposite directions of effect. These regions likely represent true copy number events with a potential role in T2D and/or T2D-ESRD and warrant further investigation and replication.
A limitation of this report is that we compared cases with T2D-ESRD to non-nephropathy, non-diabetic controls. Therefore, association may be due to association with T2D or nephropathy. Unfortunately, appreciable numbers of African Americans with longstanding T2D lacking nephropathy or microalbuminuria are difficult to recruit due to the high prevalence of nephropathy in this population. Such individuals would comprise an ideal comparison group, useful for clarifying whether associations resulted from the presence of T2D or nephropathy. Additional limitations include the lack of replication in this study, which is required for the incorporation of this information into the current working model of the genetics of T2D/T2D-ESRD.
A previous study identified 1,362 CNVs in African Americans using the Affymetrix 500K array and detected two regions with significant frequency differences between African Americans and European Americans [26]. Using the Affy 6.0 array, we also detected that CNV events were common in this population. Another GWAS employing the Affy6.0 did not detect genome-wide significance between CNVs and anthropometric traits in African Americans [27]. Furthermore, a study that included African American children detected association between CNVs and childhood obesity [13]. While these represent efforts towards incorporating African Americans in studies of alternative genetic contributors to widely studied traits and diseases, there are currently no published reports investigating the role of CNVs in T2D in African Americans.

Conclusions
In summary, we observed a previously identified insertion on chromosome 1 associated with disease in a large African American sample with T2D-ESRD, which suggests a potential role for disruption or duplication of the encompassed genes in T2D and/or T2D-ESRD. Additionally, we detected that other common and novel CNVs were present at an appreciable frequency and may contribute to risk of T2D and/or T2D-ESRD in this population. These data are exploratory as CNVs represent a novel yet poorly understood form of genetic variation; validation is crucial for elucidation of additional mechanisms contributing to the etiology of these diseases.