Haplotypes of Polymorphic Antigen Processing Genes for Low Molecular Mass Polypeptides (LMP2 and LMp7) are Strongly Associated with Type 1 Diabetes in North India

World-wide disease affects 1 in 300-400 children [2]. The prevalence in India is 10.20/100,000 with higher prevalence of 26.6/100,000 in urban areas compared to 4.27/100,000 in rural areas [3]. While several genetic [4-6] and environmental factors have been implicated in autoimmune destruction of the insulin producing Pancreatic Beta cells, the association of the Major Histocompatibility complex (MHC) class-II alleles has been shown to be the strongest [5-7]. The function of the MHC molecule is to present antigenic peptides to the T cells for the immune response to take place. However, for the peptides to be presented on the MHC molecule, the antigenic proteins need to be processed into small peptides and loaded on to the peptide binding groove of the MHC molecule. Cytosolic or viral proteins are processed in the cytoplasm by a complex of proteosomes, which include interferon–gamma (IFN-γ) inducible low molecular mass proteases or polypeptide complex 2 and 7 (LMP2 and LMP7), also known as proteasome subunit beta type-9 (PSMB9) and Proteasome subunit beta type-8 (PSMB8) respectively [8-12].


Introduction
Type 1 diabetes (T1D) is an incurable, multifactorial and complex autoimmune disorder. In T1D, most of the insulin producing beta cells of the pancreas are lost before the disease manifests itself in the form of abnormal glucose metabolism. Uncontrolled hyperglycemia may result in complications like ketoacidosis, retinopathy, nephropathy and even cardio-vascular diseases and pre-mature death [1].
World-wide disease affects 1 in 300-400 children [2]. The prevalence in India is 10.20/100,000 with higher prevalence of 26.6/100,000 in urban areas compared to 4.27/100,000 in rural areas [3]. While several genetic [4][5][6] and environmental factors have been implicated in autoimmune destruction of the insulin producing Pancreatic Beta cells, the association of the Major Histocompatibility complex (MHC) class-II alleles has been shown to be the strongest [5][6][7]. The function of the MHC molecule is to present antigenic peptides to the T cells for the immune response to take place. However, for the peptides to be presented on the MHC molecule, the antigenic proteins need to be processed into small peptides and loaded on to the peptide binding groove of the MHC molecule. Cytosolic or viral proteins are processed in the cytoplasm by a complex of proteosomes, which include interferon-gamma (IFN-γ) inducible low molecular mass proteases or polypeptide complex 2 and 7 (LMP2 and LMP7), also known as proteasome subunit beta type-9 (PSMB9) and Proteasome subunit beta type-8 (PSMB8) respectively [8][9][10][11][12].
LMP2 and LMP7 seem to have peptide editing function since they select the peptides to be presented on MHC class-I molecules and thus modulate the immune response against self or non-self antigens. Proteosomes have also been shown to mediate the processing and that were part of our earlier studies [5,14] and 752 normal healthy controls (199 females and 553 males, mean age of 31.86 ± 20.03) from the same ethnic background were studied for LMP2 and LMP7 SNPs after obtaining informed written consent and Institutional Human Ethics Committee's approval from both All India Institute of medical Sciences and National Institute of Immunology, New Delhi. All subjects i.e. patients and controls were based in Delhi, originally from three states of North India, Uttar Pradesh, Haryana and Punjab. The controls were the random healthy individuals with no disease, symptoms of a disease or family history of any autoimmune or infectious disease and comprised of students, scholars and employees of NII and AIIMS, who gave informed consent.

PCR amplification and genotyping:
Third exon of LMP2 and second exon and sixth Intron of LMP7 were amplified using Polymerase Chain Reaction (PCR) using standard conditions and primers described by Casp et al. [11] listed in Table  1. SNP genotypes were determined by restriction fragment length polymorphism (RFLP) analysis of the PCR products as described [11]. The digested fragments were resolved on 3% agarose gel electrophoresis in TBE buffer. The single SNP studied from LMP2 was G/A substitution at codon 60 in exon 3, studied using restriction endonuclease Hha I, which cleaves the G allele, but not the A allele ( Figure 1a). LMP7 exon 2 SNP A/C at codon 49 was studied using restriction enzyme Pst-I which cleaves the C allele and the A allele remains uncut (Figure 1b). LMP7intron 6 SNP was studied using Hha I enzyme which cleaves the G allele but not the T allele (Figure 1c). Figure 1 shows the interpretations of different genotypes based on PCR-RFLP patterns.

HLA-DRB1 polymorphism
Alleles of HLA-DRB1 locus were studied for 199 T1D patients and 350 controls for whom LMP2 and LMP7 data were available as described earlier using either 32 P-labeled sequence specific oligonucleotide probes (SSOP) or Luminex based HLA typing using Labtype SSO kit from One Lambda, (Canoga Park, CA, USA) according to the manufacturer's instructions as described earlier [5,15].

Statistical analysis
The significance of differences in allelic and genotypic frequencies between T1D patients and controls was determined by standard χ 2 tests, Odds ratios and 95% confidence intervals using Stata 9.2 software. However, whenever the numbers in any group (i.e. in cases  (Q) (p<0.0098) and homozygous -CC (QQ) (p<0.03) were significantly increased in T1D. Allele A (K) (p<0.0098) and genotype AA (KK) were significantly reduced (p<0.03) in T1D patients compared to healthy controls. While the differences in allele frequencies remained significant even after the p value was corrected for multiple comparisons, the difference in genotype frequencies did not remain significant after correction.
In LMP7 Intron 6 the G allele was significantly increased (p<0.01) and T allele (p<0.01) and homozygous TT (p<0.03) were significantly reduced in T1D patients as compared to controls. The differences in allele frequencies remained significant even after the p value was corrected for multiple comparisons; however, the difference in homozygous TT genotype frequency between patients and controls did not remain significant after correction Since G allele of LMP2 exon 3, C alleles of LMP7 exon 2 and G allele of LMP7 intron 6 were significantly increased in T1D patients even after Bonferroni's correction; we wanted to study if there is a Linkage Disequilibrium (LD) between these alleles. For this purpose, haplotypes were constructed using online software SHEsis [19,20] for 206 T1D and 738 healthy controls samples which were typed for all the three loci. Interestingly, as expected these SNPs were indeed in LD ( Figure 2) and GCG (G allele of LMP2-C alleles of LMP7-G allele of LMP7 intron 6) was the most frequent haplotype observed with a frequency of 62.37% in the patients compared to 39.56% in the controls and this difference was highly significant (p=5.9×10 -13 ). Haplotype ACT was observed with a frequency of 14.1% in patients compared to 5.42% in the controls and this difference was also significant (p=1.9×10 -8 ). However, haplotypes or controls) were less than 5 for any allele Fisher's exact test was used. In such cases, Odds ratios were calculated using Woolf 's method [16] with Haldane's [17] modification as described earlier [18]. p values were corrected using Bonferroni's correction for multiple comparisons. Linkage disequilibrium between HLA alleles and LMP haplotypes were calculated as described earlier [5]. Haplotype analysis for LMP2-LMP7 SNPs, haplotype association with disease, gender and age at onset were done using online SHEsis software [19,20] (http://202.120.31.177/ myAnalysis.php).

Genotype, allele and haplotype frequencies of LMP2 and LMP7 SNPs in T1D patients as compared to controls
Genotype, allele and haplotype frequencies of LMP2 exon 3 G/A, LMP7 exon 2 A/C and LMP7 intron 6 G/T are shown in Table 2. All genotype frequencies in patients as well as controls were in Hardy Weinberg equilibrium. The G to A substitution in LMP2 exon 3 leads to an amino acid change from arginine (R) to histidine (H) at codon 60 (CGC to CAC). A significant increase in the frequency of G (R) allele (p<0.009) and homozygous GG (RR) genotype (p<0.01) was observed in T1D patients compared to healthy controls. These differences were significant even after Bonferroni's correction.
In LMP 7 exon 2 substitution of C to A results in amino acid change from glutamine (Q) to Lysine (K) at codon 49 (CAG/AAG  GCT and ACG were significantly reduced in T1D patients with a total absence of haplotype ACG. Differences in the haplotype frequencies were significant even after Bonferroni's correction.
Gender-wise distribution of genotype, allele and haplotype frequencies of LMP2 and LMP7 SNPs in gender matched T1D patients and controls To check whether there is any gender bias in the LMP2 and LMP7 alleles associated with T1D, 101 females and 134 male T1D patients' genotypes and haplotypes were compared with 199 healthy females and 552 healthy males respectively as shown in Tables 3 and 4. The data showed that for LMP2 codon 60, allele G (R) (p<0.015) and genotype GG (RR) (p<0.016) were significantly increased and allele A (H) was significantly reduced (p<0.015) in female patients compared with female controls (Table 3) and these differences were significant even after Bonferroni's correction. While the difference in allele frequencies of these SNPs was not statistically significant in male patients compared to male controls (Table 4), haplotype GCG was significantly increased in both female (65.29%) and male (60.83%) patients compared to female (42.31%) and male (39.3%) controls respectively. Male patients showed significant increase in frequency of haplotype ACT as compared to male controls, however, this difference was not observed in female patients. Haplotypes GCT and ACG were significantly reduced in both male and female patients compared to their control counterparts.

Association of LMP2 and LMP7 SNPs with age at onset
We further analyzed the data to check if age at onset of the disease was associated with any particular SNPs of LMP2 and LMP7 (Table 5). Patients were divided in two groups, those who were 14 years or less than 14 years old at the time of onset (considering onset of adolescence at the age of 14 years) and those above 14 years old at the time of onset of T1D. These two groups were compared with all healthy controls. The data revealed a significant increase in the frequencies of G (R) allele (p<0.029) and homozygous GG (RR) genotype (p<0.03) for LMP2 codon 60, allele C (Q) for LMP 7 codon 49 (p<0.045) and G allele for LMP7 Intron 6 (p<0.007) and GG genotype (p<0.016) in T1D patients with early age at onset as compared to controls. However, there were no significant differences in the frequencies of these SNPs in patients with more than 14 years age at onset and healthy controls.
Interestingly, while individual SNPs showed significant differences only in patients with early age at onset, haplotypes GCG and ACT were significantly increased in both early (p<4.5×10 -11 , p<0.003 respectively) and late age at onset (p<2×10 -5 and p<5.14×10 -8 respectively). Similarly, haplotypes GCT and ACG were significantly reduced in patients with both early and late age at onset (   [4,5,7]. So, to check whether the association of LMP2 and LMP7 SNPs in the present study were due to theirs being in Linkage Disequilibrium (LD) with the predisposing MHC allele, we calculated the co-efficient of LD (D') and co-efficient of correlation (r) between predisposing and protective HLA alleles with LMP2-LMP7 haplotypes ( Table 6). Haplotypes GCG and GCT were in weak linkage disequilibrium with predisposing HLA-DRB1*03:01 in both patients (D'=0.2273, r=0.1385 and D'=0.5337, r=0.1297 respectively) and controls (D'=0.06337, r=0.1349 and D'=0.07436, r=0.0911). However, haplotype ACT was in stronger LD with protective DRB1*07:01 allele (D'=0.1594, r=0.2134). While haplotype GCG and ACT were significantly increased, GCT was significantly reduced in the patients. Weak LD of both predisposing GCG and protective GCT haplotypes with predisposing DRB1*03:01 and that of predisposing ACT with protective DRB1*07:01 suggests that the association of LMP2-LMP7 haplotypes are independent of their HLA alleles and not due to theirs being in LD with the predisposing HLA alleles. Interestingly, when we studied simultaneous presence of the predisposing and protective LMP2-LMP7 haplotypes with predisposing and protective HLA alleles (Table 7), all three haplotypes GCG, ACT and GCT along with DRB1*03:01, were significantly increased in T1D patients compared to control. And GCG and GCT along with DRB1*07:01 were significantly reduced in T1D compared to controls. These results indicate the dominant effect of the predisposing HLA allele DRB1*03:01 which was present in more than 70% of the patients compared to only 15.7% of controls and was thus associated with both predisposing and protective LMP2-LMP7 haplotypes suggesting the independent role of predisposing LMP2-LMP7 haplotypes.

Discussion
We show here that the haplotypes of antigen processing genes LMP2 and LMP7 may have a role in the aberrant presentation of self-antigens in T1D. LMP2 and LMP7 act as peptide editors for the appropriate peptide to be presented on the MHC molecules since they generate peptides that would better bind to MHC class-I molecules [21], and polymorphism in these genes may be detrimental for the peptides being loaded on MHC class-I molecules. There are controversial reports with respect to functional role of LMP2 exon 3 SNP at codon 60 where a single nucleotide polymorphism results in an amino acid change from arginine (R) to histidine (H) (CGC to CAC). While there was no difference in the mRNA expression of LMP2 in the R and H alleles, their chymotrypsin-like and trypsin-like activities were observed to be more in RR subjects compared to heterozygous RH subjects [22]. However, Park et al. [23] did not find any effect of the codon 60 R/H polymorphism on either expression or catalytic activity of LMP2 in some cancer cell lines. Since the cancer cell lines themselves showed a lot of variability in protein expression of LMP2, it is possible that situation may be different in normal non-cancerous cells. LMP2 codon 60 R/H (G/A) polymorphism seems to be conserved since this polymorphism is observed in different strains of mice [24] including non-obese diabetic (NOD) mice, the animal model for human Type1 diabetes, who also have R allele at codon 60. Results of LMP2 polymorphisms in T1D are variable in different populations. While we found LMP2 GG   [25,26]. However, several other studies did not find LMP2 R/H polymorphism to be associated with T1D [27,30]. A meta analysis done to resolve this problem of variable results suggested that LMP2 RH genotype seemed to be associated with T1D [31], the results opposite to ours where we observed RR genotype to be disease conferring and heterozygous RH to be reduced in the patients compared to controls. These differences could be due to different ethnicity of the individuals studied in the present report.
In LMP7 exon 2 glutamine (Q-CAG) [5] to Lysine (K-AAG) substitution in the codon 49 has been implicated in the transcription regulation of the gene. On IFN-gamma stimulation cell lines with homozygous KK (AA) genotype showed lower expression and reduced transcript stability compared to cell lines with LMP7 QQ (CC) genotypes and heterozygous K/Q cell lines showed intermediate expression of LMP7 [32], suggesting that the K allele may reduce the formation of immunoproteasome, and thus peptide processing followed by reduced peptide-HLA presentation [32]. In the present scenario, we observed the QQ (CC) genotype to be significantly increased in patients which may be involved in higher expression of the immunoproteasome  and may have a role in presentation of self antigens in T1D since upregulation of LMP2 and LMP7 can result in marked improvement of antigen presentation [33]. This may greatly enhance the efficiency of intracellular T cell epitope production, establishing the cytotoxic T cell repertoire and shaping their cytotoxic immune responses [34][35][36]. While there is dearth of studies on LMP7 exon 2 Q/K polymorphism in T1D, QQ homozygosity has been shown to be associated with another autoimmune disorder, juvenile rheumatoid arthritis (JRA) [37]. LMP7 exon 2 49Q allele is the most frequent allele in Mexicans [38], Japanese [39], Brazilian Guarani population [39] and the north Indians in the present study and 49K allele has been shown to have lower frequency in most of the studies except in Caucasians from USA [11]. However, in the study by Casp et al. [11], there seems to be an error in either interpretation or typographical error for C (Q) and the A (K) alleles since the frequency of Q allele in a random population from USA has been reported to be 88.1% in another study [40] compared to 88.1% for A (K) allele in the study by Casp et al. [11].
Our results are not in concordance with the earlier studies on LMP7 intron 6 where homozygous TT at G/T at 37360 site was increased and GG was reduced in T1D [25,31,41], however, our results showed a significant decrease in the frequency of TT genotype and T allele and increase in G allele frequency in T1D from North India. The reason for non-concordance with earlier published reports could be due to different ethnicity of the subjects studied in the present report and larger numbers patients and controls studied compared to most of the earlier reports.
LMP2 and LMP7 are both immunoproteosomes involved in antigen processing and act in concert with each other and thus may have integrated and synergistic roles in generation of MHC class-I fitting peptides. So, we checked for the first time, whether there were any significant differences in the frequencies of haplotypes of LMP2 exon 3 A/G, LMP7 exon 2 A/C and LMP7 Intron 6 G/T in T1D compared to controls. Comparison of haplotypes showed that haplotypes GCG and ACT were significantly increased in the patients and GCT and ACG were significantly reduced in them, irrespective of gender of the patients or age at onset of diabetes. We further checked whether this effect could be due to LMP2 and LMP7 being in LD with predisposing MHC class-II allele DRB1*03:01 [4,5]. Our results showed that LMP haplotype GCG was in weak linkage disequilibrium with predisposing HLA-DRB1*03:01 and not with protective HLA-DRB1*07:01, both in patients and controls, however, 37.19% of the T1D patients had GCG-DRB1*03:01 compared to 7.28% of the control and this was a highly significant difference with and Odds Ratio of 9.37. Similarly, other LMP haplotypes ACT and GCT along with DRB1*03:01 were significantly increased in the patients and the same haplotypes along with DRB1*07:01 were significantly reduced in the patients. LMP haplotype GCT by itself was significantly reduced and haplotype GCG was significantly increased in the patients, however, significant increase in the simultaneous presence of both these haplotypes along with predisposing HLA DRB1*03:01 and significant decrease along   with HLA DRB1*07:01 clearly shows that while the predisposing MHC alleles have the dominant effect, association of LMP2/LMP7 haplotypes is independent of the HLA alleles.
In conclusion, our results demonstrate that the significant increase in frequencies of haplotypes GCG and ACT and decrease in the frequency of GCT and ACG haplotypes is independent of gender, age at onset and the predisposing HLA alleles and may have a significant role in manifestation of T1D through higher presentation of self antigens, activation of early T cell responses and differentiating them into effector cells through polarizing cytokines [33,36]. While association with MHC class-II allele DRB1*03:01 [7] may be involved in generating Th1 type responses, predisposing MHC class-I alleles [5] and LMP2-LMP7 haplotypes may be involved in generating self reactive cytotoxic T cells. Since all three SNPs of LMP2-LMP7 are very closely linked and are inherited en-bloc as a haplotype and the two proteosomes may be functioning in an integrated manner, it may be more relevant to study the haplotypes rather than individual SNPs in future studies.