Genotype Distribution of Dihydrofolatereductase Variants and their Role in Disease Susceptibility to Acute Lymphoblastic Leukemia in Indian Population:

Objectives: To establish the allele and genotype frequencies of dihydrofolate reductase (DHFR) -317A>G and -680C>A variants in Indian population, to find the association of these variants with the risk of acute lymphoblastic leukemia (ALL) and to analyze the effects of non-synonymous SNPs (nsSNPs) and 3’untranslated region (3’UTR) variants of DHFR gene on its structure and function using in-silico tools. Methods: A total of 235 unrelated healthy volunteers (controls) and 127 ALL patients (cases) were recruited for the study. DNA was extracted from peripheral leucocytes. Genotyping of DHFR polymorphisms was done by realtime PCR. We investigated the deleterious effect of nsSNPs and variants in 3’UTR of DHFR gene through computational platforms. Results: In the present study, the frequency of DHFR -317G and -680 A alleles was found to be 33.3% and 59.8% respectively. The studied DHFR variants (rs408626 and rs442767) did not confer a significant risk for ALL. Insilico analysis revealed that three nsSNPs potentially affect the structure, function and activity of the DHFR protein. Four microRNA binding sites were found to be highly affected due to 3’UTR SNPs. Further, docking simulation suggested that the order of binding affinity of methotrexate towards native and all three mutant forms of DHFR is D153V (rs121913223)>native>G18R (rs61736208)>D187Y (rs200904105). Conclusion: This is the first study to report the normative genotype distribution of DHFR variants in Indian population and also to report that DHFR (-317A>G and -680C>A) variants do not confer a potential risk for development of ALL. Inter-ethnic differences exist in the distribution of DHFR variant genotypes, and this can lead to variability in therapeutic response to DHFR substrates. Protein sequence analysis revealed rs200904105 influences the phosphorylation process (post-translational modification) of DHFR and docking simulation suggested methotrexate to have a higher affinity towards rs121913223 mutant form. Therefore, studies are warranted to explore the clinical impact of these variants in the Indian population.


Introduction
Acute lymphoblastic leukemia (ALL), is a hematological malignancy characterized by an uncontrolled proliferation of lymphoblasts. Although it affects all age groups, it is the most frequent form of childhood cancers [1,2]. Estimated numbers of new cases of ALL in the United States in 2016 is 6,590 and out of which predicted deaths are 1,430 [3]. In India, the lymphoid leukemias are expected to be 18,449 by the year 2020 [4]. Though the causes of ALL are unknown, adverse gene-environment interactions are likely to be involved in the risk of developing ALL [5,6]. Leukemia commonly arises as a result of DNA translocations, different types of mutations in genes regulating blood cell development or homeostasis [7] and folate deficiency [8,9].
Folate pathway has two components such as methylation reactions and nucleotide synthesis. Polymorphisms in genes involved in methylation pathway were not found to influence the risk of ALL in Indian population [10][11][12][13][14]. Thus, exploring the variants in DNA synthesis pathway enzymes such as dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS) may provide insights in to the susceptibility to ALL.
Dihydrofolate reductase (DHFR) is a crucial enzyme in the folate pathway which converts dihydrofolicacid (DHF) to tetrahydrofolic acid (THF). THF is essential for the synthesis of amino acids and nucleic acids, required for cell growth and proliferation. Impairment of folate pathway results, in an uncontrolled proliferation of cells leading to various cancers [5,15]. DHFR gene is located on chromosome 5 and has seven transcripts. The major transcript of DHFR gene was found to have various mutations such as three stop gained variants, twenty missense variants, seven splice region variants, eleven synonymous variants, two coding sequence variants, ninety-one 5' prime UTR variants, seventy three 3' prime UTR variants, 1211 intron variants, 232 upstream gene variants and 210 downstream gene variants. Among these, non-synonymous SNPs and SNP in the regulatory region, play a significant role in gene function and are often reported to play a role in the development of diseases in humans [16][17][18][19].
Functional polymorphisms in the DHFR gene can influence its expression or activity, thereby affecting the risk of folate-dependent diseases [19][20][21][22]. Polymorphisms in DHFR gene are also relevant in the context of sensitivity and resistance to antifolates [19,23,24] and antibacterial drugs, as they also affect the prognosis of such patients treated with these drugs. High activity of DHFR enzyme counteracts the action of anti-folate drugs which results in the development of resistance to chemotherapy. On the other hand, reduction of DHFR enzymatic activity decreases the THF levels that in turn can affect the DNA synthesis and methylation reactions in the body. The risk of cancer varies according to the geographical area, probably due to differences in the genetic makeup, diet and the environmental factors.
The admixture of various ethnic groups in India resulted in the heterogeneous population [25,26]. As the difference in the distribution of alleles could confound genotype-phenotype association studies and are the primary reasons for the conflicting results, it is important to explore the influence of DHFR variants in each population.
Thus, establishing the normal genotype frequency would provide the reference values for the distribution of these SNPs in a population, so that it can be compared with those of people with the risk of cancer. Sorting of deleterious SNPs is essential for narrowing down the number of mutations to be screened in genetic association studies and for a better understanding of the functional and structural aspects of protein before conducting an experimental study. To the best of our knowledge, the distribution of DHFR variants and their role in the risk of ALL has not been explored in Indian population.
Therefore, our objectives were, to establish the allele and genotype frequencies of DHFR -317A>G and -680 C>A variants in Indian population, to study the association of DHFR variants with the risk of ALL, and the study was also extended to perform computational analysis of nsSNPs and SNPs in the 3'-UTR of DHFR gene to identify possible deleterious mutations.

Materials and Methods
Our study was approved by the Institute ethics committee of JIPMER, Puducherry, India. Written informed consent was obtained from all study participants, and in the case of children, consent was obtained from their legally accepted representatives.

Study population
A case-control study consisting of 235 unrelated healthy volunteers and 127 patients diagnosed with ALL of either gender were recruited for the study. All the participants reported three consecutive generations residing in South India and spoke one of the Dravidian languages (Tamil, Telugu, Malayalam and Kannada) as their mother tongue. The mean age (±SD) of the cases and controls were 13.9 (±13.37) and 24.5 (±5.6) years respectively. There were 83 males and 44 females in the patient group and 156 males and 79 females in the control group. We could not recruit age-matched healthy children due to ethical issues that resulted in difference in the means of age between cases and controls. The details of sample collection and DNA extraction were described previously [27]. Taqman assays for DHFR-317A>G (rs408626) and -680C>A (rs442767) genotyping were procured from Applied Biosystems (ABI, Foster City,CA,USA). Genotyping was done according to manufacturer instructions using Real Time-PCR (Applied Biosystems-7300). For quality control, genotyping was done in duplicates in 30 % of the randomly selected samples and found to be in 100% concordance.

Statistical analysis
The observed genotype frequencies were tested for Hardy-Weinberg equilibrium (HWE) using the Chi-square test. Fisher's exact test was used to test the differences in genotypes between patients with ALL and healthy volunteers (controls). Comparison between genotype and allele frequencies of SIs with 1000 genome project data was done using Fisher's exact test. Statistical analysis of data was performed using GraphPad Instat 3 (GraphPad Software Inc., San Diego, CA, USA) and SPSS software (version 16). The level of statistical significance was set at p<0.05.

In-silico analysis
Flow chart of methodology used for In-silco analysis is given in Figure 1. Retrieval of SNP data and Prediction of functional consequences of missense mutations SNP data of DHFR gene were obtained from several databases namely the NCBI dbSNP [28], 1000 genomes [29] and the Ensembl genome browser [30]. We retrieved the protein sequence of Human DHFR from the UniProt database (P00374 (DYR_HUMAN) [31]. Minor allele frequencies of DHFR variants in other populations were obtained from the 1000 genome browser [29].
Functional effects of nsSNP were predicted using various bioinformatic tools ( Figure 2). For simplicity, the nsSNPs that were found to cause disease due to protein function damage are labeled as deleterious while neutral variants are depicted as tolerated. The nsSNPs that were found to be deleterious by at least five programs were considered for further protein sequence analysis and molecular docking simulation. In addition to nsSNPs, variants affecting the 3'UTR function of DHFR were also identified using miRNASNP tool.

Protein sequence analysis
The protein sequence analysis was performed to explore the impact of considered deleterious substitutions at the sequence level for both native and three mutated forms of DHFR by several proteomics tools which are available at Expasy (http://www.expasy.org/tools) suite.

Docking simulation
The three-dimensional structure of human DHFR was retrieved from Protein Data Bank (PDB, www.rcsb.org) and its accession code 4M6J. COOT [32] was used to generate the models of the three mutants based on the structure of native DHFR. The three mutants were subjected to energy minimization step using Modrefiner [33] and validated by Rampage webservers. Subsequently, the anti-cancer drug methotrexate was retrieved from the PubChem database (accession number CID 126941). The docking calculation begins with the preparation of the protein models and ligand molecules by adding polar hydrogens, assigning of charges, merging non-polar hydrogens and the files were saved in PDBQT format using MGLTools. Once the proteins and ligand files were prepared, the blind docking simulation was performed between proteins and drug molecule to explore all possible interacting sites of the target protein. For this, a grid parameter file (grid box size: 105 Å × 124 Å × 105 Å, grid spacing: 0.375 Å and grid center: 6.390 Å × 7.300 Å × -18.160 Å) and docking parameter file (100 docking trials, 150 population size, 2500000 maximum number of energy evaluations, 27000 maximum number of generations, 0.02 mutation rate and 0.8 cross-over rate) were taken into account using Autogrid and Autodock programs [34]. Once the docking simulation was over, the best protein -drug complexes were selected based on estimated free energy of binding (ΔH) and inhibition constant (Ki) and finally exported into PDB format. The LigPlus [35] and PyMOL (https://www.pymol.org) programs were used for analyzing and comparing the docking results of native and three mutant forms of DHFR. PDB-protein data bank; nsSNPs-non-synonymous single nucleotide polymorphisms

Association of DHFR -317 A>G and -680C>A variants with the risk of ALL
The genotypic and allelic distribution of DHFR -317 A>G variant in cases and controls is shown in Table 1. The genotype frequencies of 317A>G polymorphism was concordant with HWE (p>0.05) in both cases and controls whereas -680C>A polymorphism was discordant.
There was no significant difference in the genotype and allele frequencies of DHFR polymorphisms between patients and controls (

Comparison of DHFR variants in the study population with that of 1000 genome project data
The allele frequencies of DHFR variants (-317A>G and -680C>A) in the study population were found to be significantly different from other populations except SAS ( Table 2).

In-silico analysis
A total of 20 nsSNPs were found to be non-synonymous in DHFR transcript-1. All SNPs were analyzed using eight different prediction tools (Figure 2). SIFT analysis predicted that 2nsSNPs were deleterious to DHFR function, whereas 9 SNPs were tolerated. According to Polyphen-2 results, 3SNPs were probably damaging, 1 was possibly damaging, and 7 SNPs were benign. I-mutant predicted 6 SNPs cause disease and 5 SNPs were neutral. Two SNPs were found to cause disease and 9 were neutral by PHD-SNP. SNP analyzer predicted that 1 SNP causes disease, 6 SNPs were neutral, and the effect of 4 SNPs was unknown. According to SNP and GO program results, 4 SNPs were identified to cause disease and 7 were identified as neutral SNPs.   Seven SNPs were observed to be deleterious, and 4 were found to be neutral by PROVEAN server. All eight tools suggested rs200232379 to be neutral. Three nsSNPsrs61736208 (G18R), rs121913223 (D153V), and rs200904105 (D187Y) were found to be deleterious or disease causing by five programs. Prediction of the influence of SNPs in 3'UTR identified that three polymorphisms rs76026972, rs11550953, rs56039509 led to the loss of miRNA binding sites such as hsa-miR-28-3p, hsa-miR-552, hsa-miR-3673 and hsa-miR-3911 respectively in DHFR gene (Figure 2).

Protein sequence analysis and docking simulation
Protein sequence analysis of native and mutant forms of DHFR revealed, D187Y (rs200904105) substitution in DHFR gene can be phosphorylated whereas others couldn't be phosphorylated. In comparison with native and other mutant forms, methotrexate is showing a higher binding affinity towards D153V (rs121913223) of DHFR with the ΔH of -8.88 kcal/mol and Ki of 307.33 nM, respectively. Detailed docking results of methotrexate with all three mutant forms along with native is given in Table 3 and Figure 3.

Discussion
To the best of our knowledge, this is the first study from India to provide insight into the allele and genotype frequency distribution of DHFR (-317A>G and -680C>A) polymorphisms. Our study is also the first study to report that the presence of studied DHFR variant alleles does not confer a potential risk for development of ALL. We have also performed a computational analysis and found that rs200904105 influences the phosphorylation process (post-translational modification) of DHFR and methotrexate has ahigher binding affinity towards a rs121913223 mutant form of DHFR enzyme.
DHFR-317 G allele frequency in patients with ALL and healthy volunteers were found to be 35.4 % and 33.3% respectively in our study. The percentage of DHFR-680A allele frequency in patients and healthy volunteers was found to be 61.7 and 59.8 respectively. DHFR -680CC genotype was not observed or absent in SIs ( Table 2). The DHFR-317G and -680 A alleles were not found to be associated with disease susceptibility to ALL. Other studies have examined the role of DHFR variants in patients with ALL and reported conflicting results. A study by Yang et al. [21] in Chinese population reported that DHFR317A>G mutant is not associated with the risk of ALL. In contrast to our study and Yang et al. [21], Gomez-Gomez et al. [19] reported -317 homozygous mutant genotypes (GG) to be a risk factor for the development of ALL in the Mexican population. (OR=2.89, 95%CI 1.03 to 4.88). The contradictory findings could be due to differences in population genetics or due to environmental factors.
The -317 ' A' allele was found to be the major allele in SI, Mexican, EUR and SAS, but G allele was the major allele in other population ( Table 2). The percentage of G and A allele were highest in EAS (69.2%) and GIH (73.3%) populations respectively. We found that G allele of -317A>G in SIs was significantly lower compared to AFR, AMR, EAS and EUR populations. Further, the frequency of G allele was similar to SAS (Table 2). This observation suggests that for studies on -317A>G loci, all SAS populations can be grouped together because they may not present significant differences amongst their genetic loci.
The variant allele of rs442767 SNP namely A allele has also demonstrated significant variation in its distribution. The percentage of A allele was highest in CHS subpopulation (63.8%) of EAS (Supplementary Table 1). Allele A was absent in Luhya in Webuye, Kenya subpopulation of AFR. DHFR-680CC genotype (rs442767) was not found in SIs, and AA genotype is absent in most of the AFR subpopulations (Supplementary Table 1). The absence of an individual genotype in different populations resulted in discordant HWE. Population stratification as a cause of discordance is unlikely in the present study subjects because of the high prevalence of endogamy [25,43]. DHFR-680A allele frequency was significantly greater than AFR, AMR, EUR and SAS sub-populations. The allele frequencies of -680C>A were similar when compared to EAS sub-populations and Peruvian in Lima, Peru subpopulation of AMR (Supplementary Table  1).
These findings suggest inter-ethnic variability in the distribution of DHFR polymorphisms. Thus, susceptibility to the diseases and the treatment outcome may vary between populations and confound genotype-phenotype association studies. Hence, normative frequencies must be established in genetically distinct populations before exploring the influence of genotypic distributions on the phenotype.
In our study, three SNPs (rs61736208, rs121913223, and rs200904105) were found to be deleterious by most of the prediction tools. None of these SNPs are validated by 1000 genome project and Hap-Map population. Hence, there is a need to confirm the following SNPs the rs61736208, rs121913223, and rs200904105in different populations. A Genome-wide study by Cario et al. [44] reported rs121913223 ASP153Val (A458T) to be pathogenic, causing megaloblastic anemia and cerebral folate deficiency. They also reported no difference in DHFR mRNA levels between wild-type and mutated cells, whereas protein expression is reduced in cells with the DHFR mutation. DHFR deficiency can lead to hematological and neurological diseases. Therefore, the role of rs121913223 alone and in combination with rs61736208, rs200904105 need to be evaluated in ALL.
Protein sequence analysis of the native and three mutant forms of DHFR concluded that the mutant D187Y can influence the phosphorylation process (post-translational modification) of DHFR.
Molecular docking studies were performed to understand the role of single amino-acid substitution on protein-drug interactions. The order of binding affinity of methotrexate towards all the three mutant forms along with native DHFR is D153V (rs121913223)>native>G18R (rs61736208)>D187Y (rs200904105). Though the substitution (G18R, D153V, and D187Y) is not directly involved in the drug binding, it may increase the drug binding ability in the case of D153V whereas it is decreased in other instances. Further experimental studies are necessary to validate the in-silico hypothesis. Prediction of the influence of SNPs in 3'UTR identified that three polymorphisms rs76026972, rs11550953, rs56039509, and rs56039509 led to the loss of miRNA binding sites such as hsa-miR-28-3p, hsa-miR-552, hsa-miR-3673 and hsa-miR-3911 respectively.
DHFR plays a crucial role in folate metabolism, and its variations may play a dual role. Polymorphisms associated with increased expression of DHFR may protect against cancer, due to the higher levels of 5,10-methylene-THF needed for thymidylate synthesis,whereas decreased expression may affect methylation reactions by changing 5-methyl-THF pool and, consequently, increase cancer risk. Both genomic DNA hypomethylation and gene-specific promoter CpG island hypermethylation are important epigenetic mechanisms of carcinogenesis [45][46][47]. Therefore, the functional role of studied DHFR (-317A>G and -680C>A) variants needs to be explored. Several enzymes are involved in folate pathway to regulate THF pools and DNA methylation. Influence of one enzyme variant may be abrogated by another variant [48]. Hence, studies that consider the role of gene-gene and folate-gene interactions are warranted.

Clinical relevance of studied DHFR variants
DHFR is an important target of MTX and mutations in the DHFR gene were found to influence drug response. To the best of our knowledge, only four studies have been reported regarding the clinical significance of DHFR variants worldwide. Interestingly, the results of all these studies were contradictory to each other. A study on patients with ALL of the European ethnicity reported that -317AA genotype carriers had lower event-free survival (EFS), and both the studied DHFR variants (-317A>G and -680C>A) were not associated with MTX toxicity [23]. In Mexican and SI patients with ALL, -317 GG genotype was found to be involved in the relapse and lower overall survival [19,27]. DHFR variants were found to have no influence on the response to methotrexate in the Spanish population, but the DHFR-680C>A variant was found to be associated with neutropenia [49]. Discordant results of DHFR variants could be majorly due to the difference in the distribution of DHFR variant alleles. As the difference in the distribution of alleles could confound genotype-phenotype association studies, it is important to explore the distribution of DHFR variants in each population to establish its role in predicting the therapeutic efficacy of DHFR substrates.
Our study provides normative frequency data of DHFR variants in South Indian population. Studied DHFR variants were not found to have an influence on susceptibility to ALL. Inter-ethnic differences exist in the distribution of DHFR variant genotypes, and this can lead to variability in therapeutic response to DHFR substrates.
Computational analysis revealed three nSNPs (rs61736208, rs121913223, and rs200904105) to be deleterious to DHFR function and three 3'UTR SNPs affect miRNA binding sites of DHFR. Protein sequence analysis revealed rs200904105 influences the phosphorylation process (post-translational modification) of DHFR. Docking simulation suggested that methotrexate has highest and lowest binding affinity towards rs121913223 and rs200904105 mutant forms of DHFR respectively. Therefore, studies are warranted to explore the clinical impact of these variants on the Indian population.

Compliance with Ethical Standards
All procedures performed in the present study involving human participants were by the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest
Authors declare no conflicts of interest.