The Contribution of Next Generation Sequencing Technologies to Epigenome Research of Stem Cell and Tumorigenesis

Epigenome contains another layer of genetic information, not as stable as genome. Dynamic epigenome can serve as an interface to explain the role of environmental factors. Stem cell and tumorigenesis are reported to be closely associated with epigenome modifications. Next generation sequencing (NGS) technologies have directly leaded to the recent advances in epigenome research of stem cell and cancer. DNA methylation and histone modification are two major epigenetic modifications. Four NGS-based approaches have been developed to identify these two epigenetic modifications, including whole genome bisulfite sequencing (WGBS), methylated DNA Immunoprecipitation Sequencing (MeDIP-Seq), reduced representation bisulfite sequencing (RRBS) and chromatin immunoprecipitation sequencing (ChIP-Seq). This paper reviews the recent advances of WGBS, MeDIP-Seq and RRBS for DNA methylation and ChIP-Seq for histone modification in the field of stem cell. The potential contribution of epigenetic modifications to tumorigenesis is also described. At present, the epigenome research still faces the defects of current sampling strategy and unknown network regulation pattern. In future, worldwide collaboration and latest sequencing technologies application are expected to solve these problem and offer new insight into epigenome research.


Introduction
Genome sequencing has great positive effect on human disease research since its emergence. It enables researchers to explore and understand the mechanism of disease development on nucleic acid level. The effect of genome sequencing on human disease research has been obviously demonstrated by several international collaborative projects [1,4]. Human genome project, began in 1990 and completely accomplished in 2003, had constructed the first map of human genome, widely used as the reference sequence of subsequent human genome researches [1]. International Hapmap project, officially started in 2002 and initially published in 2005, firstly described the haplotype map of the human genome, revealing the common patterns of human genetic variants. The single nucleotide polymorphism (SNP) information in the Hapmap project is fundamental to explore common genetic variants affecting human health and disease [2,3]. Furthermore, the 1000 genome project launched in 2008 is expected to find more genetic variant information with larger samples and resources and build the most comprehensive catalogue of human genetic variations [4]. The project is designed to sequence 2,500 genomes of individuals from 27 populations and obtain comprehensive genetic variants contributing to the genetic diversity in human population, such as structural variants (SV) and copy number variants (CNV). The pilot study of the project was finished in 2010 and revealed unprecedented number and type of genetic variants [4]. The achievements of these large projects have switched on the "big science" mode of human disease research by collaboration of worldwide scientists. They are regarded as the milestones, setting the clear goal and reference for the subsequent numerous human disease researches based on genome sequencing.
However, as more genome sequencing researches emerged, it was found that the genetic variants of genome level were not enough to fully demonstrate and understand human disease mechanisms. It was speculated that there was another layer of information besides genome sequence to determine the state of human health and disease, based on two reasons below. First, as a multicellular organism, human body can produce a variety of cells corresponding to distinct functions. Since all human cells share the DNA sequence, information other than DNA sequence may occur to control cell development for a particular type to function in different tissues [5]. Second, the expression of gene in DNA sequence is regulated by environmentally induced changes, such as nutrient, toxins, drugs, infection, behavior and stress [6,7]. Genome sequencing can merely clarify the life diversity among individuals, populations and ethnic groups by detected genetic variants. However, the results of genome level research cannot help to explain the regulation mechanism of external factors to make the diversity occur, especially those similar genomes with different phenotypes. For example, monozygotic twins were born to have identical genome sequence, but would have different diseases with their growing up. Thus, it is expected to reveal the mystery by the study of another layer.
This further layer of information for regulating the differential gene expression was early described as 'epigenetic control' by Nanney in 1958 [8]. Although there is a little debate on the precise description of epigenetics, the fundamental definition of epigenetics refers to the heritable changes in cell or tissue specific gene expression with no alteration in the DNA sequences [6]. The heritable changes, inherited from cell to cell and generation to generation, are mostly established during the process of cellular differentiation and are steadily maintained through multiple cycles of cell division [9]. These heritable regulation mechanisms mainly include DNA methylation, histone modifications, nucleosome positioning, chromatin remodeling, genomic imprinting and ncRNA regulation. Multilevel epigenetic mechanisms constitute the system of regulating gene expression in cells. By cell specific regulations, those mechanisms are crucial for cellular developments, such as embryogenesis, cell differentiation [10]. Thus, the aberration in the epigenetic regulation system is reported to be associated with a wide range of diseases [11,14].
Similar to genome, epigenome contains another layer of genetic information, representing the overall epigenetic state of a cell. But epigenome is not as stable as genome, varying with influence of internal and external factors. According to alterations, various epigenomes can originate from one genome. Since most human diseases are well recognized to be jointly affected by genetic and environmental factors, the epigenome can consequently serve as a vital bridge of gene-environment interactions. Epigenome has been proved to play an important role in the development and function of cells, especially early embryo development [15,17]. The understanding of epigenome is clearly beneficial to human disease research. The increased epigenome researches in recent one decade have laid the good basis for understanding ( Figure 1). Here, we will review recent epigenomic research advances of human disease. The review focuses on the application of next generation sequencing (NGS) technologies to demonstrate the contribution of epigenome to stem cell and tumorigenesis.  [18,19]. NGS platforms (Roche 454 GS FLX, Illumina GA and HiSeq and Life Technologies SOLiD) are able to massively sequence a large quantity of sequence reads in parallel. Due to the characteristics of high-throughput data output, NGS has significantly accelerated the speed of scientific discoveries in epigenome research ( Figure 1). The ability of massively parallel sequencing also allows researchers to first gain the comprehensive mapping of epigenome in different states. Compared with the previous techniques, NGS genome-wide epigenome mapping can reach unprecedented resolution through high-throughput data output. And several effective approaches based on NGS technologies are well developed and widely used [20,22,29]. These innovative advantages have made the stem cell research in the field of epigenomic blossom, but the review cannot cover all. In this section, we mainly focus on the application of four NGS-based approaches, including WGBS, MeDIP-Seq, RRBS and ChIP-Seq, to two primary forms of epigenetic marks, DNA methylation and histone modification (Table 1).

The NGS epigenome and stem cell research
DNA methylation: DNA methylation is the most well studied epigenetic mechanism, referring to adding a methyl group at the carbon 5 position of cytosine through DNA methyltransferase (DNMT) enzymes to cytosine methylation in human genome. De novo methyl groups are catalysed by DNMT3A and DNMT3B enzymes to cytosine in newly synthesised DNA. Cytosine methylation, associated with gene silencing, is critical for hypermethylation in the promoter with CpG islands. The status of CpG sites in the genome is mostly methylated. But, CpG islands in the promoter regions in most human genes are not methylated [23]. DNA methylation is involved in a number of important processes such as maintaining genome stability, transcriptional silencing and genome imprinting. As a stable and heritable epigenetic mark, correct patterns of DNA methylation are crucial for normal development and lineage commitment [24,25]. Thus, the approaches based on NGS technologies to reveal the methylome are very crucial for human disease research. Three innovative NGS techniques are widely used in DNA methylation research, consisting of WGBS, MeDIP and RRBS.
• WGBS: Whole genome bisulfite sequencing (WGBS) is the gold standard method to detect and calculate DNA methylation level. NGS technologies enable WGBS to conduct DNA methylation study at single base resolution [26,28]. Treatment of DNA with sodium bisulfite will change unmethylated cytosine into thymine without alterations of methylated    [27,28]. As the first genome-wide map of methylated cytosines in a mammalian genome, Lister et al. [27] compared the human embryonic stem cells (hESCs) and fetal fibroblasts. The portion of non-CG methylation was much higher than expected through this study, for nearly one-quarter of all methylations identified in embryonic stem cells was found to be in a non-CG context. And non-CG methylations were enriched in gene bodies and depleted in protein binding sites and enhancers. Furthermore, non-CG methylation disappeared upon induced differentiation of the embryonic stem cells, and was restored in induced pluripotent stem cells. These interesting results strongly suggest that embryonic stem cells may rely on the high level of methylation in non-CG context for different regulatory patterns to affect gene regulation to maintain the pluripotency. It is also implied that there are alterations in epigenomic regulation mechanisms during the cell differentiation stages. As mentioned above, Laurent et al. [29] also reported the dynamic changes in the human methylome during differentiation by WGBS. Three cultured cell types were selected, including hESCs, a fibroblastic differentiated derivative of the hESCs and neonatal fibroblasts. And the mature peripheral blood mononuclear cells (monocytes) were set as a reference, for they were fully differentiated as an adult cell type. Developmental stage was reflected in both the level of global methylation and extent of non-CpG methylation. As representatives of progressive differentiation stages, hESCs have the highest level of methylation as a representative in the early stage of differentiation, while monocytes have the lowest level in the last stage, together with intermediate level of fibroblasts in the middle stage. Thus, epigenetic marks will dynamically regulate the development of various types of cells in different stages to function exactly.
In addition to hESCs, WGBS can also be used to study induced pluripotent stem cells (iPSCs). iPSCs are derived from somatic cells, epigenetically reprogrammed to lose tissue-specific features and gain pluripotency. Similar to hESCs, they can theoretically differentiate into any type of cells [30]. But the reprogramming mechanism of iPSCs is different from ESCs, so it is a hotspot to distinguish epigenome and genome betweem iPSCs and ESCs. Lister et al. [31] reported the first genome-wide DNA methylation profiles of iPSCs at single-base resolution. By comparison among the methylomes of human ES cells, somatic cells, and differentiated iPSCs and ES cells, the difference in DNA methylation status was found between iPSCs and ESC. Human iPSCs exhibited large aberrant epigenomics reprogramming, including somatic memory and aberrant reprogramming of DNA methylation. Moreover, it was revealed that errors in reprogramming CG methylation were transmitted at a high frequency by analyzing differentiation of iPSCs into trophoblast cells. The result proved that an iPSC reprogramming signature was maintained after differentiation. As an important regulatory mechanism in development, epigenetic reprogramming of DNA methylation occurs frequently during differentiation. The differentiation extent of iPSCs is intermediate between embryonic stem cells and somatic cells. It can be predicted that researches on epigenetic reprogramming will increasingly use WGBS to study iPSCs to reveal the accurate mechanisms.
WGBS can be engaged to study not only several types of stem cells mentioned above, but also adult somatic cells [28]. Wang et al. [32] studied the methylome of human peripheral blood mononuclear cells (PBMCs) by WGBS, and revealed the first Asian epigenome map of the same Asian individual whose genome was decoded in the YH project. Different from the result of Lister et al. [27] above, the portion of non-CG methylation in this study was minor, only <0.2% methylated non-CG sites. In addition, this study also revealed allele specific methylation between the two haploid methylomes, together with the previously generated whole genome sequencing data. From integrated results of different types of human cells in two methylome studies above, it could be clearly concluded that epigenomic statue is not stable to regulate the differentiation level in various types of cells. The conclusion has enlightened us to explore the contribution of non-CG methylation in maintaining and inducing cellular development, and implicated that non-CG methylation is not just existed in embryonic stem cells. With the characteristic of single base resolution, WGBS is expected to become a powerful tool in exploring the methylome differences of cells in various differentiated stages and tissue types.
• MeDIP-Seq: Similar to WGBS, Methylated DNA Immunoprecipitation Sequencing (MeDIP-Seq) is a genomewide method to detect DNA methylation. However, different from sodium bisulfite treatment in WGBS, MeDIP-Seq is based on enrichment of methylated DNA sequence. The antibody especially recognizes genome-wide methylated cytosines, and the purified fraction of methylated DNA can be input to highthroughput DNA detection methods such as NGS [33]. Thus, this method is sensitive to the highly methylated and high CG density regions. Although lower resolution and less accuracy than WGBS, the characteristics of time saving and cost effective make it suitable for disease research in large sample size between cells and tissues. For example, the world largest ever epigenetics project, named as EpiTwin, was launched in 2010 by collaboration between Beijing Genomics Institute (BGI) and King's College London (TwinsUK). The EpiTwin project is to capture the subtle epigenetic differences between 5,000 twins throught MeDIP-Seq, and to explain why many identical twins don't develop the same diseases. Monozygotic twins are highly coincident in DNA sequence and consequently suitable to investigate the influence of epigenetic modifications on human diseases [34], such as autoimmune diseases [35,37].
Besides intensive research of DNA methylation, MeDIP-Seq can be applied for other fields, such as demethylation and 5-methylctosine (5mC). Demethylation is also very crucial for understanding the epigenetic mechanisms of human diseases. With both DNA methylation and demethylation, we could completely understand how these patterns of 5-methylcytosine are established and maintained. DNA demethylation is not as dynamic as methylation, as active DNA demethylation has been revealed to be merely observed during specific stages of development [38]. The existence of genomewide DNA demethylation has been reported in germ cells and early  [39]. Although the mechanisms of demethylation remain to be elucidated, few researchers have already begun to use MeDIP-Seq to study DNA demethylation. Chavez et al. [40] used MeDIP-Seq to analyze DNA methylation change during differentiation of hESCs to definitive endoderm. After analyzing the interplay between DNA methylation, histone modifications and transcription factor binding, demethylation was found to be mainly associated with regions of low CpG densities, in contrast to de novo methylation. Even so there are few reports of NGS applications on DNA demethylation research, its importance of DNA demethylation is expected to be gradually recognized as that of DNA methylation.
5-hydroxymethylcytosine (5hmC) is a lysine-modified base in various cell types in mammals at low level, generated by adding the hydroxymethyl group on the cytosine [41]. The formation of 5hmC is regulated by the enzyme reaction of of TET family [42,45]. Similar to the principle of 5mC antibody enrichment in DNA methylation study, MeDIP-Seq or other similar NGS-based techniques can also be applied to investigate the distribution and role of 5hmC in the genome by 5hmC-specific antibodies. As an important and novel mechanism of epigenetics, 5hmC was recently found in 2009 to be existed in embryonic stem cells, as well as human and mouse brains [42,45]. Pastor et al. [41] further used NGS-based approaches to present a genome-wide mapping of 5hmC in mouse embryonic stem cells (ESCs). It was found that 5hmC was strongly enriched in exons and near transcriptional start sites. The result suggested that 5hmC might regulate the transcription of ESCs, but its regulatory role is different from 5mC. Ficz et al. [46] used MeDIP-Seq to confirm the existence of 5hmC in mouse ESCs and its role during differentiation, and demonstrated the relationship of 5mC and 5hmC. 5hmC was found to be mainly associated with euchromatin, while 5mC was enriched at gene promoters and CpG islands. 5hmC could not occur alone, whereas it mostly depended on the existence of 5mC in the genome. It indicated that 5hmC contributed to enhance the transcription as the opposite role of methylation in inhibiting gene expression. During differentiation with decreased TET, the hydroxymethylation level at the ESC-specific gene promoters declined simultaneously with the enhanced methylation level and consequent gene silencing. However, the balance between 5mC and 5hmC was not simple, but different according to genomic regions. It was supposed by the research that the balance between pluripotency and differentiation was associated with the balance between 5mC and 5hmC. Researches have reported the distribution of 5hmC in many types of tissues, and its importance in the ESCs is being gradually recognized as mentioned above. However, researchers have just begun to be interested in this epigenetic mark of 5hmC, the limited information still remains to be investigated. We will know the biological roles of 5mC and 5hmC in ESCs and human diseases more clearly when more powerful methods have been developed to distinguish them discretely.
• RRBS: Reduced representation bisulfite sequencing (RRBS) is a fast and cost-effective method to provide qualified DNA methylation data, newly developed in recent years [47,49]. The first step is enzyme digestion by MspI, specifically cutting CCGG sites, and then is bisulfite treatment as the step in WGBS. Hence, RRBS can only cover CpG-rich regions such as promoter and other regulatory element, not genome-wide region as WGBS. It can still reach single base resolution as well as WGBS [48,50]. Thus, it is suitable to investigate the different methylated regions among samples for a broad scope of researches, such as medicine and biomarker [49,51].
As a recently developed NGS technique, few researches using RRBS have been published. Nevertheless, some researched have attempted to apply for biology and disease research [51,52]. For example, Wang et al. [51] applied RRBS to the human PBMC of the Asian individual from YH project, whose genome and epigenome has been systematically deciphered [28,32]. The result revealed that more than half of CpG islands and promoter regions were covered with a good coverage depth. Furthermore, the proportion of the CpG sites covered reached 80-90%, demonstrating good reproducibility of biological replicates [28]. Thus, it is a good choice for RRBS to focus on certain CpG-rich region of large samples to explore the DNA methylation differences. Besides, human disease can also be investigated by RRBS. Gertz et al. [52] used RRBS to study somatic DNA of six members in a threegeneration family. The result demonstrated the close relationship of genotype with DNA methylation. It was found that more than 92% of differential methylation between homologous chromosomes occurred on a particular haplotype, and 80% of DNA methylation differences could be explained by genotype. In addition, the study used transcriptional analysis to measure genes exhibiting genotypedependent DNA methylation, 22% of which had allele-specific gene expression differences. In general, this study highlighted the contribution of genotype to the pattern of DNA methylome. Along with the recognition of RRBS through increased publications, it will become a novel tool for DNA methylation research in many fields.
Histone modification: In addition to DNA methylation, histone modification is another type of epigenetic regulation mechanisms via chromatin change. DNA in the eukaryotic chromatin is wrapped around histone octamers, consisting of four highly conserved core histones, H2A, H2B, H3 and H4. Histones are subject to various posttranslational modifications, including but not limited to lysine, lysine and arginine methylation, serine and threonine phosphorylation, lysine acetylation, ubiquitination, sumoylation and ADP ribosylation. These modifications occur mainly within the histone amino-terminal tails [53]. The state of histone tails can contribute to alter the chromatin structure to determine the accessibility of the transcription machinery and other regulatory factors to DNA. Thus, histone modifications of the histone tails are important to regulate the level of chromatin condensation and gene expression [54]. Among various types of histone modifications, acetylation and methylation of specific lysine residues on N-terminal histone tails play a fundamental role in the formation of chromatin domains [53]. Acetylation is respectively established and removed by histone acetyltransferases and deacetylases. Likewise, methylation is regulated by histone methyltransferase and demethylase families. The contributing enzymes on methylation and acetylation specifically affect toward various histone proteins [55]. As the switch in on-off regulation of gene expression, lysine residues acetylation on histones is associated with gene activation, whereas methylation of lysine residues can result in either activation or silencing on gene expressions [56]. As an epigenetic mechanism, posttranslational modifications of histones are involved in the regulation of normal and disease-associated development. Due to technical restrictions, most of these posttranslational modifications of histones remain poorly understood. However, advances have been made obviously in recent years based on NGS application through ChIP-Seq approaches. modifications with specific DNA sequence [57]. In ChIP experiment, chromatin is first treated with sonication or MNase-digestion [58], and then enriched by specific antibody. After immunoprecipitation, NGS technologies can detect specific protein's binding sites. Compared with ChIP-chip, ChIP-Seq shows higher resolution and greater coverage, and can detect more peaks and narrower peaks with a better signal-to-noise ratio [57]. The high-resolution capability of identifying genome-wide histone modifications make it fit for human biology research [59]. For example, Terrenoire et al. [59] used ChIP-Seq to study histone modifications H3K9ac, H3K27ac and H3K4me3 in human metaphase epigenome. By comparison with histone modification levels across the interphase genome, H3K4me3 and H3K27ac were revealed to show a close correspondence. Oppositely, H3K27me3, a epigenetic mark associated with gene silencing, exhibited big differences. The study provided evidence for extensive epigenome remodeling at mitosis.
In the field of stem cell research, ChIP-Seq is also used due to its powerful ability of genome-wide histone modifications characterization. Larson et al. [60] used ChIP-Seq to study five histone modification marks (H3K4me2, H3K4me3, H3K27me3, H3K9me3, and H3K36me3) in mouse embryonic stem cells (ESCs). Coupled with a hidden Markov model (HMM), these marks were identified to be respectively existed in active, non-active and null domains. Each type of domains corresponded with distinct biological functions and chromatin structural changes during early cell differentiation. The study offered new insights into the role of epigenetics in long-range gene regulation. From the research examples above, we can conclude that ChIP-Seq is efficient and powerful to reveal the contribution of genome-wide histone modifications in epigenetic regulation mechanisms. More new insights are expected to be offered to make us understand the potential role of histone modifications in stem cell and human diseases deeper by ChIP-Seq.

Cancer epigenomics
It was the first time to be proved by scientists that epigenetic changes could be involved in both oncogenes and tumour suppressors in 1980s, which laid the cornerstone for our present acknowledgment of epigenetic markers as diagnostics and therapeutic biomarkers for cancer [61,62]. In 1983, Andrew Feinberg and Bert Vogelstein purified DNA from several human primary tumour tissues by methylationsensitive restriction enzymes and found lowered DNA methylation of specific genes in contrast to DNA from adjacent normal tissues. At that time, the predominant theory of tumorgenesis was the activation of oncogenes. However, Feinberg and Vogelstein's findings implied that DNA methylation alteration could lead to oncogene activation [61]. Later in the 1980s, tumour suppressor genes were widely recognized, which made it encouraging when relevant epigenetic changes were discovered in those tumour suppressor genes. For example, Greger et al. [62] demonstrated that an unmethylated CpG island at the 5' end of the retinoblastoma gene turned hypermethylated in tumour tissures from retinoblastoma patients, and they had the right to speculate that methylation could directly silence tumour suppressor genes . Later studies correlated the methylation of tumour suppressor genes to their actual silencing role in cancer, and proved that tumour suppressor genes could be reactivated by inhibiting DNA methylation [63].
Epigenetic modifications: DNA methylation, as the most wellstudied mechanisms in cancer epigenomics, is only one of many aspects of demonstrating the role of epigenetic alterations on tumorigenesis. Cancer epigenomcis involves the researches of all sorts of epigenetic alterations in cancer DNA sequence ( Figure 2). Next, we will summarize the current advances of the hotspots of cancer epigenomic researches, DNA methylation, histone modification, chromatin remodeling to demonstrate the contribution of epigenomics to tumorigenesis.
• DNA methylation: Human disease is closely associated with abnormality in DNA methylation pattern. DNA methylation will generally inhibit gene expression. For example, global hypomethylation in cancer genome usually results in genomic instability, and gene silencing of tumour suppressor genes is caused by hypermethylation in CpG islands of the promoter region [14]. The methylated promoter regions may directly prevent transcription factors, e.g. A P-2, c-Myc, E2F and NF-kB, from combining with promoters, leading to gene silence or low gene expression; at the same time, the methylated regulatory elements at the 5' end of the genes may specifically bind to the methyl CpG binding protein (MBP), indirectly inhibiting the forming of transcriptional complex; besides, DNA methylation can alter the conformation of chromatin to inactivate it. Whereas, non-methylation usually correlates with gene activation, and demethylation should be related to reactivation of silencing genes [64]. Thus, Aberrant DNA methylation regulations would lead to tumorgenesis. DNA methylation changes in cancer cells include the loss of methylation at normally methylated sequences (hypomethylation) and the gain of methylated sequences at sites usually unmethylated (hypermethylation) [65].
As two opposite forms of DNA methylation, hypermethylation and hypomethylation play distinct roles in tumorigenesis. Hypermethylation of the promoter CpG islands regions in the 5' end of cancer related genes in human tumour cell lines have been reported, such as tumour suppressor gene (p16) [66], metastasis suppressor gene (Nm23) [67], DNA repair gene (MLH1) [68], angiogenesis suppressor gene [69] and so on. Some genes are hypermethylated in many types of cancers, such as p16 [66]. However, other genes are associated with specific cancer. For example, GSTP1 has been reported to be hypermethylated only in prostate cancer [70]. While hypomethylation has been reported in almost every human malignancy and prefers the repetitive sequences, transposable elements and proto-oncogenes in cancer, some studies indicate that hypomenthylation in cells can increase the expression of certain genes, such as RAS, c-myc and so on. The overall decrease in the level of 5 methyl cytosine can be worse if the tumour has become more malignant [71].
In recent studies, increasing evidences have pointed out the important role of DNA methylation in tumorigenesis. For example, Ummanni et al. [72] previously reported significant downregulation of ubiquitin carboxyl-terminal hydrolase 1 (UCHL1) in prostate cancer, but now showed that the underlying mechanism of UCHL1 downregulation in PCa was linked with the promoter hypermethylation. Furthermore, it was suggested that UCHL1 downregulation via promoter hypermethylation played an important role in various molecular aspects of PCa biology, such as morphological diversification and regulation of proliferation. Then, other experimental results demonstrated that methylation status of DNMT1 could influence the activities of several important tumor suppressor genes in cervical tumorigenesis and may have the potential to act as an effective target for treatment of cervical cancer [73]. Besides solid tumours, the same results can also be found in hematological malignancies. Deneberg et al. [74] observed a negative impact of DNA methylation on transcription in acute myeloid leukemia (AML). Genes targeted by Polycomb group (PcG) proteins and genes associated with bivalent histone marks in stem cells showed increased aberrant methylation in AML (p<0.0001). Furthermore, high methylation levels of PcG target genes were independently associated with better progression free (OR 0.47, p=0.01) and overall survival (OR 0.36, p=0.001). It is expected that methylation-related factors in tumorigenesis will still be the hotspot of cancer epigenome research.
• Histone modification: Histones are subject to posttranslational modifications by enzymes primarily on their N-terminal tails, but also in their globular domains. Such posttranslational modifications include methylation, citrullination, acetylation, phosphorylation, sumoylation, ubiquitination, and ADP-ribosylation. Here, we will mainly focus on relatively widespread methylation and acetylation.
Histone acetylation is one of the most important modifications in cancer, which regulates the gene expression with reversibility. The histone acetyltransferases (HATs) acetylates conserved lysine amino acids on histone to improve the gene transcription (or the combination of transcriptional factors and regulatory elements). But, histone deacetylases (HDACs) removes acetyl groups from a ε-N-acetyl lysine amino acid on a histone to inhibit the gene transcription. As a major target for epigenetic therapy, HDACs are found overexpressed in different types of cancer. Actually, histone acetylation is essential to maintain the protein function and gene transcription. The imbalance of acetylation in cancer cells can change the structure of chromosomes and the level of gene expression, directly influencing the cell cycle, differentiation, apoptosis and tumorigenesis.
Recent advances in NGS enable genome-wide profiles of chromatin changes during tumorigenesis. Fraga et al. [75] have revealed a global loss of acetylated H4-lysine 16 (H4K16ac) and H4-lysine 20 trimethylation (H4K20me3) to lead to gene repression. Further, Wang et al. [76] used ChIP-seq method and found the fusion protein (AML1-ETO) generated by the t(8;21) translocation acetylated by the transcriptional coactivator p300 in leukemia cells isolated from t(8;21) AML patients, which followed by animal trails has indicates that lysine acetyltransferases represent a potential therapeutic target in AML. Lately, in order to investigate the epigenetic inactivation of the SFRP1 gene in Esophageal Squamous Cell Carcinoma (ESCC), Meng et al. [77] applied methylation-specific polymerase chain reaction (PCR), bisulfite sequencing, reverse-transcription (RT) PCR, immunohistochemistry and chromatin immunoprecipitation (ChIP) assay to detect SFRP1 promoter methylation, expression of the SFRP1 gene and histone modification in the SFRP1 promoter region. The SFRP1 promoter was found to be highly methylated in 95% (19/20) of the ESCC tissues and in nine ESCC cell lines. Furthermore, complete methylation of the SFRP1 gene promoter was correlated with its greatly reduced expression level.
In cancer cells, promoter CpG island hypermethylation is also associated with the combination of histone marks: deacetylation of histones H3 and H4, loss of histone H3 lysine K4 (H3K4) trimethylation, and gain of H3K9 methylation and H3K27 trimethylation [78,80]. H3K9 methylation and H3K27 trimethylation are also associated with aberrant gene silence in various types of cancer. By ChIP, Ballestar et al. [79] have found that the gene-specific profiles of Methyl-CpG binding proteins (MBDs) exist for hypermethylated promoters of breast cancer cells with a common pattern of histone modifications shared. It's interesting that Fujisawa et al. [81] found CpG sites in IL-13Rα2 promoter region were not methylated in all pancreatic cancer cell lines studied including IL-13Rα2-positive and IL-13Rα2-negative cell lines and normal cells. On the other hand, histones at IL-13Rα2 promoter region were highly acetylated in IL-13Rα2-positive but much less in receptor-negative pancreatic cancer cell lines. When cells were treated with HDAC inhibitors, not only histone acetylation but also IL-13Rα2 expression was dramatically enhanced in receptor-negative pancreatic cancer cells, which makes HDAC inhibitors new opportunity of target therapy.
In addition to methylation and acetylation, there are other kinds of modifcations in histone, not so widely distributed as those mentioned above. However, all kinds of histone modifications are not separated but mutually linked in cancer cells. These histone modifications are integrated together to affect the histones of cancer cells. Consequently, the aberrant changes in the histone modifications will result in tumorigenesis.
• Chromatin remodeling: Chromatin remodeling is the enzymedriven movement of nucleosomes, performed by chromatin remodeling complexes like SWI/SNF in human. Such can enable proteins such as transcription factors to bind to DNA wrapped around nucleosome cores. Genetic alterations of the genes involved in the chromatin remodeling process have been reported in many types of tumors recently [82,86]. For one study, the protein-coding exome has been sequenced in a series of primary clear cell renal carcinoma (ccRCC). Furthermore, it was reported that the SWI/SNF chromatin remodelling complex gene PBRM1 [4] was identified as a second major ccRCC cancer gene with truncating mutations in 41% (92/227) of cases. These data showed the marked contribution of aberrant chromatin biology [87]. For another study, the exomes of nine individuals with transitional cell carcinoma (TCC) have been sequenced. The study identified genetic aberrations of the chromatin remodeling genes (UTX, MLL-MLL3, CREBBP-EP300, NCOR1, ARID1A and CHD6) in 59% of our 97 subjects with TCC [82]. Dynamic chromatin remodeling is the base of diverse biological processes, such as gene transcription, DNA replication and repair, chromosome separation and apoptosis. Together with these results, it is suggested that the aberrations of chromatin regulation might be a hallmark of cancer.
Aberrant chromatin remodeling may directly lead to the dysregulation of multiple downstream effector genes, consequently promoting the process of tumorigenesis [82]. For example, Nakazawa et al. [87] examined the histone H3 status in benign and malignant colorectal tumors by immunohistochemistry and western blotting, the results of which suggested that aberration of the global H3K9me2 level was an important epigenetic event in colorectal tumorigenesis and carcinogenesis involved with gene regulation in neoplastic cells through chromatin remodeling. Besides, different causes of chromatin remodeling may lead to different types of cancers. Much more researches should be carried on to determine the exact reasons and results.
Epigenetic marks as therapeutic targets: Epigenetic modifications are reversible, making them perfect therapeutic targets for cancer. Thus, cancer will be theoretically cured if the causal epigenetic aberrations are reversely corrected. According to this principle, many epigenetic drugs have been developed respectively corresponding to various epigenetic marks in recent decades. As hot epigenetic marks, DNA methylation and histone acetylation are extensively studied to successfully act as therapeutic targets.
First, the hypermethylation in CpG islands is commonly found in many types of tumours. DNA methylation inhibitor is the first one that is supposed to be available for cancer therapeutics. The remarkable discovery has been found that treatment with cytotoxic agents, 5-azacytidine (5-aza-CR) and 5-aza-2'-deoxycytidine (5-aza-CdR) would lead to the inhibition of DNA methylation that induces gene expression and causes differentiation in cultured cells [88]. 5-Aza-CR (azacitidine) and 5-aza-CdR (decitabine) have been approved by FDA for use in the treatment of myelodys-plastic syndromes, and promising results have also emerged from the treatment of hematological malignancies [89] or solid tumors [90]. There are some other possible DNA methylation inhibitors such as zebularine, which is orally administered and currently under investigation in many types of cancers. However, the demethylation drug have serious side effect of toxicity, which leaves a problem that seeks proper agents to act synergistically with the drugs. Luckily, clinical studies by Silverman et al. [91], Issa et al. [92] and other researchers generated a notable paradigm of oncology: therapeutic efficacy could be achieved at low drug doses. Such reduced doses were adopted in a large trial in patients with myelodisplastic syndrome (MDS) that would lead to leukaemia. It was revealed that the conversion time from MDS to frank leukaemia increased, as well as overall survival [93]. Now, two inhibitors-azacitidine (Vidaza; Celgene) and decitabine (Dacogen; Eisai)--have been approval by the FDA for MDS, and this improves the use of lowdose regimens not only for leukaemia, but also for solid tumours [94].
Second, reversing histone acetylation patterns back to normal through treatment with HDAC inhibitors have been proved to have antitumorigenic effects, including growth arrest, apoptosis and the induction of differentiation [95]. The antiproliferative effects of HDAC inhibitors are mediated by their ability to reactivate silenced tumor suppressor genes [96]. Suberoylanilide hydroxamic acid (SAHA), as an HDAC inhibitor, has been approved for clinical use as treatment of T cell cutaneous lymphoma and has gained the approval of FDA as vorinostat (Zolinza; Merck) [97]. Besides, romidepsin (Istodax; Celgene) with the same remarkable efficacy in cutaneous T cell lymphoma has also been approved by FDA [98]. Although they are well tolerated with little toxicity, HDAC inhibitors as drugs have some side effects, including constitutional and gastrointestinal toxicity, cardiac trouble, myelosuppresion and others. However, the molecular mechanisms for drug response in these patients have not been determined yet. Several other HDAC inhibitors such as depsipeptide and phenylbutyrate are also under clinical trials [99].

Challenge and future of epigenome research
Major challenges: Benefit from the advent of NGS technologies, epigenome research has rapidly expanded in recent years. As described above, advances have been achieved in recent years. However, there are still two major challenges in epigenome research, respectively referring to sampling and integrated analysis of various epigenetic modifications [10]. Next, the review will discuss the two aspects in detail.
Epigenome research is expected to interpret the effect of epigenetic modifications caused by environmental factors. Thus, most epigenetic modifications are somatic and tissue or stage specific. Due to the dynamics of epigenetics, sampling is the first and critical step of epigenome research. To a large extent, mistakes in sample tissue selection will lead to the aborted and incorrect conclusion. For epigenome research of human disease, cancer is studied more intensively than other human diseases. That is attributed to the easier accessibility of cancer tissues after biopsy or surgery. However, as the obvious characteristic of cancer, tissue heterogeneity is still a problem in sampling for epigenome research. Many complex diseases, such as hypertension, don't exhibit tissue-specific pathogenesis. DNA samples from any tissues do not show significant difference. Thus, based on our current unclear understanding of pathogenesis, it is difficult to conduct epigenome research very well. Second, since the epigenome research of human disease is in the early stage, the study model is still robust and the exact sample size is also unknown. Third, due to tissue specificity, many types of tissues need to be collected to demonstrate the complete picture of epigenome. In general, the challenge of sampling arises from specific tissue selection, exact sample size and multiple tissue collection.
There are various types of epigenetic modifications, not limited to those described above in this review. First, it is necessary to explore every type of epigenetic modifications in the human genome. It is possible that most of them still remain to be found in future. Second, even if all epigenetic modifications have been revealed until now, there is still a long way for researchers to move. That is due to the network pattern of epigenetic regulations. Individual epigenetic modification does not work separately, but mutually to regulate gene expression of the whole genome. It is a large-scale project to clearly understand the subtle system of integrated regulations by epigenetic modifications.

Future direction:
A decade ago, the human genome project (HGP) has been accomplished by collaborations of worldwide scientists. The constructed human genome map is a milestone for genome research in the history, providing a strong foundation for the following countless sequencing researches. Similarly, human epigenome map is essential to be constructed to promote the field of epigenome research. This large-scale scientific project can only be achieved by the way of HGP. Worldwide scientists must join in a global organization for collaborations to achieve this significant goal. Fortunately, many consortiums have been founded in recent years ( Table 2). The human epigenome map is expected to be constructed in the near future. However, both genome and epigenome are desired to explain the mechanisms of complex life activities from the view of DNA level. Although the recent achievements can illustrate many phenomenons that were inexplainable in the past, more unsolved problems still remain to be explored. According to the central dogma, life is a systematic network with multidimensional activities. The activities on DNA level would interact with those in RNA and protein level. Thus, the researches on DNA level are obviously not enough. With various types of NGS technologies, it is possible to apply NGS in DNA, RNA and protein levels. The information in these levels is expected to be explored by NGS and integrated by bioinformatics to together reveal more discoveries in biology and human disease.
The rapid progress of sequencing technology has also contributed to the development of epigenome research. Third generation sequencing (TGS) technologies are expected to be commercial in recent several years. Compared to NGS, TGS exhibits many technical breakthroughs, such as small amount of samples, faster speed, less time, single cell sequencing and so on. These characteristics make TGS feasible to reveal unknown epigenetic mechanisms and speed up the epigenome research. The ability of single cell sequencing can largely solve the obtacle of tissue specificity in epigenome research. Combined with large-scale collaborations and latest sequencing technology, it is believed that epigenome research will contribute to explain one aspect of the complexity of nature and improve human health.
• Transform health research in Canada by applying next-generation sequencing to more research on targeted priority and under-developed areas such as population health and health services research • Transform research results into policies, practices, procedures, products and services