An Exploration of Mutation Status of Cancer Genes in Breast Cancers

Breast cancer is the most common cancer in women in US, and has the second highest mortality rate that accounts for about 25% of all cancer deaths. It has been recognized that genetic biomarkers for cancer are useful for estimating the cancer recurrence risk, and guiding targeted treatment of cancer. Since breast cancers carry a wide spectrum of gene mutations in their genomes, identification of these mutations would be promising in improving diagnosis and treatment of breast cancers. The rapid advances in Next-Generation Sequencing (NGS) technology have generated a large amount of NGS data on breast cancer genomes that makes detection and application of mutant biomarkers for breast cancer a reality. This study performed a wide survey of mutation status of cancer genes in breast cancers based on The Cancer Genome Atlas (TCGA) breast cancer data. Some frequently mutant genes in breast cancers were detected and analyzed.


Introduction
It has been recognized that the early detection and targeted therapy can greatly improve survival rates of patients with breast cancer, the second most lethal cancer in women that is next to the lung cancer. Several factors such as hormone receptor status, breast cancer biomarkers and gene signatures' expression, have been used to estimate the recurrence risk and guide targeted therapy. Typically, gene expression patterns such as the Agendia Symphony Breast Cancer Decision Suite (TargetPrint, MammaPrint, BluePrint, and TheraPrint) have been used for assessing the prognosis of breast cancer patients, and guiding oncologists to choose an appropriate treatment program [1]. However, one disadvantage of using gene expression profiling to identify biomarkers for cancer is that gene expression level is highly variable and unsteady so that a single measure often leads to misinterpretation. In contrast, genetic mutations in DNA can be stably detected. As all cancers carry mutations in their genomes and mutational heterogeneity widely exists in cancer genomes [2], gene mutation based biomarkers for cancer could be more useful than gene expression based biomarkers. On the other hand, the rapid advances in next-generation sequencing (NGS) technology have enabled to sequence a large number of DNA samples in parallel at a reasonable expense. As a result, a large amount of NGS data on cancer genomes have emerged that makes detection and application of mutant biomarkers for cancer a reality.
The Cancer Genome Atlas (TCGA) project [3] is generating comprehensive cancer genomic data based on high-quality tumor tissue samples. The tumor samples collected by TCGA are primary, untreated tumor tissues, and are paired with germline samples from the same patients that serve as a normal comparison. In addition, the samples are frozen soon after surgery to avoid of degradation of the RNA and DNA. FFPE (formalin fixed paraffin embedded) tumor samples are not collected by TCGA due to their lower quality compared to the fresh tumor tissues. Using NGS technology, TCGA has sequenced hundreds of breast cancer DNA samples (whole-exome sequencing), and identified somatic mutations for tens of thousands genes. In this study, we performed a wide survey of mutation status of cancer genes in breast cancers based on the TCGA breast cancer data.

Materials and Methods
We downloaded the TCGA breast invasive carcinoma (BRCA) somatic mutation data from the TCGA website https://tcgadata.nci.nih.gov/tcga/ (data collected by date of November 20, 2013). TCGA identified a total of 47243 somatic mutations, 93% (43912) of which belonging to novel SNPs (Single Nucleotide Polymorphisms). These mutations were involved in 776 tumor samples and 15819 genes.

Mutation status of the Foundation One cancer genes in breast cancers
FoundationOne ™ is a targeted NGS assay that simultaneously sequences the whole exome of 236 cancer-related genes (3769 exons) plus 47 introns from 19 genes frequently rearranged or altered in cancer with an average depth of coverage of greater than 250X [4]. It detects all classes of genomic alterations, including base substitutions, insertions and deletions, copy number alterations and rearrangements using a small routine tumor sample (FFPE or needle biopsy). The genes tested are known to be somatically altered in human solid cancers based on recent scientific and clinical literature. We obtained the 236 gene list from the FoundationOne website [5], and examined their mutation status in the 776 TCGA breast cancer samples. Table 2 gives the list of genes which have mutations in at least 10 tumor samples. The mutation statistics for all the 236 genes were summarized in the supplementary Table S1. Table 2 shows that two most frequently mutated genes are PIK3CA and TP53, both of which are mutated in more than one third of the TCGA breast cancer samples. PIK3CA has been found to be oncogenic and be mutated in a range of human cancers [6]. The mutations in PIK3CA are clustered and occur mainly in the helical (exon 9) and kinase (exon 20) domains of the protein [7]. Previous studies have shown that PIK3CA has the highest frequency of gain-of-function mutations in breast cancer [8,9], consistent with the present analysis results. The kinase-encoding gene PIK3CA could be a promising drug target for cancer treatment considering its high frequency of mutations in cancer. Another frequently mutated gene identified in the TCGA breast cancer samples is the tumor suppressor gene TP53 whose mutations have been found to occur in more than half of all human cancer cases [10]. Therefore, anticancer drugs targeting mutated TP53 are potentially efficacious for a large number of cancer patients. Although mutated TP53 itself is hard to be targeted, its synthetic lethal partners may include promising treatment targets [11].
In addition to TP53, the other tumor suppressor genes BRCA1, BRCA2, GATA3, PTEN, RB1, CDH1, KDM6A, MAP2K4, NF1, PIK3R1, SETD2 and ATM were found to be frequently mutated in the TCGA breast cancer samples (Table 2). BRCA1 and BRCA2 are normally expressed in the cells of breast and other tissue and involved in maintenance of genome stability, specifically the homologous recombination pathway for double-strand DNA repair. Mutations in BRCA1 and BRCA2 confer increased lifetime risk of developing breast or ovarian cancer, and account for about 20 to 25 percent of hereditary breast cancers [12] and about 5 to 10 percent of all breast cancers [13]. In addition, mutations in BRCA1 and BRCA2 account for around 15 percent of ovarian cancers [14]. More than 1000 mutations in BRCA1 and BRCA2 have been identified, many of which are associated with an increased risk of cancer (particularly breast cancer in women). Identification of BRCA1 and BRCA2 mutations in a certain percentage of TCGA breast cancer samples supports the hypothesis that 10% of sporadic breast cancers may have a strong germline contribution [15].
GATA3 encodes a protein that belongs to the GATA family of transcription factors, and regulates luminal epithelial cell differentiation in the mammary gland [16]. Table 2 shows that GATA3 is the third most frequently mutated genes in the TCGA breast cancer samples whose mutations occur in more than 10% of breast cancer samples. The loss of expression of GATA3 has been linked to poor prognosis of breast cancers [17].
The tumor suppressor gene PTEN is mutated in a large number of cancers at high frequency including prostate cancer, endometrial cancer, glioblastoma, and melanoma [18]. Its mutation frequency is also relatively high in the TCGA breast cancer samples (the 7th most frequently mutated gene). The loss of PTEN expression is frequently linked to advanced breast cancers [19]. In addition to retinoblastoma, somatic mutations in the RB1 (retinoblastoma 1) gene are associated with many other types of cancer including bladder cancer, lung cancer, breast cancer, osteosarcoma, melanoma, and leukemia [20].  Table 2: FoundationOne genes frequently mutated in TCGA breast cancer samples CDH1 encodes a calcium-dependent cell-cell adhesion glycoprotein that is composed of five extracellular cadherin repeats, a transmembrane region, and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with many types of cancers including gastric, breast, colorectal, thyroid, and ovarian cancers. Its loss of function contributes to increasing proliferation, invasion, and/or metastasis in cancer [21]. Table 2 shows that CDH1 is the 5th most frequently mutant genes in the TCGA breast cancer samples. Actually, its loss of function or inactivation has been associated with breast carcinomas [22][23][24]. KDM6A belongs to a family of genes that encode chromatin-modifying enzymes. The KDM6A-encoding protein functions as a histone demethylase that modifies histone proteins important for development. The protein acts as a tumor suppressor by preventing cells from uncontrolled proliferation. Somatic mutations in KDM6A have been identified in a certain type of malignant tumors including breast, esophagus, colon, kidney, and brain cancers, myeloid leukemia and myeloma [25]. Most of these mutations result in loss of its tumor suppressor role due to an abnormally short, nonfunctional lysine-specific demethylase 6A enzyme that could contribute to the development of cancer. The tumor suppressor gene MAP2K4 is a gene encoding a member of the Mitogen-activated protein (MAP) kinase signaling family. The genetic evidence for its role in a variety of cancer types has been identified [26][27][28]. Table 2 shows that this gene has a mutation rate of 4.12% in the TCGA breast cancer samples, close to a previous report that MAP2K4 had a mutation rate of near 5% in breast carcinoma [29]. The tumor suppressor gene NF1 (Neurofibromin 1), known for causing the autosomal dominant genetic disorder neurofibromatosis type 1, is the third most prevalently mutated gene in glioblastoma multiforme (GBM) [30], the fourth most prevalently mutated gene in ovarian carcinoma [31], and one of the most significantly mutated genes in lung adenocarcinoma [32]. A recent study has shown that the mutation of NF1 is a driver of breast cancer in mouse model [33]. Table 2 shows that NF1 has a mutation rate of 2.45% in the TCGA breast cancer samples. PIK3R1 was reported to have 2.2% of mutation rate in breast cancer in a recent study [34], in line with 2.71% of mutation rate in the TCGA breast cancer samples. This gene has been suggested to be an independent prognostic marker for breast cancer [34]. SETD2 encodes a histone methyltransferase that is specific for lysine-36 of histone H3, and methylation of this residue is associated with active chromatin. Its tumor suppressor role in breast cancer has been identified [35]. ATM encodes a protein that belongs to PI3K-related protein kinases. ATM has multiple complex functions including a central role in the repair of DNA double-strand breaks. It has been shown that ATM mutations could confer susceptibility to breast cancer [36,37]. The mutation rate of this gene in the TCGA breast cancer samples is 2.19% (Table 2).
In addition to the aforementioned tumor suppressor genes, there are five oncogenes including ERBB2, AKT1, CBFB, RUNX1 and PIK3CA that are frequently mutated in the TCGA breast cancer samples (Table  2). ERBB2, known as HER2/neu, is a well characterized oncogene that is responsible for development and progression of breast cancer. Overexpression of ERBB2 has been associated with poor prognosis of breast cancers [38]. This gene has 1.68% of mutation rate in the TCGA breast cancer samples. AKT1 encodes the serine threonine kinase implicated in the control of cellular metabolism, survival and growth. The hyperactivation of this gene is significantly associated with breast cancer progression [39]. CBFB encodes a protein belonging to the PEBP2/CBF transcription factor family that master-regulates a bunch of genes such as RUNX1 and RUNX2. RUNX1 encodes a transcription factor that belongs to the Runt-related transcription factor (RUNX) family. This transcription factor forms a heterodimeric complex with CBFB to confer DNA binding and stability to the complex. Mutations in CBFB and RUNX1 have been implicated in breast cancer cases [40].
In the frequently mutated genes in Table 2, there are nine kinaseencoding genes including PIK3CA, AKT1, ATM, ERBB2, ERBB3, MAP2K4, MAP3K1, MTOR, and PRKDC. Since many kinaseencoding genes have been found to be mutated and upregulated in cancers, development of anticancer drugs that inhibit overexpression of protein kinases has become an active research area in cancer biology [11]. These frequently mutated kinase genes could be potential targets for treatment of breast cancers. In fact, ERBB2, AKT, MTOR and PIK3CA (PI 3-kinases) have been suggested to be the targets of breast cancer therapy by National Cancer Institute (NCI).

Mutation Status of the Agendia Symphony Genes in Breast Cancers
Agendia Symphony Breast Cancer Decision Suite (MammaPrint, BluePrint, TargetPrint and TheraPrint) is being used for assessing the prognosis of breast cancer patients, and guiding oncologists to choose an appropriate treatment program by measuring expression of gene signatures [1]. MammaPrint is a diagnostic tool to predict risk of breast cancer metastasis using the expression of 70 genes [41]. BluePrint is an 80-gene expression profiles for the classification of breast cancers into basal, luminal and ERBB2 (HER2) molecular subtypes. The molecular subtyping signature is assisting in therapeutic decision-making. TargetPrint is a microarray-based gene expression assessment of ER (estrogen receptor) and PR (progesterone receptor) and HER2 (human epidermal growth factor receptor 2) status for breast cancer treatment management. TheraPrint is used to identify potential additional therapies that might be more effective in treating tumors based on gene expression profiles of 55 genes. Table 3 gives the list of genes which has mutations in at least five tumor samples. The mutation statistics for all the Agendia Symphony Breast Cancer Decision genes were summarized in the supplementary Table S2.
The gene signatures in the Agendia Symphony Breast Cancer Decision Suite were identified based on gene expression profiles, some of which have high frequency of mutations in breast cancers (Table 3). For example, the BluePrint gene signatures GATA3, FOXA1, ERBB2, MYB, and PREX1 have mutations in no less than 10 tumor samples. Two of the three TargetPrint gene signatures have mutations in no less than five tumor samples. The TheraPrint gene signatures PIK3CA, CDH1, PIK3R1, AKT1, BRCA2, BRCA1 and ERBB3 have mutations in no less than 10 tumor samples.

Mutation Status of the Ion AmpliSeq Cancer Genes in Breast Cancers
The Ion Torrent AmpliSeq Cancer Hotspot Panelv2 (CHPv2) is capable of identifying somatic mutation hotspot regions of 50 oncogenes and tumor suppressor genes, with wide coverage of the KRAS, BRAF, and EGFR genes [42]. The Ion AmpliSeq™ Comprehensive Cancer Panel targets the exons of 409 tumor suppressor genes and oncogenes frequently mutated. This test interrogates coding DNA sequences and splice variants across multiple cancer driver genes and drug targets that are involved in apoptosis, DNA repair, transcriptional regulation, inflammatory response, and cell growth. Table 4 lists the CHPv2 genes which have mutations in at least five tumor samples, and

Conclusions
Breast cancer is one of the few cancer types in which targeted therapies are successfully designed based on its molecular classification [43]. However, the high variability of gene expression limits its clinical application. As genetic mutations in DNA can be stably detected and the mutational heterogeneity widely exists in breast cancer genomes, genetic mutation profiles would be more advantageous in developing biomarkers for breast cancers than gene expression profiles. In this study, we performed a broad examination of the mutation status of oncogenes and tumor suppressor genes in the high-quality TCGA breast cancer samples. We found that some oncogenes such as PIK3CA, ERBB2, AKT1, CBFB and RUNX1, and some tumor suppressor genes such as TP53, BRCA1, BRCA2, GATA3, PTEN, RB1, CDH1, KDM6A, MAP2K4, NF1, PIK3R1, SETD2 and ATM, are frequently mutated in breast cancers. Some kinase-encoding genes such as PIK3CA, AKT1, ATM, ERBB2, ERBB3, MAP2K4, MAP3K1, MTOR, and PRKDC are found to be frequently mutated in breast cancers. These kinase-encoding genes could be good targets for breast cancer therapy in that some effective drugs have already been developed to target protein kinases [44].
We examined mutation status of the panel genes in FoundationOne, Ion Torrent AmpliSeq Cancer Hotspot Panelv2, and Ion AmpliSeq™ Comprehensive Cancer Panel in the TCGA breast cancer samples, respectively. These genes are highly cancer-related with their mutations being detected in all types of cancers. Some of these panel genes such as PIK3CA, TP53, GATA3, MAP3K1 and CDH1 were found to be frequently mutated in breast cancers, suggesting that there exist common mutation mechanisms that underlie different types of cancers. We also examined mutation status of the gene signatures in Agendia Symphony Breast Cancer Decision Suite, and found that most of them are infrequently mutated in breast cancers except PIK3CA, GATA3 and CDH1. This finding suggests that there is no direct correlation between gene expression and gene mutation, and gene expression profiles and gene mutation profiles could be complementary in molecular characteristics of cancer.