Received date: December 09, 2015; Accepted date: March 09, 2016; Published date: March 14, 2016
Citation: Ashktorab H, Azimi H, Nickerson M, Bass S, Varma S, et al. (2016) Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms. Next Generat Sequenc & Applic 3:123. doi:10.4172/2469-9853.1000123
Copyright: © 2016 Ashktorab H, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Next Generation Sequencing & Applications
Background and Aim: Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (Ion Torrent and Illumina) to assess single nucleotides variants in colorectal cancer (CRC) specimens.
Methods: CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using Ion Torrent and Illumina sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and Fresh Frozen (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of fresh frozen samples were sequenced on each platform to assess variability within platform.
Results: The comparison of fresh frozen replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. Fresh Frozen replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to fresh frozen (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for fresh frozen and 70% (± 26.8%) overall average).
Conclusion: Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; fresh frozen vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing.
Sequencing technology; Mutations; Nucleotides; Tumors
Next generation sequencing (NGS) technology will enable physicians to better direct the care of their patients based on their mutational profile, especially in diseases such as cancer where multiple genes and mutations are involved. Detection of driver and passenger mutations in tumor specimens will aid in the selection of targeted therapies. This technology is advancing the knowledge we need in order to allow a patient target treatments.
NGS has been exponentially evolving in the past decade with many technologies and chemistries nowadays available . Indeed, detection of clinically driver and passenger mutations in diagnostic tumor specimen’s aids in the selection of targeted therapeutics. NGS is proving to be more effective than traditional approaches to provide the general genetic landscape that associate with tumor development in prevention, diagnostic, and management of diseases such as colorectal cancer which we have considered in this study [2-4].
Single nucleotides variants (SNVs) play a crucial role in colorectal cancer predisposition, initiation, and development [5-7]. The whole genome may not need to be sequenced to identify genetic alterations in most human colorectal cancer-associated genes and pathways. More than 85% of pathogenic mutations are found within the protein-coding regions of the genome . Therefore, exome or even targeted exome NGS provide to offer a cheaper and faster alternative to whole genome sequencing, provided that the mutations are accurately detected.
NGS is based on the standard of sequencing in an immensely parallel manner. This means that millions of DNA fragments are sequenced at the same time. Primarily, DNA is fragmented into short segments leading to a shotgun library. Adaptors are ligated to the ends of each fragment, and these adaptors are themselves short sequences of DNA which have primer binding sites for subsequent amplification. The shotgun library can subsequently be enriched for the sequences of interest, using diverse approaches [9,10]. Illumina, a leader in the NGS field, adopted a sequencing by-synthesis approach, utilizing fluorescently labeled reversible-terminator nucleotides, on clonally amplified DNA templates immobilized to an acrylamide coating on the surface of a glass flow-cell . The Illumina Genome Analyzer including MiSeq and HiSeq have set the standard for both high throughput massively parallel sequencing (HiSeq), and a lower throughput fast-turn around instrument (MiSeq) . Illumina sequencing instruments and reagents support massively parallel sequencing using a proprietary method that detects single bases as they are incorporated into growing DNA strands . A fluorescently labeled reversible terminator is imaged as each dNTP is added, and then cleaved to allow incorporation of the next base. Since all 4 reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. The end result is true base-by-base sequencing that enables t accurate data generation [12,14-17]. The method attempts to eliminate errors and missed calls associated with strings of repeated nucleotides.
Another NGS platform that is being used is Ion torrent semiconductor sequencing. It uses a method of DNA sequencing that is based on the detection of hydrogen ions, which are released during the polymerization of DNA. This is also a method of sequencing by synthesis [14-16]. A microwell containing a template DNA strand to be sequenced is flooded with a single species of deoxyribonucleotide triphosphate (dNTP). If the introduced dNTP is complementary to the leading template nucleotide, it is incorporated into the growing complementary strand . This causes the release of a hydrogen ion that triggers an ISFET ion sensor, which indicates that a reaction has occurred [14-16]. If homopolymer repeats are present in the template sequence, multiple dNTP molecules will be incorporated in a single cycle, which leads to a corresponding number of released hydrogens and a proportionally higher electronic signal [14-16].
In this study, we compared mutational profiles generated with two platforms: ION TORRENT AND ILLUMINA'S generated CRC mutational profiles in order to better examine data reproducibility within and across different sequencing platforms. There are indeed major implications for the patients as the disease management and therapy design depend on the mutational profile of the tumors, therefore it is necessary to detect the best accurately mutational profile.
The 13 samples used in this study consisted of CRC tumors and adjacent normal from 6 CRC patients (Table 1). These samples were either fresh frozen or FFPE. Subjects with familial adenomatous polyposis, hereditary nonpolyposis colorectal cancer, or a family history of CRC were excluded. The study was approved by the Institutional Review Board of Howard University, and written informed consent was obtained from all patients. In addition, replicates of fresh frozen (n = 4) and FFPE (n = 5) samples were done on both platforms to gauge the extent of replicability between different sequencing runs.
|Ion Torrent||CC1053-AA-Normal-fresh frozen||39||2||3||1||3||2||3||4||6||1||8||0||1||1||1||3|
|Ion Torrent||CC1054-AA-CRC-fresh frozen||86||3||15||7||7||3||6||14||12||1||9||2||1||1||1||4|
|Ion Torrent||CC1057-AA-Normal-fresh frozen||55||1||12||0||4||2||3||5||10||1||7||2||0||2||3||3|
|Ion Torrent||CC1057-AA-Normal-fresh frozen||57||1||12||1||2||6||2||3||15||0||5||2||0||2||3||3|
|Ion Torrent||CC1057-AA-CRC-fresh frozen||52||1||10||0||5||2||3||5||9||1||7||2||0||2||3||2|
|Ion Torrent||CC1057-AA-CRC-fresh frozen||58||1||12||1||5||2||3||5||10||1||7||2||0||2||4||3|
|Ion Torrent||CC1059-AA-CRC-fresh frozen||49||2||4||3||4||1||3||14||11||1||2||0||1||0||1||2|
|Ion Torrent||CC1059-AA-CRC-fresh frozen||49||3||4||3||4||1||3||13||11||1||2||0||1||0||2||1|
|Ion Torrent||CC1059-AA-Normal-fresh frozen||48||2||4||3||4||1||3||16||10||1||2||0||1||0||1||0|
|Ion Torrent||CC1059-AA-Normal-fresh frozen||43||2||4||1||4||1||3||12||11||1||2||0||1||0||1||0|
|Ion Torrent||CC1053-AA-Normal-fresh frozen||33||3||1||0||2||2||2||4||5||1||8||0||0||1||1||3|
|Ion Torrent||CC1054-AA-CRC-fresh frozen||67||8||2||3||6||3||4||14||9||1||9||2||1||1||0||4|
|Ion Torrent||CC1057-AA-Normal-fresh frozen||41||7||0||0||3||2||2||5||6||1||7||2||0||2||1||3|
|Ion Torrent||CC1057-AA-Normal-fresh frozen||41||7||0||1||1||5||1||3||10||0||5||2||0||2||1||3|
|Ion Torrent||CC1057-AA-CRC-fresh frozen||38||5||0||0||4||2||2||5||5||1||7||2||0||2||1||2|
|Ion Torrent||CC1057-AA-CRC-fresh frozen||44||7||0||1||4||2||2||5||6||1||7||2||0||2||2||3|
|Ion Torrent||CC1059-AA-CRC-fresh frozen||39||3||1||2||3||1||2||14||8||1||2||0||0||0||0||2|
|Ion Torrent||CC1059-AA-CRC-fresh frozen||39||3||2||2||3||1||2||13||8||1||2||0||0||0||1||1|
|Ion Torrent||CC1059-AA-Normal-fresh frozen||38||3||1||2||3||1||2||16||7||1||2||0||0||0||0||0|
|Ion Torrent||CC1059-AA-Normal-fresh frozen||33||3||1||0||3||1||2||12||8||1||2||0||0||0||0||0|
Table 1: Distribution of total variants (a) and nonsynonymous mutations (b) in analyzed samples.
Targeted exome sequencing
The 13 samples underwent 29 sequencing run with 16 on Illumina and 13 on Ion Torrent platform (See excel file for details). Details regarding DNA quantification and quality assessment platform, Illumina DNA library preparation, SNV calling, public genome data comparison, sequencing validation, SNV description, mutation frequencies, and copy number alterations are described previously as reported in literature [6,18,19]. We utilized a panel of 20 genes on the Illumina platform (ACVR2A, AMER1, APC, ARID1A, BRAF, FBXW7, KRAS, MSH2, MSH3, MSH6, NRAS, PIK3CA, POLE, PTEN, SMAD2, SMAD4, SOX9, TCF7L2, TGFBR2 and TP53).
Targeted sequencing by ion torrent
Targeted sequencing (TS) was performed at the Cancer Genomics Research Laboratory at the National Cancer Institute (NCI). A targeted, multiplex PCR primer panel was designed using the custom Ion Ampliseq Designer v1.2 (Thermo Fisher Scientific, Grand Island, NY). The primer panel covered 56.9 kb and included the coding region of 20 genes, with an average coverage of 96.9%. The panel was designed using FFPE settings with an average amplicon size of 150 bp. Sample DNA (20 ng/primer pool) was amplified using this custom Ampliseq primer panel, and libraries were prepared following the manufacturer’s Ion Ampliseq Library Preparation protocol (ThermoFisher Scientific). Individual samples were barcoded, pooled, templated, and sequenced on the Ion Torrent Proton Sequencer using the Ion PI Template OT2 200 v3 and Ion PI Sequencing 200 v2 kits per manufacturer’s instructions.
Analysis methods for ion torrent targeted sequencing
Raw sequencing reads generated by the Ion Torrent sequencer were quality and adaptor trimmed by Ion Torrent Suite 4.0.4 and then aligned to the hg19 reference sequence by TMAP using default parameters. Resulting BAM files were processed through an in-house quality control (QC) and coverage analysis pipeline, which generated coverage summary plots. Aligned BAM files were left aligned using GATK LeftAlignIndels module. Amplicon primers were trimmed from aligned reads by Torrent Suite. Variant calls and filtrations were made by Torrent Variant Caller 4.0. We utilized a panel of 15 genes on the Ion Torrent platform (AMER1, APC, ARID1A, BRAF, FBXW7, KRAS, MSH3, MSH6, NRAS, PIK3CA, SMAD4, SOX9, TCF7L2, TGFBR2 and TP53).
Analysis and methods for illumina targeted sequencing
Illumina sequencing data generation, reads assembly and annotation were performed as previously described by Ashktorab et al. .
Genomic DNA from patient’s tissue sample was fragmented and hybridized to commercially available capture arrays for enrichment. For discovery set we did Ion Torrent sequencing. For validation set we used a HiSeq platform (Illumina, San Diego, CA). We used R software (version 3.1.0, http://www.r-project.org/) to compare the variants in the normal and tumor samples with those in the 1000 Genomes database, which represents a nominally noncancerous population. All samples displayed more or less an equal number of SNVs in their tumors compared with their matched normal samples. We compared or result with the cancer genome atlas network (TCGA). These somatic mutations were annotated with ANNOVAR.
Concordance of variants is more consistent in Illumina
We analyzed a set of samples (tumor and normal) from formalin fixed paraffin embedded (FFPE) and fresh frozen sources on both ION TORRENT AND ILLUMINA platforms. In addition, replicates of fresh frozen samples were done on both platforms to gauge the extent of replicability between different sequencing runs. Table 1 gives the number of FFPE and fresh frozen (FF) replicates done on each platform.
The concordance between two replicates was calculated as the number of variants that are the same in both replicates as a percentage of the total number of variants that have at-least one non-reference allele in either of the replicates. Within each platform we filtered and kept only those variants that were present in the targeted regions.
For cross-platform comparisons, we restricted the variants to those that are present in the 15 genes common to both platforms (AMER1, APC, ARID1A, BRAF, FBXW7, KRAS, MSH3, MSH6, NRAS, PIK3CA, SMAD4, SOX9, TCF7L2, TGFBR2 and TP53). The comparison of fresh frozen replicates to each other gave the highest concordance in either platform (Average concordance 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina). FFPE vs. fresh frozen replicates gave much lower concordance and higher variability in the concordance (40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina). For the cross platform concordance, we found reasonably high concordance across the same sample done on the two platforms with lower variability for FFPE compared to fresh frozen (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for fresh frozen and 70% (± 26.8%) overall average).
Detailed outcomes for each sample are presented in the table below (Excel file). We did tabulate the total number of variants as well as the non-synonymous variants in each sample as well as per targeted single gene. The findings displayed on the table show that there are indeed major variations both in number (e.g. CC1054) and quality (e.g. CC1029) of detected mutations. The impact of the nature of the sample also plays a major role on the detected mutations, samples CC1057 for example has 12 mutations in its FF sample and only 1 in its FFPE version. Eight out of these 12 detected mutations were of a nonsynonymous nature (Table 1).
NGS has the potential to allow the discovery of new target genes for prevention, treatment and diagnostic purposes. There are however many platforms that are available on the market. These platforms differ in library construction protocols, in sequencing chemistries and in informatics pipelines analyses. In this study, we used the most common two platforms (ION TORRENT AND ILLUMINA) and we discuss the effectiveness, strengths and limitations associated with NGS mutational profiling. All platforms have library preparation protocols that involve fragmenting genomic DNA and attaching specific adapter sequences. Typically, this takes somewhere between 4 and 8 hours for one sample. In addition, the Ion Torrent template preparation has a two hour emulsion PCR and a template bead enrichment step .
The generation of precise and reproducible sequencing results is multifactorial and depends on correct laboratory practice and a computational pipeline used in the analysis of NGS data. The sequencing run for MiSeq is 27 hours, compared to Ion Torrent which is 2 hours. Reported accuracy of MiSeq is mostly > Q30 with observed error rate of 0.80%, compared to Ion Torrent with reported accuracy of Q20 and observed raw error rate of 1.71%. Sequence yield per run of MiSeq is 1.5-2 Gb, compared to Ion Torrent 20-50 Mb (314 chips), 100- 200 Mb (316 chip), 1 Gb (318 hip). These differences in chemistries, machines operation as well as the specifics of the bioinformatics pipelines associated with each machine certainly account for some of the variations in SNVs’ outcomes for the same samples as displayed in Table 1 and supplemental table. This however cannot be the case for samples of same nature (Fresh Frozen) that were run in duplicates in each platform and still displayed some major discrepancies in variants’ outcome.
It is noteworthy that the data presented here correspond to a targeted exome sequencing genes panel (20 for the Illumina platform and 15 for Ion Torrent: the common 15 genes are depicted in Table 1 and Supplemental Table). A targeted sequencing has a high coverage and thus low error rate and better mutation calling. However, should we have done the same comparison on a whole exome or whole genome scale, the variabilities in outcome would increase exponentially, giving a non-accurate picture of the genomic/exomic landscape of the analyzed specimens.
Should we have considered one or the other (Ion Torrent or Illumina) targeted exome sequencing data for some of our CRC samples, we would be addressing almost two different tumors as the generated mutational profiles were starkly different. Increasing exomic and genomic data generation and submission to public databases will likely generate a major artefactual mutation noise that will make the task of researchers hard to sort out real from artificial mutations. It is necessary at this stage to start a curation of such data to accept only validated mutations that should have a full description of how they were generated as far as sequencing platforms, protocols and bioinformatics processing. Only variants validated through a second NGS platform or through Sanger sequencing will need to be reported.
In the context of our CRC samples, we only retain mutations that were reproducible and obtained through the same platform or detected through the second sequencing platform. Such a process allows to better characterize these tumors in light of their clinical and pathological features to provide a care management that fits the mutational profile. In conclusion and as a general rule, we do suggest that any detected mutation that needs to be functionally analyzed needs first to be validated. The detected SNVs weight has to be established in light of other processes such as copy number variations and epigenetic alterations through integrative analyses [18,19].