Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

Next generation sequencing (NGS) technology will enable physicians to better direct the care of their patients based on their mutational profile, especially in diseases such as cancer where multiple genes and mutations are involved. Detection of driver and passenger mutations in tumor specimens will aid in the selection of targeted therapies. This technology is advancing the knowledge we need in order to allow a patient target treatments.


Introduction
Next generation sequencing (NGS) technology will enable physicians to better direct the care of their patients based on their mutational profile, especially in diseases such as cancer where multiple genes and mutations are involved. Detection of driver and passenger mutations in tumor specimens will aid in the selection of targeted therapies. This technology is advancing the knowledge we need in order to allow a patient target treatments.
NGS has been exponentially evolving in the past decade with many technologies and chemistries nowadays available [1]. Indeed, detection of clinically driver and passenger mutations in diagnostic tumor specimen's aids in the selection of targeted therapeutics. NGS is proving to be more effective than traditional approaches to provide the general genetic landscape that associate with tumor development in prevention, diagnostic, and management of diseases such as colorectal cancer, which we have considered in this study [2][3][4].
Single nucleotides variants (SNVs) play a crucial role in colorectal cancer predisposition, initiation, and development [5][6][7]. The whole genome may not need to be sequenced to identify genetic alterations in most human colorectal cancer-associated genes and pathways. More than 85% of pathogenic mutations are found within the protein-coding regions of the genome [8]. Therefore, exome or even targeted exome NGS provide to offer a cheaper and faster alternative to whole genome sequencing, provided that the mutations are accurately detected.
NGS is based on the standard of sequencing in an immensely parallel manner. This means that millions of DNA fragments are sequenced at the same time. Primarily, DNA is fragmented into short segments leading to a shotgun library. Adaptors are ligated to the ends of each fragment, and these adaptors are themselves short sequences of DNA which have primer binding sites for subsequent amplification. The shotgun library can subsequently be enriched for the sequences of interest, using diverse approaches [9,10]. Illumina, a leader in the NGS field, adopted a sequencing by-synthesis approach, utilizing fluorescent labeled reversible-terminator nucleotides, on clonally amplified DNA templates immobilized to an acrylamide coating on the surface of a glass flow-cell [11]. The Illumina Genome Analyzer including MiSeq and HiSeq have set the standard for both high throughput massively parallel sequencing (HiSeq), and a lower throughput fast-turn around instrument (MiSeq) [12]. Illumina sequencing instruments and reagents support massively parallel sequencing using a proprietary method that detects single bases as they are incorporated into growing DNA strands [13]. A fluorescently labeled reversible terminator is imaged as each dNTP is added, and then cleaved to allow incorporation of the next base. Since all 4 reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. The end result is true base-by-base sequencing that enables t accurate data generation [12,[14][15][16][17]. The method attempts to eliminate errors and missed calls associated with strings of repeated nucleotides.
Another NGS platform that is being used is Ion Torrent semiconductor sequencing. It uses a method of DNA sequencing that is based on the detection of hydrogen ions, which are released during the polymerization of DNA. This is also a method of sequencing by synthesis [14][15][16]. A microwell containing a template DNA strand to be sequenced is flooded with a single species of deoxyribonucleotide triphosphate (dNTP). If the introduced dNTP is complementary to the leading template nucleotide, it is incorporated into the growing complementary strand [15]. This causes the release of a hydrogen ion that triggers an ion-sensitive field-effect transistor ion sensor, which indicates that a reaction has occurred [14][15][16]. If homopolymer repeats are present in the template sequence, multiple dNTP molecules will be incorporated in a single cycle, which leads to a corresponding number of released hydrogens and a proportionally higher electronic signal [14][15][16].
In this study, we compared mutational profiles generated with two platforms: Ion Torrent and Illumina generated CRC mutational profiles in order to better examine data reproducibility within and across different sequencing platforms. There are indeed major implications for the patients as the disease management and therapy design depend on the mutational profile of the tumors, therefore it is necessary to detect the best accurately mutational profile.

Patients
The 13 samples used in this study consisted of CRC tumors and adjacent normal from 6 CRC patients (Table 1). These samples were either FF or FFPE. Subjects with familial adenomatous polyposis, hereditary nonpolyposis colorectal cancer, or a family history of CRC were excluded. The study was approved by the Institutional Review Board of Howard University, and written informed consent was obtained from all patients. In addition, replicates of FF (n = 4) and FFPE (n = 5) samples were done on both platforms to gauge the extent of replicability between different sequencing runs.

Targeted sequencing by Ion Torrent
Targeted sequencing (TS) was performed at the Cancer Genomics Research Laboratory at the National Cancer Institute (NCI). A targeted, multiplex PCR primer panel was designed using the custom Ion Ampliseq Designer v1.2 (Thermo Fisher Scientific, Grand Island, NY). The primer panel covered 56.9 kb and included the coding region of 20 genes, with an average coverage of 96.9%. The panel was designed using FFPE settings with an average amplicon size of 150 bp. Sample DNA (20 ng/primer pool) was amplified using this custom Ampliseq primer panel, and libraries were prepared following the manufacturer's Ion Ampliseq Library Preparation protocol (ThermoFisher Scientific). Individual samples were barcoded, pooled, templated, and sequenced on the Ion Torrent Proton Sequencer using the Ion PI Template OT2 200 v3 and Ion PI Sequencing 200 v2 kits per manufacturer's instructions.

Analysis methods for Ion Torrent targeted sequencing
Raw sequencing reads generated by the Ion Torrent sequencer were quality and adaptor trimmed by Ion Torrent Suite 4.0.4 and then aligned to the hg19 reference sequence by TMAP using default parameters. Resulting BAM files were processed through an in-house quality control (QC) and coverage analysis pipeline, which generated coverage summary plots. Aligned BAM files were left aligned using GATK LeftAlignIndels module. Amplicon primers were

Analysis and methods for Illumina targeted sequencing
Illumina sequencing data generation, reads assembly and annotation were performed as previously described by Ashktorab et al. [6].

Bioinformatics
Genomic DNA from patient's tissue sample was fragmented and hybridized to commercially available capture arrays for enrichment. For discovery set we did Ion Torrent sequencing. For validation set we used a HiSeq platform (Illumina, San Diego, CA). We used R software (version 3.1.0, http://www.r-project.org/) to compare the variants in the normal and tumor samples with those in the 1000 Genomes database, which represents a nominally non-cancerous population. All samples displayed more or less an equal number of SNVs in their tumors compared with their matched normal samples. We compared or result with The Cancer Genome Atlas (TCGA). These somatic mutations were annotated with ANNOVAR.

Concordance of variants is more consistent in Illumina
We analyzed a set of samples (tumor and normal) from FFPE and FF sources on both Ion platforms. In addition, replicates of FF samples were done on both platforms to gauge the extent of replicability between different sequencing runs. Table 1 gives the number of FFPE and FF replicates done on each platform.
The concordance between two replicates was calculated as the number of variants that are the same in both replicates as a percentage of the total number of variants that have at-least one non-reference allele in either of the replicates. Within each platform we filtered and kept only those variants that were present in the targeted regions.

Discussion
NGS has the potential to allow the discovery of new target genes for prevention, treatment and diagnostic purposes. There are however many platforms that are available on the market. These platforms differ in library construction protocols, in sequencing chemistries and in informatics pipelines analyses. In this study, we used the most common two platforms (Ion Torrent and Illumina) and we discuss the effectiveness, strengths and limitations associated with NGS mutational profiling. All platforms have library preparation protocols that involve fragmenting genomic DNA and attaching specific adapter sequences. Typically, this takes somewhere between 4 and 8 hours for one sample. In addition, the Ion Torrent template preparation has a two-hour emulsion PCR and a template bead enrichment step [12].
The generation of precise and reproducible sequencing results is multifactorial and depends on correct laboratory practice and a computational pipeline used in the analysis of NGS data. The sequencing run for MiSeq is 27 hours, compared to Ion Torrent which is 2 hours. Reported accuracy of MiSeq is mostly > Q30 with observed error rate of 0.80%, compared to Ion Torrent with reported accuracy of Q20 and observed raw error rate of 1.71%. Sequence yield per run of MiSeq is 1.5-2 Gb, compared to Ion Torrent 20-50 Mb (314 chips), 100-200 Mb (316 chip), 1 Gb (318 hip). These differences in chemistries, machines operation as well as the specifics of the bioinformatics pipelines associated with each machine certainly account for some of the variations in SNVs' outcomes for the same samples as displayed in Table 1 and supplemental table. This however cannot be the case for samples of same nature (FF) that were run in duplicates in each platform and still displayed some major discrepancies in variants' outcome.
It is noteworthy that the data presented here correspond to a targeted exome sequencing genes panel (20 for the Illumina platform and 15 for Ion Torrent: the common 15 genes are depicted in Table 1 and Supplemental Table). A targeted sequencing has a high coverage and thus low error rate and better mutation calling. However, should we have done the same comparison on a whole exome or whole genome scale, the variabilities in outcome would increase exponentially, giving a non-accurate picture of the genomic/exomic landscape of the analyzed specimens.
Should we have considered one or the other (Ion Torrent or Illumina) targeted exome sequencing data for some of our CRC samples, we would be addressing almost two different tumors as the generated mutational profiles were starkly different. Increasing exomic and genomic data generation and submission to public databases will likely generate a major artefactual mutation noise that will make the task of researchers hard to sort out real from artificial mutations. It is necessary at this stage to start a curation of such data to accept only validated mutations that should have a full description of how they were generated as far as sequencing platforms, protocols and bioinformatics processing. Only variants validated through a second NGS platform or through Sanger sequencing will need to be reported.
In the context of our CRC samples, we only retain mutations that were reproducible and obtained through the same platform or detected through the second sequencing platform. Such a process allows to better characterize these tumors in light of their clinical and pathological features to provide a care management that fits the mutational profile. In conclusion and as a general rule, we do suggest that any detected mutation that needs to be functionally analyzed needs first to be validated. The detected SNVs weight has to be established in light of other processes such as copy number variations and epigenetic alterations through integrative analyses [18,19].