iSAVE: A Novel Validation Strategy for Next Generation Sequencing Mutation Profiling in FFPE Tissues

Ever since the publication of “Intratumor Heterogeneity Revealed by Multiregion Sequencing” in NEJM in 2012 by Gerlinger et al. questions such as, “How can we expect precision medicine to be effective if tumors are so heterogeneous?” are raised at almost every biomarker or precision medicine conference [1,2]. Despite this concern, new biomarkers, genetic tests and gene signatures are discovered on a routine basis for both diagnostic and prognostic applications [3-6]. Several of these have become FDA cleared and/or Medicare reimbursed. Clearly, there is a disconnection between those people who have concerns about the impact of tumor heterogeneity and those who do not. Many publications focus on a very specific example and use that to speculate on implications and potential future applications in the discussion section. It is often up to the readers to interpret the results and decide how to take the advantage of this information even if the authors already set the boundary and limitation of how such results might be interpreted. After trying to interpret the results obtained from Gerlinger’s publication, we initiated a study to investigate the impact of tumor heterogeneity on both RNA expression and DNA mutation profiling in different sections and regions of a tumor.


Introduction
Ever since the publication of "Intratumor Heterogeneity Revealed by Multiregion Sequencing" in NEJM in 2012 by Gerlinger et al. questions such as, "How can we expect precision medicine to be effective if tumors are so heterogeneous?" are raised at almost every biomarker or precision medicine conference [1,2]. Despite this concern, new biomarkers, genetic tests and gene signatures are discovered on a routine basis for both diagnostic and prognostic applications [3][4][5][6]. Several of these have become FDA cleared and/or Medicare reimbursed. Clearly, there is a disconnection between those people who have concerns about the impact of tumor heterogeneity and those who do not. Many publications focus on a very specific example and use that to speculate on implications and potential future applications in the discussion section. It is often up to the readers to interpret the results and decide how to take the advantage of this information even if the authors already set the boundary and limitation of how such results might be interpreted. After trying to interpret the results obtained from Gerlinger's publication, we initiated a study to investigate the impact of tumor heterogeneity on both RNA expression and DNA mutation profiling in different sections and regions of a tumor.
Through the effort, an Excel tool called concordance calculator was developed to quantify multi-variant call reproducibility among triplicates. The detail of this tool has been described in a recent publication [1]. Every variant call that passes a basic quality filter is evaluated for its reproducibility in this tool. The reproducibility is defined as the number of reproducible calls divided by the number of all variant calls above a pre-specified variant frequency (VF, such as 5%) among three or more replicates. In that study, the calculator was used to evaluate different NGS methods, and the conclusion was the Ion Torrent AmpliSeq cancer panel showed a very good reproducibility

Abstract
We previously developed a "concordance calculator" to quantify reproducibility of multi-variant calls among next generation sequencing (NGS) samples and replicates. This tool and a novel replicate approach have been also used to eliminate many different technical artifacts including post tissue collection modifications (PTCM) such as deamination and oxidation artifacts. Here we apply this approach to study the impact of tumor heterogeneity among consecutive FFPE tissue sections across entire tumor blocks and cannot detect any impact of heterogeneity among different sections/regions of tumors in terms of mutation profiles using the NGS AmpliSeq Cancer Panel, even though the tumor was visibly heterogeneous according to the H&E images and pathological review. RNA expression profiling using a NanoString Cancer Panel found significantly different expression patterns among different sections/ regions. Additional studies in a different tissue type also found no detectable discrepancies among different tissue sections in terms of their mutation profiles. If confirmed by further studies, these results using FFPE tissue sections would suggest that DNA mutation signatures as novel biomarkers for cancer diagnosis and prognosis might be less sensitive to tumor heterogeneity than RNA-based expression signatures, at least based on the performance and sensitivity of current DNA/RNA profiling technologies. Use of the concordance calculator to quantify reproducibility of multi-variant calls among Next Generation Sequencing replicates and to eliminate many different technical artifacts including PTCM also allowed us to develop an unconventional validation strategy. We call this validation approach "in situ analytical validation and evaluation (iSAVE)." As a proof of concept, we evaluated the RainDance ThunderBolts Cancer Panel and demonstrated analytical validation directly on each and every clinical sample. This strategy also comprises using a set of normal FFPE tissue samples in the validation process to eliminate platform-, panel-, amplicon-, library-preparation-, and mutation calling pipeline specific artifacts.

Ken CN Chang*, Gladys Arreaza, John Kang, Maureen Maguire, Ping Qiu and Matthew J Marton
Translational Biomarkers, Merck and Co., Inc., Rahway, NJ 07065, USA from library prep to library prep (~95%) while the Illumina TruSeq cancer panel did not. On the other hand, Illumina MiSeq instrument provided better run-to-run reproducibility than those from Ion Torrent PGM (Personal Genome Machine). Since technology is rapidly changing and both manufacturers have made upgrades to their instruments, reagents and library preparation protocols, the above conclusion was only applied to that particular study.
Very interestingly, we found that the Illumina TruSeq cancer panel generated many more C to T variant calls than did the Ion Torrent platform. The dominant variant call in the Illumina TruSeq Cancer Panel was clearly observed to be C to T, while data generated on Ion Torrent AmpliSeq Cancer Panel with the same gDNA did not have this pattern. It is clear that most of those C to Ts were artifacts or what we called Post Tissue Collection Modifications (PTCM, such as deamination or oxidation, which depends on which tissue type and fragmentation procedure was used) [1]. In addition, the Illumina TruSeq library preparation protocol increased the chance of bringing these PTCMs above background level. One way to eliminate these PTCM artifacts is to run triplicate library preparations and to use the concordance calculator to exclude them, since the overwhelming majority of these PTCM artifacts are not reproducible because they occur randomly like noise [1]. This tool can also be used to evaluate the effectiveness of bioinformatics filters, as a good filter will keep more concordant calls and eliminate more discordant calls.
It turned out that the library preparation protocol is the key for the reproducibility of variant calls with low VF. We called the Ion Torrent AmpliSeq protocol the "Amplify First" protocol, and the Illumina TruSeq protocol the "Amplify Later" protocol because Ion Torrent AmpliSeq protocol starts with 20 cycles of PCR amplification, and the amplification step in the Illumina TruSeq protocol is toward the very end of the library preparation [7][8][9][10]. In addition to the longer size of amplicon, this "Amplify Later" protocol is responsible for the big variation to the variants with low VF. Also, a plausible explanation is that the primers designed to hybridize to only one strand in the TruSeq method will double the probability of C to T artifacts compared to protocols that copy from both strands like PCR amplification [1].
Based on all the above findings, we concluded, in order to get reproducible results for mutations with variant frequency between 5% and 25% (which is the range where somatic mutations are most likely going to be found in FFPE tissue) it is recommended to use gene panels with short amplicon sizes (preferably <120 bp) and an "Amplify First" protocol like those in the Ion Torrent AmpliSeq method. Furthermore, one should consider running triplicate library preparations for every single sample in order to optimize the NGS bioinformatics pipeline and generate reproducible mutation profiling clinical data, since we also found that the same mutation in different samples or different mutations in the same sample could all be measured differently in terms of their detection sensitivity [1,[11][12][13]].
All these above mentioned previous study results as well as the interest of studying the impact of tumor heterogeneity among FFPE tissue sections led us to design an unconventional analytical validation approach for an NGS targeted cancer gene panel to support the exploratory objectives of an on-going clinical trial. Many laboratories view validation as one time event that is summarized in a validation report. However, since it's impossible to validate every possible mutation there is a need for ongoing performance verification or validation. In this manuscript we provide an example of how this in situ analytical validation and evaluation (iSAVE) strategy works. Each clinical sample will be run in triplicate, which might typically be 7-10 samples plus normal control and standards all in one MiSeq run. Our data show that by starting with basic QC filters and only including reproducible calls, and then removing those reproducible variant calls that show up across all samples (including normal tissue samples), we are then able to eliminate most artifact mutation calls from these clinical samples, resulting in low false positive and false negative rates.

FFPE tissue source and sample preparation
Colorectal cancer (CRC) FFPE tissue blocks that had been profiled with an NGS-based cancer panel and their corresponding 5 µm sectioned slides were purchased from BioChain Institute, Inc. (Newark, CA). Genomic DNA (gDNA) was isolated using Qiagen DNA FFPE Tissue Extraction Kit (Qiagen, Germantown, MD). Qubit (Life Technologies, Carlsbad, CA) and Nanodrop (Thermo Fisher, Waltham, MA) quantification as well as Bioanalyzer (Agilent Technologies, Santa Clara, CA) DNA quality analyses were done according to the standard protocols provided by the manufacturers.

Ion Torrent AmpliSeq library preparation procedures
All standard library preparation protocols for FFPE tissue samples were followed according to the manufacturer's instruction. A standard Qubit-quantified gDNA input amount of 10 ng for Ion Torrent Cancer Panel (Life Technologies, Carlsbad, CA) was used for all the experiments unless otherwise specified.

RainDance ThunderBolts cancer panel library preparation procedures
The standard RainDance ThunderBolts (RainDance Technologies, Inc, Billerica, MA) cancer panel library preparation protocol was followed with a few modifications. Briefly, 20 ng of purified gDNA from RCC FFPE tissue samples were used for the first PCR reaction using the ThunderBolts cancer panel set 1 and set 2. AmPure beads (Beckman Coulter, Beverly, MA) were used for the purification, followed by the 2nd PCR reaction for Illumina adapters and index addition. After purification using AmPure beads, amplified libraries were quantified by Agilent Bioanalyzer and were normalized to 2 nM prior to sequencing.

Illumina MiSeq and Ion Torrent Ion Personal Genome Machine (PGM) sequencing procedures
MiSeq Reagent Kit v2 and Ion 318 Chip Kit were used for all the NGS runs with MiSeq and Ion PGM, respectively. Standard protocols were followed for all experiments and instrument runs [14,15], and the reagents were freshly prepared for each run. Seven samples (6 target samples plus a control sample) were multiplexed per chip or flow cell using unique index/barcodes.

Data analysis process and sequence analysis
Data analysis presented was performed using either MiSeq Reporter (Illumina, San Diego, CA), Torrent Suite (Ion Torrent, Life Technologies), or OmicSoft (OmicSoft Corporation, Cary, NC) (for both platforms) as mentioned in each result sub-section. The default setting of Q score cutoff for OmicSoft was Q13 unless otherwise specified.

Definition of concordant calls among replicates (The concordance calculator Excel tool)
The concordance calculator (and Microsoft Excel-based tool) was designed and developed to identify the range of variant frequency (VF) for which variant calls between replicates are concordant for a particular data set. The details of this procedure were recently published [1]. Here is a brief summary of the unique features of this Excel tool. The tool can be applied to any particular data set containing replicate data by 1) starting with a list of minimally filtered variant calls from one replicate, 2) calculating the acceptable VF range using the equation of VF +/-(%VF × acceptable %CV + % background variation), then 3) looking for the same ID of the variant on the other replicate data sets to determine if such variant call can be found, and if yes, 4) determining whether the corresponding VF is within the calculated acceptable range. If both answers are yes, this variant call is considered repeatable or reproducible. The tool permits one to report % repeatability in the context of specific filters, which then should be included in the validation report or clinical sample testing report to specify the limitation of the data.

RainDance digital PCR procedure and primer design
Primer design and sequences for dPCR confirmation are summarized in the dPCR Excel sheets (Supplementary Table 1). Taqman Genotyping Master Mix (Life Technologies) was used in the dPCR reaction setup. A RainDance RainDrop digital PCR instrument was used to perform sample preparation (RainDrop Source) and analysis (RainDrop Sense). RainDance dPCR standard protocols were used for all these studies. RainDrop Analyst Software from RainDance was employed for data analysis.

Tumor heterogeneity experimental design
The experimental design for the study of the impact of tumor heterogeneity on different FFPE tissue sections from same tissue block was done using one single CRC tissue block and divided into 6 consecutive regions (20 micron each). The basic design of the experiment is the independent evaluation of 6 different regions/ sections starting from gDNA isolation, followed by independent library preparation and analysis in a multiplexed run in the same chip. The Nanostring GX Human Cancer Reference Kit was purchased from NanoString directly and 50 ng of RNA was used for all samples according to the standard protocol. A hierarchical clustering heat map was used to analyze the RNA profiles among different regions/sections of FFPE tissue. A design diagram that depicts how the experiments were conducted is shown in Figure 1.
The experimental design for ThunderBolts Cancer Panel variant calls analytical validation involved an un-conventional approach. Two independent runs were carried out by using the ThunderBolt Cancer Panel with RCC samples analyzed on the MiSeq instrument. The first run was used as the training dataset, serving the purpose of tuning the pipeline parameters, and the second run was considered the testing dataset as part of the validation.

Data analysis procedure and NGS variant-calling pipeline
According to the results obtained from previous studies [1] the vast majority of false positive mutation calls are not reproducible and therefore can be identified and eliminated by the use of replicate experiments. In addition, the default settings of many of the existing NGS software packages are not optimized for targeted somatic mutation detection because the defaults impose filters that are very stringent in order to reduce false positive rate, and therefore increase the risk of false negatives.
Taking these findings into consideration, and following the general procedures recommended by Broad Institute's "DNA-Seq Best Practices" (https://www.broadinstitute.org/gatk/guide/bestpractices?bpm=DNAseq), we developed a customized quality-based variant caller that integrates basic quality filters and the reproducibility of mutations across triplicate experiment when making mutation calls.

Identification of quality control parameters
From the NGS mutation calling pipeline outlined above (Figure 2), the following quality control (QC) parameters were obtained: coverage and the number of mutation reads, strand bias, homo-polymer length, the median map quality score of all reads at a chromosome position (Median MQ), and the median base quality score of all reads at a chromosome position (Median BQ).
The strand bias score was calculated as, max (VpCm, VmCp) VpCm VmCp + which is the formula used by Ion Torrent (Torrent Variant Detection Algorithms 3.4). In more detail, Vp= # of alt reads on positive strand, Cp= # of total reads on positive strand, Vm=# of alt reads on minus strand, and Cm= # of total reads on minus strand. By examining this strand bias formula, one would expect that when there is no strand bias, the strand bias score will take the value of 0.5, whereas when there is a strand bias, the strand bias score will tend to 1.
Median BQ and median MQ are indicators of the quality of a chromosome position, because if most of the reads at a chromosome position have low base or map quality scores, we will have less confidence in the mutation call produced at this position. Homo-polymers are stretches of repeating nucleotides in the reference genome; very long homo-polymers form repeats and are difficult to sequence. Variants in homo-polymer runs exceeding eight bases were filtered out (R8 filter as implemented by the Illumina VCF QC filter).
Additional data analysis methods can be found in the Supplemental Methods including filtering out reproducible calls from triplicate data for each sample followed by visual inspection of difficult sequencing regions.

Experimental design to evaluate the impact of tumor heterogeneity on RNA and DNA profiles from FFPE tissue consecutive regions
A colorectal cancer (CRC) FFPE tissue block was randomly selected from the previous platform evaluation study [1]. This FFPE tissue block was evenly divided into six different regions (20 micron each). H&E-stained slides for different regions or sections of FFPET block were reviewed by two pathologists (Supplementary Figures S1-S6 for tissue H&E stains). Although there were no dramatic differences among different sections in terms of gross histology, some degree of heterogeneity from area to area was observed. Each of the six sections was subjected to DNA and RNA preparation. A NanoString commercial GX Human Cancer Reference Kit (230 cancer genes) was used for RNA expression profiling and an Ion Torrent AmpliSeq Cancer Panel (which includes most well documented oncogenes and tumor suppressor genes) was used to profile DNA mutations. Similar to the previous study, intra-run and inter-runs were performed [1]. All samples from 6 regions were processed at the same time throughout the entire profiling process.

Comparison of RNA expression profiles from different regions/sections of a FFPE tissue block using NanoString GX Human cancer panel
NanoString RNA expression profiling data (an average of two replicates) showed some significant differences among different regions or sections as shown in Figure 3 using unsupervised two-dimensional hierarchical clustering. To prove that the observed differences were not due to the variation coming from NanoString cancer panel RNA expression platform, individual replicates for clustering analysis were used. All replicates remained in the same cluster (Supplementary Figures  S7 and S8), showing that the replicate variability was much lower than those among different regions/sections, and that the different profiles in the two different clusters among some regions that were statistically significantly different were not experimental variations.
In general, approximately 36 genes (Table 1) out of 230 total genes had gene expression changes greater than 2-fold among six different tissue regions/sections. Although these are not big changes, they are statistically significant (p-values between the two main clusters were less than 0.05) sufficient to illustrate that intra tumor heterogeneity might cause some impact on RNA expression profiles among different sections.

Comparison of DNA mutation profiles on different regions/ sections of a FFPE tissue block on Ion Torrent AmpliSeq cancer panel
Using the custom-designed concordance calculator we calculated the reproducibility of variant calls among different regions/sections profiled on Ion Torrent AmpliSeq Cancer Panel and found it was greater than 96% (for those variants with VF between 5 and 25%), which is at least as reproducible as that from six different replicate library preparations using same gDNA [1]. The Concordance Calculator uses very stringent outlier rejection and concordance acceptance criteria based on the degree of VF differences. If only the variant calls detected or not detected were counted, the reproducibility for the two sets of 3-region comparisons (the current version of the concordance calculator only allows analysis of a set of 3 replicates) was both 100%. Therefore, the result clearly shows, although there are some significant impacts on common cancer gene RNA expression profiles from region to region (likely due to tumor heterogeneity using FFPET slides) there is no detectable impact on DNA mutation profiles among frequently encountered cancer genes (based on the data generated using AmpliSeq Cancer Panel).
In addition, other randomly picked tissue samples were subjected to a similar evaluation (e.g., different sections from same renal cell carcinoma FFPE tissue block, unpublished internal study), and we found no significant differences in different regions of the FFPE tissue blocks DNA variant calls that the Calculator called reproducible.
Randomly selected variant calls from Sections 1-3 and Sections 3-5 from the concordance calculator variant call reproducibility evaluation are shown in Tables 2 and 3. Table 2 shows that variant calls with variant frequency (VF) 5% to 25% for sections 1-3 have an overall reproducibility of 96.8%. Variant calls with VF 5% to 25% for sections 3-5 also have a reproducibility of greater than 96% (Table 3) Table 2 and chromosome 18 position 48584629 in Table 3). The grey boxes show the final tally of reproducible calls divided by the total variant calls (including non-reproducible calls) to generate percent reproducibility. Furthermore, all variant calls with VF higher than 30% among these sections were 100% reproducible according to the concordance calculator (data not shown).

Analytical performance and validation strategy of RainDance ThunderBolts cancer panel mutation profiling using FFPE tissue samples
Since the results obtained above using a commercial NGS cancer panel suggest that the impact of intra-tumor heterogeneity for mutation profiling studies might be minimal across different sections of an individual FFPE tissue block, and the cost effectiveness of running replicates is important, we decided to evaluate and validate the RainDance ThunderBolts Cancer Panel to support exploratory objectives of renal cell carcinoma (RCC) clinical trial. An example of variant calls from replicate library preparations using the ThunderBolts cancer panel is shown in Table 4. All consistent variant calls (based on the Concordance Calculator) with VF greater than 5% among replicates from the same run were reproducible run-to-run (Supplemental data), and even at 3%, most of the consistent variant calls were also reproducible between runs (Supplemental data). We also included triplicate normal (non-tumor) FFPE tissue in the analytical validation strategy to eliminate panel-/amplicon-/platform-/library preparationas well as pipeline specific artifacts besides those PTCM artifacts (data not shown). It is interesting to note that in our initial analysis of reproducibility of variant calls among all replicate samples, out of approximately 350 variant calls there were more than 100 variant calls that were present across all samples including non-tumor control FFPE tissue samples (Supplemental information and Training run Omicsoft Pass 2). This observation indicates that there is a large portion of reproducible variant calls that are likely false positives, which further validated the use of non-tumor control FFPE tissue samples in these analyses. A triplicate strategy is a prerequisite of using commercial ThunderBolts Cancer Panel in this case, since without it the runto-run variant calls will not be as reproducible. Table 4 shows that random high quality score artifacts (Illumina quality score of 100 with very high coverages as shown in the parenthesis following the variant frequency; most likely those PTCM) with mainly C to T or G to A calls can effectively be removed using triplicate strategy. One could set a   specific LOD (limit of detection, 5% or as low as 3%), then use triplicate FFPET replicates to filter out a set of reproducible variant calls followed by locking down the optimized pipeline after confirming a subset of variant calls using an orthogonal method. The confirmation rate can then be reported and used for clinical sample testing in triplicate. In this case, the analytical validation is incomplete until a set of reproducible calls are filtered out right on the spot (or "in situ") using real clinical samples. We name this the "in situ analytical validation and evaluation (iSAVE)" strategy.

Analysis of inter and intra-day variability
This novel analytical validation strategy involves an unconventional approach to determine the reproducibility of all-variant calls. Since only reproducible calls were included in the final data analysis the variant call reproducibility within run is 100%. Based on the tworun results (optimization run and validation/confirmation run), we determined the run-to-run (inter-day) reproducibility is approximately 95% without manual inspection and filtering, and 100% with manual inspection and the elimination of apparent false positives.

Sensitivity
The sensitivity in this validation process is defined as the lowest variant frequency (using this triplicate strategy) that is able to achieve reproducible run-to-run variant calls. Although more gDNA input might reduce PTCM artifacts, we do not expect a further reduction of false positive or negative rates as only reproducible variant calls will be used for data analysis. Based on the data the detection sensitivity was greater than 3% for most within-run reproducible variant calls (5% for all within-run reproducible variant calls). To test whether calls were accurate, we analyzed samples of different quality then confirmed accuracy of the calls by digital PCR (dPCR) analysis (Supplemental Table S1). Thus, the current input amount of 20 ng Qubit-quantified gDNA is expected to accommodate various ranges of FFPE tissue sample quality since true signals should remain reproducible while artifacts such as PTCM (deamination or oxidation) and noise should remain non-reproducible [16,17].

Accuracy
All variant calls selected from the optimization run from four different RCC FFPE tissue samples were confirmed by dPCR analysis.  Therefore, the accuracy was 100%, although several variant calls that dPCR could not be clearly resolve were not included in the calculation as they were inconclusive due to difficult sequence regions (Supplemental Table S1). Table 5 shows an example of how this in situ analytical validation strategy works if applied to clinical sample testing. Each set of triplicates (as shown in three consecutive columns of data) mimics a clinical sample analyzed in triplicate; 7-10 samples including normal control and standards in triplicate profiled on the ThunderBolts Cancer Panel can be accommodated in one MiSeq run (the cost is about $100-150 per replicate). We started with basic QC filters and only included reproducible calls, then removed those reproducible variant calls that were observed across all samples including normal tissue samples (these likely represent panel-specific or amplicon-specific artifacts). To approximate the false positive rate (FPR) and false negative rate (FNR), we performed RainDrop dPCR confirmation on all variant calls for which PCR primers could be successfully designed (Supplementary  Table S1). When each sample was then analyzed in triplicate as another validation run using very low stringent QC filters to minimize the FNR, we obtained an identical list of mutations. This approach allowed us to eliminate most artifact mutation calls from these clinical samples with very low FPR and FNR.

Discussion
As mentioned earlier in the Introduction, often an influential publication causing a lot of discussion and debate results more from the authors' interpretation than from the data itself. This could be exemplified by the following direct quotes from the 2012 New England Journal of Medicine (NEJM) paper published by Gerlinger et al. First, in reference to their experiment involving NGS analysis of nine biopsies of a tumor, the authors stated "Such spatially separated somatic mutations altering pathway activity suggest that multiregional analyses may be required to predict the therapeutic outcome." In other words, the authors raised a concern without proposing a practical solution. How do we know which region is going to give us the correct diagnosis or yield data predictive of treatment response if biopsies from nine different regions are expected to give nine different answers? The other direct quote from the same paper is "Identification of common mutations located in the trunk of the phylogenetic tree may contribute to more robust biomarkers and therapeutic approaches". Basically, the first statement is unrealistic (at least based on current clinical practice) but the second is the reality, so the authors can be commended for presenting both views. However, in this paper the authors showed different regions have different private mutations or shared mutations, but their conclusion is only based on a single experiment without replicates. If we use one DNA stock solution to generate different library preps, and focus only on the highest quality variant calls (such as those with Illumina quality score of 100) and consider each of our independent library preps like one of their individual nine biopsies, then we too would see many private and shared mutations in each of the library preps. In other words, it is only by doing replicates you can distinguish artifacts from genuine private and shared mutations. The non-tumor control identifies the artifactual shared mutations; the replicates identify the artifactual private mutations. This is the primary reason that we studied the impact of tumor heterogeneity among different FFPE tissue sections/regions. That is, we were planning to perform mutation profiling studies using FFPE tissue sections of patient tumor samples and wanted to investigate the concern that "if tumors are so heterogeneous, how could you use one or a couple consecutive FFPE tissue sections to make clinical decisions or treatment response predictions?".

The impact of tumor heterogeneity on RNA expression and DNA mutation profiling
In our study (Figure 1), although the RNA expression levels appeared to be impacted significantly by tumor heterogeneity, we were unable to detect evidence that tumor heterogeneity impacted the DNA mutation profiles among different regions for the genes and targeted sequence regions included in the cancer panel. In other words, no socalled "private mutations" were identified among different regions or sections from same FFPE tumor block. We have also evaluated RCC (renal cell carcinoma) FFPE tissue block sections and, likewise, no reproducible "private mutations" were identified (unpublished results). We will continue to evaluate different FFPE tissue samples to see if we are able to identify reproducible private mutations, and if we eventually find one, it may only represent the exception rather than the norm. Whether the example presented in the NEJM publication is just an exception or not is unclear. Because of PTCM and related effects, one thing that is very clear is that using FFPE tissue sections (which has been the clinical practice for decades) is likely going to enrich common mutations ("shared mutations") and dilute private mutations if private mutations do exist; and in order to claim the status of private mutations we need to make sure these so-called private mutations are reproducible from library preparation to library preparation. Of course if we perform whole exome sequencing (WES) the chance of finding private mutations in different regions from any given tumor or FFPE tissue block is likely to be higher, however, this is yet to be directly demonstrated and confirmed using orthogonal methods [18]. We are currently applying these methodologies to evaluate the reproducibility of mutations identified through whole exome sequencing (WES) using three independent library preparations.

Key points related to the novel NGS gene panel analytical validation strategy
Based on all the information we learned previously we crafted an unconventional analytical validation strategy. Here are some key points we considered based on our recent results. We found that the same mutation on different samples (such as samples with different degree of fragmentation) or different mutations in a given sample could all have different detection sensitivities. Also, the same mutation on different platforms, different panels or even on FFPE cell lines versus FFPE tissues could have different detection sensitivities. Furthermore, hot spot mutations, rare mutations as well as mutations with different variant frequency ranges could all perform differently in terms of detection sensitivity and reproducibility. The analytical validation is incomplete (and if there is a way to ensure analytical performance we ought to explore it) until the reproducibility of each individual mutation is evaluated. Therefore, we decided to use ThunderBolts Cancer Panel from RainDance to execute this novel analytical validation strategy because it combines an "Amplify First" library preparation protocol (similar to the one used in Ion Torrent AmpliSeq, the more reproducible protocol) and uses the MiSeq instrument (the platform with the higher run-to-run reproducibility). The only further optimization we would recommend is to shorten the amplicon design [19]: the current commercially available design did result in the observation of a high number of PTCMs. This suggests that unless replicates are run, it may not be possible to get reproducible results on the ThunderBolt platform either.

Practical use of "in situ analytical validation and evaluation" strategy
As presented in the result section (Table 4), because all consistent variant calls among replicates from the same run with VF greater than 5% were reproducible run-to-run, we were able to use a triplicate strategy and include normal FFPE tissue in the analytical validation run to eliminate panel-, amplicon-, platform-, even library preparationand pipeline-specific artifacts in addition to those PTCM artifacts. We have confirmed all of the variant calls derived from the training run using dPCR as orthogonal method for variants for which it was possible to design PCR primers. The validation run yielded an identical list of mutations using very low stringent QC filters. This in situ analytical validation strategy can be applied to WES assay validation as our unpublished WES studies suggest similar artifacts are observed.

Conclusion
The current advantage of NGS is its ability to detect unknown or rare mutations. Otherwise, cheaper, more effective and faster turnaround time options are readily available. However, often the strategy of eliminating false positives in diagnostic NGS panels or even WES data analysis is to apply aggressive QC filters [18]. This approach could potentially increase the false negative rates of variant detection, and many of the false negatives are likely to be those very important rare or unknown somatic mutations. Interestingly, strategies or data analysis tools for developing reliable low false positive rate diagnostic NGS panels were also adapted by scientists in the field for developing NGS mutation profiling prognostic panels even though in WES that both low false positive and negative rates are equally important. One could argue that in this case, low false negative rate should be prioritized higher than achieving a low false positive rate since many false positives could be eliminated through the confirmation of hypothesis driven biomarker discovery process and will not withstand any type of confirmation. It is rare that any QC filters could accomplish the goal of reducing false positive rate without impacting the false negative rate. Hence, a set of relatively low stringency QC filters should be used in the beginning of the data analysis process. A triplicate strategy is a very effective way to eliminate false positives or technical artifacts without the need to depend on high stringency QC filters. The authors understand the challenge of the high cost associated with running triplicate NGS samples. However, using whole exome sequencing as an example, if a mutation assay is not able to deliver a highly reproducible list of identified somatic mutations, we may need to ask ourselves if this technology is ready for prime time. Hence, we propose the following concept: for each clinical lab to include at least one unique randomly selected clinical sample from a previously run in each and every new batch of NGS clinical assay/test, starting from nucleic acid [13]. Therefore, over a period of time and through the entire sample testing process, some degree of real-world intra-laboratory reproducibility of clinical samples would be generated in terms of the total number and the identity of mutations reported. If the results show the reported mutation calls are highly reproducible we can at least have confidence that the decisions made for the patients are done in a responsible manner. Even with only reproducible calls are reported, there is no guarantee that all the reproducible calls will be true positives (especially if no normal control samples are run in replicates to remove the artifactual "shared mutations;" see Results), but at least we know if the test is done again the same results should be obtained. Finally, if the results show a significant number of variant calls are not reproducible, such tests should be re-evaluated and analytical validation re-designed to address this shortfall.