Figure 3: The concepts of signal-noise separations of SNV-analysis vs. quasispecies-analysis. The red dots are denoted as higher sequencing errors on the PacBio long reads. The blue and green marks represent two true mutations.
A. The PacBio reads contain the mixed signals of the true mutations (in blue and green) and many errors (in red). Of note, the errors are in random distribution.
B. Individual SNV-analysis is to measure the frequency of the variant (comparing to the wildtype reference sequence) for each given position in a one-by-one independent way. The frequencies of each position can be combined and presented in a SNV plot. The background noise (dots in red) is at the level of ~ 1% for both MiSeq and PacBio. If an individual true mutation (marks in blue or green) is at the noise area, it is difficult to distinguish it from the background noise.
C. The PacBio errors are randomly distributed, so that it is difficult to form a co-occurrence of two errors on the same read by chance. In contrast, the co-occurrence of two true mutations (in blue and green) with a relatively high frequency could be easily distinguished from noise signals. The co-occurrence pattern of quasispecies is the most important foundation for the tag-based quasispecies analysis.