Author(s): Soergel DA, Dey N, Knight R, Brenner SE
Abstract Share this page
Abstract Microbial community profiling using 16S rRNA gene sequences requires accurate taxonomy assignments. 'Universal' primers target conserved sequences and amplify sequences from many taxa, but they provide variable coverage of different environments, and regions of the rRNA gene differ in taxonomic informativeness--especially when high-throughput short-read sequencing technologies (for example, 454 and Illumina) are used. We introduce a new evaluation procedure that provides an improved measure of expected taxonomic precision when classifying environmental sequence reads from a given primer. Applying this measure to thousands of combinations of primers and read lengths, simulating single-ended and paired-end sequencing, reveals that these choices greatly affect taxonomic informativeness. The most informative sequence region may differ by environment, partly due to variable coverage of different environments in reference databases. Using our Rtax method of classifying paired-end reads, we found that paired-end sequencing provides substantial benefit in some environments including human gut, but not in others. Optimal primer choice for short reads totaling 96 nt provides 82-100\% of the confident genus classifications available from longer reads.
This article was published in ISME J
and referenced in Journal of Aquaculture Research & Development