Scalable SNP Analyses of 100+ Bacterial or Viral Genomes
Shea N. Gardner* and Tom Slezak
Lawrence Livermore National Laboratory, Livermore, Computations/Global Security, PO Box 808, L-174, CA 94551
- *Corresponding Author:
- Shea N. Gardner
Lawrence Livermore National Laboratory
Livermore CA 94551
E-mail: [email protected]
Received date: November 25, 2010; Accepted date: December 27, 2010; Published date: December 30, 2010
Citation: Gardner SN, Slezak T (2010) Scalable SNP Analyses of 100+ Bacterial or Viral Genomes. J Forensic Res 1:107. doi: 10.4172/2157-7145.1000107
Copyright: © 2010 Gardner SN, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
With the flood of whole genome finished and draft microbial sequences, analysts need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/ or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. It identified thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. The SNP-based trees it generated were consistent with known taxonomy and trees determined in other studies. The approach described can handle as input hundreds of megabases of sequence in a single run. The algorithm kSNP is based on k-mer analysis using suffix arrays and requires no multiple sequence alignment.