Management of High-Throughput DNA Sequencing Projects: Alpheus
Neil A. Miller1, Stephen F. Kingsmore1, Andrew D. Farmer1, Raymond J. Langley1, Joann Mudge1, John A. Crow1, Alvaro J. Gonzalez1,3, Faye D. Schilkey1,Ryan J. Kim1, Jennifer van Velkinburgh1, Gregory D. May1, C. Forrest Black1, M. Kathy Myers1, John P. Utsey1, Nicholas S. Frost1, Selene M. Virk1, David J. Sugarbaker2, Raphael Bueno2, Stephen R. Gullans2, Susan M. Baxter1,4, Steve W. Day1, and Ernest F. Retzel1*
- *Corresponding Author:
- Dr. Ernest F. Retzel, Ph.D., Program Leader
National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe,
Email : [email protected]
Received Date: December 18, 2008; Accepted Date: December 22, 2008; Published Date: December 26, 2008
Citation: Miller NA, Kingsmore SF, Farmer AD, Langley RJ, Mudge J, et al. (2008) Management of High-Throughput DNA Sequencing Projects: Alpheus. J Comput Sci Syst Biol 1:132-148. doi: 10.4172/jcsb.1000013
Copyright: © 2008 Neil A. Miller, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multigigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem’s SOLiD (sequencing-by-ligation). enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls dynamically based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.