Management of High-Throughput
DNA Sequencing Projects: Alpheus |
Neil A. Miller1, Stephen F. Kingsmore1, Andrew D. Farmer1,
Raymond J. Langley1, Joann Mudge1, John A. Crow1, Alvaro J. Gonzalez1,3,
Faye D. Schilkey1,Ryan J. Kim1, Jennifer van Velkinburgh1, Gregory D. May1,
C. Forrest Black2, M. Kathy Myers1, John P. Utsey1, Nicholas S. Frost1, Selene M. Virk2,
David J. Sugarbaker2, Raphael Bueno2, Stephen R. Gullans2,
Susan M. Baxter1,4, Steve W. Day1, and Ernest F. Retzel1* |
| 1National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA |
| 2International Mesothelioma Program, Division of Thoracic Surgery, Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA |
| 3Current address: Computer and Information Sciences Department,
101 Smith Hall, University of Delaware, Newark, DE 19716 |
| 4Current address: San Diego State University, 5500 Campanile Dr., San Diego, CA |
| *Corresponding author: |
Dr. Ernest F. Retzel, Ph.D., Program Leader, National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505,
E-mail: efr@ncgr.org |
|
| Received December 18, 2008; Accepted December 22, 2008; Published December 26, 2008 |
| Citation: Miller NA, Kingsmore SF, Farmer AD, Langley RJ, Mudge J, et a l. (2008) Management of High-Throughput DNA Sequencing Projects: Alpheus. J Comput Sci Syst Biol 1: 132-148. doi:10.4172/jcsb.1000013 |
| Copyright: ©2008 Neil A. Miller et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
| Abstract |
High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural
and basic biological research. Concomitant with the opportunities is an absolute necessity to manage
significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database
and visualization software for use with massively parallel DNA sequencing technologies that feature multigigabase
throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis),
Roche-454 (pyrosequencing) and Applied Biosystem’s SOLiD (sequencing-by-ligation). Alpheus enables alignment
to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression
levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous
and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop
codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls dynamically based on consistency,
expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while
maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between
cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed,
with data export to SAS JMP Genomics for statistical analysis. |
|
|
|