Node-Oriented Workflow (NOW): A Command Template Workflow Management Tool for High Throughput Data Analysis Pipelines
Eric B. Lipsky1, Brian R. King2, Gerard Tromp1*
1Sigfried and Janet Weis Center for Research, Geisinger Health System, 100 North Academy Ave., Danville, PA 17822, USA
2Dept. of Computer Science, Bucknell University, 1 Dent Drive, Lewisburg, PA 17837, USA
- *Corresponding Author:
- Gerard Tromp
Sigfried and Janet Weis Center for Research
Geisinger Health System, 100 North Academy Ave.
Danville, PA 17822, USA
E-mail: [email protected]
Received date: June 06, 2014; Accepted date: June 28, 2014; Published date: June 30, 2014
Citation: Lipsky EB, King BR, Tromp G (2014) Node-Oriented Workflow (NOW): A Command Template Workflow Management Tool for High Throughput Data Analysis Pipelines. J Data Mining Genomics Proteomics 5:159. doi:10.4172/2153-0602.1000159
Copyright: © 2014 Lipsky EB et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Next generation sequencing (NGS) systems produce vast quantities of data that require substantial
computational resources for typical analysis tasks. In addition, data that are generated by different NGS systems are not homogeneous. Moreover, there are an overwhelming number of tools available for performing typical tasks. Managing NGS workflows involves writing custom scripts that quickly grow in complexity, often resulting in unwieldy workflows that underutilize typical high performance compute resources, and increase the demands of the staff managing these workflows. We present Node-Oriented Workflow (NOW), a dynamic command template workflow engine for high performance distributed computing (HPC) systems. Our system provides a simple-to-use browserbased front end for designing and managing complex workflows. Workflows are configured using a simple browser interface, and are managed by the integrated job engine, which initializes nodes, monitors node status, and processes results of individual jobs across nodes in an HPC configuration. We reduce excessive messaging across
nodes by placing the burden on nodes to start tasks in a workflow when dependencies are met, i.e., node oriented workflow. Our system was designed for NGS processing in the clinical research setting, emphasizing user simplicity, tool scalability, minimization of redundancy in workflows, while maximizing throughput in an HPC environment. Furthermore, NOW is not restricted to NGS pipeline management, but can used to manage any computational pipeline.