MoDa-A Data Warehouse for Multi-“Omics” DataSudeshna Guha Neogi1, Pauls Vasilis2, Maria Krestyaninova3,4, Misha Kapushesky4, Ibrahim Emam4, Alvis Brazma4, Ugis Sarkans4*
- *Corresponding Author:
- Ugis Sarkans
European Bioinformatics Institute, EMBLEBI
Hinxton, Cambridge CB10 1SD, U.K
Tel: +44 (0)1223 494 603
Received date: July 24, 2013; Accepted date: October 24, 2013; Published date: October 28, 2013
Citation: Neogi SG, Krestyaninova M, Kapushesky M, Emam I, Brazma A, et al.(2013) MoDa-A Data Warehouse for Multi-“Omics” Data. J Data Mining Genomics Proteomics 4:145. doi: 10.4172/2153-0602.1000145
Copyright: © 2013 Neogi SG, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The range of various “omics” technologies for measuring properties of biomolecular entities (e.g. transcripts, proteins, metabolites) in biological samples in a high throughput manner is continuing to increase. Information systems enabling integrative exploration of results of such experiments are needed. We have developed a system, MoDa (Molecular Data warehouse), that provides a unified framework for finding and visualizing results of various experimental techniques of molecular biology.
The warehouse architecture is optimized for various types of filtering and querying annotations of samples, experimental results and properties of genes and other molecular entities. The implementation is based on the BioMart technology, with enhanced means for manipulating multidimensional data. The user interface is a web-based application.
An important consideration for every data warehousing project is data acquisition and cleaning. To ensure that the data uploaded into the warehouse is consistent and sufficiently well-annotated for further statistical analyses, we implemented a repository for sample and research subject data, experimental metadata, and experimental results. A gene re-annotation pipeline was used to provide a uniform reference system for the collected data along the bioentity (“gene”) dimension.
We expect that the developed data warehousing infrastructure can be useful for collaborative projects employing high throughput molecular biology technologies.