MoDa-A Data Warehouse for Multi-and#8220;Omicsand#8221; Data

Sudeshna Guha Neogi; Maria Krestyaninova; Misha Kapushesky; Ibrahim Emam; Alvis Brazma; Ugis Sarkans

doi:10.4172/2153-0602.1000145

Awards Nomination 20+ Million Readerbase

PMC/PubMed Indexed Articles

Causal Inference in the Age of Decision Medicine

Mining Next Generation Sequencing Data: How to Avoid â€œTreasure in, Error Outâ€

Google Scholar citation report

Citations : 1039

Journal of Data Mining in Genomics & Proteomics received 1039 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Tweets by JohnMat36980096

Open Access Journals

Abstract

MoDa-A Data Warehouse for Multi-“Omics” Data

Sudeshna Guha Neogi, Pauls Vasilis, Maria Krestyaninova, Misha Kapushesky, Ibrahim Emam, Alvis Brazma and Ugis Sarkans

The range of various “omics” technologies for measuring properties of biomolecular entities (e.g. transcripts, proteins, metabolites) in biological samples in a high throughput manner is continuing to increase. Information systems enabling integrative exploration of results of such experiments are needed. We have developed a system, MoDa (Molecular Data warehouse), that provides a unified framework for finding and visualizing results of various experimental techniques of molecular biology. The warehouse architecture is optimized for various types of filtering and querying annotations of samples, experimental results and properties of genes and other molecular entities. The implementation is based on the BioMart technology, with enhanced means for manipulating multidimensional data. The user interface is a web-based application. An important consideration for every data warehousing project is data acquisition and cleaning. To ensure that the data uploaded into the warehouse is consistent and sufficiently well-annotated for further statistical analyses, we implemented a repository for sample and research subject data, experimental metadata, and experimental results. A gene re-annotation pipeline was used to provide a uniform reference system for the collected data along the bioentity (“gene”) dimension. We expect that the developed data warehousing infrastructure can be useful for collaborative projects employing high throughput molecular biology technologies.