Comparison of Programmatic Approaches for Efficient Accessing to mzML Files

Miroslaw J. Gilski; Rovshan G. Sadygov

doi:10.4172/2153-0602.1000109

Awards Nomination 20+ Million Readerbase

PMC/PubMed Indexed Articles

Causal Inference in the Age of Decision Medicine

Mining Next Generation Sequencing Data: How to Avoid â€œTreasure in, Error Outâ€

Google Scholar citation report

Citations : 1039

Journal of Data Mining in Genomics & Proteomics received 1039 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Tweets by JohnMat36980096

Open Access Journals

Abstract

Comparison of Programmatic Approaches for Efficient Accessing to mzML Files

Miroslaw J. Gilski and Rovshan G. Sadygov

The Human Proteome Organization (HUPO) Proteomics Standard Initiative has been tasked with developing file formats for storing raw data (mzML) and the results of spectral processing (protein identification and quantification) from proteomics experiments (mzIndentML). In order to fully characterize complex experiments, special data types have been designed. Standardized file formats will promote visualization, validation and dissemination of data independent of the vendor-specific binary data storage files. Innovative programmatic solutions for robust and efficient data access to standardized file formats will contribute to more rapid wide-scale acceptance of these file formats by the proteomics community. In this work, we compare algorithms for accessing spectral data in the mzML file format. As an XML file, mzML files allow efficient parsing of data structures when using XML-specific class types. These classes provide only sequential access to files. However, random access to spectral data is needed in many algorithmic applications for processing proteomics datasets. Here, we demonstrate implementation of memory streams to convert a sequential access into random access. Our application preserves the elegant XML parsing capabilities. Benchmarking file access times in sequential and random access modes show that while for small number of spectra the random access is more time efficient, when retrieving large number of spectra sequential access becomes more efficient. We also provide comparisons to other file accessing methods from academia and industry.