Comparison of Programmatic Approaches for Efficient Accessing to mzML FilesMiroslaw J. Gilski1,2,3 and Rovshan G. Sadygov1,2*
- *Corresponding Author:
- Dr. Rovshan G. Sadygov
Department of Biochemistry and Molecular Biology
The University of Texas Medical Branch
301 University Blvd., Galveston, TX, 77555, USA
E-mail: [email protected]
Received date: February 14, 2011; Accepted date: March 29, 2011; Published date: March 31, 2011
Citation: Gilski MJ, Sadygov RG (2011) Comparison of Programmatic Approaches for Efficient Accessing to mzML Files. J Data Mining in Genom Proteomics 2:109. doi: 10.4172/2153-0602.1000109
Copyright: © 2011 Gilski MJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The Human Proteome Organization (HUPO) Proteomics Standard Initiative has been tasked with developing file formats for storing raw data (mzML) and the results of spectral processing (protein identification and quantification) from proteomics experiments (mzIndentML). In order to fully characterize complex experiments, special data types have been designed. Standardized file formats will promote visualization, validation and dissemination of data independent of the vendor-specific binary data storage files. Innovative programmatic solutions for robust and efficient data access to standardized file formats will contribute to more rapid wide-scale acceptance of these file formats by the proteomics community. In this work, we compare algorithms for accessing spectral data in the mzML file format. As an XML file, mzML files allow efficient parsing of data structures when using XML-specific class types. These classes provide only sequential access to files. However, random access to spectral data is needed in many algorithmic applications for processing proteomics datasets. Here, we demonstrate implementation of memory streams to convert a sequential access into random access. Our application preserves the elegant XML parsing capabilities. Benchmarking file access times in sequential and random access modes show that while for small number of spectra the random access is more time efficient, when retrieving large number of spectra sequential access becomes more efficient. We also provide comparisons to other file accessing methods from academia and industry.