Managing Proteomics Data: From Generation and
Data Warehousing to Central Data Repository |
| Herbert Thiele1, Jörg Glandorf1, Peter Hufnagel1,
Gerhard Körting2, Martin Blüggel2 |
| 1Bruker Daltonik GmbH, Bremen, Germany |
| 2Protagen AG, Dortmund, Germany |
| Corresponding author: |
Prof. Dr. Herbert Thiele,
Bruker Daltonik GmbH,
Fahrenheitstrasse 4, D 28359 Bremen,
Phone: 0049 421 2205 187;
Fax: 0049 421 2205 108;
E-mail: ht@bdal.de |
|
| Received August 09, 2008; Accepted October 25, 2008; Published December 05, 2008 |
| Citation:Thiele H , Jörg G, Peter H, Gerhard K, Martin B (2008) Managing Proteomics Data: From Generation and Data
Warehousing to Central Data Repository. J Proteomics Bioinform 1: 485-507. doi:10.4172/jpb.1000056 |
| Copyright: © 2008 Thiele H , et al. This is an open-access article distributed under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited. |
| Abstract |
With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis
software available, researchers today face major challenges validating and comparing their Proteomics data. It is
the expectation that Human Proteome Organisation (HUPO) related standardization initiatives with its standardized
data formats but also with its efforts in standardized processing and validation will lead to field-generated data of
greater accuracy, reproducibility and comparability.
Here we present a new generation of the ProteinScapeTM bioinformatics platform, now enabling researchers to
manage Proteomics data from the generation and data warehousing to a central data repository with a strong
focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It
addresses scientists‘ current needs in proteomics identification, quantification, validation and biomarker discovery.
Offering comprehensive solutions for qualitative and quantitative LC-MS/MS and gel-based protein
analysis, this proteomics data warehousing and project management software supports various discovery workflows
through a flexible analyte hierarchy, a combination of different database search engines, scoring algorithms and
quantification methods. It streamlines the discovery process through Decoy validation and the ProteinExtractor™
algorithm that produces non redundant protein result lists across entire Proteomics projects. The implemented
processing pipeline for protein identification adopts the human brain proteome project (HUPO BPP) processing
guidelines (forum.hbpp.org) and facilitates the direct submission process of Proteomics project data adhering to
HUPO/PSI publishing guidelines.
As a specific example of the HUPO based data processing strategy, the analysis of a large proteomics data set
is described, including the automatic search over four search engines to generate peptide results, the use of
Decoy databases to measure the false positive rate (FPR), the combination of peptide results by the
ProteinExtractor algorithm to non-redundant protein lists with known FPR, the automatic evaluation and cutoff
of protein lists to defined FPR and merging protein lists of four search engines to one list (ProteinExtractor) with
automatic result validation based on the defined FPR threshold value. |
|
|
|