Background: High throughput analysis like mass spectrometry dedicated to clinical proteomics offers new insights into clinical research. This promising technology generates high-dimensional datasets with a huge amount of biological input. Working with these high-dimensional datasets has created challenges for statistical methods and there are still weaknesses in current statistical analysis that have to be overcome to get an accurate interpretation of “omics” studies. The central question is that of a reliable identification of new prognostic and diagnostic biomarkers. Although observed in previous studies, these mechanisms of identification and validation of new markers have been inadequately explained and often dissociated.
Results: The aim of our study was therefore to show how candidate markers are sometimes selected in identification studies because of biased estimations of their effect. To achieve this goal, this work was conducted through the simulation of high-dimensional studies concerning survival. We showed how the selection mechanism involved in identification studies influences a mechanism called regression to the mean. This in turn leads to a biased estimation of the effect size and thus to optimism when considering validation studies.
Conclusions: This study demonstrated why the discovery of new robust markers is only possible through well-designed studies relying on consistent sample sizes for the identification step. Due to the above mentioned mechanisms of identification and validation, pertinent candidate biomarkers in high-dimensional clinical studies require non-biased estimation, and this right from the identification step. Only then will it lead to consistent studies and thus reach benefit in terms of health care.
Citation: Truntzer C, Maucort-Boulch D, Roy P (2013) Impact of the Selection Mechanism in the Identification and Validation of New “Omic” Biomarkers. J Proteomics Bioinform 6:164-170. doi: 10.4172/jpb.1000276