Clustering Mass Spectral Peaks Increases Recognition Accuracy and Stability of SVM-based Feature Selection
Mikhail Pyatnitskiy, Maria Karpova*, Sergei Moshkovskii, Andrey Lisitsa, Alexander Archakov
Institute of Biomedical Chemistry, 119121, Pogodinskaya str., 10, Moscow, Russia
- *Corresponding Author:
- Dr.Maria Karpova
Institute of Biomedical Chemistry
119121, Pogodinskaya str., 10, Moscow, Russia
Tel: +7-499- 2461641
E-mail: [email protected]
Received Date: January 13, 2010; Accepted Date: February 12, 2010; Published Date: February 12, 2010
Citation: Pyatnitskiy M, Karpova M, Moshkovskii S, Lisitsa A, Archakov A (2010) Clustering Mass Spectral Peaks Increases Recognition Accuracy and Stability of SVM-based Feature Selection. J Proteomics Bioinform 3: 048-054. doi: 10.4172/jpb.1000120
Copyright: © 2010 Pyatnitskiy M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Mass spectral profiling of serum or plasma is one of the tools widely used to make experimental diagnostic systems for different cancer types. In this approach, a set of discriminatory peaks serves as a multiplex cancer biomarker. Hence, adequate selection of peaks is a crucial stage in the development of diagnostic rule. In the present paper we propose using sequential filter and wrapper feature selection in a complete cross-validation scheme with feature selection performed at each run of crossvalidation separately. Filter feature selection is represented by hierarchical cluster analysis; recursive feature elimination coupled with support vector machine is utilized as a wrapper feature selection method. The method performance is demonstrated on previously obtained dataset with ovarian cancer and non-cancer sera. Application of our approach led to a slight but statistically significant increase in accuracy. Peak clustering favoured more stable results of feature selection and provided a biological meaning to selected m/z values. We recommend clustering of peaks as a filter dimensionality reduction for further use in mass spectral studies.