k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray DataVictor Chukwudi Osamor*, Ezekiel Femi Adebiyi and Ebere Hezekiah Enekwa
Department of Computer and Information Sciences (Bioinformatics Unit), College of Science and Technology, Covenant University, Ota, Ogun State, Nigeria
- *Corresponding Author:
- Victor Chukwudi Osamor
Department of Computer and Information Sciences (Bioinformatics Unit)
College of Science and Technology
Covenant University, Ota, Ogun State, Nigeria
E-mail: [email protected], [email protected]
Received date: December 16, 2012; Accepted date: December 26, 2012; Published date: December 28, 2012
Citation: Osamor VC, Adebiyi EF, Enekwa EH (2013) k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data. J Comput Sci Syst Biol 6:035-042. doi:10.4172/jcsb.1000098
Copyright: © 2013 Osamor VC, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. Despite wide variation of techniques associated with clustering, cluster analysis has become a generic name in bioinformatics and is seen to discover the natural grouping(s) of a set of patterns, points or sequences. The aim of this paper is to analyze k-means by applying a step-by-step k-means walk approach using graphic-guided analysis to provide clear understanding of the operational mechanism of the k-means algorithm. Scattered graph was created using theoretical microarray gene expression data which is a simplified view of a typical microarray experiment data. We designate the centroid as the first three initial data points and applied Euclidean distance metrics in the k-means algorithm leading to assignment of these three data points as reference point to each cluster formation. A test is conducted to determine if there is a shift in centroid before the next iteration is attained. We were able to trace out those data points in same cluster after convergence. We observed that, as both the dimension of data and gene list increases for hybridization matrix of microarray data, computational implementation of k-means algorithm becomes more rigorous. Furthermore, the understanding of this approach will stimulate new ideas for further development and improvement of the k-means clustering algorithm especially within the confines of the biology of diseases and beyond. However, the major advantage will be to give improved cluster output for the interpretation of microarray experimental results, facilitate better understanding for bioinformaticians and algorithm experts to tweak k-means algorithm for improved run-time of clustering.