Special Issue Article
A Relevant Clustering Algorithm for High- Dimensional Data
|Bini Tofflin.R1, Kamala Malar.M2, Sivasakthi.S1
|Related article at Pubmed, Scholar Google|
Clustering is widely used data mining model that partitions data points into a set of groups, each of which is called a cluster. With the emerging growth of computational biology and e-commerce applications, highdimensional data becomes very common. Thus, mining high-dimensional data is more needed. There are some main challenges for mining data of high dimensions, such as the curse of dimensionality and more crucial, the meaningfulness of the similarity measure in the high dimension space. The main goal of feature selection is to select a subset of useful features. The relevant features of subset are selected correctly and then the entire set gives accurate results. For this reason, feature subset selection is used in the high-dimensional data. The good subsets of features are selected by using feature selection method. The feature selection methods are mainly used in application of learning algorithms. Feature selection methods decrease the dimensionality of the data and allow learning algorithms to operate efficiently and more effectively. The proposed Relevant Clustering Algorithm is used for finding the subset of features. A Relevant clustering algorithm renders efficiency and effectiveness to find the subset of features. Relevant clustering algorithm work can be done in three steps. First step elimination of irrelevant features from the dataset; the relevant features are selected by the features having the value greater than the predefined threshold. In the second step selected relevant features are used to generate the graph, divide the features using graph theoretic method, and then clusters are formed by using Minimum Spanning Tree. In the third step find the subsets features that are more related to the target class is selected. The Relevant Clustering Algorithm is more efficient than the existing features subset selection algorithms RELIEF, FCBF, CFS, FOCUS-2 and INTERACT.