Comparative Analysis of Advanced Algorithms for Feature Selection
|Radhika Senapathi1, Kanakeswari D2, Ravi Bhushan Yadlapalli3
|Related article at Pubmed, Scholar Google|
Feature Selection is the preprocessing process of identifying the subset of data from large dimension data. To identifying the required data, using some Feature Selection algorithms. Like ReliefF, Parzen-ReliefF algorithms, it attempts to directly maximize the classification accuracy and naturally reflects the Bayes error in the objective. Proposed algorithmic framework selects a subset of features by minimizing the Bayes error rate estimated by a nonparametric estimator. A set of existing algorithms as well as new ones can be derived naturally from this framework. As an example, we show that the Relief algorithm greedily attempts to minimize the Bayes error estimated by the k-Nearest-Neighbor (kNN) method. This new interpretation insightfully reveals the secret behind the family of margin-based feature selection algorithms and also offers a principled way to establish new alternatives for performance improvement. In particular, by exploiting the proposed framework, we establish the Parzen-Relief (PRelief) algorithm based on Parzen window estimator. The RELIEF algorithm is a popular approach for feature weight estimation. Many extensions of the RELIEF algorithm are developed. Because of the randomicity and the uncertainty of the instances used for calculating the feature weight vector in the RELEIF algorithm, the results will fluctuate with the instances, which lead to poor evaluation accuracy. To solve this problem, a feature selection algorithm parzen+reliefF based algorithm is proposed. It takes both the mean and the variance of the discrimination among instances and weights into account as the criterion of feature weight estimation, which makes the result more stable and accurate. And the main idea is how to estimate the performance of the both algorithms, for this we are using two algorithms for calculating the quality of the generated out puts. They are Leader and sub-leader algorithm and Davies– Bouldin index (DBI) algorithm. Both are clustering algorithms. Which are used for knowing the cluster quality and cluster similarity.