alexa
Reach Us +44-1522-440391
Two-Stage Feature Selection Algorithm Based on Supervised Classification Approach for Automated Epilepsy Diagnosis | OMICS International
ISSN: 2155-9538
Journal of Bioengineering & Biomedical Science

Like us on:

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Two-Stage Feature Selection Algorithm Based on Supervised Classification Approach for Automated Epilepsy Diagnosis

Mechmeche S1*, Salah RB2 and Ellouze N1

1National Engineering School of Tunis, University of El Manar, UR SITI, Tunis, Tunisia

2PrinceSattam Bin AbdulazizUniversity, Biomedical Technology Department, Riyadh, Saudi Arabia

*Corresponding Author:
Mechmeche S
National Engineering School of Tunis
University of El Manar
UR SITI, Tunis, Tunisia
Tel: 21671872253
E-mail: [email protected]

Received Date: March 31, 2016; Accepted Date: April 13, 2016; Published Date: April 21, 2016

Citation: Mechmeche S, Salah RB, Ellouze N (2016) Two-Stage Feature Selection Algorithm Based on Supervised Classification Approach for Automated Epilepsy Diagnosis. J Bioengineer & Biomedical Sci 6:183. doi:10.4172/2155-9538.1000183

Copyright: © 2016 Mechmeche S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Bioengineering & Biomedical Science

Abstract

Epileptic diagnosis is generally achieved by visual scanning of Interictal Epileptiform Discharges (IEDs) using EEG recordings. The main objective of this research is to select a smallest relevant feature subset from the original dataset in order to reduce the diagnosis time and increase classification accuracy by removing irrelevant and redundant features. For this purpose we suggest a two-stage feature selection algorithm based on supervised classification approach adopting successively a wrapper feature selection and a wrapper feature subset selection method. Matlab simulation results illustrate that through comparing the two classifiers, the high-dimensionality is reduced at only one relevant feature that showed classification metrics of 100%. The epilepsy diagnosis is successfully tested in the discriminant Fisher-space with the single-best relevant feature.

Keywords

Cross-validation; Classification metrics; EEG; Feature selection; IED’s; LDA; Mahalanobis distance classifier; QDA; Supervised classification

Introduction

Epileptic is a neurological disorder marked by sudden recurrent episodes of sensory disturbance, loss of conscience, convulsions, associated with abnormal electrical activity in the brain. The confirmation of the existence of an epileptic diseases is based on visual detection of isolated Interictal Epilepti form Discharges (IEDs) (spikes or spike-waves complex), using EEG (Electroencephalogram) signal recordings in certain brain areas , for example, the confirmation of the epileptic-absence type is based on presence of a spike-waves rhythmic at 3 Hz [1-3]. This technique is inaccurate, fastidious and too time consuming. The aim of our research is to establish an automated diagnosis of epileptic disease employing a supervisedclassification approach (Figure 1).

bioengineering-biomedical-science-diagnosis

Figure 1: Block diagram of automatic diagnosis process.

To create a training set, we need to build a knowledge database composed of normal EEG sample and epileptic EEG sample. Feature extraction is an essential pre-processing step to pattern recognition and machine learning problems. To build the training set, the signal pattern may be described by three field analysis: Time field [4-11], frequency field [11-13], and time-frequency field [4,7,11,14-16]. In this article, EEG-signal pattern is described in high dimensionality in the three previous fields. To reduce the dimensionality at a SRFS (Smallest Relevant Feature Subset), we have proposed two-stage feature selection algorithm using wrapper-based method in supervised classification [17]: The first stage uses the IFE (individual feature evaluation) method and the second stage uses the SBS (sequential Backward Selection) method.

A Mahalanobis Distance-based Classifier (MDC) is suggested to classify the unknown EEG signal into “Normal” or “Epileptic” classes. For an optimal visualization of both of them, the samples are projected in the linear Fisher space [18,19] using Fisher linear Discriminant Analysis (FDA) that consists of seeking the optimal directions that are efficient for discrimination [20,21].

Methods

Knowledge database

The population selected is composed of 20 labeled single-EEG signals (derived from the Neurology department of University Hospital of Sousse-Tunisia), sampled at a frequency F = 200Hz, segmented at 1 second epoch, and filtered from artifacts, divided into two groups: 10 normal signals for the first group and 10 epileptic signals for the second group. These signals will be modeled by a set of features to form the training set that will be used in the feature selection process.

Feature extraction

In feature extraction process, we have adopted the statistical analysis approach from each single-EEG signal. Feature vector is composed of 48 features that are extracted from time, frequency and time-frequency fields (Table 1):

Analysis fields Methods Number of feature
  Min-Max 2
Time Hjort parameters 3
  LPC 4
  DFTC 3
Frequency CC 4
  DHTC 8
  WC 16
Time-Frequency    
STFTC 8
 
    Total:   48

Table 1: Analysis domains for feature extraction.

LPC: Linear Predictive Coefficients

DFTC: Discret Fourier Transformation Coefficients

CC: Cepstral Coefficients

DHTC: Discret Hilbert Transformation Coefficients

WC: Wavelet Coefficients

STFTC: Short Time Fourier Transformation Coefficients

Training dataset

The training dataset is represented as (nxd) data pattern, it is defined as:

(1)

1 ≤i≤n , 1 ≤k ≤d

n: Total number of samples; d:dataset dimensionality

xi,k: General term of training dataset

The signals are manually labeled and ordered into two groups, normal and epileptic, by an expert neurologist.

The “normal” group is defined by the following dataset:

(2)

: Samples number of first group

The “Epileptic” group is defined by the following dataset:

(3)

: Samples number of second group

Feature selection algorithm

For the classification difficulty, wrapper feature selection consists of selecting the features that maximize the classifier performance and capable of discriminating samples that belong to different classes. In this research, the classifier performance is evaluated from the confusion matrix that derives the important metrics, such as, Accuracy, Sensitivity and Specificity. The feature selection algorithm is composed of the two following stages (Figure 2):

-IFE (Individual Feature Evaluation) stage,

-SBS (Sequential Backward Selection) stage.

bioengineering-biomedical-science-feature

Figure 2: Block diagram of the relevant feature selection process.

Individual feature evaluation stage: In the first algorithm stage, a wrapper feature selection method is used by applying the Individual Feature Evaluation technique. The choice of the features is accorded to the highest metrics that have been selected. Two classifiers have been evaluated for this process: LDC (Linear Discriminant Classifier) and QDC (Quadratic Discriminant Classifier) that provide the two following Relevant Feature Subsets (RFS):

FRFS− LDC : Relevant Feature Subset corresponding to the higher ranked-LDC metrics

FRFS− QDC : Relevant Feature Subset corresponding to the higher ranked-QDC metrics

In the output of the first stage, the algorithm compare between the higher ranked LDC metrics and the higher ranked QDC metrics in order to select the Smallest-Best Relevant Feature Subset FSBREFS.

Sequential backward selection stage: In the second algorithm stage, to reduce the dimensionality of FSBREFS , a wrapper feature subset selection method is used applying Sequential Backward Selection method, consists of removing sequentially the features of the FSBREFS set until the removal of further features increase the classification metrics. The feature subsets according to the highest metrics have been selected to provide the two Smallest Best Relevant Feature Subsets (SBRFS):

FSBRFS− LDC: Smallest-Best Relevant Feature Subset using LDC classifier

FSBRFS −QDC: Smallest-Best Relevant Feature Subset using QDC classifier

The output of the second stage provides the smallest relevant feature subset FSRFS that is finally obtained by selecting the best smallest size between FSBRFS − LDC and FSBRFS −QDC .

Mahalanobis distance classifier (MDC): Mahalanobis Distance Classifier computes the distance d(xunk, mk )between unknown EEG feature vector and the two classes “Normal” and “Epileptic” as follow:

(4)

Xunk: Unknown feature vector;

mk: Mean of the kth class;

T: Covariance matrix of the learning dataset XTR

Results and Discussion

First-stage experimental results

In the first part of individual feature evaluation (IFE) stage a 5-fold cross-validation procedure is used in LDC-classifier in order to estimate the metrics (Accuracy, Sensitivity and Specificity) of each feature (Figure 3). The algorithm chooses only the features having the higher metrics (Table 2 and Figure 3).

Top 4 feature indices 16 19 20 21
Accuracy 100% 100% 100% 100%
Sensitivity 100% 100% 100% 100%
Specificity 100% 100% 100% 100%

Table 2: Top 4 feature indices using LDC-classifier.

bioengineering-biomedical-science-Metrics

Figure 3: Metrics of individual features using LDC classifier.

The feature subset deduced from the first stage using LDC-classifier will therefore be defined as:

In the second part of individual feature evaluation (IFE) stage, a 5-fold cross-validation procedure is applied in QDC-classifier in order to estimate the metrics of each feature (Figure 4) and the algorithm selects only the features having the higher metrics (Table 3).

Feature indices 16 19 20 21 28 42
Accuracy 100% 100% 100% 100% 100% 100%
Sensitivity 100% 100% 100% 100% 100% 100%
Specificity 100% 100% 100% 100% 100% 100%

Table 3: Top 6 feature indices using QDC-classifier.

bioengineering-biomedical-science-classifier

Figure 4: Metrics of individual features using QDC classifier.

The feature subset deduced from the first stage using QDC-classifier will therefore be defined as:

At the end of the first algorithm stage the smallest best relevant feature subset (SBRFS) have been selected by comparing between the metrics and the size of both FRFS − LDC and FRFS − QDC subsets, the SBRFS will therefore be defined as:

Second-stage experimental results

To reduce the dimensionality of FSBRFS we have used the SBS (Sequential Backward Selection) search method that starts with all features and removes a single feature at each step until the desired dimension with the highest metrics is reached. For each step a 5-fold cross validation is applied for the feature subset selection process. In the first part of the second-algorithm stage, the experimental results using LDC-classifier illustrates that the SBRFS (Smallest Best Relevant Feature Subset) is composed of the 16th feature:

In the second part of the second-algorithm stage, the experimental results using QDC-classifier illustrates that the SBRFS (Smallest Best Relevant Feature Subset) is also composed of the 16th feature:

The output of the second stage selects the smallest relevant feature subset comparing both the metrics and thesize of FSBRFS − LDC and FSBRFS − QDC,

The final SRFS (Smallest Relevant Feature Subset) is so deduced as: FSRFS= {f16}

The final experimental result of the two-stage algorithm feature selection is resumed in the following figure (Figure 5).

bioengineering-biomedical-science-experimental

Figure 5: Experimental results of relevant feature selection process.

The combination of these two techniques (IFE and SBS) leads to reduce the dimensionality of the original feature set at only one best relevant feature that will be used in epilepsydiagnosis.

The diagnostic result was successfully tested (Figure 6) on EEG signals containing spikes and spike-waves, this figure gives an example of automatic affectation (using a Mahalanobis distance classifier) of an EEG signal that containing two spike-waves (Epileptic). This diagnostic is made using only one feature (16th feature) that has been selected from the dataset. Index 16 is accorded to the maximum of the DHTC magnitude of EEG signal that isdefined as: max (|DHT(S(n)|).

bioengineering-biomedical-science-automated

Figure 6: Automated diagnosis of the single-EEG signal.

Literature Review

Table 4 show a comparative study on IED’s classification metrics in recent years, regardless of the number of features used: Our feature selection algorithm improves the classification metrics for both LDC and QDC classifier using the single-best relevant feature selected (Table 4).

Classifier Accuracy Sensitivity Specificity
AdaBoost 93,9% 95,5% 92,4%
NN 99% __ __
LDC __ 82% 90%
QDC __ 87% 92%
LDC 100% 100% 100%
QDC 100% 100% 100%

Table 4: Literature review of some classification metrics.

Conclusion

A two-stage feature selection algorithm has been proposed in this article in order to remove the redundancy and to reduce the dimensionality of the dataset at the relevant feature subset. The mRMR (Minimum-Redundancy Maximum-Relevance) approach was successfully confirmed and tested in the first algorithm stage using IFE method, and the dimensionality of the relevant feature subset selected was successfully reduced in the second stage using SBS method at only one single best relevant feature that may be reduce considerably the processing time of the diagnostic. The performance of the results can be improved by using other robust dataset features and other classifier types for validation, such as the ANN (Artificial Neural Network), SVM (Support Vector Machine) and GA (Genetic Algorithm) methods. Using the automated IED’s diagnosis the doctor will no longer need to scan visually EEGsignal leads.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 9823
  • [From(publication date):
    May-2016 - Aug 24, 2019]
  • Breakdown by view type
  • HTML page views : 9693
  • PDF downloads : 130
Top