Received date: September 28, 2013; Accepted date: November 18, 2013; Published date: December 26, 2013
Citation: Xu B, Fu Y, Shi G, Yin X, Wang Z, et al. (2014) Improving Classification by Feature Discretization and Optimization for fNIRS-based BCI. J Biomim Biomater Tissue Eng 19:119. doi:10.4172/1662-100X.1000119
Copyright: © 2013 Xu B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author andsource are credited.
Visit for more related articles at Journal of Biomimetics Biomaterials and Tissue Engineering
In this paper, we present a signal discretization and feature selection method to improve classification accuracy for fNIRS based brain computer interface (BCI) system, which can classifiy right hand clench force motor imagery and clench speed motor imagery at an accuracy of 69%-81% through 5 fold cross validation in 6 subjects. Difference between oxyhemoglobin and deoxyhemoglobin (we abbreviate this difference as HbD) is proposed as a new feature type and shows outstanding performance in some subjects. Linear kernal support vector machine (SVM) classification between clench force motor imagery and clench speed motor imagery using four concentration feature types (oxyhemoglobin, deoxyhemoglobin, totalhemoglobin, and HbD) is implemented. Our results demonstrate that feature discretization using Chi2 algorighm and feature optimization using ‘MIFS’ (Mutual Information Feature Selection) criterion can improve the classification accuracy by more than 35%. Except total hemoglobin, all the other three feature types can be used as the optimum feature for different subjects. The results of this paper can also be used in online BCI applications.
Brain computer interface; BCI; fNIRS; Feature discretization; Feature selection
Brain Computer interface (BCI) is a technology to provide device control using brain signals directely without the participation of the peripheral nervous system and muscles. It was first funded by the Defense Advanced esearch Projects Agency (DARPA) in 1970s  and has got great developments since 1990s [2-4]. Many signals can be used to build a BCI system, including electroencephalography (EEG), electrocorticography (ECoG), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI) and functional near infrared spectroscopy (fNIRS) . At present, EEG is the most wildly used non invasive signal for its high temporal resolution, portability and low cost. fNIRS is another promising modality for BCI purpose. This technology can provide the metabolism information during the cognitive process, and can be used with EEG simultaneously to achieve higher classification accuracy for BCI systems .
fNIRS is used in BCI research since 2004 by Coyle , and BCI reseachers have focused on this modality increasingly in the following years [8-10]. fNIRS can obtain concentration changes of deoxyhemoglobin (Hb) and oxyhemoglobin (HbO) calculated by the modified Beer-Lamber Law from the detected otpical signals . Both concentration signals and optical signals can be used as features in BCI research. Our previous research shows that concentration signals can get better results than optical signals . Early researches often use the amplitude of concentration signals as features directly [7,13]. Cui et al. compares the classification accuracy of different feature spaces, including signal amplitude, signal history, history gradient and spatial pattern. Their research indicate that combining multi channel signal history can get the best results . The slope of the concentration signal can also be used as the feature for classification and get desirable results . Classification accuracy can be further improved by using both ‘filter’ and ‘wrapper’ feature selection methods . Herff et al. take the mean hemodynamic concentration difference between task period and rest period as features, use Mutual Information based Best Individual Feature (MIBIF) to decrease feature space, getting a classification accuracy of 61% between three speaking modes using support vector machines (SVM) . Power et al. take the slope of Hb and HbO concentration as features, and use sequential feed-forward feature selection algorithm to choose the optimal feature subset with the Fisher criterion, getting a classification accruracy of 71.1% between the Mental Arithmetic (MA) task and the No Control (NC) state using Linear Discriminant Analysis (LDA) .
In this paper, we apply motor imagery paradigm in the experiment. A new signal type of Difference between HbO and Hb (HbD) is proposed as a feature. The continous feature is discretized first, then the Mutual Information Based Feature Selection (MIFS) is used to get the optimal feature subset, and the Support Vector Machine (SVM) with linear kernal is used to classify the task of clench speed motor imagery from the task of clench force motor imagery on the same hand. The comparison of classification accuracy between signal types of Hb, HbO, HbT(total hemoglobin) and HbD shows that the best results relies on individuals. For some subjects HbD can get comparable results with HbO, and for some subjects HbD can get better results than all the other three feature types.
In our experiment, the motor imagery of clench force and clench speed of right hand is used as the paradigm. We measure each subject's Maximum clench Force (MF) using hand dynamometers, and told them to do real clench force practice at the level of 20%, 50% and 80% of MF. The clench speed task practice request the subject to clench their right hand at the frequency of 0.5Hz, 1Hz and 2Hz according to a metronome. We divide both task into three sub-types because we want to average the effect of different task parameters. During the experiment, the subjects are told to recall the feeling of doing real movement task. We adopt this motor parameters imagery paradigm based on two considerations: First, motor imagery is more natural for device control than paradigms using mental arithmetic and singing. Second, this paradigm can provide more direct control commands than paradigms that do not distinguish the hand motor imagery types, and such discrimination is especially important to improve the information transfer rate for BCI applications if both hands are used.
Six healthy subjects (three males and three females, averaged age 26.8 years) participate in the experiment. All the subjects are right handed. Three of them are trained three times before the experiment (subject number 1, 2, and 6), while the other three subjects are not trained (subject number 3, 4, and 5). All of them give written informed consent to participate in the experiment, and the experiment is approved by the Ethnical Committee of the Shenyang Institute of Automation (SIA), Chinese Academy of Sciences (CAS).
A single trial of the experiment consists of four parts: the base line period, the cue period, the task period and the rest period, as shown in Figure 1. The baseline period is used to produce a baseline for task period. The cue period is used to inform the motor imagery type to implememt in the following task period that lasts 10 seconds. The rest peroid is used to make the hemodynamic level return to a normal level. The duration of a single trial in fNIRS experiment is much longer than EEG experiment due to the intrinsic time lag of hemodynamic signals . Each subject take part in 3 sessions, and each session contains 60 trials (30 trials of clench force motor imagery and 30 trials of clench speed motor imagery).
We acquire the hemodynamic signals using ETG-4000 produced by the Hitachi Co., Ltd. This device is a continuous wave equipment, and it has two types of optode. One type is the illuminator and the other type is the detector, as is shown in Figure 2. The middle point between those two optodes is the meaurement channel. The illuminator optode can emit near infrared light at wavelength of 695 nm and 830 nm simultaneously. The detector receives the output light that has been modulated by the oxyhemoglobin and deoxyhemoglobin concentration changes, and transmits it to an Avalanche Photo Diode (APD) that converts optical changes to voltage changes. The voltage changes is digitalized by the analog to digital converter (ADC) and then used to calculate the concentraion changes by the modified Beer-Lamber Law as shown in Eq. (1), where is the extinction coefficient of Hb/HbO under the corresponding wavelength, is the light intensity, and DPF is the ratio of optical photons’ actual path length and the illuminator detector distance . To cover the sensorimotor area at both hemisphere, we choose two 3×3 optode layout helmets, which contain 24 channels altogether. The channel 7 and channel 21 are adjusted to directly above the C4 and C3 position in the standard 10-20 international system used for EEG recordings. The sampling frequency is 10Hz, which is sufficient enough for fNIRS based BCI system.
In our experiment, the EEG and fNIRS data are acquired simultaneously because we want to research the multi modality feature for BCI application. However, in this paper, we only focus on the fNIRS signal, and the combined features will be studied in the future.
The original concentration data is linearly detrened first to remove low frequency drift due to optode movements, and then filtered by a Chebyshev II low pass IIR filter at a cutoff frequency of 0.1 Hz to remove high frequency artifacts such as heart rate and breathing rate. This is reasonable because our previous research shows that the frequency component between 0.02 Hz and 0.08 Hz plays important role for data classification . The time course average of the four different preprocessed fNIRS signals is shown in Figure 3. The data is then downsampled to 1 Hz to decrease feature space. The downsampling process loss no information on the low passed data according to the Shannon's sampling theorem.
Discretization is the process of partitioning continous variables into categories. It is very useful for many machine learning (ML) algorithms to get better classification results . Based on whether the class information is taken into account, discretization methods can classified into supervised methods and unsupervised methods. Generally, the supervised methods outperforms the unsupervised methods . The equal width and equal frequency discretizaions are two representative algorithms of the unsupervised methods, but their performance is relatively poor. In this paper, we choose the Chi2 algorithm to discretize the continous concentration sigal. In Chi2 algorithm, the significance on relationship between the values of a feature and a class type is tested using the χ2 statistic as shown in Eq. (2), where k is the number of classes, Aij is the number of patterns in the ith interval of jth class, is the number of patterns in the ith interval, is the number of patterns in the jth class, is the total number of patterns , and Eij is the expected frequency of .
Currently, there are three types of concentration signal used in fNIRS based BCI application: Hb, HbO and HbT. In this paper, we proposed a new concentration signal HbD, which refers to the difference between HbO and Hb concentration. HbO and Hb are typically strongly negatively correlated with each other under normal circumstances , and we propose that HbD may amplify the amplitude of concentration changes due to cognitive tasks.
The orignal feature space consists of 24 channels of the downsampled amplitude data within the task start point and 2s after the task end point, resulting in a feature set of 312 values. The 2s time lag is set to take into consideration of the intrinsic time lag of hemodynamic response. The four signal types (Hb, HbO, HbT, and HbD) are researched respectively.
The ‘wrapper’ methods and the ‘filter’ methods are two brodly grouped approaches in feature selection techniques . The former method takes the training accuracy of a paticular classifier as the criterion when searching for the best feature subsets.This method can get better results, while the subsets are overly fit to the classifier used, and the computational cost is considerably expensive. To make the optimized subset suitable for more classification methods, we apply the ‘filter’ method in our research.
Eight types of information based criteria, including ‘MIFS’, ‘mRMR’, ‘CMIM’, ‘JMI’, ‘DISR’, ‘CIFE’, ‘ICAP’, and ‘CondRed’ , are tested using one subject’s data to choose the best criterion for our research. The ‘MIFS’ (Mutual Information Feature Selection) criterion is chooed in the following research because it’s performance is best in all the tested criteria. The details feature selection method using ‘MIFS’ criterion can be found in .
Support vector machine (SVM) is a widely used efficient method in machine learning research, especially for classification with small data set [26,27]. Unlike other classification method that use all the sample data to train the model, SVM only selects the samples (the support vectors) that can produce the largest margin between two classes. Also, SVM maps the input vectors into a high dimensional feature space using a kernal function, and the pproblems that can not linearly separable in low dimensional space becomes possible in high dimensional feature space. SVM can get better performance by solving the optimization problem in Equation (3) .
Where xi is the training vevtors, ϕ() is the kernal function, w is the weight, b is the bias, ξ is the soft margin and C is a constant. The linear kernal is the soft margin and C is a constant. The linear kernal is used in our reseach because it requires lesser parameters to optimize, and can achieve much higher classification results compared with other kernals when their parameters are not been optimized. The 5 fold cross validation is used to get the results more credible.
To demonstrate the effectiveness of feature discretization and feature selection, we first compared classification accuracy of four different feature sets. The first feature set consists of the continuous data from the channel above C3. The second feature set consists of the discrete data from C3 channel. The third feature set consists of the discrete data from all the 24 channels, and the fourth feature set is the optimum subset chosen from the third feature set using the ‘MIFS’ cirterion. The comparison of these four different feature sets are shown in Figure 4 (left). The error rate of continuous C3 feature set decreases by 9%, 5%, 2%, and 6% when the data is discretized for signal type of Hb, HbO, HbT and HbD respectively, which implies that feature discretization can improve the classification accuracy. The error rate of the feature set containing discrete data from all the 24 channels reaches nearly 60%, and can be decreased to 25% when the optimum subset is chosen. The relationship between error rate and the feature number chosen by the ‘MIFS’ feature selection algorithm is shown in Figure 4 (right). The optimum feature number of most of the subjects is between 20 to 50.
The classification error rate using the optimized feature subset of four types of concentraion signal is shown in Table 1 and Figure 5 (left). Similar to EEG, fNIRS signal’s characteristic varies between subjects. For subject 1, the mean error rate of Hb feature type for three sessons shows the lowest error rate than other three feature types, but the difference is not significant. Significant difference is observed olny in 3 subjects: for subject 2, HbD feature type has a lower error rate than Hb and HbT feature type at a siginificant level of 0.01 and 0.05; for subject 3, Hb feature has a lower error rate than HbO feature at a significant level of 0.05; for subject 6, HbO feature type has a lower error rate than HbT feature type at a significant level of 0.05. HbT feature type shows no significant lower error rate than other three feature type in all the subjects.
|Signal type||----------||Subject 1||Subject 2||Subject 3||Subject 4||Subject 5||Subject 6|
Table 1: The classification error rate using the optimized feature subset of four types of concentraion signal.
The trained subjects has lower mean error rate than no trained subjects, as is shown in Figure 5 (right). Though the difference level for feature type of Hb and HbT is not significant, the error rate of trained subjects is lower than the no trained subjects at a significant level of 0.01 for both feature type of HbO and HbD.
In this paper, we present a signal discretization and feature selection method to improve classification accuracy for fNIRS based BCI system, which can classifiy right hand clench force motor imagery and clench speed motor imagery at an accuracy of 69%-81% through 5 fold cross validation. HbD (Difference between HbO and Hb) is proposed as a new feature type and shows outstanding performance in some subjects. Although feature discretization has been used in many machine learning areas, we notice that it has not been used in BCI applications before. Our research shows that discretization using Chi2 algorithm can improve classification accuracy by 2%-9% for different feature types using signal from the channel above C3. Further more, the optimum feature subset chosen by ‘MIFS’ criterion can improve the classification accuracy by 35% compared with feature set containing all the discrete channels.
Due to the specificity of fNIRS signals, it is hard to conclude which concentration signal type is best for all the subjects. Our research shows that the best signal type depends on subject. In our study, Hb, HbO and HbD can get optimum results for different subjects, but HbT does not perform best in all the subjects. This may be explained by the strongly negatively correlated relationship between Hb and HbO. Our study also shows that subjects can improve their performance through training using feature type of HbO and HbD at a significant level of 0.01, which means that sufficient training before experiment is necessary to improve the classification accuracy.
Our research shows that classification of clench force motor imagery and clench speed motor imagey of the same hand is possible at a resonable accuracy, which can be used to increase the command number for BCI control. Also, the feature discretizaiton and feature optimization method of this paper can be used in online BCI applications.