Data Set (PC % of Total Variation for MM) # Features # of Folds1 Size of Data Set Testing Size Current Method Mean Method (MM)
Positive Negative Positive Negative TN (%) TP (%) TN (%) TP (%)
Satimage        
(39.1%)
36 One 3397 3397 1530 1530     878 (57.4) 1495 (97.7)
36 3397 3397 1530 1530 1159 (75.8) 1159 (75.8)    
Breast
(65%)
(9)2 10 239 444       436 (98.2) 229 (95.8)
(9)2 None 239 444     434 (97.7) 225 (94.1)
10 10 444 444 301 (67.8) 409 (92.1)    
Blood
(36.1%)
(3)3 None 178 570       343 (60.2) 136 (76.4)
4 10 570 570 103 (18.1) 565 (99.1)    
Yeast            
(26.2%)
8 10 1433 1433       1150 (80.3) 1164 (81.2)
8 None 1433 1433     1200 (83.7) 1125 (78.5)
8 10 1433 1433 1301 (90.8) 329 (23.0)    
Colon Cancer                 
(5.2 %)
(1788)4 None 40 22       20 (90.9) 34 (85.0)
2000 40 22     18 (81.8) 30 (75.0)
2000 10 40 40 29 (72.5) Unknown4    
Leukemia
(4.5%)
(2000)4 One 11 27 14 20     16 (80.0) 5 (35.7)
7129 11 27 14 20     14 (70.0) 7 (50.0)
7129 11 27 14 20 20 (100.0) Unknown4    
Table 1: Results for the LVQ-SMOTE and MM for six (6) data sets
1The results are based on 10-fold, 1-fold or no (represented by “None”) cross validation. “None” is applicable to MM only.
2For this data set there were only three true features, i.e., Xtr has a rank of 3.
3The number of features was reduced using the FR technique.
4It was not possible to determine TP for LVQ-SMOTE from the value of the statistics reported in Nakamura et al.
5These numbers represent the sizes of the classes for the training data sets.
6The balanced data sets for LVQ-SMOTE are determined by over-sampling except in the case of Colon Cancer.
7Oversampling is not used to determine the LVQ-SMOTE because the results are not possible.