DECISION TREE LEARNING WITH ERROR CORRECTED INTERVAL VALUES OF NUMERICAL ATTRIBUTES IN TRAINING DATA SETS
Classification is the most important technique in data mining. A Decision tree is the most important classification technique in machine learning and data mining. Data measurement errors are common in any data collection process, particularly when the training datasets contain numerical attributes. Values of numerical attributes contain data measurement errors in many training data sets. We extend certain or traditional or classical decision tree building algorithms to handle training data sets with numerical attributes containing measurement errors. We have discovered that the classification accuracy of a certain or classical or traditional decision tree classifier can be much improved if the data measurement errors in the values of numerical (or continuous) attributes in the training data sets are properly controlled (corrected or handled) appropriately. The present study proposes a new algorithm for decision tree classifier construction. This new algorithm is named as Interval Decision Tree (IDT) classifier construction. IDT classifiers are more accurate and efficient than certain or traditional decision tree classifiers. An interval is constructed for each value of each attribute in the training data set and within the interval the best error corrected value is approximated and then entropy is calculated. Extensive experiments have been conducted which show that the resulting IDT classifiers are more accurate than certain or traditional or classical decision tree classifiers.