Back

Abdulmohsen Algarni

Abdulmohsen Algarni

King Khalid University, Saudi Arabia

Title: Selecting Training Documents for Better Learning

Biography

Abdulmohsen Algarni received the PhD degrees in the Faculty of Information Technology at Queensland University of Technology, Brisbane, Australia in 2011. He is currently an assistant professor in the Department of Computer Science, king Khalid University. His research interests include web intelligence, data mining,Text intelligence, information retrieval and information systems.

Abstract

In general, there are two types of feedback documents: positive feedback documents and negative feedback documents. Term-­‐based approaches can extract many features in text documents, but most include noise. It is clear that all feedback documents contain some noise knowledge that affects the quality of the extracted features. The amount of noise is different from document to another. Therefore, reducing the noise data in the training documents would help to reduce noise in the extracted features. Moreover, we believe that removing some training documents (documents that contain more noise data than useful data) can help to improve the effectiveness of a classifier. Based on that observation, we found that short documents are more important than long documents. Testing that idea, we found that using the advantages of short training documents to improve the quality of extracted features can give a promising result. Moreover, we found that not all training documents are useful for training the classifier.