Simple and Efficient Way to Cluster Documents for Growing Database
|Dikhtiarenko Oleksandr1, Biloshchytskyi Andrii2
|Related article at Pubmed, Scholar Google|
In this article we described a new method of clustering text documents. A frequency table of words from the documents was used as a characteristic of each document. These tables were created using term frequency which were cleaned from words that do not characterize a specific document and are common to the entire set of documents or for most of it. For the identification of such words, we calculated the percentage of documents in which this word occurs (inverse document frequency). The objectives of this publication were to determine the possibility of using frequency dictionary documents as their semantic characteristics and determine clustering method using frequency tables.