Mining association rules for clustered domains by separating disjoint sub-domains in Large Databases
|Kanhaiya Lal1 and N. C. Mahanti2
|Related article at Pubmed, Scholar Google|
Association rule mining algorithms focus on the discovery of valid rules by testing all the items or elements in the domain, rather testing some known elements, which makes the process inefficient as it generates a very large number of candidates. Also, most algorithms take multiple passes over the database and this results a very high I/O cost. As the database is disk resident and can't be managed completely in main memory, multiple passes over the database reduces the performance considerably for any known association rule mining algorithm . The proposed solution to this problem is to separate disjoint sub-domains, which are co-related. In this paper we concentrate over 1. The separation of sub-domains which are composed of co-related elements in the domain. 2. Database summarization. The items of a large domain correlate with each other forming small sub groups i.e. the domain is clustered in small groups . This property appears in many real world cases, e.g. Bioinformatics, e-commerce etc . The element of a sub- group can be processed for discovery of association rules as in this case the size of candidate set is comparably small to the exponential size of candidates of Apriori in pure form. Most algorithms take multiple passes over the entire database which results in inefficiency and high IO overheads. The proposed algorithm maintains a list of transactions that are related with the component in processing and hence only those transactions are processed instead of the entire database.