Association Rules Mining and Statistic Test over Multiple Datasets on TCM Drug PairsShang E*, Duan J, Fan X, Tang Y and Ye L
Jiangsu Key Laboratory for TCM Formulae Research, Nanjing University of Chinese Medicine, Nanjing, China
- *Corresponding Author:
- Shang E
Jiangsu Key Laboratory for TCM Formulae Research
Nanjing University of Chinese Medicine
Nanjing 210023, China
Tel: +86 25 85811916
E-mail: [email protected]
Received Date: December 16, 2016; Accepted Date: February 16, 2017; Published Date: March 10, 2017
Citation: Shang E, Duan J, Fan X, Tang Y, Ye L (2017) Association Rules Mining and Statistic Test Over Multiple Datasets on TCM Drug Pairs. Int J Biomed Data Min 6: 126. doi: 10.4172/2090-4924.1000126
Copyright: © 2017 Shang E, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Objective: TCM drug pair is consisted of two and only two drugs, which is the smallest drug group following special drug compatibility regulations. Formulae compatibility regulations are one of the most important problems in TCM clinical practice and modern research but still not quite resolved. TCM drug pair was a very suitable objects to discovery the complicated formulae compatibility regulations. This paper applied association rules mining to study the structural characters of TCM drug pairs find some special relationships between drugs. This study might give some help to the research on the formulae compatibility regulations. Methods: We presented an enhanced association rules mining method to find out the property associations between two drugs in TCM drug pairs. And a binominal statistic test was introduced to get the statistical significance of rules mined. The property data from the 625 drug pairs containing 347 drugs were collected and analyzed. As most association rules mining run only in single database, the new method was proposed to find rules over multiple databases (2 in this paper standing for the two drugs in TCM drug pairs) based on a first Apriori algorithm mining. Then statistic test was applied to filter out insignificant rules furthermore. Results: Apriori algorithm and the new method were applied to mine association rules on TCM drug pairs for comparison. The rules found by Apriori method showed false high support, part of which came from the property associations within one drug but not between the two drugs in TCM drug pairs. And Apriori method could not found the association of replicated property, such as liver - liver rules. The new method proposed could get the only associations between drugs even those replicated property rules. Some associations were mined with high supports and significances. Conclusion: This paper proposed an enhanced method to perform association rules mining over multiple databases. After comparison with Apriori algorithm the new method could just obtain the associations in which each item came from different database. The method was confirmed to be quite suitable on mining over multiple databases. The statistic test was also necessary to exclude false association rules.