Special Issue Article
Transferring Knowledge Using Feature Extraction from Sparse Data for Drug Toxicity Prediction Using Utility and Drug Combinations
|Related article at Pubmed, Scholar Google|
Effectively using readily available auxiliary data to reform predictive performance on new modeling tasks is a major problem in data mining. Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). It import into the intermediate extracting system, followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow. The goal is to transfer knowledge between sources of data, especially when ground-truth information for the new modeling the task is scarce or is expensive to collect where any auxiliary sources of data becomes a available. Toward seamless knowledge transfer among tasks, it is critical for effective representation of the data but not fully explored research area for the data engineer and data miner. Drug toxicity Reaction (DTR) is one of the most important issues in the assessment of drug safety. In fact, many drug toxic reactions are not discovered during limited pre-marketing clinical trials instead, it only observed after long term post-marketing surveillance of drug usage. The detection of adverse drug reactions is an important topic of research for the medicinal industry. Recently, adverse events of large numbers and the development of data mining technology have motivated the development of statistical and data mining methods for the detection of DTRs. The proposed two algorithms, namely utility pattern growth (UP-Growth) and UP-Growth+, for mining the utility item sets with a set of effective strategies for pruning candidate item sets. The information of utility item sets is maintained in a tree-based data structure named utility pattern tree (UP-Tree) such that candidate item sets can be generated efficiently with only two scans of database. The UP-Growth+ and UP-Growth performance is compared with the state-of-the-art algorithms on different types of both synthetic and real data sets. Experimental results shows the proposed algorithms, the UP-Growth+ not only minimize the number of candidates effectively but also outperform other algorithms substantially in terms of runtime, particularly when databases contain number of long transactions.