Identification of Potentially Relevant Citeable Articles using Association Rule MiningSelen Uguroglu1, Oznur Tastan1, Judith Klein-Seetharaman1,2* and Sanford H. Leuba3*
- *Corresponding Author:
- Dr. Sanford H. Leuba
5117 Centre Avenue, 2.26a Hillman Cancer Center
Pittsburgh, PA 15213
Email: [email protected]
Biomedical Science Tower 3
Rm. 2051, 3501 Fifth Avenue
Tel: 412 383 7325
Fax: 412 648 8998
Email: [email protected]
Received date: December 01, 2011; Accepted date: December 01, 2011; Published date: December 03, 2011
Citation: Uguroglu S, Tastan O, Klein-Seetharaman J, Leuba SH (2011) Identification of Potentially Relevant Citeable Articles using Association Rule Mining. Medchem 1:e101. doi:10.4172/2161-0444.1000e101
Copyright: © 2011 Uguroglu S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Due to the increasingly larger and more interdisciplinary nature of scientific reporting, it is becoming more difficult to identify all the potentially relevant, citeable articles in reference lists of publications such as scientific papers, reports, grant proposals and patent applications. Authors may miss and/or give inaccurate citations, potentially hindering progress in a discipline and on a personal level, and change the importance and impact of an investigator’s work. Given the emphasis on quantitative means for assessing productivity, including the number of literature citations, efforts are needed to assist authors in the identification of potentially relevant articles to cite. Prior work has analyzed citation network structure and characteristic features and correlated these with other variables, such as country of origin, journal impact factor and open access status. As a result, problems have been revealed, such as underrepresentation of third-world countries, a high incidence of self-citation, and unsystematic quotation habits in review articles. With the exception of gross plagiarism detection software, however, no attempt has been made to develop a practical solution to identifying potentially relevant, citeable articles that may have been missed. Here, we use statistical methods to help in the retrieval of relevant literature from existing publications. Specifically, we exploit the fact that publications reporting specific findings are typically quoted together as grouped-co-citations in their respective contexts. Our approach can automatically construct rules for co-citation by automatically extracting co-citation overrepresentations in manuscripts. This approach should help authors and reviewers identify potentially relevant, citeable articles.