GeneNarrator: Mining the Literaturome for Relations Among Genes
1Information Warehouse, Ohio State University Medical Center, 410 W. 10th Ave., Columbus, Ohio, 43210, USA, [email protected], (614) 293-0776, fax (614) 293-2210
2Department of Information Science, University of Arkansas at Little Rock, 2801 University Ave., Little Rock, Arkansas, 72204, USA, [email protected], (501) 683-7056, fax (501) 683-7049
3Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, [email protected]
4Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, [email protected]
5Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50011, USA, [email protected]
6Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, [email protected]
- *Corresponding Author:
- Dr. Daniel Berleant
Department of Information Science
University of Arkansas at Little Rock
2801 University Ave., Little Rock
Arkansas, 72204, USA
Tel: (501) 683-7056
Fax: (501) 683-7049
E-mail: [email protected]
Received Date: July 07, 2009; Accepted Date: August 23, 2009; Published Date: August 24, 2009
Citation: Ding J, Berleant D, Xu J, Juhlin K, et al. (2009) GeneNarrator: Mining the Literaturome for Relations Among Genes. J Proteomics Bioinform 2: 360-371. doi: 10.4172/jpb.1000096
Copyright: © 2009 Ding J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The rapid development of microarray and other genom ic technologies now enables biologists to monitor t he expression of hundreds, even thousands of genes in a single experiment. Interpreting the biological m eaning of the expression patterns still relies largely on biologist's domain knowledge, as well as on information collected from the literature and various public databases. Yet i ndividual experts’ domain knowledge is insufficient for large data sets, and collecting and analyzing this information manually from the literature and/or public databases is tedious and time-consuming. Computer-aided functional analy sis tools are therefore highly desirable.
We describe the architecture of GeneNarrator, a tex t mining system for functional analysis of microarr ay data. This system’s primary purpose is to test the feasib ility of a more general system architecture based o n a two-stage clustering strategy that is explained in detail. Gi ven a list of genes, GeneNarrator collects abstract s about them from PubMed, then clusters the abstracts into funct ional topics in a first clustering stage. In the s econd clustering stage, the genes are clustered into groups based on similarities in their distributions of occurrence across topics. This novel two-stage architecture, the primary cont ribution of this project, has benefits not easily p rovided by one- stage clustering.