Biologists and life science researchers are primarily interested in understanding the complex cellular mechanism and the interplay of these mechanisms at cellular, tissue and organ level. The underlying quest is to decipher the correlation between genotype-phenotype and a disease state. Currently, with availability of data from highthroughput technologies and multiple information databases dealing with genotypic and phenotypic information, it has become extremely challenging and time-consuming for any researcher to find relevant information and make sense from large volume of available data. This becomes critical while designing high-throughput experiments and interpreting their results, more so when one considers that genotypicphenotypic correlations alter with cell or tissue types or organism. Hence, experimental data in one cell, tissue or organism may not be directly extrapolated to other cell, tissue types or organisms.
NCBI PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) is one of the most widely used databases, and is generally considered as the primary source for biomedical literature with about 23 million citations. Usually, a typical PubMed search returns large number of articles and reading through all the retrieved articles is very timeconsuming and may not be an efficient way for accessing the required information. To ease PubMed search, some tools which mine PubMed to either rank articles , cluster the articles , or enrich the results  are already available. Also, Signaling Gateway Molecule Pages (SGMP) database provides information on the functional state of proteins and the biological processes associated with each state . However, to our knowledge, no tool exist that can help mine PubMed based on genotypic-biological process (phenotypes) correlations directly, especially in a particular cell-type, or tissue or organism. Apart from PubMed, biologist routinely use databases such as Gene Ontology (GO) , BioCyc , KEGG  and Reactome [8,9] to identify geneprocess associations. However, these databases suffer from one or both of the following limitations: (1) Not all the gene to process associations are supplemented by relevant literature evidence; and (2) Information regarding relevance of gene to process association for a particular celltype is missing [10,11].
To address these current multiple limitations and to offer a unique integrated gateway, we developed a tool ‘BioGyan’ (Bio: Biology and Gyan: Knowledge) (http://biogyan.com/). BioGyan is a unique single window search tool that mines multiple databases including PubMed, PDB, GO and Reactome simultaneously. It uniquely allows searching directly by gene-process association and helps retrieve multidimensional information which includes relevant articles, associated pathways, processes and 3-dimensional (3D) structures. BioGyan greatly automates literature and database searches and their interpretation by: (1) supporting combinatorial queries of list of genes and processes; and (2) ranking research articles as per ‘relevance’ to queried gene to process associations. The relevance scoring and ranking of articles is done on the basis of in-house scoring algorithm, which uses text-mining and heuristic rules. When tested on a set of 6889 unique articles from PubMed, BioGyan accurately identified 85.46% articles for their relevance to queried genes-process associations. Furthermore, for any search query, BioGyan retrieves data from multiple databases and stores in XML format on the system, thus allowing user to work even in offline mode.
In summary, we believe that BioGyan is a unique platform which not only offers holistic search across multiple databases but also helps substantially reduce the time required to find the relevant biological information by automating the entire process.
Citation: Kumar S, Ghadage Vk, Subramanian I, Desai A, Singh VK, et al. (2014) BioGyan: A Tool to Identify Gene Functions from Literature. J Data Mining Genomics Proteomics 5:164. doi: 10.4172/2153-0602.1000164