alexa TOBFAC: the database of tobacco transcription factors.
Agri and Aquaculture

Agri and Aquaculture

Rice Research: Open Access

Author(s): Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, , Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X,

Abstract Share this page

Abstract BACKGROUND: Regulation of gene expression at the level of transcription is a major control point in many biological processes. Transcription factors (TFs) can activate and/or repress the transcriptional rate of target genes and vascular plant genomes devote approximately 7\% of their coding capacity to TFs. Global analysis of TFs has only been performed for three complete higher plant genomes - Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa) and rice (Oryza sativa). Presently, no large-scale analysis of TFs has been made from a member of the Solanaceae, one of the most important families of vascular plants. To fill this void, we have analysed tobacco (Nicotiana tabacum) TFs using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering of the tobacco genome. An analytical pipeline was developed to isolate TF sequences from the GSR data set. This involved multiple (typically 10-15) independent searches with different versions of the TF family-defining domain(s) (normally the DNA-binding domain) followed by assembly into contigs and verification. Our analysis revealed that tobacco contains a minimum of 2,513 TFs representing all of the 64 well-characterised plant TF families. The number of TFs in tobacco is higher than previously reported for Arabidopsis and rice. RESULTS: TOBFAC: the database of tobacco transcription factors, is an integrative database that provides a portal to sequence and phylogeny data for the identified TFs, together with a large quantity of other data concerning TFs in tobacco. The database contains an individual page dedicated to each of the 64 TF families. These contain background information, domain architecture via Pfam links, a list of all sequences and an assessment of the minimum number of TFs in this family in tobacco. Downloadable phylogenetic trees of the major families are provided along with detailed information on the bioinformatic pipeline that was used to find all family members. TOBFAC also contains EST data, a list of published tobacco TFs and a list of papers concerning tobacco TFs. The sequences and annotation data are stored in relational tables using a PostgrelSQL relational database management system. The data processing and analysis pipelines used the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The computationally intensive data processing and analysis pipelines were run on an Apple XServe cluster with more than 20 nodes. CONCLUSION: TOBFAC is an expandable knowledgebase of tobacco TFs with data currently available for over 2,513 TFs from 64 gene families. TOBFAC integrates available sequence information, phylogenetic analysis, and EST data with published reports on tobacco TF function. The database provides a major resource for the study of gene expression in tobacco and the Solanaceae and helps to fill a current gap in studies of TF families across the plant kingdom. TOBFAC is publicly accessible at http://compsysbio.achs.virginia.edu/tobfac/.
This article was published in BMC Bioinformatics and referenced in Rice Research: Open Access

Relevant Expert PPTs

Relevant Speaker PPTs

Recommended Conferences

Relevant Topics

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords