Tatiana Tatusova

National Center for Biotechnology Information, USA

Title: Keeping Pace with Genome Sequence Data Deluge


Tatiana Tatusova has completed her PhD in Physics and Mathematics from Moscow State University, Russia. She is a Senior Scientist at the National Center for Biotechnology Information (NCBI). She possesses 20+ years’ experience as a Researcher and Senior Systems Analyst with 15 years devoted to algorithm development and applied program package evaluation for genome-related research. She has published more than 100 papers in reputed journals and has been serving as an Editorial Board Member of several repute journals.


Recent technological innovations have ignited an explosion in microbial genome sequencing that has fundamentally changed our understanding of biology of microbes and profoundly impacted public health policy. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. Genomes are organized in a hierarchical distance tree using single copy ribosomal protein marker distances for distance calculation. Protein distance measures dissimilarity between markers of the same type and the subsequent genomic distance averages over the majority of marker-distances, ignoring the outliers. More than 60 thousand genomes from public archives have been organized in a marker-distance tree resulting in more than 6000 species level clades representing 7597 taxonomic species. This computational infrastructure provides a foundation for prokaryotic gene and genome analysis allowing easy access to pre-calculated genome groups at various distance levels. One of the most challenging problems in the current data deluge is the presentation of the relevant data at an appropriate resolution for each application; eliminating data redundancy but keeping biologically interesting variations.