alexa A New Web Server for the Rapid Identification of Microorganisms | Open Access Journals
ISSN: 1948-5948
Journal of Microbial & Biochemical Technology
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

A New Web Server for the Rapid Identification of Microorganisms

Olivier Croce1*, François Chevenet2 and Richard Christen1

1University of Nice Sophia-Antipolis and CNRS UMR 6543. Institute of Signaling, Developmental Biology and Cancer. Virtual Biology Lab. Parc Valrose. F06108 Nice, France

2Génétique et Evolution des Maladies Infectieuses, IRD / CNRS UMR 2724 - IRD, 911 avenue Agropolis, B.P. 64501, F34394 Montpellier Cedex 5, France

*Corresponding Author:
Dr. Olivier Croce
University of Nice Sophia-Antipolis,
Institute of Signaling
Developmental Biology and Cancer. Virtual Biology Lab,
Faculté des Sciences, Parc Valrose,
06108 Nice, France,
Tel: +33 492 07 6947,

Received date: June 08, 2010; Accepted date: June 29, 2010; Published date: June 29, 2010

Citation: Croce O, Chevenet F, Christen R (2010) A New Web Server for the Rapid Identification of Microorganisms. J Microbial Biochem Technol 2:084-088. doi:10.4172/1948-5948.1000029

Copyright: © 2010 Croce O, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Microbial & Biochemical Technology


Identification of microorganisms; RNA sequences; BLAST; Web server; Phylogenetic tree


Identifications of microbial organisms are now usually done by comparing their SSU rRNA gene sequences to those of known organisms (Stackebrandt and Goebel, 1994). The usual application is to study the composition of the microbial community within a given environmental or clinical sample. SSU rRNA gene sequences are thus obtained (McCabe et al., 1999), either after cloning the PCR products and random sequencing a set of clones (Amann et al., 1995; Hugenholtz and Pace, 1996) or by pyrosequencing (Jonasson et al., 2002; Roesch et al., 2007; Huse et al., 2008; Christen, 2008). The questions are to find out if these sequences are related to other sequences already found in environmental samples, and/or related to well known cultured microorganisms and eventually a type strain (Albuquerque et al., 2009).

The general process starts with a similarity search of the new sequence (s) against the public databases usually using BLAST (Altschul et al., 1990), then align (s) to the most similar sequences and finally do a classification or a phylogenetic analysis in order to proceed to identifications. Online BLAST servers are a common choice to quickly identify related sequences, but the databases they rely on are now filled with partial sequences from environmental samples that are often poorly annotated. For example, the NCBI BLAST server ( and the EBI server ( blast2/) allow the exclusion of environmental sequences, but these databases still contain many inaccurate descriptions. A similar database restricted to 16S rRNA sequences exists at DDBJ ( As a result, any BLAST query returns many (sometimes only) poorly described sequences (Lin et al., 2008; Clarridge, 2004). Thus, a tedious manual analysis of the BLAST results is necessary to identify closely related well described species. Some other tools such as Eztaxon (Chun et al., 2007) are more specific. Eztaxon is associated with a hand-curated database of 16S rRNA gene sequences for bacterial type strains. It allows users to perform similarity-based searches, multiple sequences alignment and various phylogenetic analyses. The database of Eztaxon is extremely useful for characterization of a new species before its publication, but being restricted to bacterial type strains, it does not allow to identify well deposited cultured species that have not been validated as type strains. Also, it does not allow for identifications of protozoa and archaea. Finally, it does not provide taxonomic informations required to construct a fully annotated phylogenetic tree.

Our server allows BLAST searches on cultured species, with restrictions on sequence’s length as well as for example using only two sequences per species. Moreover, we made an online tool named “Blast2Tree” (PHP / Javascript / MySQL). The main goal of Blast2Tree is to derive a fully annotated phylogenetic tree in a very simple way from the BLAST hits or the results obtained from Eztaxon. Annotations are provided from a local database and through a pipeline that uses ScripTree (Chevenet et al., 2010). ScripTree is a tool for scripting phylogenetic graphics. It allows the management of multiple trees and usual kinds of annotations. It can be used either as a stand-alone package or included in a pipeline and linked to a HTTP server. Also, our online tool Blast2Tree is able to download every sequence from a clade as well as the related annotations to be used by software such as TreeDyn (Chevenet et al., 2006).

Materials and Methods

Finding similar rRNA sequences using a specific BLAST 

The database server described here contains SSU rRNA sequences extracted from the EMBL database. Each entry is parsed with a Python script to check if the species definitions line reveals or not that a proper latin species name has been used. Therefore, descriptions such as ‘uncultured’, ‘env’, ‘Genus sp.’, ‘genomosp.’ are automatically excluded. If necessary SSU rRNA subsequences are extracted from longer genomic fragment. The NCBI taxonomic description is also fully extracted, checked and associated to each sequence. Finally, we use the “List of Prokaryotic names with Standing in Nomenclature” at (Euzeby, 2008) to identify all the bacterial type strains. The data is stored into a local relational database (MySQL). Several databases for BLAST are formatted with the previous data, for Bacteria, Archaea and Protozoa divisions. For each division we propose sub-databases containing sequences of minimal lengths (500, 800, 1000, 1200 nt), since longer sequences are more appropriated to appreciate deeper branching. The databases often contain many sequences of the same species. Therefore, we built sub-databases including only 2 sequences per species, allowing, in each case, to produce a workable phylogenetic tree with proper outgroups. These two sequences are selected as being among the longest of each species.

Two different web interfaces are proposed: the NCBI BLAST default interface (Ye et al., 2006; McGinnis and Madden, 2004) and the ViroBLAST interface (Deng et al., 2007) which extends the utility of BLAST to query against multiple sequence databases and user sequence datasets. It also offers a friendly output to easily parse and navigate BLAST results. Due to the restricted amount of sequences in the database, a BLAST performs only in a few seconds (we have limited the max BLAST hits number at 500).

Recovering information and display using blast2tree 

Recovering an informative taxonomy of a sequence is often very difficult using conventional BLAST servers. The definition line of the fasta format is too long and then often truncated by phylogeny programs such as Phylip (Felsenstein, 1989) or Clustal (Larkin et al., 2007). Moreover, this line contains no information concerning the taxonomic assignment. The online tool Blast2Tree intends to solve these problems. It helps to perform an easy retrieval of sequences and associated information. A simple copy/paste of any number of lines from the BLAST hits allows to get the corresponding SSU rRNA sequences (under two different formats) as well as the associated taxonomy. This can be done several times from different parts of a same BLAST result or from different queries. The final list can be hand-edited and the choice can be visualized as a table with complete taxonomic assignments displayed. A phylogenetic tree can be drawn on screen and exported before or after data retrieval. This interface also allows to copy/paste data from Eztaxon.

Downloads of the following files are possible:

1. The exact SSU rRNA sequences (FASTA format), even when such sequences are embedded into a larger sequence (such as complete genomes). It is possible either to download sequences as selected from the BLAST result, or to download a clade of sequences (such as every sequence in a genus or a given species).

2. The annotations (tlf format). These annotations comprise the complete taxonomy description, the name of the strain if available and the notification of type strain. The tlf format is a simple ASCII file that can be used by softwares such as TreeDyn (https://www. or ScripTree (

3. The full taxonomy (HTML format) is sorted under similar terms. The taxonomical terms can be retrieved from the pasted BLAST hits or from keywords entered into the form.

4. A phylogenetic tree (Newick format). Blast2Tree is able to automatically build a phylogenetic tree based on MAFFT software (Katoh et al., 2009). MAFFT is a program for multiple sequence alignment. It offers various multiple alignment strategies and we integrated the fastest one (although the less accurate) which is a simple progressive method like Clustal. The detailed algorithms are described by Katoh et al. (Katoh et al., 2002). The automatic tree building is also very fast and takes less than 40 seconds for 500 leafs, 7 seconds for 100 leafs, 3 seconds for 10 sequences. The user can also upload his own tree.

5. An image of a phylogenetic tree (jpeg, ps or tiff formats). Blast2Tree integrates the scripting language “ScripTree” to generate highquality images of phylogenetic trees. A simple click is enough to display a downloadable image. This image includes the sequence to identify, the set of sequences selected from the BLAST results and the related annotations.

Results and Discussion

We have compared the results of the common BLAST servers (NBCI BLAST, EBI WuBlast2, DDBJ BLAST) and our web server. We took a given sequence as example, which is described as uncultured bacteria in the related NCBI entry (Table 1). The objective was to determine which known species are most related to the input sequence. For each server and similar options, the results are very different:

1. Most of the 100 similar sequences returned by the NCBI BLAST server are annotated as “Uncultured...”. Recently, the NCBI BLAST added a feature to exclude the environmental sample sequences. If this option is enabled, the results become better, but are not suitable because of the amount of fuzzy descriptions (e.g. many descriptions with “Enterobacter sp.”).

2. WU-Blast2 on EBI server with the database “embl release” gives results similar to NCBI. Improved results can be obtained by selecting the database “embl standard prokaryotes”, the most frequent sequences retrieved being “Enterobacter sp” blended with various species. Although sequences from the environment are not included in this database, many sequences are still poorly described.

3. BLAST on DDBJ allows to choose a database of 16S rRNA sequences of prokaryotes, which behaves very similar to the “embl standard prokaryotes” database at EBI.

4. Concerning our server, we selected the database containing bacterial 16 rRNA and only 2 sequences for each species. The first results clearly show that the closest species are from the genera Kluyvera and Enterobacter (Table 2).


Table 1: Sequence [EMBL: GU084214] described as an uncultured bacteria. This sequence is taken as an example of an “unknown sequence” to highlight the differences between the results of the usual online BLASTs (EBI, NCBI, DDBJ) and our BLAST server.

Sequences producing significant alignments: Score E-value
AM933754|Kluyvera cryocrescens 1405 0.0
AM778415|Enterobacter cloacae 1398 0.0
AJ251468|Enterobacter aerogenes 1398 0.0
AM933757|Kluyvera cryocrescens 1394 0.0
AJ854062.16S-RRNA|Serratia ureilytica 1392 0.0
AJ853891|Enterobacter ludwigii 1390 0.0
Z96078|Enterobacter cancerogenus 1382 0.0
DQ202394|Enterobacter cloacae 1382 0.0
AY825036|Enterobacter aerogenes 1382 0.0
AF025373|Citrobacter werkmanii 1382 0.0
AF025369|Citrobacter murliniae 1382 0.0
AF025368|Citrobacter braakii 1382 0.0
FJ662869|Serratia nematodiphila 1372 0.0
FJ462700|Leclercia adecarboxylata 1372 0.0
X93216|Raoultella planticola 1370 0.0
AF181574|Raoultella planticola 1370 0.0
EF175735|Enterobacter ludwigii 1370 0.0
DQ229104|Citrobacter tnt5 1368 0.0
FJ853424|Serratia marcescens 1366 0.0
AB364958|Raoultella ornithinolytica 1366 0.0
AJ251467|Raoultella ornithinolytica 1366 0.0
Z96077|Enterobacter nimipressuralis 1366 0.0
DQ294288|Citrobacter freundii 1366 0.0
DQ229103|Citrobacter tnt4 1366 0.0
EU221358|Enterobacter asburiae 1362 0.0
AF493976|Cedecea davisae 1362 0.0
EU914257|Serratia nematodiphila 1360 0.0
AM062693|Enterobacter amnigenus 1360 0.0
EF688006|Pantoea punctata 1358 0.0
AJ627202|Kluyvera ascorbata 1358 0.0
AJ627201|Kluyvera ascorbata 1358 0.0
AY567708|Candidatus Cuticobacterium kirbyi 1356 0.0

Table 2: Example of BLAST output from The BLASTed sequence is [EMBL: GU084214] (the target sequence to identify) and the database used is “2 Seq/Bacteria”. Only the fi rst 33 most similar sequences are displayed.

This example highlights that none of the common public BLAST servers are able to easily retrieve either a set of sequences with an informative and complete taxonomy or subsequences embedded in a larger genomic fragment. Only the use of a dedicated database returns meaningful BLAST results that are required for an easy identification of microbial rRNA sequences.

Figure 1 shows the phylogenetic tree built with Blast2Tree. This tree includes the sequences retrieved from our server and the sequence to identify. The default view displays the tree with the name of each species as leaves, the strain (with “T” as exponent, if this strain is a type strain) and the accession number. In this figure, the sequence to identify (named “New.sequence”) is clearly related to the genus Kluyvera and the figure is almost ready for publication. The user can also refine the analysis by adding for example all sequences of Kluyvera.


Figure 1: Phylogenetic tree built by Blast2Tree. It contains the sequence to identify (in this example we use [EMBL: GU084214] which is named “New.sequence” here) and the closed sequences from our server.

Besides these common BLAST servers, some projects based on specific rRNA databases included tools which intend to help in determining the sequences of procaryotes.

1. The Greengenes web application (DeSantis et al., 2006) gives access to a database of 16S rRNA aligned sequences (https:// It allows to export, slice, browse and compare the sequences, and to search probes throught many graphical tools. The data and tools presented by Greengenes aim to assist the researcher in choosing phylogenetically specific probes, interpreting microarray results, and aligning/annotating novel sequences. This tools also to compare a given sequence by alignment against the Greengenes database using a specific BLAST or a tool called “Simrank” based on shared 7-mers. They display an interactive table of BLAST or Simrank hits arranged by taxonomy.

2. The Ribosomal Database Project (RDP) (Cole et al., 2009) provides ribosome related databases and online tools for data analysis to the scientific community ( As of May 2010 (release 10.20), RDP maintains 1,237,963 aligned and annotated 16S rRNA sequences of Archaea and Bacteria, including sequences from cultured organisms and sequences obtained from environmental samples. Among the available services, RDB provides a classifier that assigns 16S rRNA gene sequences to the new phylogenetically consistent higher-order bacterial taxonomy using a naïve Bayesian classifier (Wang et al., 2007). This service allows to select a set of sequences with some relevant criteria such as sequences length or sequences from isolates only. Then, it is possible to build a crude online tree which is downloadable into the Newick format.

3. Silva is an online resource that provides comprehensive, quality checked and regularly updated databases of aligned 16S, 18S, 23S, 28S ribosomal RNA sequences for Bacteria, Archaea and Eukarya (Pruesse et al., 2007) ( The project “The All-Species Living Tree” is associated with the Silva databases. This project aims to reconstruct a single 16S rRNA tree harboring all sequenced type strains of the hitherto classified species of Archaea and Bacteria (Yarza et al., 2008). Also, Silva proposes to do phylogenetic classification linked to the ARB software (Ludwig et al., 2004) ( An online form helps to align the sequence to identify with the closest sequences included into the database of Silva. A file is created (fasta+metadata or ARB file format) and can be used with the ARB software for further phylogenetic studies and for the visualization of the tree. The ARB software package has to be installed on the system and is not available for all operating systems (i.e. Windows). Note that an independent BLAST server exists which uses a part of Silva data ( SepsiTest has similarities with our Blast, but the database includes only sequences of type strains. However, we also added the possibility to paste the output BLAST of SepstiTest to Blast2Tree.

The final phylogenetic tree may not include sequences that are not clearly named at the species level. Depending on the database provided by these described tools or depending of the ability of the user, it could be very tedious to avoid sequences annotated with some flooding terms such as ‘Candidatus’, ‘unclassified’ or ‘environmental samples’. Moreover, the user can spend much time to obtain representative annotations in order to have a “ready to print” tree for publication. Finally, it is not easy to obtain a quick overview of a relevant set of closed sequences related to an unknown sequence to identify. Our web server is a tool intended to fill this gap.


Quality of analysis and display 

The MAFFT algorithm used to build the trees is very fast but gives an average quality. No bootstrap confidence estimates or other statistical analysis are displayed. To perform a more robust tree we recommend to get sequences from Blast2tree and to build the tree by using classical phylogenetic methods. This procedure is longer but should provide better results. It is then possible to re-import the tree (in Newick format) into Blast2Tree. Concerning the final tree image, Blast2tree performs automatic procedures with the scripting facilities of “ScripTree” ( Some options allow to select the annotations to add onto the image. Also, it is possible to export the generated files from Blast2Tree to ScripTree software in order to obtain a more customized tree output. Another way to modify the tree consists in using the stand-alone software TreeDyn which is an advanced tool for the graphical management of trees.

Advantages of the system 

The BLAST on main public servers are not appropriate for an accurate identification of prokaryotes, because it is not easy to find taxonomically meaningful relatives to a query sequence. The main advantages using a specific server are (i) to get clear and informative results (every hit will be from a well described species); (ii) a fast BLAST computing due to a low amount of data (the databases of sequences are much smaller than public databases); (iii) the possibility to export the true SSU rRNA sequences and taxonomic descriptions which facilitates the phylogenetic analysis and helps the user to integrate new sequences into a meaningful phylogenetic tree.

Existing tools based on specific rRNA databases offer many advantages, but they are not fully suitable for a fast and easy identification of an unknown prokaryote sequence that have to be included into an annotated tree. However, the high quality of the databases such as those provided by Silva or Greengenes could be used in the future in addition to the database used by our server.

Finally, the online tool Blast2Tree provides a fast and useful way to see how a new sequence is positioned within a tree of relevant closed sequences.

Availability and requirements

The server is freely accessible over the Internet at An online help is also available on, it contains tutorials and examples of use. This server for microbial identification has been running for about one year and is used within the framework of a European project. Server and database are available for local installation, interested individuals are invited to contact the authors for more information.


This work was supported by funds from the European Commission for the HEALTHY WATER project (FOOD-CT-2006-036306) to R. Christen. The authors are solely responsible for the content of this publication. It does not represent the opinion of the European Commission. The European Commission is not responsible for any use that might be made of data appearing therein.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11487
  • [From(publication date):
    May-2010 - May 26, 2017]
  • Breakdown by view type
  • HTML page views : 7760
  • PDF downloads :3727

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

1-702-714-7001Extn: 9037

Business & Management Journals


1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

1-702-714-7001 Extn: 9042

© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version