16S-23S Intergenic Spacer (ITS) Region Sequence Analysis: Applicability and Usefulness in Identifying Genera and Species Resembling Non-Hemolytic Streptococci

Xiaohui Chen Nielsen1*, Derya Carkaci1, Rimtas Dargis1, Lise Hannecke1, Ulrik Stenz Justesen2, Michael Kemp2, Monja Hammer3 and Jens Jørgen Christensen1 1Department of Clinical Microbiology, Slagelse Hospital, Slagelse, Denmark 2Department of Clinical Microbiology, Odense University Hospital, Odense, Denmark 3Department of Microbiology Diagnostics and Virology, Statens Serum Institute, Copenhagen, Denmark


Introduction
Catalase-negative, gram-positive cocci not belonging to streptococci or enterococci represent a group of bacteria which, over the last decades, has become increasingly well characterized. The number of taxonomic entities has been steadily growing, thereby complicating their identification. They resemble the more well-known genera of streptococci and enterococci and consequently may be mistaken as one of those. Therefore, these species often give rise to identification problems and subsequently delayed reporting on final identification to the clinicians [1]. Strains most often recognized belong to the genera Gemella, Granulicatella, Abiotrophia and Aerococcus. Leuconostoc, Globicatella, Facklamia, Dolosicoccus, and Dolosigranulum are also isolated from blood cultures, though less often [1].
These bacteria are usually part of the normal oral, gastrointestinal and genitourinary flora of humans, and may cause a variety of opportunistic infections. Gemella, Granulicatella and Abiotrophia species are recognized etiologies especially of infective endocarditis and brain abscesses [1,2]. Aerococcus urinae are known to cause urinary tract infections, septicaemia and infective endocarditis; the latter with a considerable mortality rate [3][4][5]. Leuconostoc has been isolated from blood cultures, particularly in immune compromised patients [6,7].
As they all may cause serious infections in humans, a rapid and secure identification method is desirable. Precise species identification of blood culture isolates helps identify the primary site of infection and can be a guide to antibiotic susceptibility, thereby having impact on the therapeutic strategy and outcome. Identification of microorganisms from patient samples has in the past mainly been based on phenotypic characteristics exhibited by the putative pathogens, which is time consuming and can produce ambiguous results. In the clinical laboratories, misidentification of Aerococcus, Gemella, Granulicatella, and Abiotrophia species as Non-Hemolytic Streptococci (NHS) often occurs. The accuracy of commercial systems (VITEK, API 32 and ATB) commonly used for clinical identification was evaluated by Woo et al. [8] and their results showed frequent misidentification of strains from the genera Gemella as Streptococcus, Abiotrophia, or Granulicatella.
For routine identification of clinical bacterial strains, Matrix-Assisted Laser Desorption/Ionization Time of Flight mass spectrometry (MALDI-ToF MS) seems promising for strains belonging to the group of bacteria examined in this study [9]. Sequence based identification methods, especially 16S rRNA gene analysis and detection/sequencing of selected genes, have revolutionized bacteriology in the last 2-3 decades [10]. 16S rRNA gene analysis has been shown to provide relatively good separation of the taxons examined in this study [11]. However, both methodologies are challenged by the closely related species in the Mitis group of NHS [12][13][14].
A variety of other gene targets including the manganese-dependent superoxide dismutase gene (sodA) [15,16], the heat shock protein  groESL [17,18], ribosomal protein rpoB [19], and recombination and repair protein recN [20] have also been used for species identification of the genus Streptococcus with promising results for most of the species, except for the Mitis group. However, other members of the catalasenegative, gram-positive cocci not belonging to Streptococcus and Enterococcus have only rarely been investigated [17,19,21].
The ribosomal 16S-23S Intergenic Spacer (ITS) region has been suggested as a good candidate for bacterial identification and strain typing [22,23] In a previous study by our group, the feasibility of using the ITS sequence to identify clinical strains of NHS was established [24] ITS sequence analysis was suggested as a first line identification tool for the NHS group. However, a housekeeping gene, glucose-dehydrogenase (gdh), would also have to be analysed in order to safely differentiate between S. mitis, S. oralis and S. pneumoniae. Early and effective antimicrobial treatment can result in negative cultures from important clinical specimens, e.g., heart valve tissue or brain abscess material. This stresses the need for the possibility of performing non-culture-based molecular biology examinations. Sequence based methods, which can elegantly separate relevant taxons, will be natural candidates for this purpose. ITS sequence analyses have also proven to be useful in species identification of enterococcal strains [23]. For that reason, we found it of interest to expand ITS sequencing to other members of the catalasenegative, gram-positive cocci that resemble NHS morphologically. The purpose of this study was to investigate the possible role of ITS sequence analysis as a common key for the identification of clinical strains of NHS, enterococcal, and the NHS-like taxons examined in this study.

Type strains
Twenty five type strains belonging to 11 genera were received from the Culture Collection, University of Göteborg, Sweden (CCUG) ( Table 1). Strains were grown and maintained on 5% Danish horse blood agar plates and stored at -80 °C in 10% glycerol broth (Statens Serum Institut, Copenhagen, Denmark). These strains underwent PCR, sequencing and subsequent editing to determine the ITS sequence (see the paragraph about "Sequencing of ITS region and sequence editing" for details). Sequences from four other type strains with published ITS sequences were also included in the study (Table 1).

Clinical strains
A total of 103 clinical strains of gram-positive, catalase-negative cocci were included in this study belonging to the following genera: Table 2). Among these; 75 strains were from the Reference Laboratory at Statens Serum Institut (SSI), Copenhagen, Denmark. These strains were sent from local departments of clinical microbiology in Denmark for identification, from March 2000 to June 2010. Conventional phenotypic analysis, partial 16S rRNA (a 526 bp stretch) gene sequence analysis [12,25] and MALDI-ToF MS analysis (Bruker Biotyper, Germany) [9] were performed to characterize these strains. Furthermore, 28 clinical strains were purchased from CCUG and MALDI-ToF MS was performed to confirm identification of these strains. These included A.

DNA extraction
The genomic DNA of all strains was extracted by heating one to three colonies of each strain for 10 min at 95 °C in 100 µl PCR-grade water.

PCR amplification of ITS region
To amplify the ITS region, we used primers Strep16S-1471F (5'-GTG GGA TAG ATG ATT GGG GTG AAG T-3') and 6R-IGS (5'-GGG TTC CCC CAT TCG GAH AT-3') as previous described [24]. The PCR was performed with 50 µl reaction volumes consisting of 25 µl of Brilliant II SYBR Green master mix (Agilent Technologies) and 0.5 µM (final concentration) of each primer and 2 µl of the DNA template. The PCR program was: 94°C for 10 minutes followed by 35 cycles of 94°C for 30 seconds, 61°C for 30 seconds, 72°C for 30 seconds. PCR was performed on Mx3005P (Stratagene, Agilent Technologies). The PCR products were analyzed both by real-time amplification and melting curves in the program MxPro (Stratagene, Agilent Technologies) and by capillary electrophoresis system QIAxcel (Qiagen).

Sequencing of ITS region and sequence editing
Amplicons were sequenced at Eurofins MWG Operon (Germany) and GATC Biotech (Germany). The primers Strep16S-1471F and 6R-IGS were used as sequencing primers. Results from sequencing were analyzed with CLC Main Workbench v6. The forward and reverse sequence reads were assembled to obtain the consensus sequence of the ITS regions. The regions belonging to 16S and 23S rRNA genes were removed to obtain full-length sequences of the ITS region with CTAAGG at the 5-prime and TTAAGT/C at the 3-prime ends of the sequences of the ITS region.

Blast
The edited sequences of the ITS regions from both the type and clinical strains were compared to sequences deposited in the NCBI database by using the BLAST search engine (http://blast.ncbi.nlm.nih. gov/Blast.cgi) and by taking into consideration % identity (number of identical bases between the query and the subject sequence in the database), the Maximum score (indication of alignment concordance), and E values (indication of statistical significance of a given alignment) for the best and the second best taxon matches. The Maximum score difference between the best and second best taxon match at a minimum of 10 was used as the criteria for species differentiation.

Phylogenetic analysis
Intraspecies distances were calculated by performing alignment of ITS regions achieved from the clinical strains and the corresponding type strain with ClustalW (http://www.ebi.ac.uk/ Tools/msa/clustalw2). The alignment was then used to compute the pairwise distance calculations with Kimura-2-parameter model in the Molecular Evolutionary Genetic Analysis (MEGA) 5.0 program package (http://www.megasoftware.net). Interspecies distances were calculated in the same way with ITS sequences achieved from type strains belonging to the same genus. Phylogenetic analysis on the basis of the sequences of the ITS region for each genus including the type and clinical strains were performed by Neighbour-Joining method (MEGA 5.0). The robustness of the phylogenetic tree was determined with 1000 bootstrap replicates.

Amplification and sequence analysis of the ITS region for type strains and determination of editing sites
One predominant amplicon was achieved from all 25 type strains. The sizes of ITS PCR products varied between 184 and 377 bp. Sequences with the ITS region and part of the 16S and 23S region were generated. Published ITS sequences of four other type strains were downloaded. Alignment of these 25 ITS sequences and earlier published streptococcal ITS sequences [24], revealed the editing sites of 5'-and 3'-ends to be CTAAGG and TTAAGT/C, respectively. The edited ITS sequences of the 25 type strains were submitted to GenBank and accession numbers are listed in Table 1.

Amplification and sequence analysis of the ITS region for the clinical strains
For most of the clinical strains, only one amplicon was detected. For two A. urinae strains, more than one amplicon was detected according to the dissociation curve analysis, although in all strains only one product was detected by the QIAxcel capillary electrophoresis system. Figure 1 presents an example with one such A. urinae strain. The forward and reverse sequences were assembled and a consensus sequence was achieved for all clinical strains (CLC Main Workbench). All the sequences were edited as described in Materials and Methods. The sizes of the edited ITS sequences varied between 197-375 bp ( Table  1).

ITS region sequences BLAST results of the type strains
ITS sequences from all the type strains were sent to BLAST to achieve the knowledge of how the ITS will perform as a candidate target for species identification. All the type strains had achieved their own strains as the first taxon match (data not shown). The Maximum score distance from the first to the second best taxon match varied between 21 and 358 (Table 1).

Species identification based on ITS sequence BLAST
The species distribution and BLAST results of the clinical strains are shown in (Table 2). For all 37 Aerococcus strains, of which A. urinae (n=27) dominated in number, best taxon matches were in agreement with the presumed species identifications. The large difference in Maximum score values between the best and second best taxon matches of 216-308 made identifications convincing. Similarly, the best taxon matches obtained for the 36 clinical strains belonging to the genera Abiotrophia, Facklamia, Granulicatella and Leuconostoc were in agreement with the presumed species identifications. The difference in Maximum score values between the best and second best taxon matches were large (129-318). For the genus Gemella, all clinical strains (n=30) obtained best taxon matches that were in agreement with the presumed species identifications. However, two strains that were designated as G. haemolysans only had a difference of six in Maximum score values between the best and the next best taxon match (G. morbillorum) ( Table  2). In no cases did misidentification occur.

Phylogenetic analysis
Phylogenetic analysis based on sequences of the ITS regions of strains belonging to the genera Aerococcus (Figure 2a

Interspecies and intraspecies distances
Pairwise comparisons of the ITS sequences were performed for type strains belonging to the same genus to calculate the interspecies distances. Intraspecies distances were calculated among the strains belonging to the same species. The interspecies distances among the type strains of Aerococcus, Facklamia, Granulicatella, and Leuconostoc were in the range of 0.067-0.266. The intraspecies distances for the strains belonging to these four genera were all less than 0.047, for some species it was zero ( Table 3). The interspecies distance between the type strains of G. haemolysans and G. morbillorum was only 0.025. The interspecies distances among the type strains of G. bergeri, G. cuniculi, G. palaticanis, and G. sanguinis were in the range of 0.038 to 0.113. The intraspecies distances were less than 0.056 for the G. bergeri strains, less than 0.016 for the G. haemolysans strains, less than 0.036 for the G. morbillorum strains, and less than 0.026 for the G. sanguinis strains (Table 3).

Discussion
We describe a method for species-level identification by ITS sequence analysis for the strains belonging to 11 genera that are catalasenegative, gram-positive cocci and not streptoocci nor enterococci.
In our study, ITS sequences were determined for a total of 25 type strains; and ITS sequences were downloaded for further four type strains. The re-BLAST results for ITS region sequences of these 29 type strains showed a large distance from the first to the second best taxon match. This indicates that the ITS region has great interspecies divergence and is suitable as a species identification target (Table 1).
103 clinical strains from 17 species of the genera Aerococcus, Abiotrophia, Facklamia, Granulicatella, Gemella, and Leuconostoc were examined and identified based on ITS sequence analysis. All clinical strains, irrespective of obtained Maximum score value, were allocated to the expected species (Table 2). Large interspecies divergence, high intraspecies homology, and the distinct clustering demonstrated by the phylogentic analysis supported that the ITS region is a good target for species identification of strains belonging to the genera Aerococcus, Abiotrophia, Facklamia, Granulicatella, and Leuconostoc.
In the genus Gemella, 28 out of the 30 clinical strains were allocated to the expected species by BLAST. Two Gemella strains achieved G. haemolysans as first taxon match, though the Maximum score distance to the second taxon match (G. morbillorum) was only six, which was too short to make an unambiguous conclusion. The interspecies divergence between G. haemolysans and G. morbillorum, based on ITS sequences was as small as 0.025. This contributes to the difficulty in differentiating strains of G. morbillorum and G. haemolysans. An earlier publication from our group applied MALDI-ToF MS for species identification on 23 strains from the same collection. It resulted in unreliable identifications for 14 of 23 Gemella strains, the number of which was reduced considerably after in-house database extension [9]. However, more species of G. morbillorum and G. haemolysans had short distances to second best taxon match illustrating the close relationship for some of the species. Species identification for Gemella strains based on rpoB and groESL genes showed similar difficulties [12,18,19] However, the phylogentic analysis based on ITS sequences showed two distinct clusters of G. morbillorum and G. haemolysans strains, and the two specific strains in question were allocated to the G. haemolysans cluster (Figure 2d). Therefore, in this case, ITS sequence analysis with the combination of BLAST and phylogenetic analysis was sufficient to identify these two G. haemolysans strains to the species level.
Using the heterogeneity of the 16S-23S ITS region has become more common over the past years for identification and typing purposes of bacteria [26]. It is known that bacterial genomes can contain several rrn operons, e.g. Enterococcus has three copies of this operon [27] and Streptococcus pneumoniae has four operons [28]. The ITS sequence tends to present a mosaic organization of blocks, highly conserved at intra-and interspecies level, within the genus Streptococcus [28]. But in other genera, both length and copy number can vary from strain to strain [26]. Gurtler et al. established typing of Clostridium difficile strains by PCR amplification of variable length of ITS regions, which is still the standard for C. difficile PCR ribotyping [29]. Therefore,  a challenge for species identification based on ITS sequence analysis could be that some strains generate more than one amplicon, though only one may be dominant. It is important to understand the variability of ITS sequences in a given genome to gain insights into bacterial taxonomy. Tung et al. [23] investigated the applicability of ITS for identification of Abiotrophia, Enterococcus, Granulicatella and Streptococcus. The correct species identification rate by ITS sequence analysis for the 217 clinical strains belonging to these four genera was 98.2%. Except for the genus Streptococcus, all the other genera produced more than one amplicon. This made it necessary to perform agarose gel separation and purification of the amplicons before sequencing. Even after gel purification, two strains resulted in mixed sequences. Cloning was necessary for achieving unambiguous sequences. In our study, by optimizing the PCR conditions using SYBR Green master mix and changing the annealing temperature, it was possible to generate one predominant PCR amplicon for all the analyzed strains, and DNA sequences were obtained for these amplicons.
ITS sequence analysis seems promising for species identification of the strains that are catalase-negative, gram-positive, and resembles NHS. Of great importance, no species misidentifications were suggested by ITS sequence analysis among these strains. In a former study, the same method was applied for the genus Streptococcus and was shown to be sufficient for species identification of most streptococcal species [24]. For S. mitis, S. oralis, and S. pneumonia strains, it was necessary to examine an additional housekeeping gene, gdh. ITS sequence analysis has also been shown to be a useful tool for species identification of the genus Enterococcus.
Several other gene targets have been applied for species level identification for this group of bacteria. Drancourt et al. showed that the rpoB gene was useful in achieving species identification of the genus Streptococcus and related genera. However, a 99.4% similarity between G. haemolysans and G. morbillorum in their partial rpoB gene sequence was observed in the same study [19] besides, there were only 1-3 strains for each included species belonging to Abiotrophia, Granulicatella and Gemella [19]. Hung et al. developed a multiplex PCR attempting to differentiate strains of these three genera by the different size of groESL PCR products. This method could only achieve identification to the genus level. High intraspecies variation was observed in the groESL gene sequences among G. haemolysans isolates [17]. None of these two studies included Aerococcus, Facklamia, and Leuconostoc. Bosshard et al. concluded that 16S rRNA gene sequence analysis was an effective mean for identification of 171 clinical isolates of catalase-negative, gram-positive cocci. However, only a limited number of species were included from the genus Aerococcus, Gemella, Enterococcus, and Streptococcus [21]. Dynamic taxonomy development has happened in this area in the past ten years. Therefore, studies that include more species are necessary to support this conclusion.
In conclusion, ITS sequence analysis might be considered as a common identification key for bacteria that are catalase-negative, gram-positive cocci. Potentially, ITS sequence analysis can also be useful in detecting bacteria directly from clinical specimens, which are culture negative, as with direct 16S rRNA gene analysis on specimens.