Received date: January 5, 2016 Accepted date: January 27, 2016 Published date: January 31, 2016
Citation:Shen Z, Zhang N, Mustapha A, Lin M, Xu D, et al. (2016) Identification of Host-Specific Genetic Markers within 16S rDNA Intervening Sequences of 73 Genera of Fecal Bacteria. J Data Mining Genomics Proteomics 7:186. doi:10.4172/2153-0602.1000186
Copyright: © 2016 Zhenyu S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Data Mining in Genomics & Proteomics
Ribosomal intervening sequences (IVSs) were recently proposed as genetic markers for microbial source tracking (MST). This study comprehensively investigated host specificities of IVSs within the 16S rDNA of 73 genera of dominant fecal bacteria using the approaches of bioinformatics and next generation sequencing (NGS). Thirteen types of IVSs were identified in silico to be associated with particular host species; they were found within bacteria of the genera Anaerovibrio, Bacteroides, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Based on the DNA sequences of the thirteen types of IVSs, polymerase chain reaction (PCR) assays were developed. PCR amplifications using fecal DNA samples of target and non-target host species demonstrated that eight out of the 13 IVSs were highly associated with human, chicken/turkey, beef cattle/pig, or horse/pig/human feces. Based on the IVS polymorphisms, NGS was applied to search for single-host-associated IVSs from those linked to multiple host species. Consequently, a new type of IVS specific to beef cattle was found and confirmed by PCR amplification using cattle and non-cattle fecal samples. The results suggest that some IVSs may be used as the genetic markers for MST and that NGS may be useful in identifying novel host-specific genetic markers.
Microbial source tracking; Human feces; Beef cattle feces; Host-specific PCR
Identification of the source (human vs. animal) of fecal pollution is critical in the assessment and mitigation of fecal pollution . In the past two decades, many microbial source tracking (MST) methods have been developed to determine the sources of fecal pollution in water, and these procedures have been comprehensively reviewed [2-5]. Recently, host-specific polymerase chain reaction (PCR) methods are the most popular MST approaches for determining the sources of fecal pollution by detecting the genetic markers associated with the fecal bacteria unique to a host animal species .
Among the host-specific genetic markers, 16S rDNA sequences of fecal bacteria are the most common. As an outcome of studies on the physiology, ecology, and biodiversity of intestinal flora, vast amounts of 16S rDNA have been sequenced for fecal microorganisms associated with human and animal intestinal contents or feces. These sequences are available in several public databases, such as the Ribosomal Database Project (RDP)  and Genbank . Although the 16S rDNA sequences are highly conserved, their variable regions can provide discrimination capability to the subspecies level  at which host-specific microbes (subsequently genetic markers) can be found. In fact, many host-specific genetic markers have been identified in the 16S rDNA of fecal bacteria, such as Bacteroides [9-11], Faecalibacterium , Bifidobacterium , Brevibacterium , and Catellicoccus . However, cross-reaction is not unusual for 16S rDNA genetic markers; this is because of the high degree of sequence similarity among 16S rDNA molecules [16-20].
In contrast, the genes involved in fecal microorganism-host interactions, such as the genes required for bacterial colonization in host intestinal tracts or microbe-host symbiosis, are believed to be more host-specific and are ideal candidates as MST genetic markers . Unfortunately, this type of gene is rarely identified in most fecal microorganisms . Initially, we used the ribosomal intervening sequences (IVSs) of fecal bacteria as genetic markers for MST, taking advantage of the vast 16S rDNA data and the host specificity of IVSs . Ribosomal IVSs are believed to be a result of the coevolution of the host and the bacteria, and IVSs appear to cause a high level of variation among both 16S rDNA and 23S rDNA in adapting to different environments [24,25]. An IVS within the 16S rDNA sequence of the genus Faecalibacterium were identified to be specific to poultry (chicken and turkey) feces and found to be distributed widely across a huge geographic area, indicating that IVSs might be alternative host-specific genetic markers, if not superior to the conventional 16S rDNA sequences for MST .
The study herein seeks to comprehensively examine, in silico, by PCR, and with NGS approaches, the host specificity of IVSs within the 16S rDNA sequences of 73 genera of fecal bacteria dominant in the intestinal tracts of humans and important agricultural animals. The study also aimed to provide information essential to better understand the potential of IVSs as host-specific genetic markers. In this study, term “composite sample” referred to a mixture of an equal amount of DNA extracted from the feces of at least 20 individual animals (or persons), “target host” an animal species (including human) whose feces was the target of a genetic marker.
Fecal sample collection and total DNA extraction
All animal fecal samples were collected as certainly as possible from separate animal individuals in animal farms in Missouri, USA, including samples from chickens, turkeys, beef cattle, dairy cattle, goats, sheep, horses, and pigs. Human fecal samples were also collected in Missouri from the sewage at waste treatment plant inflow. All fecal and sewage samples were kept on ice during transportation and stored at −70°C before DNA extraction. The total fecal DNA was extracted from the samples using the PowerSoil® DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA) and stored at −20°C before use. The DNA was adjusted to a concentration of 5 ng/μl for PCR amplification.
In silico analyses of host-specific IVSs within the 16S rDNA of the fecal bacteria
Seventy-three genera of bacteria (Table 1) were selected for this study as they are generally considered to be dominant in human and animal intestinal tracts [26-28]. The aligned 16S rDNA sequences of each genus were downloaded from the RDP database and viewed by the visualization software, BioEdit . IVSs of each genus were manually identified and compiled in an Excel spreadsheet. The IVSs were then sorted by length with the Excel add-in tool, DigDB (http://www.digdb.com/). All IVSs shorter than 70 bp were filtered off, as they might be too short for Tagman qPCR, which would be developed in future study. The IVSs thus obtained were used to do a BLAST search against the National Center for Biotechnology Information’s (NCBI) GenBank database  to retrieve any DNA sequences containing these IVSs. The information about the host and the host location of the IVS-containing DNA sequences was also subsequently retrieved from GenBank. Any IVS that appeared to be associated with a particular bacterial genus and with no more than three host species was verified by PCR assay for its host specificity (section Verification of the host specificity of the IVSs by PCR assays). In total, 13 IVSs (110-140 bp) were thus identified.
|Genus||No. of 16S rDNA||Genus||No. of 16S rDNA|
|TM7 genera incertaesedis||3145|
Table 1: Bacterial phyla and genera along with the number of 16S rDNA sequences used in this study.
Verification of the host specificity of the IVSs by PCR assays
For each of the 13 IVSs selected through the in silico analysis described in section In silico analyses of host-specific IVSs within the 16S rDNA of the fecal bacteria, a PCR primer set was designed using the NCBI Primer-BLAST program . The specificity of each primer pair was examined with the ProbeMatch program  against the Ribosomal Database Project (RDP) database, allowing a maximum of three nucleotide mismatches in each primer. The optimal annealing temperature (TA) of each primer pair was determined through gradient PCR reactions using a gradient TA from 45 to 65°C with the composite fecal DNA samples of the target host species (human or animal). If an expected-size PCR amplicon was generated, the PCR primer set was further tested by PCR amplification against all composite fecal DNA samples from both target and non-target host sources. Each composite DNA sample was a mixture of an equal amount of DNA extracted from the feces of at least 20 individual animals. The PCR reactions were conducted with 40 cycles of the following thermocycle after initial denaturation at 95°C for 2 min: denaturation at 95°C for 30 sec, annealing at the optimal TA (Table 2) for 30 sec, and elongation at 72°C for 20 sec. The final elongation was at 72°C for 6 min. The 25 μl of PCR reaction cocktail consisted of 10 ng of composite fecal DNA, 25 pmol of each PCR primer (Integrated DNA Technologies, Coralville, IA), and 10 μl of 2.5 × Taq 5 Prime MasterMix (5 Prime Inc., Gaithersburg, MD). PCR products were separated by electrophoresis in 2.0% agarose gels. The PCR amplicons with expected molecular sizes were assumed to be IVS amplicons and were then gel-purified using the GelElute Extraction Kit (5 Prime Inc., Gaithersburg, MD). The purified IVS amplicons were used as the DNA templates in the Illumina® PCR amplifications as described in section Search of novel host-specific IVSs by NGS.
|Target IVS||Primer pair (5’-> 3’)||Amplicon size (bp)||AT1 (°C)||Host specificity|
|76||45 to 65 gradient||NSA2|
|91||45 to 65
|82||59||Beef cattle and pig|
|83||59||Beef cattle and pig|
|104||55||Chicken and turkey|
|120||55||Chicken and turkey|
|107||45 to 65
|75||60||Beef cattle and pig|
|81||62||Horse, human, and pig|
|93||45 to 65
|91||55||Chicken and turkey|
Table 2: Host specificity of the 13 types of IVSs, determined by PCR assays.
PCR without DNA template served as a negative control, and PCR with the universal bacterial 16S rDNA primers, Bac1070F and Bac1392R , was used to determine the presence of possible PCR inhibitors. All PCR reactions were repeated at least in duplicate.
Search of novel host-specific IVSs by NGS
To search for single-host-associated IVSs based on IVS polymorphisms from those linked to multiple host species, the following approach of NGS was applied: The Illumina® MiSeq platform using the 2 × 250 nt run was performed at the DNA Core Facility of the University of Missouri in Columbia, Missouri. Briefly, the DNA containing possible IVSs were generated through the PCR amplifications (designated as Illumina® PCRs) using the Illumina® primers with the purified IVS amplicons as the DNA templates. The Illumina® primers consisted of a pair of Illumina® universal forward/reverse adapters, a pair of the universal forward/reverse binding primers, and a pair of the host-specific IVS forward/reverse primers. A unique barcode sequence was incorporated into each Illumina® reverse primer so that all different Illumina® PCR amplifications were coded. The Illumina® PCR was performed using the conditions detailed in section Verification of the host specificity of the IVSs by PCR assays, except that the TA was 55°C and the elongation time was 60 sec. The resulting Illumina® PCR amplicons were purified with the GelElute Extraction Kit and then with Agencourt AMPure XP beads (Beckman Coulter, Inc., Pasadena, CA), following the manufacturers’ protocols. Purified DNA samples were quantified with a Qubit® Fluorometer (Thermo Fisher Scientific, Waltham, MA). The samples were combined in equal molar amounts to form a final 10 nM multiplex pool and were submitted for sequencing through the Illumina® MiSeq platform. The resulting raw reads were used to search for novel host-specific IVSs, as illustrated in Figure 1.
Briefly, the raw sequences were pre-processed using Trimmomatic  to remove low quality reads and trim any sequencing adaptors. PCR duplicated and sequences containing no Illumina® primers and/or with an ambiguous read were removed. The resulting sequences were aligned, and the identical sequences were clustered and identified as one read type. Only the read types accounting for 0.1% or more of the read population were compared to search for single-host-associated IVSs. Only host-specific IVSs longer than 70 nt were further examined for their host specificities in silico, as described in section In silico analyses of host-specific IVSs within the 16S rDNA of the fecal bacteria, and by PCR, as explained in section Verification of the host specificity of the IVSs by PCR assays.
The beef cattle-specific PCR assay
Based on the sequence of IVS_BacBC (Figure 2) identified by NGS followed by the multiple sequence alignment comparisons, a PCR assay specific to beef cattle feces was developed. The forward primer is Bac_BCF: 5’-GTAAAGCGTGCCGAAGACTG-3’; the reverse primer, Bac_BCR: 5’-TATCGGGGACTTGTAAGCCG-3’. The thermocycle conditions were those used in section Verification of the host specificity of the IVSs by PCR assays, except that the TA was 55°C. The composite fecal DNA samples of beef cattle and non-cattle host species were used to verify this PCR assay.
•Dots indicate the missing nucleotides in IVS Bacteroides_1, which could be found in both beef cattle and pig feces.
•The capital letters in consensus sequences indicate the identical nucleotides, and the lowercase letters, the different ones.
•The underlined letters indicate the targeting sequence of Bacteriodes_1 PCR primers.
•The letters in bold indicate the targeting sequences of IVS_BacBC PCR primers (IVS_Bac BC could only be found in beef cattle feces).
In silico analyses of ribosomal IVSs in fecal bacteria
A total of 406,466 aligned 16S rDNA sequences from 73 bacterial genera were compiled from the RDP database . In order from the highest to the lowest frequency (Table 1), the 16S rDNA sequences were gathered from bacteria of the phyla Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, Spirochaetes, and TM7. Visualized by the BioEdit program , IVSs were sporadically and manually found within the variable region 1 (V1) of the 16S rDNA sequences from the genera Anaerovibrio, Bacteroidis, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Among them, 13 types of IVSs were identified to be associated with fecal or intestinal samples from not more than three host sources, as detailed in Table 3.
|Name of IVS1:
sequence (length in bp)
|Dairy cattle||Japan and USA|
|Beef cattle and pig||Canada and USA|
|Beef cattle and pig||China, Korea, and USA|
|Faecalibacterium-1 (or IVS-p)2:
|Chicken and turkey||USA|
|Chicken and turkey||Australia, China, and USA|
|Chicken and turkey||China and USA|
|Beef cattle and pig||Korea and USA|
|Horse, human, and pig||Australia, China, India, Spain, and USA|
|Chicken and turkey||USA|
|Pig||Korea and USA|
Table 3: Characteristics of the host-specific IVSs identified in silico.
Using the aforementioned 13 types of IVSs to do a BLAST search against the GenBank database , IVS-containing DNA sequences were retrieved. All the retrieved IVS-containing sequences were 16S rDNA molecules, and each IVS was associated with only one genus of bacteria (Table 3). Single nucleotide polymorphism (SNP) was common among the retrieved IVSs, indicated by “nucleotide mismatches” between the retrieved IVSs and the inquiry IVSs (data not shown). In addition, IVSs found in the genera Anaerovibrio, Bacteroides, Faecalibacterium, and Mitsuokella appeared to have wide geographic distribution as the IVSs had been identified in more than one country, while the other IVSs were reported only in one country by the time this study was conducted (Table 3).
Verification of host specificity of the IVSs
Based on the sequences of the 13 types of host-specific IVSs, 13 pairs of PCR primer sets were designed, and the corresponding PCR assays were performed using composite fecal samples from target host species. Among the 13 pairs of primers, nine produced the expected size of amplicons, while the remaining four failed to generate the expected amplicons at all annealing temperatures (45 to 65°C) tested (Table 2). The corresponding IVSs of the nine primer sets were then examined for their host specificities by PCR assays using composite fecal DNA samples from both target and non-target host sources, including beef cattle, chickens, dairy cattle, goats, sheep, horses, humans, pigs, sheep, and turkeys. The results demonstrated that the nine IVSs were only present in the fecal DNA of the target host source(s), which was in agreement with the results of the in silico analyses (Table 3).
Search for novel IVSs by NGS
Most NGS analyses generated from 400,000 to 700,000 quality sequenced reads, which could be clustered into from 1,000 to 10,000 read types, but no quality reads were obtained for IVS Mitsuokella_1 against all the associated host sources (horse, human, and pig) and for either Faecalibacterium_1 or Subdoligranulum_1 against the turkey source. The length and nucleotide distribution of reads derived from each type of IVS exhibited a high degree of nonuniformity, which was observed among each associated host source. Single nucleotide polymorphism (SNP) was ubiquitous throughout the sequences of the nine types of IVSs, but the mosaic nucleotide variations provided no differential power to discriminate among the same types of IVSs between or among different host sources. One exception was a unique read type of IVS Bacteroides_1, accounting for 0.42% of the read population (Table 4). The sequence was found in the feces of beef cattle but not in pig feces, which is another source associated with Bacteroides_1 (Table 2). This unique IVS, significantly different from the IVS Bacteroides_1 (Figure 2), was designated IVS_BacBC (stands for an IVS from genus Bacteroides and specific to beef cattle). Using the sequence IVS_BacBC to do a BLAST search against the Genbank, not one similar sequence was found, suggesting that the IVS is a novel one. A pair of PCR primers (IVS_BCF and IVS_BCR) were designed for IVS_BacBC, and the corresponding PCR assay was developed. The result of the PCR assay, using target and nontarget fecal samples, confirmed IVS_BacBC to be specific to beef cattle feces.
|Target IVS||Source of fecal DNA||Number of valid reads||Number of
|Number of Bacteroides_1 reads|
|Bacteroides_1||Beef cattle||620,942||2, 608||618, 334|
Table 4: Distributions of IVS Bacteroides_1 and IVS-BacBC determined by the NGS.
This study investigated the potential of 16S rDNA IVSs from fecal bacteria as MST genetic markers. The 73 genera represented most, if not all, abundant bacteria in human and agriculturally important animal feces [27,34-37]. Thirteen types of IVSs were identified to be specific to one, two, or three host sources (Tables 2 and 3). PCR assays were developed, which were able to detect nine out of the 13 IVSs (Table 2). Derived from the nine IVSs and with the approach of NGS, a novel IVS, IVS_BacBC, was found to be unique to the feces of beef cattle. Most of the host-specific IVSs found in this study could be detected in fecal samples from at least two countries, demonstrating their wide geographic distribution (Table 3). This suggests that these potential MST markers might be useful for a huge geographic range.
IVSs of 16S- and 23S-rDNA are collectively called ribosomal IVSs. They were first found in the 23S rDNA of Salmonella enterica Typhimurium  and then in the 16S and 23S rDNA of other bacteria [39-42]. Ribosomal IVSs are excised from 16S- or 23S-rRNA by ribonucleases after transcription, resulting in rRNA fragmentation . The fragmentation, which is a bacterial response to living environments by adjusting rRNA levels, is believed to increase the rRNA degrading rate by creating more targets for ribonucleases . The presence of ribosomal IVSs might be an adaption of microbes to their living environments, e.g., the intestinal tracts of their hosts, in the case of fecal microorganisms. In other words, IVSs might be host-adapted. In addition, IVSs contribute to the diversity of bacterial rDNA [24,44], thus providing a genetic basis for differentiating among the fecal microbes of different host sources. Therefore, ribosomal IVSs can be used as host-specific genetic markers for MST.
We previously identified an IVS specific to the feces of poultry (chicken and turkey) within the 16S rDNA of the genus Faecalibacterium . This study identified 13 types of host-specific IVSs in seven genera. Surprisingly, although the bacteria of the genus Bacteroides are among the most abundant fecal microbes in many host species [25,37,45,46], only one host-specific IVS suitable for PCR detection was identified from this genus (Tables 2 and 3). IVSs of Faecalibacterium appeared to be more abundant and diverse. Three types of Faecalibacterium IVSs were identified to be associated with poultry (chicken and turkey) feces, including the previously reported IVS-p , while one type of Faecalibacterium IVS was linked to the feces of beef cattle and pigs. The genus Faecalibacterium and the closely related genus Subdoligranulum account for 49% of the bacterial population in the cecum of chickens . For detecting poultry feces, IVSs of these two genera might be superior, in terms of abundance, to IVSs of fecal bacteria from other genera.
16S rDNA IVSs were not common but were widely distributed in many bacterial genera . Our result supports this conclusion. At least seven out of 73 genera of bacteria were found to contain the 16S rDNA IVSs. However, frequency of occurrence of IVSs in 16S rDNA might in fact be higher than what we observed because smaller IVSs (< 70 bp) were not accounted for in this study, and some large IVS-containing 16S rDNA sequences might have been excluded from the GenBank and RDP databases. This is due to the techniques used to obtain the 16S rDNA sequence data. Before the era of NGS, the data were commonly obtained through 16S rDNA cloning followed by DNA sequencing, where the larger-than-usual 16S rDNA would have had less chance to be cloned or even was intentionally excluded as non-16S rDNA. Although an IVS can be longer than 350 bp , our analysis did not find any IVSs larger than 200 bp, indicating that large-IVS-containing 16S rDNA sequences are missing in the databases.
To better understand the diversity and occurrence of ribosomal IVSs in fecal bacteria, research beyond the publically available data is needed.
Fecal pollution from humans, compared with other sources, poses the highest risk to human health because it can spread human diseases. In our study, three IVSs (Mitsuokella_1, Peptostreptococcus_1, and Peptostreptococcus_1) were identified in silico to be associated with human feces. We were only able to develop a PCR assay to detect the IVSs Mitsuokella_1 and Peptostreptococcus_1. However, IVS Mitsuokella_1 was not only found in human feces but also in horse and pig feces (Tables 2 and 3). Bacteria of Peptostreptococcus might be a good fecal indicator of a human source. It is a genus of anaerobic, Gram-positive, non-spore-forming bacteria  that is unable to survive in an environment where oxygen is present and therefore can be used as a fecal indicator for fresh fecal pollution. Furthermore, Peptostreptococcus is among the most abundant genera of human enteric microbiota . However, further experiments are needed to understand the complete value of the IVS Peptostreptococcus_1 as a genetic marker for tracking human fecal pollution in the environment, although the current study presents an initial investigation.
IVS_BacBC appeared to be a novel genetic marker for the detection of beef cattle feces. Further tests with fecal samples from wider geographic locations are needed to verify the value of this marker. Surprisingly, beef cattle shared the same types of IVSs more often with pigs than dairy cattle. For example, the IVSs Anaerovibrio-3, Bacteroides-1, and Faecalibacterium-4 were found in both beef cattle and pigs but not in other animal species (Table 3). This result was in agreement with some previous research [25,37] reporting that beef cattle and pigs, but not dairy cattle, had similar intestinal microbiota. However, the cause of the observations remains unknown.
Our results demonstrated that some ribosomal IVSs may be used as genetic markers in MST and that the large-size (>200 bp) IVSs were been absent from the public databases such as RDP and GenBank. This study suggests that ribosomal IVSs are a group of closely related sequences with various host specificity. Our data suggest that the IVS Peptostreptococcus_1 and the IVS IVS_BacBC may be useful genetic markers for identification of human and beef cattle feces, respectively.
This research was financially supported by a USDA-NIFA Evans-Allen Grant (Project NO: MOX-Zheng, grant# 0223248, to Zheng).