Using 16S rRNA High-Throughput Methods

As a result of advancements in high-throughput technology, the sequencing of the pioneering 16S rRNA gene marker is gradually shedding light on the taxonomic characterization of the spectacular microbial diversity that inhabits the earth. 16S rRNA-based investigations of microbial environmental niches are currently conducted using several technologies, including large-scale clonal Sanger sequencing, oligonucleotide microarrays, and, particularly, 454 pyrosequencing that targets specific regions or is linked to barcoding strategies. Interestingly, the short read length


Introduction
Until recently, the vast majority of global microbial diversity was inaccessible or largely underestimated by culturedependent methods, since the cultivated fraction of the 4-6x10 31 prokaryotic genomes moving around the biosphere (Whitman et al., 1998) is currently estimated to be 1% (Giovannoni and Stingl, 2005). However, the development of culture-independent methods and the commercialization of next-generation sequencing technology (Mardis, 2008;Rothberg and Leamon, 2008) have yielded powerful new tools in terms of time savings, cost effectiveness, and data production capability. These new tools allow for the gradual characterization of the unseen majority of environmental microbial communities. Microbial diversity has recently been explored in a great variety of environments, including soil (Roesch et al., 2007;Yergeau et al., 2008), sea (Huber et al., 2007;Sogin et al., 2006), air (Wilson et al., 2002), and the human body, including from a medical perspective, the gastrointestinal tract (Andersson et al., 2008; Ley et al., 2006), oral cavity (Jenkinson and Lamont, 2005), vaginal tract (Zhou et al., 2004), and skin surface (Fierer et al., 2008). These microbial communities have been characterized in terms of community structure, composition, metabolic function, and ecological roles. Investigations of environmental microbial diversity have employed the 16S rRNA (16S) gene marker, which offers phylogenetic taxonomic classification without requiring isolation and cultivation. Although the use of the 16S phylogenetic marker is often criticized, due to its heterogeneity among operons of the same genome (Acinas et al., 2004) or its lack of resolution at the species level (Pontes et al., 2007), it is still considered as a 'gold standard' for bacterial identification. The use of next-generation sequencing technology has increased the size of 16S sequence databases at an impressive speed (Tringe and Hugenholtz, 2008).
Supported by new high-throughput methods (454 pyrosequencing, PhyloChip microarrays) and strategies (barcoding); the surveys of 16S gene in the human microbiota attempt to provide a comprehensive picture of the community differences between healthy and diseased states. In this review, we focus on the 16S-gene-based characterization of microbial communities using clonal Sanger sequencing, phylogenetic oligonucleotide microarrays, and 454 pyrosequencing strategies as applied to medical research. Propelled by the launch of the Human Microbiome Project, 16S high-throughput methods show tremendous potential for identifying uncultivated or rare pathogenic agents, finding shifts in the bacterial community associated with disease states , understanding how microbiota are affected by environmental factors of the human host (diet, lifestyle, sex, age) (Fierer et al., 2008), and differentiating between a core human microbial community and inter-individual variability (Gao et al., 2007;Turnbaugh et al., 2008). These advances will contribute to a more comprehensive picture of both healthy and diseased states and will lead to the use of more appropriate medical treatments, such as targeted antibiotic therapy rather than the use of broad-range antibiotics.

Bacterial Taxonomic Classification The 16S rRNA Gene: A Phylogenetic Marker
In the mid-1980s, major enhancements in bacterial typing and characterization of phylogenetic relationships were achieved, using new molecular approaches based on fulllength sequencing of ribosomal genes. Pioneering work by Woese and colleagues (Woese, 1987) described bacterial rRNA genes as 'molecular clocks', due to their uncommon features such as universality, activity in cellular functions, and extremely conserved structure and nucleotide sequence. The three types of rRNA in prokaryotic ribosomes are classified as 23S, 16S, and 5S, according to their sedimentation rates, and have sequence lengths of about 3300, 1550, and 120 nucleotides, respectively (Rossello-Mora and Amann, 2001). Initially, microbial diversity studies involved sequencing the 5S rRNA gene obtained from environmental samples Stahl et al., 1985). However, the relatively short sequence length of the 5S gene contains few phylogenetically informative sites, which limits its usefulness for taxonomic classification purposes. In addition, although the information content of the 23S rRNA gene is larger than that of the 16S gene, it is the 16S gene that has become a standard in bacterial taxonomic classification because it is more easily and rapidly sequenced (Spiegelman et al., 2005). It is widely accepted that a compelling classification of prokaryotes should be based on a 'polyphasic approach' that includes genomic, phenotypic, and phylogenetic information (Vandamme et al., 1996). However, most bacterial diversity surveys exclusively target the 16S gene in a singlestep phylogenetic approach (Pace, 1997).
The 1550 base pairs of the 16S gene are a structural part of the 30S ribosomal small subunit (SSU) and consist of eight highly conserved regions (U-U8) and nine variable regions across the bacterial domain . As no lateral gene transfer seems to occur between 16S genes (Olsen et al., 1986) and as their structure contains both highly conserved and variable regions with different evolution rates, the relationships between 16S genes reflect evolutionary relationships between organisms. A comparison of 16S gene sequence similarities is usually used as the 'gold standard' for taxonomic identification at the species level. Although thresholds are arbitrary and controversial, a range of 0.5% to 1% sequence divergence is often used to delineate the species taxonomic rank (Clarridge III, 2004). Sequencing the 16S gene is currently the most common approach used in microbial classification as a result of its phylogenetic properties and the large amount of 16S gene sequences available for comparison analyses.

16S Gene Sequence Databases
Accurate identification of organisms by comparative analysis of 16S gene sequences is strongly dependent on the quality of the database used. The curated Ribosomal Database Project (RDP-II, http://rdp.cme.msu.edu/) provides 623,174 bacterial and archaeal small subunit rRNA gene sequences in an aligned and annotated format and has achieved major improvements in the detection of sequence anomalies . Notably, among all of the online tools provided by the RDP-II web site, the RDP classifier tool has demonstrated effective taxonomic classification of short sequences produced by the new pyrosequencing technology The J Comput Sci Syst Biol Volume 2(1): 074-092 (2009) -076 ISSN:0974-7230 JCSB, an open access journal Greengenes project (http://greengenes.lbl.gov/) offers annotated, chimera-checked, full-length 16S gene sequences in standard alignment formats (DeSantis et al., 2006) and has a particularly useful tool for 16S microarray design. The Silva project (http://www.arb-silva.de) (Pruesse et al., 2007) provides SSU as well as large subunit (LSU) rRNA sequences from Bacteria, Archaea, and Eukarya in a format that is fully compatible with the ARB package (Ludwig et al., 2004). The ARB package (www.arb-home.de) has been used in major 16S surveys (Eckburg et al., 2005;Ley et al., 2005;Ley et al., 2006;McKenna et al., 2008; and notably allows phylogenetic tree constructions by insertion of partial or near-full sequences into a pre-established phylogenetic tree using a parsimony insertion tool. However, the lack of quality control of sequence entries (ragged sequence ends and outdated or faulty entries) in these major public sequence databases has led to the development of high quality commercial databases, including MicroSeq 500 and the RIDOM Mycobacteria project (http:/ /www.ridom-rdna.de) (Harmsen et al., 2002). The MicroSeq 500 database targets the first 527-bp fragment of the 16S gene and is able to identify most of the clinically important bacterial strains with ambiguous biochemical profiles (Woo et al., 2003). The ribosomal differentiation of the medical microorganism (RIDOM) database targets the 5'end of the 16S sequence and is dedicated to Mycobacteria family analyses. However, although these commercial databases are continually expanding, the current total number of 16S entries remains uncertain, and the representation of taxonomic divisions is limited.

Measures of Microbial Diversity
The assessment of microbial diversity in a natural environment involves two aspects, species richness (number of species present in a sample) and species evenness (distribution of relative abundance of species) (Magurran, 2005). In order to estimate species richness, researchers widely rely on the assignment of 16S sequences into Operational Taxonomic Unit (OTU or phylotype) clusters, for instance, as performed by DOTUR (Schloss and Handelsman, 2005). The criterion used to define an OTU at the species level is the percentage of nucleotide sequence divergence; the cutoff values vary between 1%, 3%, or 5%, depending on the study. As a result of these inconsistencies, reliable statistical comparisons or descriptions of species richness across studies are restricted (Martin, 2002). The total community diversity of a single environment, or the α-diversity, is often represented by rarefaction curves. These curves plot the cumulative number of OTUs or phylotypes captured as a function of sampling effort and, therefore, indicate only the OTU richness observed in a given set of samples (Eckburg et al, 2005). In contrast, nonparametric methods, including Chao1 or ACE, are richness estimators of overall α-diversity (Roesch et al., 2007). In addition, quantitative methods such as the Shannon or Simpson indices measure the evenness of the α-diversity. However, although these estimators can describe the diversity of the microbiota associated with a healthy or diseased state, they are not informative of the (phylo)genetic diversity of an environmental sample (Martin, 2002).
Contrary to the α-diversity, the β-diversity measure offers a community structure comparison (taxon composition and relative abundance) between two or more environmental samples. For instance, β-diversity indices can compare similarities and differences in microbial communities in healthy and diseased states. A broad range of qualitative (presence/absence of taxa) and quantitative (taxon abundance) measures of community distance are available using several tools, including LIBHUFF, P-test, TreeClimber (Schloss and Handelsman, 2006b), SONS (Schloss and Handelsman, 2006a), DPCoA, or UniFrac (Lozupone et al., 2006, Lozupone and; these methods have been thoroughly detailed in a previous review (Lozupone and Knight, 2008). For example, the robust unweighted UniFrac tool  measures the phylogenetic distance between two communities as the fraction of phylogenetic tree branch lengths leading to a descendant from either one community or the other. While UniFrac efficiently detects differences in the presence or absence of bacterial lineages, the recently developed weighted UniFrac is the qualitative version of original UniFrac and provides an efficient detection of differences in the relative abundance of bacterial lineages .

16S Clonal Sanger Sequencing
Until the appearance in the two last decades of sequencing-by-synthesis methods (Ronaghi et al., 1996) such as those used in pyrosequencing, the Sanger sequencing method (Sanger et al., 1977) was the cornerstone of DNA sequence production. The Sanger method is based on DNA synthesis on a single-stranded template and di-deoxy chain-termination (Hall, 2007, Pettersson et al., 2008. Improvements in cost-effectiveness and the development of high-throughput techniques (e.g, fluorescent-labeled terminators, capillary separation, template preparation) (Hunkapiller et al., 1991) have enabled direct sequencing of clones, without laborious prior screening by restriction analysis. As a result, Sanger sequencing has produced the earliest in-depth analyses of microbial communities (Eckburg et al., 2005;Ley et al., 2006;Turnbaugh et al., 2006;Zhou et al., 2004). All 16S genes in a sample are amplified using a universally conserved primer pair that targets most of the species that have been sequenced and deposited in the ribosomal databases. After the cloning of amplified PCR products into specific vectors, the inserts are sequenced. Sanger sequencing offers high phylogenetic resolution power, as the method yields the longest sequencing read available, up to 1000 base pairs (bp) (Shendure and Ji, 2008).

16S Microarray-based Approach
The DNA microarray is a powerful technology that can simultaneously detect thousands of genes on a single glass slide or silicon surface (Gentry et al., 2006). Mainly used in gene expression profiling (DeRisi et al., 1997; Schena et al., 1996), DNA microarray technology has also been employed in bacterial identification and, more recently, has been adapted for exploring microbial community diversity in environmental niches. Microarrays used for microbial identification rely on the 16S gene and use short 20-to 70-mer oligonucleotide probes (Bae and Park, 2006) for multi-species detection. These are referred to as phylogenetic oligonucleotide microarrays (POAs).
Due to the phenomenal microbial diversity that might be The primer for the sequencing step is hybridized to a single-stranded DNA template, and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates. Deoxyribonucleotide triphosphate (dNTP) is added, one at a time, to the pyrosequencing reaction. The incorporation of a nucleotide is accompanied by release of pyrophosphate (PPi). The ATP sulfurylase quantitatively converts PPi to ATP. The signal light produced by the luciferase-catalyzed reaction in presence of ATP is detected by a charge coupled device (CCD) camera and integrated as a peak in a Pyrogram. The nucleotide degrading Apyrase enzyme continuously degrades ATP excess and unincorporated dNTPs. The process continues with addition of the next dNTP and the nucleotide sequence of the complementary DNA strand is inferred from the signal peaks of the pyrogram.

Nucleotide added
Luciferase present in an environmental sample and the lack of prior knowledge of its population composition, the oligonucleotide probe design strategy has been modified for use in community diversity analysis. Instead of using one unique probe that targets a specific region of one taxon, multiple probe sets are employed to target microorganisms at different taxonomic levels. An efficient probe design based on a hierarchical phylogenetic framework was established in 2003, using a database of curated and aligned 16S sequences known as ProkMSA ( High-throughput microorganism detection by microarray technology was first achieved with high-density photolithography microarrays using 31,179 20-mer oligonucleotide probes in a deep air investigation (Wilson et al., 2002). These high density microarrays present an alternative to clone library sequencing, since they can more deeply assess microbial diversity (DeSantis et al., 2007) without a cloning bias. With advancements in technology, the PhyloChip platform, an Affymetrix microarray product, was developed by the Lawrence Berkeley National Laboratory. The PhyloChip has rapidly identified up to 8,900 distinct environmental microorganisms at different taxonomy levels from soil (Yergeau et al., 2008) , air , or uranium-contaminated site samples (Brodie et al., 2006) in a single experimental run. The PhyloChip is a glass slide with a small surface area, containing a high density microarray of hundreds of immobilized oligonucleotides (15-to 25-mer length). The method employs parallel hybridization reactions, using a flow of fluorescently labeled DNA targets. The microarray slide is analyzed with a fluorescence microscope equipped with a cooled CCD camera.

Sequencing Chemistry
Pyrosequencing is a DNA sequencing method (Clarke, 2005, Ronaghi, 2001Ronaghi and Elahi, 2002) based on the sequencing-by-synthesis principle, which was first described in 1985 (Melamede and Wallace, 1985). This method relies on efficient detection of the sequential incorporation of natural nucleotides during DNA synthesis (Ronaghi et al, 1996; Ronaghi et al., 1998) ( Figure 1). The pyrosequencing technique includes four enzymes that are involved in a cascade reaction system (Figure 1).
During the reaction, the Klenow fragment of DNA polymerase I releases inorganic pyrophosphate molecules (PPi) upon the addition of one nucleotide to a primer hybridized to a single-stranded DNA template. The second reaction, catalyzed by ATP sulfurylase, produces ATP, using the released PPi as a substrate. The ATP molecules are then converted to a luminometric signal by the luciferase enzyme. Therefore, the signal light is detected only if a base pair is formed with the DNA template, and the signal strength is proportional to the number of nucleotides incorporated in a single nucleotide flow. Finally, the unincorporated nucleotides and excess ATP are degraded between base additions by a nucleotide-degrading enzyme such as apyrase (Ronaghi et al., 1998); at this point, another dNTP is added and a new cycle begins. The earliest attempts at pyrosequencing were performed using the PSQ96 system (Biotage AB, Upsala, Sweden) and targeted the short 16S variable regions V1 and V3 Tarnberg et al., 2002) or the human pathogen H. pylori (Hjalmarsson et al., 2004). This system produced reads of an average length of around 20-40 bases.

The 454 Life Sciences Pyrosequencing Platform
In Margulies et al. (2005) first described a highly parallel sequencing platform (GS 20 454 Life Sciences) using a pyrosequencing protocol optimized for solid support. The authors demonstrated the ability of the system to assemble complete genomes (Mycoplasma genitalium and Streptococcus pneumoniae) from short sequencing reads (Margulies et al., 2005). In 454 pyrosequencing, the DNA template is fragmented, and the resulting fragments are individually immobilized onto a bead by limiting dilution. Emulsion PCR is performed for the DNA amplification step, in which each DNA fragment is independently confined into a droplet of oil and water containing PCR reagents. The clonally amplified fragments are distributed into a picotiter plate, which contains ~1.6 million picoliter wells, with a well diameter allowing one bead per well.
Using the pyrosequencing protocol previously described, the chemiluminescent signal obtained from an incorporated nucleotide is recorded by a charge-coupled device (CCD) camera, and data analysis, such as image processing or de novo sequence/genome assembly, is performed with provided bioinformatics tools. Other massively parallel platforms

16S Pyrosequencing Strategies Targeting Specific Regions
Due to the short sequencing read length generated by pyrosequencing technology (e.g, 100 bp for GS 20 and 200 bp for GS FLX) and due to the small amount of nucleotide variability in the 16S gene throughout the bacterial domain, full 16S gene sequence assembly and the taxonomic assignment of species present in a mixed microbial sample remain a computational challenge (Armougom and Raoult, 2008). One strategy of addressing this problem consists of targeting a specific variable 16S region that exhibits a sufficient phylogenetic signal to be accurately assigned at the genus level or below. Surprisingly, short sequence fragments (including 100 bases) can provide substantial phylogenic resolution . By choosing appropriate 16S regions and simulating the read length obtained with GS FLX (250 bases), Liu et al., (2007) reproduced the same results obtained from full length 16S sequences using the UniFrac clustering tool . The authors suggested that the F8-R357 primer set, which amplifies a sequence spanning the V2 and V3 hypervariable regions and generates a 250-bp amplicon, could be the optimal primer set for exploring microbial diversity using 16S 454 pyrosequencing with GS FLX. These primers were also used in a study of the macaque gut microbiome (   Whereas Liu et al., (2005) laud the use of the F8-R357 primer pair the study that introduced the concept of tag pyrosequencing was performed using the V6 hypervariable region amplified from deep water samples of the North Atlantic (Sogin et al., 2006). Based on the Shannon entropy measure, the 16S gene shows high variability in the V6 region. This region was selected in an analysis of the human gut microbiome combined with a barcoding strategy (Andersson et al., 2008). In addition, in their recent study of a gut microbial community, Huse et al., (2008) showed that the use of tags of the V3 and V6 regions in 454 pyrosequencing notably provides taxonomic assignments equivalent to those obtained by full length sequences generated using clonal Sanger sequencing .
The taxonomic assignment of pyrosequencing reads is sensitive to the classification method employed (Liu et al., 2008). Wang et al., (2007) showed a classification accuracy map of their RDP classifier tool along the 16S sequence position , which revealed that variable V2 and V4 regions of 16S provided the best taxonomic assignment results at the genus level. No true consensus seems to emerge among the 16S 454 pyrosequencing studies that target variable regions (Table1). However, this is not surprising since the power of phylogenetic resolution of variable 16S regions might differ depending on the taxa present in the mi-crobial community studied; these results may also be due to the under-representation of 16S sequences of certain environments in the reference databases .

16S Barcoding Strategy
The currently available 454 GS FLX pyrosequencer can accommodate a maximum of sixteen independent samples, since a picotiter plate contains sixteen physically separated regions.
To overcome this limitation and expand the capacity by pooling DNA from independent samples in a single sequencing run, a barcoding approach was developed, which associates a short unique DNA sequence tag (barcode) with each DNA template origin. In contrast with Sanger sequencing, pyrosequencing technology such as GS FLX generates sequencing reads from the first position of the DNA template fragment. Therefore, the sequencing reaction driven by an oligonucleotide that is complementary to adaptor A and B first reads the barcode sequence, allowing the identi- Finally, the efficiency of barcoding strategy was investigated using a set of eight nucleotide barcodes based on error-correcting codes (called Hamming codes). The ability to detect sequencing errors that change sample assignments and to correct errors in the barcodes was evaluated to 92% (Hammady et al., 2008).

Taxonomy Assignment of Short Pyrosequencing Reads
Taxonomy assignment using standard phylogenic methods such as likelihood-or parsimony-based tree construction is inconceivable, given the large amount of sequence data (400,000 reads for GS FLX Titanium) generated by high-throughput pyrosequencing (Liu et al., 2008). In addition, due to the short read length resulting from pyrosequencing (~ 100 bp for GS 20 and 250 bp for GS FLX), full 16S sequence assembly is a computational challenge, especially for closely related species in a mixed bacterial sample. Therefore, current taxonomy classification tools such as the naïve Bayesian RDP-II classifier  or the Greengenes classifier (DeSantis et al., 2006) employ rapid but approximate methods. In contrast to the RDP-II classifier, the Greengenes classifier requires a pre-computed alignment for its taxonomic classification. While, tree-based methods are subject to large variations in assignment accuracy according to the DNA region examined (Liu et al., 2008), the tree-independent methods used in the RDP-II and Greengene classification tools yield stable and accurate taxonomy assignment results. However, although these methods provide satisfactory assignment results at the genus level, they have a limited resolution power at the species level. This limitation could be reduced, but not yet solved, by the 400-to 500-base read length generated by the new GS Titanium pyrosequencing platform.
Pyrosequencing read classification based on a sequence similarity search using the BLAST algorithm can yield reliable results; however, the closest match is not inevitably the nearest phylogenetic neighbor (Koski and Golding, 2001). Sundquist et al., (2007) recently proposed a method based on a similarity search with BLAT, a BLASTlike alignment tool (Kent, 2002), and inferred specific phy-logeny placement using a preliminary set of best BLAT match scores. Finally, a tag-mapping methodology has been recently introduced with GAST (Global Alignment for Sequence Taxonomy), which is a taxonomic assignment tool combining BLAST, multiple sequence alignment, and global distance measures. GAST has been used to identify taxa present in deep sea vent and human gut samples , Sogin et al., 2006.
The accuracy rate of classification methods of short 16S rRNA sequences is measured as the percentage of sequences correctly classified from a representative set of bacteria sequences of known classification. This accuracy rate can be associated with the misclassified sequence rate. Overall, the simulation tests showed efficient classification down to the genus level. Simulating 200-base segments (such as generated by the GS FLX), the RPD-II classifier tool indicated an overall taxonomic assignment accuracy of 83.2 % at the genus level . Likewise, using the methodology of Sundquist and co-workers (Sundquist et al., 2007), the simulation of read resolution for 200 base segments in diverse and representative samples yield an accuracy rate around 80% as obtained for the RDP-II classifier. However, while the benchmark of the Sundquist method identified the V1 and the V2 regions of the 16S as the best targets for the pyrosequencing of 100 bases (such as generated by the GS 20), the benchmark of the RDP-II classifier method identified the V2 and the V4 regions. Finally, the comparison of the classification methods is rather difficult since the representative bacteria sequences of known classification used for the benchmarking is different for each method. A collection of reference sequences of known classification (defined as gold standard) is required for classification method comparisons. For instance, collections of reference sequence alignments are generally used for the benchmarking of multiple sequence alignment methods (Armougom et al., 2006).

Metagenomic Approach
Metagenomic is a culture-independent genomic analysis of entire microbial communities inhabiting a particular niche, such as the human gut (Riesenfeld et al., 2004, Schmeisser et al., 2007. Metagenomic investigations aimed at finding "who's there and what are they doing" (Board On Life Sciences, 2007) are providing new insight into the genetic variability and metabolic capabilities of unknown or uncultured microorganisms (Turnbaugh et al., 2006). The inclusion of the metagenomic approach in this review was re-quired, since metagenomic studies include the analysis of the 16S sequences contained in metagenome data, in order to identify community composition and determine bacterial relative abundance (Biddle et al., 2008;Edwards et al., 2006;Krause et al., 2008;Wegley et al., 2007). However, we distinguished the metagenomic approach from 16S highthroughput methods, as the principal purpose of metagenomic is to explore the entire gene content of metagenomes for metabolic pathways and to understand microbial community interactions (Tringe et al., 2005), rather than targeting a single gene such as 16S. In contrast with some scientific literature examples and because it can only answer "who is there", 454 pyrosequencing investigations based on the 16S gene should not be assimilated to a metagenomic case.
Before the commercialization of the 454 pyrosequencing platform, microbial community sequencing in early metagenomic studies, such as in the Sargasso sea (Venter et al., 2004) or in acid mine drainage (Edwards et al., 2000), involved preliminary clone library construction and capillary sequencing. This approach limited the expansion of microbial diversity knowledge due to cloning bias and the cost of capillary sequencing. In contrast, the inexpensive next-generation 454 pyrosequencing technology can perform direct sequencing without requiring preliminary PCR amplification or library construction. By excluding cloning and PCR bias, 454 sequencing revolutionized the metagenomic field by capturing up to 100% (depending on the quality of DNA extraction and environmental sampling) of the microbial diversity present in a sample.

Case Studies The Human Microbiome Project
Launched by the NIH Roadmap for Medical Research, the Human Microbiome Project (HMP) seeks to comprehensively characterize the human microbiota in order to better understand its role in human health and disease states (http://nihroadmap.nih.gov/) The HMP project mainly focuses on the gastrointestinal, oral, vaginal, and skin microbiota.

The Gastrointestinal Microbiota
Until recently, characterization of the gut microbiota diversity was restricted to culture-based methods (Simon and Gorbach, 1986). While the cultivable fraction is currently estimated to be 442 bacterial, 3 archaeal, and 17 eukaryotic species (Zoetendal et al., 2008), the species richness of the gut microbiota is estimated to be 15,000 or 36,000 bacterial species, depending on the similarity cut-off applied in OTU clustering (Frank et al., 2007). With the development of culture-independent methods, 16S gene surveys have deeply enhanced the microbial diversity map of the human gut microbiota (Eckburg et al., 2005;Gill et al., 2006;Ley et al., 2006;Palmer et al., 2007). A major large-scale 16S investigation by Eckburg et al., (2005) indicated that the gut of healthy individuals was mainly composed of the Bacteroidetes and Firmicutes bacterial phyla (83% of sequences). The sequences also included the archaeal Metanobrevibacter species, as well as a majority of uncultivated species and novel microorganisms (Eckburg et al., 2005). Surprisingly, at the phylum level, the bacterial diversity of the gut microbiota is low; only 8 of 70 known bacterial phyla are represented. Despite the predominance of Firmicutes and Bacteroidetes phylotypes, the gut microbiota displays a great inter-individual specificity in its composition , especially in newborn babies (Palmer et al., 2007). Within elderly populations, the Bacteroidetes proportion can decline (Woodmansey, 2007). Recently, a barcoded pyrosequencing study of the gut microbiota of six elderly individuals showed that Actinobacteria was the second most abundant phylum, not Bacteroidetes as expected (Andersson et al., 2008). Age, caloric intake, antibiotic treatment  and diet are a few environmental factors that can influence the gut microbiota composition and thus affect human health.
Through the comparison of lean and obese individuals, a possible relationship between obesity and the composition of (and changes in) gut microbiota has been investigated Turnbaugh et al., 2008) and reviewed (Dibaise et al, 2008). Ley et al., (2006) showed that obese subjects have a higher Firmicutes/Bacteroidetes ratio than lean controls . By testing dietary factors, the authors demonstrated a shift in Bacteroidetes and Firmicutes relative abundance that correlated with weight loss. Reduced bacterial diversity and familial similarity of gut microbiota composition within obese individuals have also been recently reported (Turnbaugh et al., 2008).

The Oral Cavity Microbiota
The understanding of healthy oral cavity microflora is essential for the prevention of oral diseases and requires unambiguous identification of microorganism(s) associated with pathology. For instance, it is accepted that Streptococcus mutans is the etiological agent in dental caries (Jenkinson and Lamont, 2005). Limited by the cloning and sequencing approach, the characterization of the diversity in human oral microflora was radically enhanced by the first 16S 454 pyrosequencing of saliva and supragingival plaque (Keijser et al., 2008). Keijer et al., revealed 3,621 and 6,888 phylotypes in saliva and plaque samples, respectively, and estimated the total microbial species richness to be 19,000 (3% similarity cut-off). Within the 22 phyla identified, the main taxa are Firmicutes (genus Streptococcus and Veillonella) and Bacteroidetes (genus Prevotella) in saliva, while Firmicutes and Actinobacteria (genus Corynebacterium and Actinomyces) are the most common in plaque.

The Vaginal Microbiota
By its exposure to the external environment, the female genital tract can be easily affected in its reproductive functions. Previous surveys of the human vaginal microbiota proposed that the normal vaginal microbiota can act as a defense mechanism, playing an essential role in preventing infections such as bacterial vaginosis or sexually transmitted diseases in women. Zhou et al., (2004) first characterized the vaginal microbiota by 16S clone library sequencing (Zhou et al., 2004). As found in culture-dependent studies, the authors showed that Lactobacilli and Atopobium were generally the predominant organisms; they also reported the first identification of a Megasphaera species in the vagina. By employing a full 16S pyrosequencing strategy and developing a classification method of short 16S pyrosequencing reads, Sundquist et al., (2007) studied the human vagina during pregnancy and corroborated previous results. The authors identified Lactobacillus as the dominant genus and detected a significant presence of other genera including Psychrobacter, Magnetobacterium, Prevotella, Bifidobacterium, and Veillonella (Sundquist et al., 2007). However, Lactobacillus can be missing in healthy vaginal microbiota and replaced by other predominant genera such as Gardnerella, Pseudomonas, or Streptococcus (Hyman et al., 2005). Although inter-individual variability of the vaginal microbiota was demonstrated, the function of the communities was conserved and was shown to be involved in the production of lactic acid (Zhou et al., 2004). The maintenance of the vaginal acidity preserves an unfriendly environment for the growth of many pathogenic organisms (Zhou et al., 2004). Important shifts in relative abundances and types of bacteria, especially the decrease in lactic acid bacteria, in the healthy vagina are associated with bacterial vaginosis infections (Spiegel, 1991). Compared to the healthy vaginal microbiota, bacterial vaginosis-associated microbiota showed greater specie richness, different bacterial community structures, and a strong association with members of the Bacteroidetes and Actinobacteria phyla (Oakley et al., 2008). To enhance the understanding of the important variations in the incidence of bacterial vaginosis among racial or ethnic groups, the vaginal microbiota of Caucasian and black women were explored. Striking differences were demonstrated in community abundance and also in composition; for instance, the predominance of Lactobacillus in black women is lower than in Caucasian women (Zhou et al., 2007).

The Skin Microbiota
The skin probably offers one of the largest human-associated habitats and has a bacterial density of around 10 7 cells per square centimeter (Fredricks, 2001). The commensal bacterial communities and pathogenic microorganisms harbored by the skin's surface suggest an association with healthy, infectious or noninfectious (psoriasis, eczema) (Grice et al., 2008) skin states. Until recently, because skin infections generally involve rare pathogenic isolates, the resident skin bacteria remained poorly described and limited to culture-dependent studies that under-represented the extent of bacterial diversity.
A recent 16S survey of the resident skin microbiota of the inner elbow region, an area subjected to atopic dermatitis, from five healthy humans generated 5,373 nearly complete 16S rRNA sequences. These sequences were assigned into 113 phylotypes belonging to the Proteobacteria (49%), Actinobacteria (28%), Firmicutes (12%), Bacteroidetes (9.7%), Cyanobacteria (<1%), and Acidobacteria(<1%) phyla (Grice et al., 2008). Most of the 16S sequences (90%) belong to the Proteobacteria phylum and, more accurately, to the Pseudomonas and Janthinobacterium genera. Finally, the authors' results indicated a low level of deep evolutionary lineage diversity and a similar diversity profile for all of the subjects, suggesting a common core skin microbiota among healthy subjects (Grice et al., 2008). Interestingly, the few cultivated commensal skin bacteria, including S. epidermidis and P. acnes, accounted for only 5% of the microbiota captured in the Grice et al., (2008) study (Grice et al., 2008).
In contrast, a study examining the diversity of skin microbiota from the superficial forearms in six healthy subjects indicated a small core set of phylotypes (2.2%) and a high degree of inter-individual variability in the microbiota  (Gao et al., 2007). The distribution of the 182 identified phylotypes at the phylum level was 29% Proteobacteria, 35% Actinobacteria, 24% Firmicutes, 8% Bacteroidetes, 1.6% Deinococcus-Thermus, 0.5% Termomicrobia, and 0.5% Cyanobacteria (Gao et al., 2007). However, only the three most abundant phyla were observed in all subjects. Although many phylotypes overlap between the inner elbow and forearm skin microbiota, the predominant phylotype belonging to the Proteobacteria phyla of the inner elbow microbiota is missing in the forearm microbiota. In addition, the forearm skin microbiota possesses more members of the Actinobacteria and Firmicutes phyla (Grice et al., 2008). It is difficult, however, to compare studies that employed different methods and skin sample locations.
The characterization of skin microbial communities and their interactions is still in its infancy, since the impacts of sex, age, clothing, and others factors have not been clearly determined. However, a recent study on the reduction of disease transmission by hand washing used a barcoded pyrosequencing approach to characterize the hand surface microbiota of 51 healthy men and women to determine how specific factors could affect the community structure (Fierer et al., 2008). Although the authors detected a core set of bacterial taxa on the hand surface, the results mainly revealed a high intra-and inter-individual variation in community structure when sex, hand washing, or handedness factors were considered. In addition, though the diversity observed in hand surfaces is high (sequences from >25 phyla were identified), 94% of the sequences belong only to three of these phyla (Actinobacteria, Firmicutes and Proteobacteria) (Fierer et al., 2008). Finally, independently of the skin site sampled (Fierer et al., 2008, Gao et al., 2007Grice et al., 2008), all of the studies shared the same predominant phyla: Proteobacteria, Actinobacteria, and Firmicutes.

Limits of 16S Analyses
The 16S is an efficient phylogenic marker for bacteria identification and microbial community analyses. However, the multiple pitfalls of PCR-based analyses, including sample collection, cell lysis, PCR amplification, and cloning, can affect the estimation of the community composition in mixed microbial samples (Farrelly et al., 1995;von et al., 1997).
Although one or two 16S gene copies are commonly exhibited by a single genome, multiple and heterogeneous 16S genes in a single microbial genome are not rare and can lead to an overestimation of the abundance and bacterial diversity using culture-independent approaches (Acinas et al., 2004). Multiple copies of rRNA operons (rrn) per genome are generally found in rapidly growing microorganisms, especially in soil bacteria (Klappenbach et al., 2000). As a result, the number of copies of the 16S gene in a microbial genome can reach 10 or 12 copies in Bacillus subtilis (Stewart et al., 1982) or Bacillus cereus (Johansen et al., 1996), respectively, and up to 15 copies in Clostridium paradoxum (Rainey et al., 1996). In addition to the heterogeneity of the 16S gene copy number per genome, a bacterial species can display important nucleotide sequence variability among its 16S genes (Acinas et al., 2004; Rainey et al., 1996;Turova et al., 2001). Furthermore, it is well known that 16S gene sequencing lacks taxonomic resolution at the species level for some closely related species (Janda and Abbott, 2007), subspecies, or recently diverged species (Fox et al., 1992 Likewise, PCR amplification bias in a mixed microbial sample causes the taxon amplicon abundance to differ from the real proportions present in the community. Notably, PCR amplification bias can be induced by the choice of primer set, the number of replication cycles, or the enzyme system used (Qiu et al., 2001;Suzuki and Giovannoni, 1996). An obvious example is that the sensitivity of universal primers is limited to the currently known 16S sequences and does not reach 100% coverage. What is the primer sensitivity for unknown microbial sequences, and how can the same hybridization efficiency be guaranteed for all targets in the sample? However, metagenomic studies, which can theoretically access up to 100% of the microbial diversity in a sample, have yielded powerful alternatives to bypass primer and cloning bias using 454 pyrosequencing.
Another aspect limiting the capture of the true microbial diversity in a mixed sample by 16S surveys is inherent to the cloning step. The efficiency of ligation to the plasmid, transformation, and amplification in the host can all have an effect. For instance, it has been suggested that many unclonable genes in the E. coli host are present in a single copy per genome and hence are under-represented in clone libraries due to inactive promoters or toxic effects induced by gene transcription/translation into the host (Kunin et al.,

Conclusions and Perspectives
Culture-independent methods based on the 16S rRNA gene yield a useful framework for exploring microbial diversity, by establishing the taxonomic composition and/or structure present in environmental samples using both α and β-diversity measures, phylogenetic tree construction, and sequence similarity comparison.
Unlikely to culture methods, these recent high-throughput methods allow accessing to the true microbial diversity. In a point of view of clinical research, new or uncultured etiologic agents from poly-microbial samples (pulmonary infections, brain abscess) of disease states can be identified and will lead to elaborate more appropriate antibiotherapies rather than the use of broad range antibiotics. In addition, compared to lean controls, the reduction of the Bacteroidetes members and the increase of the methanogen Methanobrevibacter smithii in obese patients were revealed by high-throughput sequencing methods. These results suggested that modulate the relative abundance of some microbial groups of the gut microbiota could be beneficial for obese treatment. The huge amount of sequences provided by these new sequencing methods hugely increase the number of 16S sequences in databases, and thus improve the ability of 16S sequence identification using sequence similarity search tools. In a near future, the accuracy of classification methods of short 16S sequences will be improved by the increase of read length (450pb) produced by the new 454 FLX Titanium apparatus. In addition, the increase in sequence production capabilities of the 454 FLX Titanium associated with the barcoding strategy will allow examining much more different samples in a single pyrosequencing run.
However, 16S high-throughput methods can not characterize the functional component (defined as the microbiome) of an environmental sample. Such limitations arise by targeting a sole gene marker. This limitation can be overcome by a metagenomic approach, which focuses on the full gene content (gene-centric analysis) of a sample. Therefore, in addition to providing species richness and evenness information, the relatively unbiased metagenomic approach can also identify the metabolic capabilities of a microbial com-munity and disclose specific adaptive gene sets that are potentially beneficial for survival in a given habitat.