Informed Carbohydrate Active Enzyme Discovery within the Human Distal Gut Microbiome

The human genome contains nearly one hundred glycoside hydrolases (www.cazy.org, [1]). Of these carbohydrate active enzymes (CAZymes), only a few are dedicated to the metabolism of dietary complex carbohydrates [2]. Human digestive enzymes are primarily responsible for the saccharification of α1,4-linked and α1,6-linked glucans (GH13s) and sucrose (α-d-glucopyranosyl(1,2)-β-d-fructofuranoside, GH31s). By comparison, the majority of human CAZymes are involved in glycoprotein maturation, lysosomal processing, and cellular physiology [2]. In order to metabolise dietary fibre humans rely on a symbiotic relationship with a community of microorganisms that colonize their intestine, referred to as the human distal gut microbiota (hDGM). Since the emergence of culture independent, next-generation sequencing methods there have been intensive efforts to catalog the composition of this community and determine how it responds to dynamic stimuli, such as diet and disease. Accordingly, the hDGM is one of the primary sites for the Human Microbiome Project, a global collaboration seeking to exhaustively document the genomes of microorganisms that colonize human beings [3,4]. Current estimates suggest that approximately 1,000 specieslevel phylotypes are present within the human intestine [5], which encodes an integrated microbiome catalog of nearly of ten million genes [6]. Although the composition of the hDGM varies between healthy individuals, it has been suggested that humans share a ‘core microbiome’ that is dominated by Bacteroidetes and Firmicutes, with lesser contributions from Actinobacteria, Proteobacteria, Verrucomicrobia, methanogens, yeast, and viruses [5].


Introduction Intestinal microbiomes as enzyme repositories
The human genome contains nearly one hundred glycoside hydrolases (www.cazy.org, [1]). Of these carbohydrate active enzymes (CAZymes), only a few are dedicated to the metabolism of dietary complex carbohydrates [2]. Human digestive enzymes are primarily responsible for the saccharification of α1,4-linked and α1,6-linked glucans (GH13s) and sucrose (α-d-glucopyranosyl-(1,2)-β-d-fructofuranoside, GH31s). By comparison, the majority of human CAZymes are involved in glycoprotein maturation, lysosomal processing, and cellular physiology [2]. In order to metabolise dietary fibre humans rely on a symbiotic relationship with a community of microorganisms that colonize their intestine, referred to as the human distal gut microbiota (hDGM). Since the emergence of culture independent, next-generation sequencing methods there have been intensive efforts to catalog the composition of this community and determine how it responds to dynamic stimuli, such as diet and disease. Accordingly, the hDGM is one of the primary sites for the Human Microbiome Project, a global collaboration seeking to exhaustively document the genomes of microorganisms that colonize human beings [3,4]. Current estimates suggest that approximately 1,000 specieslevel phylotypes are present within the human intestine [5], which encodes an integrated microbiome catalog of nearly of ten million genes [6]. Although the composition of the hDGM varies between healthy individuals, it has been suggested that humans share a 'core microbiome' that is dominated by Bacteroidetes and Firmicutes, with lesser contributions from Actinobacteria, Proteobacteria, Verrucomicrobia, methanogens, yeast, and viruses [5].
Analysis of the genomes of Bacteroides spp. has revealed that their genomes are augmented with genes involved in carbohydrate metabolism. For example six percent of the overall genome of Bacteroides thetaiotaomicron, a well-studied intestinal model microorganism, encodes for CAZymes [7]. A prominent proportion of these are classified into CAZyme families that have been previously determined to modify pectins, α-mannans, arabinans, and various β-linked glycans ( Figure 1). Monosaccharides released from dietary fibre by these enzymes are fermented into host-absorbable products (e.g. propionate, butyrate) that provide primary sources of energy for the colonic epithelia [8]. In this regard, the symbiotic relationship between the host and intestinal saccharophiles, such as B. thetaiotaomicron, fulfills a functional deficit present within the human metabolome.

Comparing CAZyme profiles highlights functional diversity
In recent years the metagenomic sequence space of the hDGM is expanding at a rate that currently exceeds our ability to comprehensively interpret its function. For example, metagenomic surveys often report on the presence or absence of CAZyme families within a microbiome. This provides a snapshot of the metabolic potential contained within an intestinal community and may reveal niche-specific colonization behaviors of species that possess unique CAZyme families ( Figure 1) [9][10][11][12]. Despite these insights; however, its scope is limited to genes that can be classified into previously defined families and it does not account for functional diversity that exists within CAZyme families (e.g. GH2, GH5, GH43; [9]). For example, the genome of B. thetaiotaomicron contains thirty-three GH43s (arabinofurasosidases and xylanases), twenty-three GH92s (α-mannosidases), and ten GH76s (α-mannanses; [10]). Comparing these collections to other Bacteroides spp. that either colonize a common host (e.g. Bacteroides vulgatus [13]: twenty-two GH43s, nine GH92s, seven GH76s) or different animal hosts (e.g. Bacteroides salanitronis [14]: thirteen GH43s, six GH92s, zero GH76s; Prevotella ruminicola [15]: twenty GH43s, eight GH92s, one GH76) reveals the presence of species-selective metabolic programs ( Figure 1).

Complex carbohydrate utilization pathways within Bacteroides spp.
Bacteroides spp. cluster CAZymes into catabolic pathways called 'Polysaccharide Utilization Loci' (PULs; [16]). When a target polysaccharide is detected, a dedicated PUL responds by expressing gene products tailored to modify and transport the carbohydrates subunits within that polysaccharide. When presented with glycan mixtures, which more accurately represents the intestinal digesta in vivo, B. thetaiotaomicron selectively metabolises distinct polysaccharides by switching on PULs with varying levels of induction [17]. The size of individual PULs reflects the complexity of a pathway required to metabolise the glycosidic backbone, side-chain linkages, and chemical modifications (e.g. methylesterifications, acetylations) that exist within the substrate. Correspondingly, PULs that target relatively simple homopolysaccharides, such as fructans (Bt1757-BT1763) [18], are noticeably smaller than PULs that target large complex and highly branched heteropolysaccharides, such as rhamnogalacturonan-I (Bt4145-Bt4182) and rhamnogalacturonan-II (Bt0977-Bt1031) [11]. In this regard, transcriptomics of monocultures grown on purified carbohydrates have been instrumental, firstly for determining the presence and selectivity of PULs; and secondly, for defining the boundaries of PULs within the genome.

Limitations of sequence-based approaches
Although comparative genomics of CAZymes (i.e. 'CAZomics') has utility for highlighting the metabolic potential of a microorganism, it does not establish how genes are regulated, how enzymes operate, and what products are released. Recently, CAZyme families have now begun to be further classified into sequence-based subfamilies in an effort to provide more predictive power [19]. Ultimately however, determining the biological significance of CAZyme profiles requires combinatorial functional genomic approaches, such as transcriptomics, enzymology, structural biochemistry, and metabolomics to define how PULs sense and saccharify polysaccharides [18,20].

Harnessing bacteroides spp. for informed CAZyme discovery
With mounting global population pressures for food and renewable resources, harvesting the chemical energy within structural carbohydrates from the cell walls of plants [21] and macroalgae [22] remain some of the most promising avenues for invested research. In this regard, unlocking the biocatalytic potential of the hDGM, and other microbiomes tuned to the bioconversion of more recalcitrant biomass (e.g. domesticated or wild ruminants), are likely to continue to drive innovation towards discovering CAZymes with an improved catalytic efficiency, defining cocktails that provide synergistic improvements in the turnover of a mixed substrate (i.e. artificial pathways), and isolating intestinal probiotics for biological pretreatments [23][24][25]. Furthermore, exploratory studies have demonstrated that it is possible to intentionally screen Bacteroides spp. to discover novel PULs that are active on select polysaccharides. For example, the rhamnogalacturonan-II pathway recently identified within B. thetaiotaomicron documented the first CAZymes that degrade this polysaccharide; a substrate which was previously believed to be impervious to enzyme-mediated deconstruction [26]. Future efforts to culture Bacteroides spp. ex vivo to evaluate selective growth proficiency combined with transcriptomics and the biochemical characterization of PUL components promise to define novel pathways that have utility for the saccharification of or bioproduct generation from industrial significant carbohydrates.  2  3  5  8  9  10  13  15  16  18  20  23  24  25  26  27  28  29  30  31  32  33  35  36  38  42  43  50  51  53  57  59  63  65  66  67  73  76  77  78  84  88  89  91  92  93  95  97  99  . Copy number is displayed as a heat map with green = zero to red = ≥15. Copy numbers that are present in ratios of ≥1.3 over other related species are bolded and underlined. Entries that represent unique GH families within an individual species are boxed (e.g. GH9 = b1,4-glucanase). Adapted from: HYPERLINK "http://www.cazy.org" www.cazy.org [9].