Analyzing Intracellular flux Distributions of Thermoanaerobacter sp. X514 under Various Substrate Conditions by Gibbs Sampling

Thermoanaerobacter sp. X514, capable of fermenting most hexose and pentose to produce ethanol, is of critical importance in industrial bioethanol production. This paper provides a detailed investigation of the intracellular metabolic flux flowing activities of X514 under different conditions, including sole glucose substrate condition, sole xylose substrate condition and mixed glucose and xylose substrates condition by means of Gibbs sampling algorithm and stoichiometric metabolic flux balances. Statistical analysis of the results show that all the flux distributions exhibit Gaussian or truncated Gaussian distributions under the assumption that noise of the system is Gaussian distribution. Pentose phosphate pathway becomes more active when xylose is the sole substrate whereas glycolysis pool is more active under sole glucose substrate condition. Major changes of the fluxes activities occur in the connection flux between glycolysis pool and pentose phosphate metabolite pool, indicating the balances between the need for NADPH, the requirement of ATP as well as the necessity of pyruvate under different substrate conditions. Generation of pyruvate reaches its maximum when sole glucose is fed into the system, indicating the preference of the system on glucose consumption. The whole metabolic flux distributions illustrate vivid information about the central metabolic map of X514. Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s ISSN: 2155-6180 Citation: Yang J, Ling X, Xie T, Yao L, Kadirkamanathan V (2012) Analyzing Intracellular flux Distributions of Thermoanaerobacter sp. X514 under Various Substrate Conditions by Gibbs Sampling. J Biomet Biostat S7:014. doi:10.4172/2155-6180.S7-014


Introduction
It is of paramount importance to find alternative energy type so as to replenish the decreasing fossil energy. Biofuel is currently recognized as one of the promising options to replace traditional fossil fuel [1]. Cellulosic biofuel, which generates biofuel from a wide array of feedstock, has attracted a lot of attention mainly because the sources are inexpensive and plentiful while its production is environmentally benign [2]. However, the technology that converts cellulose to bioethanol is at a stage that is far from commercialization. At present, only 40% energy in breweries can be converted to ethanol [3]. Therefore, improvement in cellulosic bioethanol technology is critical and urgent. In terms of the configurations during enzyme hydrolysis and sugar fermentation, traditional bioethanol processes contain Simultaneous Saccharification and Fermentation (SSF) and Separate Hydrolysis and Fermentation (SHF) [4]. Although SSF was proved to give a 13% higher overall ethanol yield than SHF [4], the high cost of enzymes used in cellulosic hydrolysis makes bioethanol uncompetitive to the traditional fossil fuel. Consolidated BioProcessing (CBP), which features cellulase production, cellulose hydrolysis and fermentation in one step, is viewed as the most promising way to convert cellulose to bioethanol at a minimum cost [5].
Microorganisms that can be applied in CBP are not yet available. Currently, the combination of a cellulolytic strain of Clostridium species and a saccharolytic strain of Thermoanaerobacter species which can produce cellulosic ethanol at high temperature (around 60°C) without the consumption of oxygen, provides an efficient means to rapidly convert cellulose to ethanol by CBP technology [5,6]. A number of thermophilic ethanolgenic anaerobes have been isolated from all over the world in recent years [7][8][9]. Those organisms exhibit some attractive properties such as: viable substrate utilization, high ethanol productivity, anaerobiosis and tolerance to high temperature (around 60°C) [10][11][12]. Despite the importance of thermophilic bacteria of the genus Thermoanaerobacter spp. in bioenergy production, the ethanol production abilities of these bacteria and the intracellular mechanisms that stimulate or inhibit ethanol yields were rarely investigated in full detail except for the microorganism T. ethanolicus JW200 [13,14]. However, it was illustrated that T. ethanolicus JW200 produced ethanol as the major product only at substrate concentrations below 1.0% [15], which prevents its universal industrialization. In contrast, Thermoanaerobacter sp. X514, which was discovered as a metalreducing thermophilic anaerobe [16] can ferment pentose and hexose to produce ethanol simultaneously. Research work on X514 is quite sparse with only a few references available as in [17][18][19]. In [19], 13 C labeled tracer experiments were applied to study the central metabolism of X514. These studies focused on obtaining only partial flux ratios for the central metabolic pathway of X514. A thorough metabolic flux analysis of X514 is, therefore, necessary so as to provide detailed information about the intracellular pathways of X514. The case where multiple substrates are utilized simultaneously by X514 needs addressing especially since the central metabolic flux directions and magnitudes delineate the intracellular reaction activities and convey fundamental information about the overall metabolic system. Metabolic Flux Analysis (MFA) has long been viewed as a common approach in metabolic systems analysis of microorganisms [20][21][22]. Software for MFA, for example, MetaFluxNet [23] and FiatFlux [24], are also freely available to researchers. However, standard MFA approaches, which act by formulating a user-defined cost function based upon a few constraints, are often questioned due to their prior assumption about the optimization target of the metabolic system of any microorganism. In this paper, without any specified cost function, one special form of Markov Chain Monte Carlo (MCMC) algorithm, Gibbs sampling [25], is applied to the metabolic flux quantification of X514. Though MCMC has been applied in metabolic flux estimation based upon 13 C fractional enrichment data [26], none of similar approaches has been applied in MFA without extra carbon labeling constraints. In this paper, without resorting to any optimization assumption about the metabolic system of X514 and by utilizing prior knowledge of the metabolic network and measured data, the paper reveals that Gibbs sampling is able to provide intracellular flux distributions with least constraints about the system. The results show that the flux distributions of X514 exhibit Gaussian or truncated Gaussian formats under different substrate conditions and constraints, which proves that utilizing a single optimized estimation to represent the corresponding flux without considering its sample distributions might be misleading. The connection flux between glycolysis pool and pentose phosphate metabolite pool and the pyruvate-generating fluxes experience major change under different substrate conditions, pointing out future metabolic engineering direction of X514.
The paper is organized as follows: the experiment setup and measurement are described in Section 2. Section 3 presents the Gibbs sampling algorithm. Section 4 illustrates the outcomes of the presented algorithm. Section 5 provides discussions and future prospects of the work.

Strains and media
X514 (ATCC BAA-938) was received from Institute for Environmental Genomics, University of Oklahoma, United States. Central metabolic map of X514 is shown in Figure 1. X514 was incubated under a N2 (99.99%) headspace in a flask in the mineral medium which is a modified version of the one used for Thermoanaerobacter pseudoethanolicus ATCC 33223 by Wiegel [27] at 60°C. The mineral medium contains (per litre of distilled water) 4.2 g Na 2 HPO 4 •12H 2 O, 1.5 g KH 2 PO 4 , 1.0 g NH 4 Cl, 0.2 g MgCl 2 •6H 2 O, 10.0 ml Wolfe's mineral solution [28] and 1.0 ml 0.1% (m/v) resazurin. The mineral medium was autoclaved for 20 minutes at 115°C. Reducing agents which were composed of 5 g/l Na 2 S 2 O 4 and 3 g/l NaHCO 3 were injected into the medium by a syringe after the basal medium was sterilized and cooled to room temperature. In sole substrate condition, 2.0 g/l of glucose or xylose was injected into the mineral medium, whereas in the multiple substrate condition, 1.0 g/l glucose and 1.0 g/l xylose were injected into the mineral medium. All chemicals were purchased from Amresoco (USA), USB (USA), Sigma (USA) or Sinopharm (China) with high purity [29].

Analytical methods
X514 was sampled firstly at 12 hours, then the samples were taken every 8 hours until 68 hours and the last sample was taken at 84 hours. The Optical Density (OD) was measured by Cary 50 spectrophotometer (Varian, USA) at the wavelength of 600 nm and calibrated to Cell Dry Weight (CDW). The relationship between OD and CDW is illustrated in Figure 2 (plot F). Concentrations of the substrates, glucose and xylose, and the two main products, acetate and lactate, were measured by ionic chromatography ics-3000 (Dionex, USA). Concentrations of ethanol were measured by Megazyme Enzyme Kit (Megazyme, Ireland). All experiments were performed in triplicate cultures.

Extracellular flux quantification
Assuming that the cell has exponential growth with a constant growth rate µ, according to the mass balance condition, the cell mass X , the substrate concentration s C and the product concentration p C at time t satisfy the following equations [30]: where s v and p v are substrate consumption rate and product formation rate, respectively. From equation (1), it can then be derived that: where (0) X is initial cell mass. Replacing ( ) X t in equation (1)

Gibbs sampling algorithm for metabolic flux distribution analysis
Based on the assumption that the metabolic system is on a quasisteady state, the following relationship holds [31] = Sv d vector whose elements are the flux quantities when the corresponding rows in S involve extracellular fluxes, or zero otherwise. m is the number of metabolites in the metabolic system studied and n is the number of fluxes involved in these reactions between different metabolites.
Considering the noise associated with the measured quantities and the possibility of unknown pathways existing in the studied metabolic system, equation (3) can be modified to include uncertain terms as, + = Sv w d (4) where the noise w is assumed to be Gaussian distribution with mean 0 and variance × Σ ∈ m m R , i.e. w~N(0, Σ). According to Bayes theorem, and the likelihood distribution ( | ) p d v can be derived from equation (4): The prior distribution of v , ( ) prior v , involves the upper and lower constraints of these quantities and expert knowledge about specified fluxes, for example, the bidirectionality or unidirectionality of a reaction, which can be formulated by the expression ≥ Uv f . Here × ∈ l n U R is a matrix specified values of 1 or -1 when the corresponding flux has lower or upper constraints, and 0, otherwise. l is the number of constraints and f is the vector containing all the corresponding constraints.
( ) prior v can then be expressed as, where ( ) Φ ⋅ is the Heaviside function taking the value of 1 when ≥

Uv f
, and vanishing otherwise. Incorporating prior distribution in equation (7) and the likelihood distribution in equation (6) into equation (5), the posterior distribution of can then be formulated as, Due to the fact that it is difficult to sample from ( ) v π directly, Gibbs sampling is adopted here to generate statistical samples from to the last item of the vector v, respectively, so that, can then be derived: where i S (S v -S v -d)  Table 1, where the maximum iteration is set as 10,000 by trials.
In order to test the convergence of the generated Markov Chains, three chains are generated by Gibbs sampling algorithm with random initial fluxes, simultaneously. Convergence of the chains with the burnin period removed is examined by comparing the variances between chains, b φ , and the variances within each chain, w φ [33]. Assuming that the variance for the i th chain with j samples is ij φ , then the average of the variances within each chain i φ and the average of the variances between all the chains φ are, where s and t are the length of each chain and the number of chains, respectively. Given As the simulation converges, Q should be decreasing and be close to 1. In the paper, an empirical threshold of 1.5 is taken, as a measure of the convergence. So that when the estimated Q is lower than 1.5, it is deemed that the chains have converged.

Results
Measured concentrations of the substrates and the products for sole glucose condition, sole xylose condition and mixed glucose and xylose condition are shown in Figure 2. By exponential fitting of the growth curve of X514, and linear regression between the concentrations of substrates and CDW, as well as the concentrations of products and CDW, extracellular fluxes are estimated and shown in Table 2. It is   then clear that the consumption rates for glucose and xylose under sole substrate condition are both much faster than their rates in the mixed substrate condition, whereas the generation rates of the three major products are pretty much the same in all the three cases, and the productions rates of lactate are always faster than that of acetate.
Gibbs sampling is applied to estimate the flux distributions of X514. The source code is implemented in Matlab (MathWorks). Considering the measurement noise nature and existence of unknown pathways in the metabolic system, the system variance matrix Σ is assumed to be a diagonal matrix with unique value 1 on the diagonal. The selection of a diagonal matrix as the variance matrix indicates that interactive disturbances between metabolites are ignored and each metabolite is freely active in the reaction pool. And the specified value for system variance represents the magnitude of the disturbance of unknown metabolic pathways to metabolic reactions. All fluxes are assumed unidirectional except the fluxes in the TCA cycle, pentose phosphate metabolite pool, and the flux G6P→F6P. G6P→6PG and 6PG→P5P+CO 2 are assumed to be unidirectional under all cases, thanks to the fact that NADPH, most of which has to be generated from the reaction, is required during the whole cell life cycle. The flux G6P→F6P is assumed in reversed direction under sole xylose substrate condition, due to the requirement of NADPH generated in its neighbor reaction G6P→6PG. Under mixed substrates condition, the same reaction is assumed flowing from G6P→F6P, based upon the fact that glucose consumption is normally in priority position in mixed glucose and xylose substrates condition. Maximum constraints for the fluxes are set as 50 or 0 and the lower constraints for the fluxes are set as 0 or -50 depending on their unidirectionality or bidirectionality characteristics. From paper [19], it is proved that under glucose substrate condition, the flux split ratio between G6P→6PG is less than 3%, the flux for this reaction is then set as 3% of the original glucose uptake rate under sole glucose substrate condition.
With available extracellular flux data shown in Table 2, distributions of the intracellular fluxes and extracellular fluxes are estimated and illustrated in Figure 3 and 4, respectively. Convergence of Gibbs sampling algorithm is tested by running three chains simultaneously with random starting values. The parameter Q (Materials and Methods) is calculated and compared to prespecified threshold, 1.5, during the simulation to ensure convergences of these chains. Mean of the samples of flux v 2 of the three chains in 10,000 runs under glucose substrate condition is taken as an example and illustrated in Figure 5. It is clear that the three chains have eventually converged to the same value.
In order to justify the intracellular flux estimation results obtained by using Gibbs sampling approach, common Flux Balance Analysis (FBA) method is applied to quantify the intracellular fluxes of X514 with the same constraints and the objective of maximizing biomass synthesis. The Matlab program linprog is used to get the optimization results. Comparisons of the estimation results from FBA and the mean estimation results from Gibbs sampling of intracellular fluxes under different substrate conditions are shown in Figure 6. It is obvious that the two results match closely. However, FBA method alone has to rely on predefined objective function and can only provide single estimations of the intracellular fluxes, whereas Gibbs sampling approach is able to provide the distributions of all intracellular fluxes without accounting on any assumptions of system preferences.
From the Gibbs sampling results, it is clear that all the fluxes exhibit Gaussian or truncated Gaussian distribution formats, where the truncations obviously originate from the prior constraints information imposed on these fluxes. Thorough comparison of the flux distributions and their statistical information under various substrate conditions indicate that the flux G6P→F6P experiences critical changes under these conditions. Mean of the reaction flux reaches its maximum under glucose substrate condition, whereas under mixed substrate condition, its mean value is in-between that of the other two conditions, indicating the variation of the active degree of the glycolysis pool under different substrate conditions. The mean values of the reactions G6P→6PG and 6PG→P5P+CO 2 , which are the major link reactions between the glycolysis pool and pentose phosphate pathway, increase gradually from the glucose substrate condition to xylose substrate condition and mixed substrates condition, implying the requirement for NADPH becomes much more intense along with the import of xylose into the metabolic system. Generation of pyruvate from glycolysis pathways is at its maximum when glucose is adopted as the sole substrate, whereas this value is gradually decreasing under the mixed substrate condition and sole xylose substrate condition, implying the system's priority on glucose consumption. The pentose phosphate metabolite pool is much more active when xylose is used as the sole substrate. The flux leading to TCA cycle, i.e. PYR→MAL, is quite small under all the three conditions. The reactions in the TCA cycle are much stable and the changes are negligible compared to these reactions in the other two metabolic pools. With the aim of maximizing ethanol yield, sole glucose substrate seems still the best candidate for bioethanol generation compared to sole xylose substrate condition and mixed glucose and xylose substrates condition ( Table 3).
The importance of prior constraints on metabolic flux distributions is illustrated in Figure 7, where the posterior flux distribution of the flux G6P→6PG under sole glucose substrate condition with or without the unidirectional prior constraint on it is compared. It is clear that the prior constraint information significantly affect the mean and the variance of the flux. Compared to standard MFA which can only provide an optimized estimation of intracellular metabolic fluxes based upon a user-defined objective function, MFA by means of Gibbs sampling can incorporate the prior constraints and generate the posterior statistical formats of these fluxes, which are more comprehensive and straightforward.

Discussion
Simple estimation of intracellular fluxes by traditional metabolic flux analysis approaches is often insufficient to provide critical information about a metabolic system. Contrary to standard MFA, which has to rely on system constraints and user-defined cost function to obtain an optimum solution of intracellular fluxes, in this paper, Gibbs sampling is applied to metabolic flux quantification. Without resorting to any objective function, the distributions of the intracellular fluxes are generated by means of deliberately constructed Markov Chains. Not only do the distributions provide means and variances of intracellular fluxes, but they also illustrate statistical formats of the estimations. The Gaussian or truncated Gaussian formats of these intracellular flux distributions address the importance of taking sample distributions into account during metabolic system analysis, as is the missing part in standard MFA. Analysis of the intracellular flux distributions of X514 illustrate that the connection flux between glycolysis pool and pentose phosphate metabolite pool as well as the pyruvate-generating fluxes are critical fluxes experiencing major changes during ethanol fermentation under different substrate condition. However, similar information will be difficult to obtain from standard MFA. Future metabolic engineering work in X514 will need to focus on these pathways.
However, one has to note that the distribution formats of Gibbs sampling are sensitive to the constraints of enzyme activities. indicating the importance of prior biological knowledge in metabolic flux analysis. Therefore, the future work will have to be twofold, one is to apply Gibbs     sampling algorithm together with available expert experiences about enzyme constraints to analyze metabolic fluxes and then use the results to perform metabolic engineering work; another one is to use the metabolic engineering feedback information to provide more accurate enzyme constraints and apply these constraints together with Gibbs sampling to further understand the studied system.