Chemical Master Equation Empirical Moment Closure

The numerical solution of the Chemical Master Equation (CME) governing gene regulatory networks and cell signaling processes remains a challenging task due to its complexity, exponentially growing with the number of species involved. When considering separated representations of the probability distribution function within the Proper Generalized Decomposition-PGD-frame-work the complexity of the CME grows only linearly with the number of state space dimensions. In order to speed up calculations moment-based descriptions are usually preferred, however these descriptions involve the necessity of using closure relations whose impact on the calculated solution is most of time unpredictable. In this work we propose an empirical closure, fitted from the solution of the chemical master equation, the last solved within the PGD framework. *Corresponding author: Francisco Chinesta, GeM, Ecole Centrale de Nantes, 1 rue de la Noe, BP 92101, F-44321 Nantes cedex 3, France, Tel: 33670799072; E-mail: Francisco.Chinesta@ec-nantes.fr Received September 18, 2015; Accepted January 13, 2016; Published January 21, 2016 Citation: Ammar A, Magnin M, Roux O, Cueto E, Chinesta F (2016) Chemical Master Equation Empirical Moment Closure. Biol Syst Open Access 5: 155. doi:10.4172/2329-6577.1000155 Copyright: © 2016 Ammar A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. to the high-dimensional character of the CME as was successfully proved in our former works discussed later. However, and despite the fact of being able to solve the CME, its solution requires a significant amount of computation, and thus, the simulation of a variety of scenarios remains a challenging issue because its computational complexity. For alleviating such a computational complexity an appealing route consists of calculating the moments of the probability distribution function instead of the pdf itself. Moments constitute a valuable description of great interest in many practical applications and then moment-based descriptions represent an appealing alternative to pdf-based descriptions. However, as discussed later, when deriving the equations that govern the time evolution of the pdf-moments, usually the one related to a moment of a certain order depends on higher-order moments and so-on. In order to close the model at a certain order, we must approximate higher order moments as a function of the ones lower or equal to the one considered. Such an approached constitutes the closure-based description. As discussed later different closures has been proposed however, no closure relation is general enough to represent any possible scenario with the required accuracy. In this paper we propose using the expensive but very accurate CME solution efficiently obtained by invoking the PGD technology for fitting empirical polynomial closures. These closures are then used for obtaining moment-based solutions in an efficient way, because the integration of the evolution equations governing the time evolution of these moments can be performed almost in real-time, for scenarios that slightly differ from the ones that served to construct the empirical closure relation. In any case it is important to note that the validity and accuracy of the computed closure-based solutions is never assured but in many

to the high-dimensional character of the CME as was successfully proved in our former works discussed later. However, and despite the fact of being able to solve the CME, its solution requires a significant amount of computation, and thus, the simulation of a variety of scenarios remains a challenging issue because its computational complexity. For alleviating such a computational complexity an appealing route consists of calculating the moments of the probability distribution function instead of the pdf itself. Moments constitute a valuable description of great interest in many practical applications and then moment-based descriptions represent an appealing alternative to pdf-based descriptions. However, as discussed later, when deriving the equations that govern the time evolution of the pdf-moments, usually the one related to a moment of a certain order depends on higher-order moments and so-on. In order to close the model at a certain order, we must approximate higher order moments as a function of the ones lower or equal to the one considered. Such an approached constitutes the closure-based description.
As discussed later different closures has been proposed however, no closure relation is general enough to represent any possible scenario with the required accuracy. In this paper we propose using the expensive but very accurate CME solution efficiently obtained by invoking the PGD technology for fitting empirical polynomial closures. These closures are then used for obtaining moment-based solutions in an efficient way, because the integration of the evolution equations governing the time evolution of these moments can be performed almost in real-time, for scenarios that slightly differ from the ones that served to construct the empirical closure relation.

Introduction
Simulating the behavior of gene regulatory networks is a formidable task for several reasons. At this level of description, only a few molecules (maybe dozens or hundreds) of each species involved in the regulation process is present, and this fact limits the possibility of considering the process as deterministic, as is done very often in most chemical applications. Here, the concept of con-centration of the species does not make sense [1,2]. On the contrary, under some weak hypotheses the system can be considered as Markovian, and can be consequently modeled by the so-called Chemical Master Equation (CME) [3], which is in fact no more than a set of ordinary differential equations stating the conservation of the probability distribution function -pdf -P in time: Where p(z,t | z 0 , t 0 ) represents the probability of being at a state in which there are a number of molecules of each species stored in the vector z at time t when we started from a state z 0 at time t 0 . a j represents the propensity (i.e., the probability) of reaction j to occur, while v j represents the change in the number of molecules of each species if reaction j takes place. This change is given, of course, by the stoichiometry of the reaction at hand.
What is challenging, however, in this set of equations is that they are de-fined in a state space which possesses as many dimensions as the number of different species involved in the regulatory network. Under this framework, if we consider N different species, present at a number n of copies, the number of different possible states of the system is n N . This number can take the astronomical value of 10 6000 if we consider some types of proteins, for instance [3]. This phenomenon is known as the curse of dimensionality in many branches of science.
However, Monte Carlo techniques need for as many as possible individual realizations of the problem, leading to excessive time consuming simulations, together with great variance in the results.
Separated representations involved in the Proper Generalized Decomposition described below allow circumventing the issues related cases is a valuable tool for pre-analysis. As soon as the main tendencies are obtained using the fast closure-based simulations, by performing a lot of simulations, all them very fast, finer analysis can be performed by solving the CME in a few selected scenarios of special interest extracted from the previous fast closure-based simulations.

Proper generalized decomposition for alleviating the curse of dimensionality
Dealing with the problem of the curse of dimensionality in a very different context, the authors presented in a previous work a technique that is now known under the name of Proper Generalized Decomposition (PGD) based on the use of separated representations [6,7]. Essentially, to avoid the exponentially growing complexity of the problem with the number of state space dimensions, the method approximates the variable of interest, say u, as a finite sum of separable functions: The reason for this particular choice motivated the method itself that is conceived as a greedy algorithm that computes one sum at a time and one product at a time, within a fixed-point, alternating directions algorithm. This leads to a sequence of one-dimensional (lowdimensional, in general) problems, one for each function F i j that can be solved using your favorite technique (finite elements, finite volumes, finite differences, colocation, ...).
If M nodes are used to discretize each coordinate, the total number of PGD unknowns is N×M×D instead of the M D degrees of freedom involved in standard mesh-based discretization's. Moreover, all numerical experiments carried out to date with the PGD show that the number of terms N required to obtain an accurate solution is not a function of the problem dimension D, but it rather depends on the regularity and separability of the exact solution as well as on the solution constructor itself. The PGD thus avoids the exponential complexity with respect to the problem dimension.

A PGD approach to gene regulatory networks simulation
The PGD approach to the problem of effciently simulating gene regulatory networks begins by assuming that the probability of being at a particular state z at time t can be approximated as a finite sum of separable functions, i.e., Where, as mentioned before, the variables zi represent the number of molecules of species i present at a given time instant. This particular choice of the form of the basis functions allows for an important reduction in the number of degrees of freedom of the problem, N × nnod × (D + 1) instead of (n nod ) D , where D is the number of dimensions of the state space and nnod the number of degrees of freedom of each one-dimensional grid established for each spatial dimension. For this to be useful, one has to assume that the probability is negligible outside some interval, and therefore substitute the infinite domain by a subdomain [0,….,m-1] D , m being the chosen limit number of molecules for any species in the simulation. A similar assumption is behind other methods in the literature, such as the Finite State Projection algorithm, for instance [3].
Another important point to be highlighted is the presence of a function depending solely on time, F t j (t). This means that the algorithm is not incremental. Instead, it solves for the whole time history of the chemical species at each iteration of the method. If one then assumes that n terms of the sum given by equation (3) are already known, and look for the n + 1-th term, by substituting Equation (4) into the CME, Equation (1) gives a non-linear problem in 1 + that is solved by means of a fixed-point, alternating directions algorithm [8,9].
The separated representation just considered does not involve any assumption. Any solution defined in a high-dimensional space can be written, if it is regular enough, in a separated form if the number of terms in the finite sum decomposition is high enough. A polynomial of any order is no more than a sum of functional products, each depending on a different coordinate. Thus, solutions can be approximated with the desired accuracy by using a separated representation and an adequate constructor as the one described above.

Moments-based descriptions
Even if the use of separated representations allows circumventing the curse of dimensionality the computational cost remains considerable. This fact motivated in many other disciplines the replacement of the pdf by some of its moments [10][11][12][13], since many times the last suffice for having a view rich enough on the dynamics of the systems. The use of moment-based description was of major interest in different areas of statistical mechanics and it is being the more and more considered as an alternative to the discretization of the CME.
A moment represents the expected value of a random variable, z, raised to a certain power. An "expectation" is a specifically defined function in statistics, when in continuous spaces or Σf(z)P(z) in discrete spaces. In general, we can talk about the i th moment as: A probability distribution is uniquely defined by its full set of moments. Having access to these moments could eliminate the need to solve for the full distribution, depending on what information would be considered important. A special function, called the Moment Generating Function, is specifically in-tended for this purpose: we can see the moments emerging from this function, the i-th moment associated with the i-th power of θ: The following equations will be used extensively in the following derivation, so it will be useful to define them now:

From the Chemical Master Equation to moments based descriptions
Since we will be uniquely considering the structure of the Chemical Master Equation, we would like to derive a general version of the Moment Generating Function which can be used for any system. The CME for l reactions with stoichiometric change vl is: As we will see later on, the kind of rate laws associated with the system dramatically impact the complexity of the overall problem. We begin with the simplest case of kinetic mass action laws, following the derivation from Gillespie [10]. However, we would eventually like to take rational rate laws, such as Hill functions, as is seen in Milner et al. [14]. An example of a mass action rate law is where the law can be rewritten as a sum of coefficients c l,i and variables a l,i . This expanded, polynomial form will be exploited in our derivation.
Since we would like to talk about moments of the CME rather than probabilities, our first priority is to write this equation in terms of M, rather than in terms of P. We multiply both sides by e θz and sum over all possible values of z: that taking into account the previous definitions results Now, we can take the second definition of e θ into its Taylor series. Notice that the summation now begins at j=i. When j < i, the index will be out of bounds and not correspond to any physical state Remember that the initial goal was to isolate the coefficients of θ n in order to obtain the nth moments: Our next step will be isolate just the coefficients of θ in order to achieve a form in which we are creating ODE's of µ rather than M (θ,t). Since

Closures
It is easy to note from the previous expression [10] that the equation that governs the time evolution of the moments up to a certain order implies, in general, higher order moments, and then, before solving all them, higher order moments must be written as a combination of those involved in the considered time evolution equations. These relations have in most of cases an approximate character and are known as closure relations [12,15]. Before describing the technique that we are following for defining such closures for a given system, we introduce some notation.
When considering multicomponent systems involving D components, the state becomes a vector Z T = (Z 1 ,Z 2 ,……,Z D ) as previously discussed. Now the first moment also becomes a vector µ 1 of size D, defined by The second order moment µ 2 results a D×D matrix In what follows and without loss of generality, we consider reactions involving linear propensities. Thus, when considering the equations governing the time evolution of the first two moments µ 1 (t) and µ 2 (t) the third order moment µ 3 (t) remains in these equations, and it need to be expressed from both lower moments.
The simplest closure writes: S µ µ µ µ µ µ = ⊗ + ⊗ Thus, the third order closure relation [16] involves 6 coefficients to be determined. For this purpose, and for a given system, we solve the CME by using the PGD in order to circumvent the curse of dimensionality and then evaluate the third order moment according to Equation (11) and then we choose the alpha parameters in Hegland et al. [16] to provide the best fitting (in a least squares sense).
As soon as the alpha parameters are empirically fitted, the CME is substituted by the two ordinary differential equations governing the time evolution of µ 1 (t) and µ 2 (t), when considering the solution of the same systems for any other initial condition or slightly different kinetic rates. First, we solve the CME with the Case 1 conditions. The probability distribution function at 6 different times is depicted in Figure 1. Now from the pdf T P (z,t), z (#X, # ) Y = and t = [0,10], we compute the three first moments µ 1 (t), µ 2 (t), and µ 3 (t), respectively from Equations (9-11) that will be considered as reference moment solutions. The parameters alpha involved in the empirical closure relation [16] are then determined. In the present case, and taking into account the symmetries, µ 1 , is of size two, µ 2 , has three independent components and µ 3 , four. Figure 2 shows the different independent components of µ 1 , and µ 2 . Figure 3 compares the reference third moment with the one    fitted with the empirical closure relation, from which we can conclude that both are in perfect agreement. Now, by integrating in time the equations governing the time evolution of the components of µ 1 , and µ 2 , using the just identified empirical closure relation, we obtained the curves depicted in Figure 4 that are very close to those obtained from the probability distribution function that were depicted in Figure 2. Now, with the closure relation obtained from the analysis of Case 1 we are addressing Case 2 without modifying the closure relation. For that purpose the equations governing the time evolution of the different components of µ 1 , and µ 2 are integrated in time by considering the closure relation fitted in Case 1. In order to check the accuracy of those solutions we solve again the CME and compute the reference moments from the resulting probability distribution function. Figure 5 compares the moment-based and the pdf-based moments. Even if non negligeable deviations in the second order moment are noticed at the final time, results are qualitatively quite good.

Exclusive switch 5D model
We consider a gene regulatory network called exclusive switch. It describes the dynamics of two genes with an overlapping promoter region, and the corresponding proteins X 1 and X 2 . Both X 1 and X 2 are produced if no transcription factor is bound to the promoter region.
However if a molecule of type X 1 (X 2 ) is bound to the promotor then it inhibits the expression of the other protein i.e., molecules of type X 2 (X 1 ) cannot be produced. Only one molecule can be bound to the promotor region at a time which gives three possibilities for the state of the promoter region (free, X 1 bound, X 2 bound). The model is infinite in two dimensions (X 1 and X 2 ) and the reactions are given by: where (λ 1 ,λ 2 ,…..λ 10 ) = (2,5,0.005,0.005,0.005,0.002,0.02,0.02,2,5). The initial conditions are such that only one DNA molecule is present in the system while the molecular counts for the rest of the species are First, we solve the CME. The probability distribution function at 6 different times is depicted in Figure 6. Now from the pdf P(z,t) = (#X 1 ,….#X 2 ) and t = [0,60], we compute the three first moments µ 1 (t), µ 2 (t), and µ 3 (t), respectively from Equations (9-11) that will be considered as reference moment solutions. The parameters alpha involved in the empirical closure relation [16] are     Figure 7 shows the different independent components of µ 1 and µ 2 . Figure 8  = (50,100,0,0,1) that produces at time t=10 the pdf depicted in Figure  9. Now, the equations governing the time evolution of the different components of µ 1 and µ 2 are integrated in time by considering the closure relation just identified fitted. In order to check the accuracy of those solutions we solve again the CME and compute the reference moments from the resulting probability distribution function. Figure  10 compares the moment-based and the pdf-based moments, proving that the moment approach based on the use of an empirical closure produces a quite reasonable agreement.

The toggle
Mutually repressing gene pair, or gene toggle, can be found in biological systems as discussed in Hegland et al. [16]. As in their example, here we focus on protein dynamics, and more particularly in the toggle-switch network. The reactions consist of: (#X , #X ) =(90,50) as well as the parameters λ 1 =1, λ 2 =5, λ 3 =1 and λ 4 =10, and solve the associated chemical master equation for calculating the joint probability distribution function P (z,t). Now from the pdf P(z,t), z T = (#X 1 , #X 2 ) and t = [0,1], we compute the three first moments µ 1 (t), µ 2 (t), and µ 3 (t) respectively from Equations (9-11) that will be considered as reference moment solutions. The parameters alpha involved in the empirical closure relation [16] are then determined. Figures 11-13 compare respectively the different independent components of µ 1 , µ 2 , and µ 3 calculated from chemical master equation solution and    from the closure-based description. A very good agreement can be noticed.
With the closure relation just identified we are addressing a new scenario consisting of a different initial condition solve again the CME and compute the reference moments from the resulting probability distribution function. Figures 14-16 compare respectively the different independent components of of µ 1 , µ 2 , and µ 3 calculated from chemical master equation solution and from the closure-based description. Again a very good agreement can be noticed.

Conclusions
In this work we revisited the modeling of regulatory networks   Deterministic solutions of the CME were performed by using the separated representation involved in the PGD, allowing circumventing the so-called curse of dimensionality. In order to improve the computational efficiency a moment-based description is derived, however such description involves higher order moments that must be described from the ones of lower order. In that sense we proposed a simplest closure relation, of empirical nature, that can be fitted numerically from the probability distribution function, the last coming from the PGD solution of the CME. As soon as the closure relation is fitted, it can be used for solving similar problems to the one that served to identify the closure relation.