Parameter Estimation for Stochastic Models of Biochemical Reactions 
BioQuant, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
Christoph Zimmer^{*} and
BioQuant, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
Sven Sahle 

Corresponding Author : 
Christoph Zimmer
BioQuant, Im Neuenheimer Feld 267
69120 Heidelberg, Germany
Email:christoph.zimmer@bioquant.uniheidelberg.de 

Received October 08, 2012; Accepted November 07, 2012; Published November
10, 2012 

Citation: Zimmer C, Sahle S (2012) Parameter Estimation for Stochastic Models
of Biochemical Reactions. J Comput Sci Syst Biol 6: 011021.
doi: 10.4172/jcsb.1000095


Copyright: © 2012 Zimmer C, et al. This is an openaccess article distributed under
the terms of the Creative Commons Attribution License,which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and
source are credited. 

Related article at
Pubmed Scholar Google 


Abstract 

Parameter estimation is very important for the analysis of models in Systems Biology. Stochastic models are of increasing importance. However parameter estimation of stochastic models is still in the early phase of development and there is need for efficient methods to estimate model parameters from time course data which is intrinsically stochastic, only partially observed and has measurement noise.
In this article a fast and efficient method that is well established in the field of parameter estimation for systems of ordinary differential equations (ODE) is adapted to stochastic models. The focus is on the objective function which is shown to have advantageous properties that make it directly applicable to problems in systems biology. The proposed method can deal with stochastic systems where the behaviour qualitatively differs from the corresponding deterministic description. It works with measurements from a single realization of the stochastic process, and with partially observed processes including measurement errors. The objective function is deterministic, therefore a wide range of optimization methods, from derivative based methods to global optimization to Bayesian techniques can be applied. The computational effort required is comparable to similar methods for parameter estimation in deterministic models. To construct the objective function a multiple shooting procedure is used in which the continuity constraints are relaxed to allow for stochasticity. Unobserved states are treated by enlarging the optimization vector and using resulting values from the forward integration. Test functions are suggested that allow to monitor the validity of the approximations involved in this approach. The quality of the method is evaluated for some example models with a statistic of 50 estimates from 50 stochastic realizations. It is shown that the method performs well compared to established approaches.


Keywords 

Parameter estimation; Calcium oscillation model 

Introduction 

Parameter estimation is very important for the analysis of models
in systems biology. Computational modeling is a central approach
in systems biology, for studying increasingly complex biochemical
systems. Progress in experimental techniques, e.g. the possibility to
measure small numbers of molecules in single cells [1], highlights
the need for stochastic modeling approaches. Simulation methods
for stochastic processes are being developed for decades since [2],
and nowadays exist with a lot of variants [3]. Parameter estimation
methods for stochastic models, however, are still in the early phase of
development. 

We have identified a number of desirable properties that a
parameter estimation procedure should have, in order to be useful in
actual biological applications: It should work with measurements from
few or even a single realization of the physical process, i.e. it should
not require that enough measurements are made to reconstruct reliable
probability distributions, or other statistical measures for all time
points. It should work in cases where only partial measurements of
the systems’ state are possible, obviously also including measurement
errors. It should not assume that the stochastic system approaches
the large volume limit, i.e. the approach should also be valid, if the
stochastic system exhibits a behavior that is qualitatively different from
what would be covered by a deterministic model. On a more technical
side, it is beneficial if the actual target function is deterministic (and
differentiable), and if it can be computed efficiently. This means that
a wide range of numerical optimization techniques can be employed,
including gradient based local methods, numerous global optimization
schemes, or Bayesian approaches. 

The problem for parameter estimation for stochastic models can
be written in the following form: We assume that the experimental or
simulated data measures at each time point t_{i}, i=1,…,n, the number of molecules of species v_{i} in the system, hence . Each data set
v=(v_{1},…v_{n}) is stochastic. This means that even two simulated realization
of the system without measurement noise can be different, see figure
1. Figure 1 does also indicate that the mean of stochastic simulations
is not close to the ODE solution. Furthermore, it is assumed that the
systems behavior depends on some unknown parameters which
are to be estimated. To this goal, the data v is compared to simulation
with respect to some objective function F, which measures the quality
of the fit. Parameter estimation means finding the optimal value for the parameter. For models of ODE, the choice of the objective function
has been widely discussed [4], and in case of normally distributed
measurement error, the common choice is a least squares function.
The focus of this article will be the objective function F, for stochastic
models. 

Approaches exist for time series data using stochastic simulations.
Due to the Markov property of the time series, the likelihood function
factorizes into the product of transition probabilities. These transition
probabilities are generally unknown in stochastic modeling. They can
be estimated using stochastic simulations. This can be done with density
estimation methods [5,6]. Another approach is the use of a reversible
jump algorithm [7]. The parameter estimation is then performed with
Bayesian methods. An alternative to that is the use of a stochastic
gradient descent [8], and the use of a reversible jump Markov chain,
Monte Carlo method, for the estimation of the transition probabilities.
Using a surrogate probabilistic model as an approximation, is faster
from a computational point of view [9]. Another approximation is
suggested in form of an approximate maximum likelihood method
[10], where also a singular value decomposition likelihood method is
described. A use of approximate Bayesian methods is suggested [11]. 

A second class of methods focuses on a numerical solution of the
Chemical Master Equation (CME), which describes the probability for
each state, as a function of the time. These systems are generally high
dimensional. To address this problem, a state space truncation can be
used [12], or momentclosure methods, which are an approximation
focusing on a finite number of moments of the probability distribution
[13,14]. Parameter estimation with the CME and an approximation
for the likelihood function for small systems, or a hybrid method for
large systems is suggested in [15]. Deuflhard et al. [16] and Engblom
[17] use an adaptive Galerkin method for the solution of the CME. If
distribution information is available from measurement, a finite state
projection [18], can be used to solve the CME without simulations.
The common challenge is the fact that the solution of the CME, as well
as simulationbased methods become very timeconsuming, as the
number of states in the state space becomes larger. 

Moment matching methods use repeated measurements to estimate
moments of the distribution for the parameter estimation [19,20].
Further approaches can be taken from stochastic epidemic models, and
are based on expectation maximization algorithms or Martingale based
approaches [21]. In the case of large numbers of molecules, there are
theoretical results available, describing the properties of a leastsquares
estimate [22]. 

This article suggests a new method for estimating parameters of
stochastic models. The method is able to cope with data from a single
stochastic trajectory, which only measures some of the species in the
system at discrete time points. The behavior of this single trajectory
might be well different from the mean or an ODE solution (Figure 1).
The approach is based on a method proposed by Bock [23], for the
parameter estimation in systems of Ordinary Differential Equations
(ODE), further developed [24], and already successfully applied to
deterministic systems with chaotic behavior [25,26]. The method can
tackle models with fully observed and partially observed data sets. The
fact that it works without stochastic simulations and without solving a
high dimensional CME means that it is possible to approach systems of
a size, as large as realistic models being tackled with ODEs. As the the
evaluation of the objective function and the numerical optimization
procedure can be separated, the article focuses on the objective
function. For the numerical optimization of this deterministic function, all methods from derivative based methods to global optimization
techniques to Bayesian methods can be applied. These techniques then
also allow for important other features, such as confidence intervals
for the parameters or experimental design. The article restricts itself
to the test of the objective function, especially with respect to different
stochastic realizations. The objective function is based on short time
ODE integration and performs successfully, even in models which
behave qualitatively different modeled stochastically (see [27] for an
example of such a model). The advantage is very high speed, since
neither solving a high dimensional CME system nor lots of stochastic
simulations are required. 

As the objective function is deterministic, the estimator for a
single stochastic time course is unique. Using the same model with the
same parameters and initial conditions, the estimation will result in a
different value due to the intrinsic stochasticity. To test the quality of
the objective function, it is therefore, necessary to estimate more than
one stochastic realization of the same model, with the same parameters
and initial conditions. This article uses 50 different realizations for each
model. A statistic over the 50 estimations can then be used to asses the
quality of the objective function. It has to be underlined that the 50
realizations are only used to test the quality of the objective function.
For an estimation one realization is sufficient. 

The method is described with equidistant time points of
measurements here, for simplicity. It is possible to apply it without
changes to non equidistant time points of measurements, which is very
important for the applicability of optimum experimental design. Due
to its structure, the method is easily able to handle measurement noise
,although it will of course reduce information. As the method uses an
ODE approximation on short time intervals for stochastic data, test
functions will be suggested to check how well this approximation works,
and it will be demonstrated that this is not problematic, even in models
with irregular stochastic oscillations. Furthermore, it is demonstrated
that the method can even be successful in models which are structurally
not identifiable, using single shooting ODE methods. As the objective
function of the method is completely deterministic it depends on the
user’s choice, whether to apply derivative based methods or methods
without derivatives for the optimization. 

The article is structured as follows: 

The method section will show how the objective function is
composed. This objective function will be extended to partially
observed models. 

The results section will show the performance of the objective
function for different models: first an ImmigrationDeath model,
which is on the one hand an instructive example, but allows on the
other hand to compare the performance of the method to an exact
analytical solution. Second is a LotkaVolterra model also treated by
Boys et al. [7], followed by a Calcium oscillation model [27]. To evaluate
the method, it is applied to 50 different data sets of each model and
statistics of the 50 resulting estimates are presented. The conclusion
section will give some concluding remarks and an outlook. 

Method 

Stochastic models describe the systems dynamics with a discrete
continuous time Markov jump process. The time dependent
probabilities of this process are described by the CME. Stochastic
modeling can lead to system’s behavior, which is qualitatively different
from the behavior of the ODEmodeled system. Qualitatively, different means that it is not just the behavior of the ODE system plus some
noise. This can be illustrated by the Calcium oscillation model of
Kummer et al. [27]. This suggests that the methods for parameter
estimation in ODE models might not be suited for stochastic models. 

It is assumed that the experimental or simulated data measures at
each time point t_{i}, i=1,…,n, the number of molecules of species v_{i} in
the system, hence . Furthermore, it is assumed that the systems
behavior depends on some unknown parameters which are to
be estimated. To this goal, the data v is compared to simulation, with
respect to some objective function F, which measures the quality of
the fit. Parameter estimation means finding the optimal value
for the
parameter: 



As mentioned in the introduction for ODE models, the choice
of the objective function has been widely discussed. The focus of this
article will be the objective function F for stochastic models. 

Due to the Markov property, the likelihood function factorizes into
the product of transition probabilities: 



The transition probability p_{θ} is generally not known, and might
either be estimated by means of simulations, or calculated by using
the solution of a high dimensional CME system, both of which is very
time consuming. The approach approximates the system on the short
time interval [t_{i−1},t_{i} ] with an ODE model. Short in this context means
relatively short with respect to the systems dynamics, as an experiment
should be designed in a way that it covers the systems dynamics. The
fact that the approximation is done only on a very short time interval is
crucial. An ODE approximation requires stronger assumptions than a
linear noise approximation, but as this approximation only has to hold
on a short time interval, it is much less restrictive than the usual linear
noise approximation [28]. Further advantages of the relatively short
integration interval have already been described in detail for parameter
estimation for ODEs [29]: a prior information on state variables from
measurement can be included in the multiple shooting scheme. This
damps influence of parameter guesses. Numerical stability is increased
as splitting the intervals reduces error propagation. 

Fully Observed Case 

The proposed approach is motivated by the methods for parameter
estimation in ODE systems introduced by Bock [23], and uses short
time ODE integration: starting at time point t_{i−1} at state v_{i−1}, the
initial value problem of the systems of ODEs is solved with initial
value, v_{i−1} at time t_{i−1} until time t_{i} . The result, , is
compared to the data point v_{i}, and the residuum is defined as 



which for the model leads to the description 



If the residuals ε are independent normally distributed random variables with mean zero and constant variance, the least squares
estimator 



with 

(1) 

is a Maximum Likelihood estimator. In general, the distribution
for a time point in a stochastic model is not known, and not necessarily
Gaussian. Hence, theoretically the properties of a Maximum Likelihood
estimator can not be guaranteed. In practice, it is possible to test if
the ε_{i} perform approximately like independent normally distributed
random variables, with mean zero. If this is the case, the estimator
might still be quite powerful. Later in this section, test functions will be
suggested which describe the properties of the residuals. The structure
of equation (1) allows handling normally distributed measurement
noise, as well without loosing the desired properties. 

As equation (1) corresponds to the multiexperiment setting,
described by Schlöder and Bock [24], it is possible to use the efficient
methods suggested there to solve the optimization problem. 

Partially Observed Models 

Assume that only the first d components of the D dimensional
vector of species ν can be observed. At time t_{0}, the unobserved states
are also used as optimization variables now. In fact,
the unobserved states are discrete numbers, but for the optimization
purpose, this condition is relaxed and they are optimized as real
numbers. For the solution of the initial value problem on [t_{i−1},t_{i} ]
instead of the full measurement v, use the observed states from the measurement and for the unobserved states, the result
of the initial value problem on the previous time interval, hence
It is possible to enlarge the optimization
vector even more, and to include also further unobserved states.
Formally, define an index set which contains all time points
at which the unobserved states will be included in the optimization
variable. Denote with the unobserved states
in the optimization vector, at time t_{j} and with , the
union of unobserved states at different time points. Now define the
completion of the observed measurement as with 



Then again, as in the fully observed case, a distance measure is used
to compare the result of the integration with the data point: 

(2) 

The case K={t_{0}} means that just the first unobserved state is
included in the optimization, and the case K={t_{0},t_{1},…,t_{n}} means that all
unobserved states are included in the optimization. The results section
will show examples for different K. 

Comments on the Approximation 

As the method uses an approximation, it is very important to
investigate if a model satisfies the approximation. Therefore, this
subsection suggests test functions to see how well the approximation
works. The mean of the residuals can be calculated. If the
model is well approximated, this should be small, in comparison with
the range of the residuals.
To see if the residuals are approximately normally distributed,
calculate the KullbackLeibler divergence of a density estimate from
the ∈_{i} and a centered normal distribution of variance restricted
to the support of the density estimate. The KullbackLeibler divergence
is then minimized over . If the KullbackLeibler divergence for
the optimal is small, this means that the ∈_{i} can be approximated
well by a normal distribution with constant variance. The point that the
variance is constant is important, as it is not possible to estimate the
variance from only one observation per time point.
Instead of calculating the correlation between the time points, it
will be calculated how long it takes until the residuals are uncorrelated.
Estimate the autocorrelation of the residuals: 



where k represents the time step and an estimate for the variance,
within the residuals ε. If ε is such that exists define the autocorrelation time as . If the total horizon
of measurements t_{n}t_{0} is only of the same size as the autocorrelation
time of the residuals act(∈) his indicates correlated residuals, which
indicate that the approximation might not be suitable. Furthermore,
it is an important question for the quality of the estimation how much
information can be found with the MSS method in the (intrinsic) noisy
system. A signal to noise ratio is defined componentwise as 



The higher the SNR value is the more information is contained in
the data. If the SNR is small (<1), many measurements are needed. 

Optimization 

In case of the fully observed system, the dimension of the
optimization problem is exactly the number of parameters, namely
q. In case of the partially observed system, the dimension of the
optimization problem is the number of parameters plus the number
of unobserved species times, the number of time points included in the
optimization, q+(D−(d+1))×length(K). 

As the objective function is completely deterministic–meaning, it is
calculated without the use of stochastic simulations–the optimization
procedure is the same, as in parameter estimation for ODE systems: it
is possible to apply derivative based methods [29], or global methods
[30], as well as including it in an approximate Bayesian framework, for
example [11]. 

Certainly the optimization of (2) with larger K can be much more
challenging than with K={0}, due to the increased dimensionality of
the optimization problem. However, the numerical optimization techniques, especially questions of local minima and identifiability
issues, as well as the question, whether derivative based or global
method are preferable shall not be the focus of this article. Nevertheless,
these points are of importance and suggested for further research. 

Results 

The result section will test the performance of the MSS objective
function, with respect to the stochastic realizations. To this aim, 50
stochastic realizations for each scenario are obtained from simulations,
using the Gillespie method [2], with the software COPASI [31]. It is
important to note that each of the stochastic realizations might look
differently. For each single time course, data set the MSS function
is optimized and a parameter estimate obtained. To investigate the
influence of the intrinsic stochasticity, the mean of the 50 estimates is
calculated to see whether the MSS functional is biased. The standard
deviation of the 50 estimates is calculated to see how much the estimates
are spreading around the mean, due to the stochasticity and the mean
relative error for simpler comparison. These values will all be shown in
result tables. 

Calculating confidence intervals for each single estimate is desirable
as well. In general, this is possible with the MSS function, applying
more advanced numerical optimization techniques, as mentioned in
the methods section. In connection with optimal design of experiment,
this is planned for further research. As the focus of the article is the
investigation of intrinsic stochasticity, data is assumed to be noise free.
The LotkaVolterra model shows that adding measurement noise leads
to the expected behavior–namely that the accuracy of the estimates
reduces with increasing level of noise. Very important for practical
applicability are partially observed models, which are treated in detail
for the models. 

ImmigrationDeath Model 

The first example is an ImmigrationDeath model: 



where X is the substance and θ_{1},θ_{2} are parameters with a representation
in ODEs 



The investigation of this system is instructive, as it allows a
comparison between the performance of the MSS functional (1), and
the result with the exact transition probability, which can only be
calculated exactly in very simple cases. 

The exact transition probability from a state v_{i1} at time point t_{i1} to
a state v_{i} at time point t_{i} can be calculated as a solution of the CME, or
with the probability generating function [32]. 

To evaluate the performance of the MSS functional (1), it is
compared to an estimation using Exact transition probabilities (EL).
To that aim, a statistics is calculated of 50 data sets, which are obtained
from simulations using the Gillespie method [2] ,with the software
COPASI [31]. The initial condition is always the steady state of the
system. This scenario is of special interest as the system is structurally
unidentifiable, using traditional ODE methods. Therefore, it is also
the most difficult scenario for parameter estimation. For each of the
data sets, the MSS objective function and the objective function with the exact transition probabilities are optimized using the software
Mathematica [33]. Then, the mean and the standard deviation of both
estimators, as well as the relative error are calculated. This procedure
is done for different parameters and designs. The results are shown in
table 1. To check the assumptions, the mean and autocorrelation time
of the residuals, as well as the SNR are calculated (The letters in brackets
indicate the chosen design, see for information table 1): 



While all other test functions give acceptable values, the SNR is
very small in table 1 (A,B). This indicates that the systems dynamics
is not well approximated by the MSS method. Using 100 observations
does in this situation not give enough information, and leads to a
biased estimate. Using many more observations as in table 1B resolves
the problem. 

The results given in table 1 lead to two conclusions: An estimation
is possible and unbiased, if there are enough measurements, table 1
(B,C). If the trajectory is very short table 1A, the estimator might be
biased. The reason for that is that for a low SNR, more measurements
are needed to sum up enough information for the estimationsee
figure 2, which for each interval [t_{i−1},t_{i} ] shows the ODE dynamics,
in form of the solution of the corresponding initial value problem as well as the residuals (dotted red line). One can see
that for this situation, the system’s dynamics are not well represented
by the ODE solutions. 

The exact method, which is only possible in this simple example
model, makes much better use of the intrinsic fluctuations, and therefore,
results in more accurate estimates. The mean and autocorrelation time of the residuals behave very well, with respect to the comments on the
residuals given in the methods section. 

This example is a proof of concept, example as it shows that an
estimation with the MSS method is possible, even in a case which
would be structurally unidentifiable for traditional methods [30]. 

The results of table 1D can be compared to Wang et al. [8], where
a different method is applied to the same setting. As there is only the
result of one estimation procedure given a comparison with respect to
unbiasedness and variance is not possible. However, 4/5 of the relative
errors of table 1D are smaller than the relative error of Wang et al. [8]. 

Table 1 (BD) also suggest that using this methods models become
identifiable, which are unidentifiable using traditional ODE methods. 

LotkaVolterra Model 

Second example which is still small, but allows investigation of
behavior in partially observed models is a LotkaVolterra model: 

(3) 

where Y^{(1)} is the prey and Y^{(2)} the predator and θ_{1}, θ_{2}, θ_{3} parameters.
The first reaction of (3) is the prey reproduction, the second the
predator reproduction, and the third is the predator death. In terms of
ODEs, this system reads as 



This chapter also includes an investigation of the estimation with
noisy data. 

Fully observed case 

To investigate the behavior of the MSS objective functional (1),
a statistics is calculated from 50 data sets which are simulated with
the Gillespie algorithm, using the software COPASI [31]. The true
parameter was θ^{(0)} =(0.5, 0.0025, 0.3) and 40 observations were taken
with Δt=1. The initial conditions were =(71, 79). The setting
is chosen in a way such that it is identically with Boys et al. [7]. A
comparison of the results will be given later in this section. 

Measurement noise is simulated as follow: For each time point,
a normally distributed random variable is generated with the given
variance. Then, it is rounded to the next integer. This is done because
measurements are assumed to be integer counts. This is not necessary
for theoretical reasons or performance of the estimation. For σ=10
or σ=25 respectively, some negative measurements occur due to the
measurement error. For σ=10, these are not corrected, which shows
that the method can even handle negative data to some amount. For
σ=25 measurements which are negative are set to zero, which leads to
an measurement error, which is not centered around zero. The results
show that the method is still able to handle that situation. Note that
setting negative measurements to zero only implies the simulated ODE
dynamics of the next time interval. For all later time intervals, the
ODE is again initialized with a measurement, which well can be larger zero again. Further studies (without measurement noise) for cases in
which the species die out show an acceptable estimation. Of course,
the amount of information for the estimation is depending on the time
point of die out. 

For all data sets, the objective function is optimized with the software
Mathematica [33]. Table 2 gives averages and standard deviation for the
50 estimation results using (1). For each estimation result, the relative
error is calculated. The test functions give the following results: for
exact measurements =(0.16, 0.07),
=(12, 11) KL:(0.2, 0.1), act(∈)=(2, 2) and SNR:(4.3, 5.8). For measurements with noise level σ=10,
they are =(0.89, 0.62),
=(19,18), KL:(0.1, 0.1), act(∈)=(1,1) and
SNR:(2.2, 2.9). For noise level σ=25, they are =(3.48, 2.52), =(40,
40), KL:(0.04, 0.04), act(∈)=(1, 1) and SNR:(1.1,1.3). For σ=25, the
mean of the residuals is not close to zero, but considering the means of
the 10%− and 90%− quantiles, −90 and 47, one sees that the residuals
still are small.This shows that the approximation with the MSS method
works well in this model. 

The MSS method estimates parameters very well in this example. The
relative error of the estimation is in the range of the relative error of the
method proposed in Boys et al. [7], where only one data set is estimated.
The mean of the residuals is close to zero and the autocorrelation time
small, compared to the total duration of observations. The signal to
noise ratio is much better than in the ImmigrationDeath case. 

Partially observed LotkaVolterra model 

Now assume that only prey can be observed. As in the completely
observed case simulated data with true parameter θ^{(0)} =(0.5, 0.0025,
0.3) and 40 observation with Δt=1 is used. Initial condition is again
= (71,79), but as only prey can be observed, only = 71
will be used for parameter estimation. The optimization variable is now
(θ_{1}, θ_{2}, θ_{3}, ). Table 3 gives averages and standard deviation for the
50 estimation results using (2) with K={0}. For each estimation result,
the relative error is calculated. The estimation performs still quite well,
even if only one species is observed. The values of the test functions are for exact measurements =0.1,
=12, KL:0.1, act(∈)=2 and
SNR:4.4. For noise with σ=10, they are =0.7, =20, KL:0.04, act(∈)=1 and SNR:2.3. For noise with σ=2 they are =3.1, =19,
KL:0.04, act(∈)=1 and SNR:1.1. The residuals are still approximately
normally distributed as in the fully observed case, see also figure 3. 

The same model is used for parameter estimation in stochastic
models from Boys et al. [7]. For comparison in this article, the same
true parameter values and initial conditions are used. Although the
method in Boys et al. [7] is exact the estimate will be a random variable
depending on the intrinsic noisy data. This is the same in the MSS
method, which has additionally an approximation error. The relative
error with the MSS method lies in 2/3 of the cases, below the smallest
relative error of the methods suggested by Boys et al. [7], who gives one
estimation result. This shows that the approximation error of the MSS
method does not influence the estimation accuracy noticeable. 

Calcium Oscillation Model 

Third model is a Calcium oscillation model [27] 

(4) 

This model shows a qualitatively different behavior between
stochastic and ODE modeling for small particle numbers as shown
in Kummer et al. [27]. In this model, the estimation of the transition
probabilities is challenging due to the large state space. 

The ODEs and parameters correspond to reactions and rate laws.
These are interpreted stochastically using the software COPASI. To
investigate the behavior of the MSS objective functional (1), 50 data sets
are simulated from the stochastic model, corresponding to the ODEs
given in (4) with the Gillespie algorithm using the software COPASI.
The true parameter and initial conditions were θ^{(0)} =(212, 2.95, 1.52,190, 4.88, 1180, 1.24, 32240, 29090, 13.58, 153000, 160) 

(Ca_{0}, g_{0}, plc_{0}) =(10, 10, 10). 

Note that for this set of parameters, the system shows irregular
oscillations with large amplitudes modeled stochastically, but shows
regular oscillations with small amplitudes modeled with ODEs. 100
observations were taken with Δt=0.5, which cover about 4 oscillation
cycles. For all data sets, the objective function (1) is optimized using
a program implemented in the software Mathematica [33]. Table 4
gives averages and standard deviation for the 50 estimation results. For
each estimation result, the relative error is calculated. The estimation
performs quite successfully. The parameter θ_{2} which determines the
oscillatory behavior of the system has very small relative error. 

Calculating for one example trajectory, the residuals for all three
species yields the following numbers for 10%−Quantile, mean and 90%−
quantile: Ca, {−197, −10, 127}; g, {−244, −1, 171}; plc, {−111, −1, 106}.
The estimated variances are =(178, 187, 83), with a KL divergence
value of (0.7, 0.2, 0.05). The calculations for the autocorrelation time,
Ca:1, g:0.5, plc:1, show that these are much smaller than the total
observation time. The SNR are (10,11,19), which states that the system’s
dynamic is well represented, see also figure 4, which shows for calcium
that the residuals (red dotted lines) are small compared to the ODE
system’s dynamics (blue line). For eg, see supplementary file 1 and
supplementary file 2. 

The hybrid approach of Mikeev and Wolf [15] would be of interest
for a comparison. But as the species change the regime from very small
molecule numbers to relatively high numbers, an automatic switching
would be required. The MSS method handles this point technically less
involved. 

Partially observed Calcium oscillation model 

Observing g: At first, assume that only g can be observed. As in the
completely observed case, simulated data with true parameter θ^{(0)} and
100 observation with Δt=0.5 is used. Initial condition is again (Ca_{0}, g_{0},
plc_{0})=(10, 10, 10), but as only g can be observed only g_{0}=10 will be used
as data input for the parameter estimation. The system is structurally
unidentifiable now on a two dimensional manifold. The reason is that
a multiple of the true amount of Ca_{0} or Plc_{0} can be compensated by
adjusting the parameter values, see supplementary file 3. Therefore, for
the optimization (Ca_{0}, plc_{0}) is fixed and the optimization problem still
has 12 optimization variables, although the system is partially observed.
Table 5 gives averages and standard deviation for the 50 estimation
results, using (2) with K={0}. For each estimation result, the relative
error is calculated. 

Calculating the averages of the residuals yields, the following
numbers for 10%−quantile, mean and 90%−quantile: g, {−285,−2.7,270}.
The estimated variance is =232, with a KL divergence value of 0.07.
The calculations for the autocorrelation time, g: 0.5, show that these are
much smaller than the total observation time. Figure 5 shows that the
system’s dynamic is well represented, SNR is 7.1 which shows that the
estimation is possible even with only one observed species. 

Observing ca: This situation leads to difficulties. An estimation
performed with the functional (2) with K={0}, as it is done for g
observed results for one example, in an estimate of =(498, 2.76, 1.57,
14,1.89, 187, 5.49, 1183, 26244, 34848, 160653, 200) which is far from
the true parameter. The calculation of the residuals yields as above with
10%−Quantile, mean and 90%−quantile: ca, {−187, 424, 1552}, which seems to indicate a bias. SNR 1.5 and the system’s dynamics are not
at all represented, see also supplementary file 4. With the knowledge
of the true parameter, the problem can be identified: For a short time
interval, the system’s dynamics are well represented, but then due to
the development of the unobserved species, it is not well represented
any longer, see figure 6. Hence, in this case, it is better to enlarge the
optimization vector and use functional (2) with K={0,5,10,…,45},
which means that also unobserved states at other time points than zero are included in the optimization vector. The results show that an
estimation is possible with this functional, see table 6. 

Now, the situation has improved. Averages of the test functions:
10%−quantile, mean and 90%−quantile of the residuals are
(−1997,162,1214). The estimated variance is =1823, with a KL
divergence value of 0.67. The autocorrelation time is 1.16 and the SNR
is 1.7. Figure 7 shows that the system’s dynamic is well represented.
This example is highly important, as it shows that the short time ODE
integration method can still be used, if the system’s behavior is very different in ODE modeling or stochastic modeling. The method is even
applicable to partially observed models. 

Discussion 

The article presents a method for parameter estimation in
stochastic models, based on short time ODE integration. The method is
able to estimate parameters, even in models which behave qualitatively
different in stochastic modeling than in ODE modeling [27]. The data
which is used as input for the method is a single stochastic trajectory,
which might be well different from the mean behavior or other stochastic
realizations. The advantage of the objective function is that it does not
need stochastic simulations, nor a solution of a high dimensional CME,
which increases its speed. Hence, it is well applicable in larger realistic
size models which would be very time consuming in simulations based
methods. 

For the numerical optimization, this allows the user to choose
between various numerical optimization methods, such as derivative
based methods, global optimization techniques or Bayesian methods.
As these are well established techniques, the article restricts its focus to
the objective function. To test this objective function, it was optimized
for three different examples. A more rigorous analysis including the
calculation of confidence intervals and experimental design would
be desirable and is planned for further research. This article focuses
on the impact of the stochasticity of the models, and its influence on
the estimation. The approximation with an ODE model on a short
time interval is not a restriction for applicability, as the test functions
show that the approximation works, even in models with qualitatively different behavior in stochastic modeling than in ODE modeling. For
that it is important that the approximation is only done on a short time
interval. In models with very few reactions per time interval, such as
the ImmigrationDeath model, the signal to noise ratio is bad, so that
many observations are necessary. But in many realistic models, it is
not possible to measure fast enough to capture every single reaction,
so this fact does not reduce the applicability of the method much.
The MSS method proposed in this article performs very well, even in
oscillatory system such as LotkaVolterra or the Calcium oscillation
model, in which a larger state space makes other approaches more time
consuming. 

Instead of using the solution of the initial value problem of the
ODE system in (1), one could also use the mean of the stochastic
simulationshence the first momentwhich could be calculated with a
momentclosure [13], with only the first moment. Although calculating
the momentclosure should be fast compared to other calculations, this
would be slightly slower than the suggested method. The models in the
results section suggest that it is in many cases not necessary. If it would
be more stable, especially in cases of partially observed models, still
has to be investigated. The results for the ImmigrationDeath model
suggest that the method yields an unbiased result compared to an exact
analytical estimation, if there are enough measurements. The standard
deviation seems to be slightly higher. Compared to a stochastic
approach by [8], the suggested method performed well: the results
for sample data sets were more accurate in 80% of the 50 sample data
sets, which still might be due to the stochastic effects as [8], provided
a single data set. The case of table 1A, in which the method is biased is a situation when only very few reactions happen per time interval,
between two points of measurements. This is in reality generally not the
case. If the systems which is under investigation has that behavior, it is
suggested to use CME based methods, which should work fast in that
case due to the very small state space. 

The benefit of the MSS method using the measurement points as
initialization for the next interval is that it resolves the identifiability
problem, which is present in this scenario of the ImmigrationDeath
model, using traditional ODE methods. 

The results of the LotkaVolterra model show that the method is
well able to provide estimates for fully and partially observed models,
comparable to the accuracy of methods using simulations [7]. 

The Calcium oscillation model was a very important test case as
the method showed good performance in estimating the parameters,
although the system’s behavior is completely different in stochastic and
deterministic modeling. For the partially observed case, the amount
of information depends on the question which species is observed.
Observing g the estimation is possible; observing only ca, the objective
function has to be modified to (2) with K={0,5,…,45}. 

The examples show that although the Maximum Likelihood
property of the method is theoretically lost, the estimation is still
quite precise. The proposed test functions work fine as they identify
those situations, which lead to a bias and “accept” the others. This
argumentation also allows for the extension to the case of measurement
noise, in which even data points with negative resulting negative
molecule counts can be used. An example with measurement noise is
given for the LotkaVolterra example. 

To conclude the article, presents a method for parameter
estimation for stochastic models. The method is based on short time
ODE integration. It is shown that the approach exhibits the desirable
properties mentioned in the introduction, that makes it applicable to
realistic problems in systems biology: it works in models which behave
qualitatively different in stochastic modeling than in ODE modeling.
It is able to cope with partially observed models with measurement
noise. As the method is deterministic and works without stochastic
simulations, it is computationally efficient. 

Points for future research are confidence intervals for the
parameters, as well as optimum experimental design. The first is crucial
for the quality assessment of the estimates, and the second helps in
saving experimental cost, whilst increasing the estimation accuracy. 

Acknowledgements 

Christoph Zimmer was supported by a ViroQuant/HGS MathComp stipend and
Sven Sahle by the Klaus Tschira Stiftung. The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.
The authors thank Jürgen Pahle for helpful discussions. 

References 

 Raj A, van Oudenaarden A (2009) Singlemolecule approaches to stochastic gene expression. Annu Rev Biophys 38: 255270.
 Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22: 403434.
 Pahle J (2009) Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Brief Bioinform 10: 5364.
 Jagaman K, Danuser G (2006) Linking data to models: data regression. Nat Rev Mol Cell Biol 7: 813819.
 Tian T, Xu S, Gao J, Burrage K (2007) Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics 23: 8491.
 Poovathingal SK, Gunawan R (2010) Global parameter estimation methods for stochastic biochemical systems. BMC Bioinformatics 11: 414.
 Boys RJ, Wilkinson DJ, Kirkwood TBL (2008) Bayesian inference for a discretely observed stochastic kinetic model. Stat Comput 18: 125135.
 Wang Y, Christley S, Mjolsness E, Xie X (2010) Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Syst Biol 4: 99.
 Henderson D, Boys R, Proctor C, Wilkinson D (2010) Linking systems biology models to data: a stochastic kinetic model of p53 oscillations. In: The Oxford Handbook of Applied Bayesian Analysis, O' Hagan A, West M (Eds). OUP Oxford, Oxford.
 Reinker S, Altman RM, Timmer J (2006) Parameter estimation in stochastic biochemical reactions. Syst Biol (Stevenage) 153: 168178.
 Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP (2009) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187202.
 Andreychenko A, Mikeev L, Spieler D, Wolf V (2011) Parameter identification for markov models of biochemical reactions.
 Gillespie CS (2009) Momentclosure approximations for massaction models. IET Syst Biol 3: 5258.
 Gillespie CS, Golightly A (2010) Bayesian inference for generalized stochastic population growth models with application to aphids. J R Stat Soc Ser C Appl Stat 59: 341357.
 Mikeev L, Wolf V (2012) Parameter estimation for stochastic hybrid models of biochemical reaction networks. HSCC 12, Beijing.
 Deuflhard P, Huisinga W, Jahnke T, Wulkow M (2008) Adaptive discrete galerkin methods applied to the chemical master equation. SIAM J Sci Comput 30: 29903011.
 Engblom S (2006) A discrete spectral method for the chemical master equation. Technical Report, Uppsala University 2006036.
 Munsky B, Khammash M (2010) Identification from stochastic celltocell variation: a genetic switch case study. IET Syst Biol 4: 356366.
 Zechner C, Ruess J, Krenn P, Pelet S, Peter M, et al. (2012) Momentbased inference predicts bimodality in transient gene expression. Proc Natl Acad Sci U S A 109: 83408345.
 Kügler P (2012) Moment fitting for parameter inference in repeatedly and partially observed stochastic biological models. PLoS One 7: e43001.
 Andersson H, Britton T (2000) Stochastic epidemic models and their statistical analysis, Vol 151 of Lecture notes in statistics. Springer, Heidelberg.
 Rempala GA (2012) Least squares estimation in stochastic biochemical networks. Bull Math Biol 74: 19381955.
 Bock H (1981) Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert K, Deuhard P, Jäger W (eds), Modelling of Chemical Reaction Systems, Volume 18 of Springer Series in Chemical Physics. Springer, Heidelberg.
 Schlöder J, Bock H (1983) Identification of Rate Constants in Bistable Chemical Reactions. In: Deuhard P, Hairer E (eds), Numerical Treatment of Inverse Problems in Differential and Integral. Birkhäuser, Berlin.
 Baake E, Baake M, Bock HG, Briggs KM (1992) Fitting ordinary differential equations to chaotic data. Phys Rev A 45: 55245529.
 Kallrath J, Schlöder J, Bock HG (1993) Least squares estimation in chaotic differential equations. Celestial Mechanics and Dynamical Astronomy 56: 353371.
 Kummer U, Krajnc B, Pahle J, Green AK, Dixon CJ, et al. (2005) Transition from stochastic to deterministic behavior in calcium oscillations. Biophys J 89: 16031611.
 Komorowski M, Finkenstädt B, Harper CV, Rand DA (2009) Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10: 343.
 Bock HG, Kostina E, Schlöder JP (2007) Numerical methods for parameter estimation in nonlinear differential algebraic equations. GAMM Mitteilungen 30: 376408.
 Moles CG, Mendes P, Banga JR (2003) Parameter estimation in biochemical pathways: a comparision of global optimization methods. Genome Res 13: 24672474.
 Hoops S, Sahle S, Gauges R, Lee C, Pahle J, et al. (2006) COPASIa COmplex PAthway SImulator. Bioinformatics 22: 30673074.
 Cox DR, Miller HD (1965) The Theory of Stochastic Processes. Wiley, London.
 Wolfram Research (2010) Mathematica, version 8.0. Wolfram Research, Inc Champaign, IL.


References

 Raj A, van Oudenaarden A (2009) Singlemolecule approaches to stochastic gene expression. Annu Rev Biophys 38: 255270.
 Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22: 403434.
 Pahle J (2009) Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Brief Bioinform 10: 5364.
 Jagaman K, Danuser G (2006) Linking data to models: data regression. Nat Rev Mol Cell Biol 7: 813819.
 Tian T, Xu S, Gao J, Burrage K (2007) Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics 23: 8491.
 Poovathingal SK, Gunawan R (2010) Global parameter estimation methods for stochastic biochemical systems. BMC Bioinformatics 11: 414.
 Boys RJ, Wilkinson DJ, Kirkwood TBL (2008) Bayesian inference for a discretely observed stochastic kinetic model. Stat Comput 18: 125135.
 Wang Y, Christley S, Mjolsness E, Xie X (2010) Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Syst Biol 4: 99.
 Henderson D, Boys R, Proctor C, Wilkinson D (2010) Linking systems biology models to data: a stochastic kinetic model of p53 oscillations. In: The Oxford Handbook of Applied Bayesian Analysis, O' Hagan A, West M (Eds). OUP Oxford, Oxford.
 Reinker S, Altman RM, Timmer J (2006) Parameter estimation in stochastic biochemical reactions. Syst Biol (Stevenage) 153: 168178.
 Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP (2009) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187202.
 Andreychenko A, Mikeev L, Spieler D, Wolf V (2011) Parameter identification for markov models of biochemical reactions.
 Gillespie CS (2009) Momentclosure approximations for massaction models. IET Syst Biol 3: 5258.
 Gillespie CS, Golightly A (2010) Bayesian inference for generalized stochastic population growth models with application to aphids. J R Stat Soc Ser C Appl Stat 59: 341357.
 Mikeev L, Wolf V (2012) Parameter estimation for stochastic hybrid models of biochemical reaction networks. HSCC 12, Beijing.
 Deuflhard P, Huisinga W, Jahnke T, Wulkow M (2008) Adaptive discrete galerkin methods applied to the chemical master equation. SIAM J Sci Comput 30: 29903011.
 Engblom S (2006) A discrete spectral method for the chemical master equation. Technical Report, Uppsala University 2006036.
 Munsky B, Khammash M (2010) Identification from stochastic celltocell variation: a genetic switch case study. IET Syst Biol 4: 356366.
 Zechner C, Ruess J, Krenn P, Pelet S, Peter M, et al. (2012) Momentbased inference predicts bimodality in transient gene expression. Proc Natl Acad Sci U S A 109: 83408345.
 Kügler P (2012) Moment fitting for parameter inference in repeatedly and partially observed stochastic biological models. PLoS One 7: e43001.
 Andersson H, Britton T (2000) Stochastic epidemic models and their statistical analysis, Vol 151 of Lecture notes in statistics. Springer, Heidelberg.
 Rempala GA (2012) Least squares estimation in stochastic biochemical networks. Bull Math Biol 74: 19381955.
 Bock H (1981) Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert K, Deuhard P, Jäger W (eds), Modelling of Chemical Reaction Systems, Volume 18 of Springer Series in Chemical Physics. Springer, Heidelberg.
 Schlöder J, Bock H (1983) Identification of Rate Constants in Bistable Chemical Reactions. In: Deuhard P, Hairer E (eds), Numerical Treatment of Inverse Problems in Differential and Integral. Birkhäuser, Berlin.
 Baake E, Baake M, Bock HG, Briggs KM (1992) Fitting ordinary differential equations to chaotic data. Phys Rev A 45: 55245529.
 Kallrath J, Schlöder J, Bock HG (1993) Least squares estimation in chaotic differential equations. Celestial Mechanics and Dynamical Astronomy 56: 353371.
 Kummer U, Krajnc B, Pahle J, Green AK, Dixon CJ, et al. (2005) Transition from stochastic to deterministic behavior in calcium oscillations. Biophys J 89: 16031611.
 Komorowski M, Finkenstädt B, Harper CV, Rand DA (2009) Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10: 343.
 Bock HG, Kostina E, Schlöder JP (2007) Numerical methods for parameter estimation in nonlinear differential algebraic equations. GAMM Mitteilungen 30: 376408.
 Moles CG, Mendes P, Banga JR (2003) Parameter estimation in biochemical pathways: a comparision of global optimization methods. Genome Res 13: 24672474.
 Hoops S, Sahle S, Gauges R, Lee C, Pahle J, et al. (2006) COPASIa COmplex PAthway SImulator. Bioinformatics 22: 30673074.
 Cox DR, Miller HD (1965) The Theory of Stochastic Processes. Wiley, London.
 Wolfram Research (2010) Mathematica, version 8.0. Wolfram Research, Inc Champaign, IL.

Tables at a glance
 
 
 
 
 

Table 1  
Table 2  
Table 3  
Table 4  
Table 5  
Table 6 
Figures at a glance
 
 
 

Figure 1  
Figure 2  
Figure 3  
Figure 4 
 
 

Figure 5  
Figure 6  
Figure 7 


