Improved estimations of stochastic chemical kinetics by finite state expansion

Stochastic reaction networks are a fundamental model to describe interactions between species where random fluctuations are relevant. The master equation provides the evolution of the probability distribution across the discrete state space consisting of vectors of population counts for each species. However, since its exact solution is often elusive, several analytical approximations have been proposed. The deterministic rate equation (DRE) gives a macroscopic approximation as a compact system of differential equations that estimate the average populations for each species, but it may be inaccurate in the case of nonlinear interaction dynamics. Here we propose finite state expansion (FSE), an analytical method mediating between the microscopic and the macroscopic interpretations of a stochastic reaction network by coupling the master equation dynamics of a chosen subset of the discrete state space with the mean population dynamics of the DRE. An algorithm translates a network into an expanded one where each discrete state is represented as a further distinct species. This translation exactly preserves the stochastic dynamics, but the DRE of the expanded network can be interpreted as a correction to the original one. The effectiveness of FSE is demonstrated in models that challenge state-of-the-art techniques due to intrinsic noise, multi-scale populations, and multi-stability.


Background theory
We briefly review here the preliminary definitions and notation on stochastic reaction networks used in the paper.
Consider a set of species S . Then, N S and R S are the sets of all integer and real-valued vectors, respectively, with coordinates represented by the elements in S . For a given vector σ ∈ R S (or σ ∈ N S ), we denote by σ S the value of the component corresponding to species S ∈ S . We generalize binary operations to the case where operands σ and μ are such that σ ∈ R S 1 and μ ∈ R S 2 , with S 1 = S 2 : each binary operation treats them as elements of R S 1 ∪S 2 .
Formally, we denote a reaction network as a pair (S , R), where R is a set of reactions. Each reaction is provided as a triple in the form where ρ ∈ N S are the reactants, π ∈ N S are the products, and f is the propensity function, f : R S → R + 0 , with arbitrary form. We will use the standard notation for reactants and products whereby only the non-zero components are written out, separated by the plus sign. For instance, given the species S = {A, B, C}, equation (2.1) corresponds to the reaction when ρ A = 1, ρ B = 2, ρ C = 0 and π A = π B = 0, π C = 1. A discrete state of a reaction network is described by a vector σ ∈ N S , where σ S denotes the population of species S in that state. Then, f (σ ) is the parameter of the exponential distribution of the firing time of that reaction. Upon firing, the system may transition from state σ to σ + π − ρ, thus defining a stochastic behaviour in terms of a Markov jump process. Its dynamics is defined by the master equation. It gives the probability P σ (t) of finding the Markov chain in state σ at time t: Formally, the master equation may be defined for all σ ∈ N S . However, its solution will be nonzero only for those states that are reachable from the states that have non-zero probability at time t = 0. The reachable set of states, also called the state space, can be defined as the smallest set such that the following hold: (i) σ is in the reachable set if P σ (0) > 0; (ii) σ is in the reachable set if σ is in the reachable set and there exists a reaction ρ f − → π such that σ + π − ρ = σ .
For simplicity (and without loss of generality), we will consider networks where the initial probability distribution P σ (0) is concentrated in one state only, which is called the initial state. Additionally, we shall restrict to well-defined reaction networks where each propensity function evaluates to zero for all multisets that do not have the minimum population counts described by the reactants. Formally, a reaction network is well defined if every reaction ρ f − → π is such that f (σ ) = 0 if ρ > σ (the inequalities shall be intended component-wise from now on). This guarantees that the Markov chain does not reach states with negative population counts.
The state space, hence the number of equations required for stochastic analysis, may be finite or infinite depending on the network stoichiometries. Even in the case of finite-state spaces, its size may grow combinatorially large with the population counts of the initial state. This may practically preclude exact analysis in most models of interest. The DRE provides a compact model with |S | variables. Each variable approximates the expected population level of each species at time t, denoted by the vector X(t) ∈ R S , as the solution of the system: The true expected population counts, denoted by E [Y], are known to satisfy However this system is not self-consistent because there are no equations for the expected values of the propensity functions appearing in the right-hand sides. The DRE closes the true equations for the expected values by replacing , introducing an approximation error if the propensity functions are not linear. Under mild conditions such error is known to vanish asymptotically when the initial population levels go to infinity and the DRE is understood as a system of re-scaled equations for the concentrations of species, rather than absolute population counts [7].

Methodology (a) Finite-state expansion
With FSE, the original set of species S of a reaction network is meant to represent the continuous dynamics. This is expanded with a set of auxiliary species, each tracking a specific discrete state.
Then, the expanded network will have auxiliary species A , B , C , A + B , A + C , B + C , A + B + C and 0 , where the last species denotes the zero vector being tracked. We remark that, similarly to the definition of the master equation, it is convenient to consider all states within the upper bound when describing the theory. However, also in this case, not all discrete states may be reached depending on the stoichiometry of the reaction network, hence they can be removed in practice during the analysis.
where η and ψ are vectors of original species, o is a target auxiliary species, and f o is a modified propensity function that accounts for the fact that the reactant auxiliary species is o . Specifically, we have This modification accounts for the fact that the tracked species o encodes additional population counts, as given by o.
For example, let us consider an expansion for the reaction in equation (2.2) assuming that it evolves with mass-action kinetics. In general, for a reaction with reagents ρ and kinetic parameter k > 0, the propensity function by mass-action kinetics for state σ is given by Here the propensity function is Assuming that the upper bounds are O A = O B = O C = 1, the expansion for the tracked state A + B is given by Since the tracked state A + B does not have enough copies of B, one further copy is used by the buffer species. The product of this reaction does not involve any of the buffer species because C is within the chosen bounds. By equation (3.3), f in the expanded reaction is We denote by R O the full set of reactions in the expanded network.
Every expansion is stochastically equivalent to the original network, in the sense that there is a unique marginal probability distribution for the overall population of each species at every time point. This equivalence can be stated in the sense of ordinary lumpability for Markov chains [12]. That is, with the master equation we show that the probability of being in a state in the original reaction network equals the sum of the probabilities across all states in the expanded network with the same overall abundances for each species. This relation holds at all time points, provided that it is satisfied for the respective probability distributions at time 0. 1 Theorem 3.1. Let P andP denote the solutions of the master equation in the original and expanded network, respectively. Then, for all t, By construction, if O = 0 then the original and expanded networks coincide. The other limit case, when the auxiliary set of species contains all discrete states, corresponds to a fully expanded reaction network. In this case, the DRE of the expanded network corresponds to the master equation of the original network, hence no approximation occurs.
reaction network with finite state expansion coupled mass-action propensity conditional probability (c) Figure 1. The FSE method applied to the Schlögl system [29]. (a) Mass-action reactions with kinetic parameters taken from Li et al. [33]: k 1 = 0.03, k 2 = 0.0001, k 3 = 200, k 4 = 3.5. (b) Stochastic simulations show the well-known bimodality of the steady-state probability distribution of species X. (c) For a given upper bound O X on the population of species X to be tracked explicitly, FSE yields the auxiliary species denoted by 0 , 1 , ..., O X . The original species X acts as buffer that collects untracked populations levels. For example, reaction R1.1 derives from reaction R1 when the autocatalytic formation of a new molecule occurs when the system tracks the discrete state O X , thus requiring to increase the buffer species X by one element. Even when the system tracks a discrete state that does not require buffering (R1.2), the propensity function f 1 (n) of the reaction effectively considers an overall kinetics of mass-action type, since the factor k 1 (X + n)(X + n − 1)/2 models the total rate due to number of possible collisions between pairs of X + n indistinguishable molecules. Intuitively, the factor n conditions these events to the system tracking n discrete molecules. (d) The original state space counts the number of copies of X. The state space in the expanded network consists of the pair tracked discrete state/population level of the buffer species. By theorem 3.1, the sum of the probabilities across all pairs that have the same overall population matches the corresponding probability in the original Markov chain (as exemplified by matching colours of the states). (e) The single-dimensional DRE of the original Schlögl model is expanded into a DRE with O X + 1 variables (where I denotes the indicator function); an estimate of the total mean population at time t can be computed as X(t) + n n · n (t).

Results
With a number of case studies from the literature, here we show that FSE can refine the accuracy of mean estimates of species populations, even with modest expansions. Ground-truth mean trajectories were computed by stochastic simulation via Gillespie's algorithm [2]. The numerical experiments herein reported were performed with an implementation of FSE within the tool ERODE [28], publicly available at https://www.erode.eu. DRE . The DRE approximation is unaffected by the variation of m while the true population averages are increased at higher burst sizes. Corrections for species X 3 are similar with a generally smaller error (cf. figure 12c). Kinetic parameters were set as follows: For a burst size of m = 5, the parameters are as in Grima [14]. Initial condition was the zero state. (Online version in colour.)

(a) Schlögl system
The well-known Schlögl system is an autocatalytic process for a single species X [29]. The DRE of the original Schlögl model has two equilibrium points, owing to its strong (cubic) nonlinearity [30], deterministically converging only to one [31]. Its discrepancy with respect to the average mean trajectory computed by stochastic simulation has been observed for a long time [32].

(b) Heterodimerization model
We now consider a model from Grima [14], where the main source of noise is the variance caused by the production of the two species X 1 and X 2 undergoing heterodimerization occurs in bursts of size m: We study two cases with m = 5 and m = 8; for a better comparison the influx rates were kept constant by setting k 1 = 2500/m. Because of the symmetries between X 1 and X 2 we consider equal observation bounds for them. The observation bound on X 3 was set to zero. In this model, the DRE is insensitive to the choice of m while the stochastic trajectories do depend on the burst size. Since a larger m introduces more noise, larger observation bounds are needed to increase the accuracy of the approximation (figure 2).

(c) Protein degradation model
The model of enzyme-catalysed protein degradation from Grima [14] allows us to study the behaviour of FSE when more species are to be tracked with non-zero population bounds than in the previous two case studies. Here, protein P is generated in bursts of size m and can bind to catalyst enzyme E, forming the enzyme-substrate complex C.  enzyme, it can be degraded, forming P d . The total amount E T of catalyst enzyme in the system always remains constant: By varying the burst production rate k 1 , different saturation levels of the catalyst enzyme are reached. The closer the ratio α = mk 1 /k 4 E T is to 1, the more saturated the enzyme becomes with substrate.
The parameters given in [14] for this model assume a burst size of m = 30 and a total enzyme population of E T = 60. For an accurate approximation, FSE requires non-zero observation bounds for E, C, and P (figure 3); similarly to the previous case, no observation bound is required for species P d . The size of the tracked state space grows proportionally to the product O E · O C · O P . Additionally, in contrast to the previous model, burst production rates are not adjusted when increasing the burst size here. The overall higher production rate increases the population additionally to the effect caused by higher variances. This exacerbates the need for higher observation bounds at higher burst sizes. For a fixed choice of the observation bounds, larger values of α tend to worsen the accuracy of the FSE approximation.

(d) Feedback switch
Let us now consider a model of a genetic feedback switch taken from [34,35]: Species D u and D b represent the state of a single gene when its promoter region is unbound (respectively, bound) to a protein P. The reaction propensities obey mass-action dynamics through the kinetic parameters on the arrows. This is a basic model for negative autoregulation, a well-known motif appearing in more than 40% of the known transcription factors in Escherichia

(e) Toggle switch
The toggle switch network is a fundamental regulatory system of two mutually repressing genes [37]. Its mathematical modelling is challenging because of multimodality [24,38], as well as stochastic noise due to the species such as mRNA present in low molecular abundances [39]. Here we study the reaction scheme analysed in [40], consisting of a mass-action variant from [37]: where M i and S i denote the precursor mRNA and the mRNA for target protein P i , respectively. The last two reactions model mutual inhibition by means of a precursor of one protein repressing the mRNA of the other. When protein production is controlled by low populations of precursor mRNA, the stochastic fluctuations are not adequately approximated by the DRE. By explicitly observing few copies of mRNA (up to tens), FSE provides precise estimates of the time courses of the mean populations ( figure 5). The resulting systems of equations, of size at most 2310, can be analysed effectively, as opposed to time-consuming simulations using hybrid stochastic-deterministic approaches such as in [40].

Comparison with related work
Using the models presented in the previous section, here we compare FSE with state-of-the-art analytical techniques that can be used to obtain approximate estimates of mean population levels in stochastic reaction networks. Specifically, we considered the following methods:  -Moment-closure approximation (MCA). We considered the second-order low-dispersion moment closure [41,42], in which variance and covariance are the highest observed moments and all higher-order central moments are set to zero; in all models considered in this paper, computing approximations with higher-order moments did not improve the quality of the approximation. -The effective mesoscopic rate equation (EMRE), which adds mean-correction terms to the linear-noise approximation under the assumption of an underlying Gaussian process [14]. -The method of conditional moments (MCM), a hybrid analytical technique combining a discrete representation of low-abundance species and a moment-based approximation of high-abundance ones [16]. -Finite-state projection (FSP), a well-known method to obtain a finite-dimensional master equation through a truncation of the state space by redirecting transitions towards unobserved states into an absorbing state with provable bounds [43].
For this study, we used an implementation of these techniques as available on the software tool CERENA [41]. The Schlögl system is known to stress MCA because of their reported difficulties with multimodal distributions [44,45]. Figure 6 shows that MCA behaves similarly to DRE in this case, while EMRE tends to overestimate the mean population of species X at longer time horizons. Similar results were obtained on the toggle switch network (figure 7). Here we confirm physically meaningless moment-closure estimates due to the presence of low-abundance species, as already reported in [22]. Furthermore, MCM could not be tested on this model since its implementation returned with an error.
Both MCA and EMRE provide accurate estimates in the heterodimerization model from equation (4.1) (figure 8). This model also allows examining the effects of burst size variation without also changing the total production rate. We used this model for a closer comparison to EMRE in this regard. Figure 9 shows that, similarly to FSE (with fixed observation bounds) and  the DRE approximation, the approximation error of EMRE increases with the burst size m. For each tested burst size, figure 9 also marks (secondary y-axis) the minimum FSE observation bound O X 1 /X 2 for which the accuracy of EMRE can be matched. In this study, the highest observation bound O X 1 /X 2 = 27 is enough to match EMRE's accuracy in the range 5 ≤ m ≤ 8. For burst sizes larger than 8, the loss of accuracy in EMRE outpaces the analogous effect in FSE, to the degree that the bound to match EMRE becomes small. The accuracy of FSE can be increased further by raising the bound above the marked values (figure 12c). The main challenge of the protein degradation model in equation (4.2) for FSE is the rapid growth of the state space as a function of the observation bounds. Using the observation bounds as in figure 3, FSE is outperformed by both EMRE and MCA for α = 0.8. For α = 0.6, FSE is more accurate than EMRE; however, it uses a significantly more complex system with 9214 differential equations (figure 10).
In the genetic feedback switch model, species D u and D b describe the distinct binary states of a single gene. Hence they represent the natural candidates of the low-abundance class when applying MCM. On this model, however, the method could not return valid results as early as time point 0.36. We further tested a gene regulatory model with an inhibition feedback loop taken from [46], where it was studied using a hybrid stochastic/deterministic method based on  Figure 9. Direct comparison against EMRE on the heterodimerization model across different burst sizes. Although it is consistently better than the DRE (broken line, shown for reference) for all tested values, the relative error of EMRE to the stochastic simulation rises on all species with increasing burst size. In green, the graph shows for each burst size the FSE bound O X 1 /X 2 that is necessary to match the accuracy of EMRE. (Online version in colour.) piecewise deterministic Markov processes. Here, MCM showed similar difficulties that confirm already reported numerical issues [3]. (The numerical results of this analysis, not shown here, are replicable using the supporting data of this article.) Stochastic models of gene expression such as these are amenable to domain-specific techniques (e.g. [25] and references therein), which may prove more effective. For example, in appendix B we compare FSE against the linear mapping approximation method from [25], showing that FSE requires larger systems of differential equations to obtain a similar level of accuracy. Defining incoming and outgoing transitions with respect to the buffer species maintained in the expanded network represents a crucial difference with FSP, where transitions towards unobserved state are collapsed into a sink state that absorbs the probability mass. Experimentally, this results in increased accuracy of mean estimates by FSE when tracking the same subset of the state space in both methods. For example, figure 11 shows this effect on the Schlögl model; a similar behaviour could be found in the other case studies (not reported here). More recently, variants of FSP have been proposed to cope with the decay of the probability mass, e.g. to obtain solutions that are able to estimate stationary distributions and passage times [47][48][49]. By means   figure 1f (450 and 650), FSE estimates the average population of X more precisely than finite-state projection (stochastic simulation performed with 100 000 runs). In particular, while FSE requires a state space with 650 states to accurately match stochastic simulations, FSP needs 750 states. (Online version in colour.) of a comparison against the most recent improvement by [49], in appendix C we show that FSE may provide accurate estimates using systems of fewer equations.

FSE bound matching EMRE
The solution by the aforementioned state-space truncation methods is a lower bound on the true probability distribution, and increasing the set of observed states tightens that bound. Instead, although FSE ensures that the expansion coincides with the master equation when the whole state space is tracked, it does not give theoretical guarantees on the degree of accuracy, nor does it guarantee monotonically increasing accuracy with larger observation bounds. Indeed, experimentally we confirmed that monotonicity of the error is model dependent. For instance, the relative percentage error between the mean population predicted by FSE and the estimated mean by stochastic simulation is not monotonic in the Schlögl system (figure 12a), while it is monotonic for the other case studies (figure 12b-e).

Conclusion
We have presented FSE as a novel analytical method that offers a trade-off between the exactness of the solution of the master equation and the approximation errors introduced by the DRE for stochastic reaction networks. FSE maintains a user-defined subset of the discrete state space and couples this with whole-population continuous dynamics to account for the behaviour of states that are not explicitly tracked. By an algorithmic translation of a network into an expanded one with auxiliary species and modified reactions, FSE leads to equations that can be interpreted as a refinement of the original DRE. A theoretical result of asymptotic correctness increases the confidence as to the effectiveness of the method, since it shows that the DRE of the expanded network corresponds to the original master equation.
The performance of FSE in correcting the original DRE when tracking a strict subset of the discrete state space has been shown numerically in models that turn out to be challenging for related state-of-the-art techniques. The effective mesoscopic rate equation relies on perturbation arguments around the linear-noise approximation, hence it inherently assumes a limiting regime, unlike FSE. Experimentally, we found that this resulted in less accurate mean estimates than FSE. With respect to analytical approximations of the master equation based on moment closure, the case studies proved difficult since the analyses returned unphysical results or exhibited numerical issues, as also reported in the literature. Unlike state-space truncation methods based on finite state projection, FSE does not provide error bounds with respect to the original master equation. Yet, we found excellent accuracy when the observed state space is large enough. Overall, these findings make FSE a useful tool to study chemical reaction networks for which exact stochastic analysis through the master equation is not accessible. Currently, the FSE translation provides a refinement of mean estimates. It is an interesting line of future work to extend the method to compute improved approximations of higher-order moments.
Despite the encouraging results herein reported, the applicability of FSE may not always be feasible. Since it is based on an enumeration of the discrete state space, FSE may too suffer from combinatorial complexity, such that the number of equations can grow rapidly large as a function of the observation bound. For larger networks in particular, this may require relatively small tracked state spaces to keep reasonable computational costs. On the other hand, if significant probability mass falls outside the tracked state space then the performance of FSE may not be adequate, as the Schlögl system shows when small enough bounds are used.
There are a number of methods that are worth investigating in the future in order to tackle these challenges. Model-reduction techniques could help relieve the computational cost of the analysis of the DRE by providing a lower-order approximation that preserves the dynamics of interest [50,51]. It would be particularly beneficial to study either bounds or monotonicity properties on the FSE approximation error in order to develop adaptive strategies to find optimal values for the observation bounds. In principle, there might be other expansions than the one presented here, which give rise to different correction behaviour of the DRE while still preserving the stochastic dynamics. A further line of improvement might consist in devising variants of FSE where the tracked state space can be arbitrarily fixed, instead of being dependent on an upper bound for the population counts. This might allow the fine-tuning of the choice of the discrete region where the probability mass is mostly concentrated. For such expansions, smaller observation bounds (hence lower computational cost) may suffice to obtain the same degree of accuracy as in this paper, thus potentially extending the practical applicability of FSE to more complex models. This appendix provides accompanying material for the main text. This section shows the proofs of theorems 3.1 and 3.2 in the main text. Appendix C presents the comparison of FSE against the method of slack reactants [49].
Notation. We define the following operations for any two σ , μ ∈ R S .
-Projection: Given P ⊆ S , σ |P ∈ R P is such that (σ |P ) P = σ P for all P ∈ P.
-Mapping: Given P ⊆ S , and a function m : S → P, σ m ∈ R P is such that P σ m = m(S)=P S σ , for all P ∈ P.
With these, the quantities in equation (3.2) in the main text can be rewritten as follows: (a) Proof of theorem 3.1 Let P andP denote the solutions of the master equation in the original and expanded network, respectively. Then for all t.
Proof. We prove the following equivalence for the derivatives of the solutions of the respective master equations from which the statement holds under the assumption of consistent initial conditions.
Now we consider the other limit case, namely when the auxiliary set of species contains all discrete states, corresponding to a fully expanded reaction network. In this case, the DRE of the expanded network corresponds to the master equation of the original network, hence no approximation occurs.

(b) Proof of theorem 3.2
In order to prove theorem 3.2 in the main text, we prove two preliminary results stated as lemmata. (ii) Lemma A.2 The following lemma proves that the expansion preserves the overall population jumps. That is, for each original reaction, every expanded reaction is such that each species is subject to the same change of its abundance level.
For case (2): For case (3): Proof. (Case i). This statement holds if, whenever X S (t) = 0, then dS X (t)/dt = 0 for all S ∈ S . The DRE for the expanded reaction network can be written as follows: Since o ∈ N S , every expanded reaction o + η f o −→ o + ψ will be such that ψ S = 0 for each S ∈ S , hence π S = 0 in equation (A 3). Let us now assume towards a contradiction that dS X (t)/dt =  method (SRM) [49], which has been shown to offer the best performance against the other state-space truncation techniques. The idea of SRM is to add 'buffer species' that turn a reaction network with an infinitestate space into one with a finite-state space, but without absorbing states, while still preserving convergence properties as with the original FSP. Eliminating absorbing states allows for stochastic analysis over long time horizons, such as first passage times and estimations of stationary distributions.
We compare FSE against SRM an example provided by the authors of SRM in ( [49], fig. 3a). The chemical reaction network is reported here for convenience:  Comparison between FSE and SRM using the example in [49]. The plots show the average populations of species X 1 and X 2 (computed by stochastic simulation); their DRE approximation; the approximation by FSE by setting O X 1 = O X 2 = 5; and two estimates by SRM by setting the initial population of the buffer species Y equal to 10 and 30, respectively (with the master equation solved by stochastic simulation). The SRM setting Y = 10 gives a comparable number of equations than the FSE setting; however, the accuracy of the mean approximation is worse than DRE. On the other hand, the SRM setting Y = 30 gives estimates of comparable accuracy to those by FSE, but it gives rise to 256 equations instead of 37 as in FSE. (Online version in colour.) a comparison between FSE and SRM can be done by measuring the difference between the estimations of the average populations given by the solution of the master equation of SRM and the DRE of FSE when both methods are set such that they give rise to systems of equations of similar size. Figure 14 shows the results of this comparison, indicating that SRM needs ca seven times more equations to estimate the average population of the original species X 1 and X 2 in this model.