SIR Model

Here, we consider a rare event estimation problem for a susceptible-infectious-removed (SIR) model. Note that the same problem appears in Cui, Dolgov, and Scheichl (2024), where it is described in further detail.

Problem Setup

We consider an SIR model with \(K \in \mathbb{N}\) compartments. The percentage of susceptible, infectious and recovered individuals in the \(k\)th compartment at time \(t \in [0, T]\) are denoted by \(S_{k}(t)\), \(I_{k}(t)\) and \(R_{k}(t)\), respectively. The dynamics of \(S_{k}(t)\), \(I_{k}(t)\) and \(R_{k}(t)\) are governed by the ODEs

\[ \begin{align} \frac{\mathrm{d}S_{k}}{\mathrm{d}t} &= -\theta_{k}S_{k}I_{k} + \frac{1}{2}\sum_{j \in \mathcal{J}_{k}}(S_{j} - S_{k}), \\ \frac{\mathrm{d}I_{k}}{\mathrm{d}t} &= \theta_{k}S_{k}I_{k} - \nu_{k}I_{k} + \frac{1}{2}\sum_{j \in \mathcal{J}_{k}}(I_{j} - I_{k}), \\ \frac{\mathrm{d}R_{k}}{\mathrm{d}t} &= -\nu_{k}I_{k} + \frac{1}{2}\sum_{j \in \mathcal{J}_{k}}(R_{j} - R_{k}), \end{align} \]

where \(\mathcal{J}_{k}\) denotes the set of neighbours of the \(k\)-th compartment, and the parameters \(\theta_{k} \in \mathbb{R}\) and \(\nu_{k} \in \mathbb{R}\) represent the infection and recovery rates of the \(k\)-th compartment.

In this example, we consider a model with \(K = 9\) compartments, where the connectivity of the compartments is based on the adjacency of the nine Austrian states (see Cui, Dolgov, and Scheichl 2024, fig. 1). For the initial condition, we set \(S_{1}(0) = 99, I_{1}(0) = 1, R_{1}(0) = 0\), and \(S_{k}(0) = 100, I_{k}(0) = 0, R_{k}(0) = 0\) in all other compartments. We specify the final time to be \(T = 5\).

Prior

To parametrise the prior we treat all parameters, \(\boldsymbol{x} := [\theta_{1}, \nu_{1}, \cdots, \theta_{K}, \nu_{K}]^{\top}\), as independent and uniformly distributed on the interval \([0, 2]\); that is,

\[ \pi_{0}(\boldsymbol{x}) \propto \prod_{k=1}^{2K} \mathbb{I}_{[0, 2]}(x_{k}), \]

where \(\mathbb{I}_{[0, 2]}(\cdot)\) denotes the indicator function for the interval \([0, 2]\).

Likelihood

We assume that we have access to noisy observations of the percentage of infected people in each compartment at time points \(t_{j} = 5j/6\), where \(j \in \{1, 2, 3, 4, 5, 6\}\). Each observation is corrupted by independent Gaussian noise; that is,

\[ y_{k, j} = I_{k}(t_{j}) + \eta_{k,j}, \quad \eta_{k, j} \sim \mathcal{N}(0, 1), \quad k = 1, \dots, K, \quad j = 1, \dots, 6. \]

Therefore, the likelihood takes the form

\[ \mathcal{L}(\boldsymbol{x}; \boldsymbol{y}) \propto \exp\left(-\frac{1}{2}\sum_{k=1}^{K}\sum_{j=1}^{6}\left(I_{k}(t_{j}; \boldsymbol{x}) - y_{k,j}\right)\right). \]

Rare Event

We wish to quantify the probability (under the posterior, \(\pi^{y}(\cdot)\), associated with the above likelihood and prior) that the percentage of infected people in compartment \(k=9\) exceeds the threshold \(I_{\mathrm{max}} = 69\) at any point in the time interval under consideration. That is, we aim to quantify the probability \(\mathbb{P}_{\pi^{y}}(\mathcal{F})\), where the set \(\mathcal{F}\) is defined by

\[ \mathcal{F} := \left\{\boldsymbol{x} \,|\, \max_{t \in [0, 5]} I_{K}(t; \boldsymbol{x}) > I_{\mathrm{max}}\right\}. \]

Rare Event Estimator

We now describe how we can use importance sampling techniques to compute the aforementioned rare event probability. First, note that the probability \(\mathbb{P}_{\pi^{y}}(\mathcal{F})\) can be recast as the expectation

\[ \mathbb{P}_{\pi^{y}}(\mathcal{F}) = \mathbb{E}_{\pi^{y}}[\mathbb{I}_{\mathcal{F}}(\boldsymbol{x})] = \int \mathbb{I}_{\mathcal{F}}(\boldsymbol{x}) \pi^{y}(\boldsymbol{x}) \, \mathrm{d}\boldsymbol{x}, \]

where \(\mathbb{I}_{\mathcal{F}}(\cdot)\) denotes the indicator function associated with set \(\mathcal{F}\). This can be approximated using a Monte Carlo estimate, as

\[ \mathbb{P}_{\pi^{y}}(\mathcal{F}) \approx \frac{1}{N}\sum_{i=1}^{N}\mathbb{I}_{\mathcal{F}}(\boldsymbol{x}^{(i)}), \quad \boldsymbol{x}^{(i)} \sim \pi^{y}(\cdot). \]

However, we are not able to simulate directly from the posterior (nor is this the optimal way to compute the above estimate); instead, we will use an importance sampling estimate. Given an importance sampling density \(p(\cdot)\) such that \(\mathrm{supp}(\pi^{y}) \subseteq \mathrm{supp}(p)\), such an estimate takes the form

\[ \begin{align} \mathbb{P}_{\pi^{y}}(\mathcal{F}) &= \mathbb{E}_{p}\left[\frac{\mathbb{I}_{\mathcal{F}}(\boldsymbol{x})\pi^{y}(\boldsymbol{x})}{p(\boldsymbol{x})}\right] \\ &\approx \sum_{i=1}^{N}\frac{\mathbb{I}_{\mathcal{F}}(\boldsymbol{x}^{(i)})\pi^{y}(\boldsymbol{x}^{(i)})}{p(\boldsymbol{x}^{(i)})}, \quad \boldsymbol{x}^{(i)} \sim p(\cdot). \end{align} \tag{1}\]

The optimal choice of importance density, \(\hat{p}(\cdot)\), is of the form

\[ \hat{p}(\boldsymbol{x}) \propto \mathbb{I}_{\mathcal{F}}(\boldsymbol{x})\pi^{y}(\boldsymbol{x}). \]

We will construct a DIRT approximation to \(\hat{p}(\cdot)\); this will be denoted by \(\tilde{p}(\cdot)\).

Note, however, that we are not able to evaluate the posterior density itself, which is required to compute the estimate in Equation 1. Instead, we can only evaluate the posterior up to an unknown proportionality constant. To account for this, we will estimate the normalising constant of the posterior using a separate importance sampling estimate. Given an importance sampling density \(q(\cdot)\) such that \(\mathrm{supp}(\pi^{y}) \subseteq \mathrm{supp}(q)\), this estimate takes the form

\[ \begin{align} \int \pi^{y}(\boldsymbol{x}) \, \mathrm{d}\boldsymbol{x} &= \mathbb{E}_q\left[\frac{\pi^{y}(\boldsymbol{x})}{q(\boldsymbol{x})}\right] \\ &\approx \sum_{i=1}^{N} \frac{\pi^{y}(\boldsymbol{x}^{(i)})}{q(\boldsymbol{x}^{(i)})}, \quad \boldsymbol{x}^{(i)} \sim q(\cdot). \end{align} \tag{2}\]

In this case, the optimal choice of importance density, \(\hat{q}(\cdot)\), is equal to the posterior itself; that is,

\[ \hat{q}(\boldsymbol{x}) \propto \pi^{y}(\boldsymbol{x}). \]

We will also construct a DIRT approximation to \(\hat{q}(\cdot)\); this will be denoted by \(\tilde{q}(\cdot)\).

Given an estimate in the form of Equation 1 using \(N\) samples from \(\tilde{p}(\cdot)\) (the DIRT approximation to the optimal importance sampling density for the rare event), which we denote by \(\tilde{P}_{N}\), and an estimate in the form of Equation 2 using \(N\) samples from \(\tilde{q}(\cdot)\) (the DIRT approximation to the optimal importance sampling density for the normalising constant), which we denote by \(\tilde{Q}_{N}\), we can estimate the rare event probability using the ratio of these two estimates, \(\tilde{R}_{N}\); that is,

\[ \mathbb{P}_{\pi^{y}}(\mathcal{F}) \approx \tilde{R}_{N} = \frac{\tilde{P}_{N}}{\tilde{Q}_{N}}. \tag{3}\]

We note that estimator \(\hat{R}_{N}\) is biased (it is the ratio of two unbiased estimators), but this bias is negligible; see Cui, Dolgov, and Scheichl (2024).

Implementation in \(\mathtt{deep\_tensor}\)

from typing import Tuple

import torch
from torch import Tensor

import deep_tensor as dt
from examples.sir import SIRCompartmentModel, austria_adjacency
from examples.plotting import set_plot_style
torch.manual_seed(0)
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
set_plot_style()

We begin by defining the model.

num_compartments = 9
dim = 2 * num_compartments

# Specify initial condition
S0 = torch.full((num_compartments,), 100.0)
I0 = torch.zeros((num_compartments,))
R0 = torch.zeros((num_compartments,))
S0[0] = 99.0
I0[0] = 1.0

# Specify time horizon and evaluation times
t1 = 5.0
t_eval = torch.linspace(0, 5, 13*12+1)
inds_obs = torch.arange(0, 13*12+1, 13)

model = SIRCompartmentModel(austria_adjacency, S0, I0, R0, t_eval, inds_obs)

Next, we simulate a set of synthetic observations using the true parameters \(\boldsymbol{x}_{\mathrm{true}} = [0.1, 1, \dots, 0.1, 1]^{\top}\).

# Generate synthetic observations
true_param = torch.tensor([[0.1, 1.0] * num_compartments])
ys_true = model.get_obs(model.solve(true_param))

std_noise = 1.0
ys_obs = ys_true # + std_noise * torch.randn_like(ys_true)

Target Functions

Next, we define the target functions for the numerator and denominator of the ratio estimator.

Note that the prior is uniform and we restrict the DIRT approximation domain to the support of the prior, so the posterior density is proportional to the likelihood function.

# Define numerator function. Note that this returns the potential 
# function associated with the posterior, and the maximum value of the 
# percentage of infected people over the time horizon, for each set of 
# parameters.
def rare_event(xs: Tensor) -> Tuple[Tensor, Tensor]:
    ys_full = model.solve(xs)
    ys = model.get_obs(ys_full)
    neglogliks = (ys-ys_obs).square().sum(dim=1) / (2*std_noise**2)
    response = model.response_func(ys_full)
    return neglogliks, response

# Define denominator function
def neglogpost(xs: Tensor) -> Tensor:
    ys = model.get_obs(model.solve(xs))
    neglogliks = (ys-ys_obs).square().sum(dim=1) / (2*std_noise**2)
    return neglogliks

I_max = 69.0

rare_event_func = dt.RareEventFunc(rare_event, threshold=I_max)
neglogpost_func = dt.TargetFunc(neglogpost)

Preconditioner

We now specify a reference density and a preconditioning mapping. Following Cui, Dolgov, and Scheichl (2024), we will use a Gaussian reference density with support truncated to \([-3, 3]^{2K}\).

domain = dt.BoundedDomain(bounds=torch.tensor([-3.0, 3.0]))
reference = dt.GaussianReference(domain)

We will define a preconditioner that maps between this reference density and the uniform density on \([0, 2]^{2K}\) (which coincides with the prior).

bounds = torch.tensor([[0.0, 2.0]]).tile((dim, 1))
preconditioner = dt.UniformMapping(bounds=bounds, reference=reference)

Functional Tensor Train

Next, we specify the properties of the functional tensor trains that will be used to approximate each incremental mapping in the final DIRT object.

basis = dt.Lagrange1(num_elems=16)
bases = dt.ApproxBases(basis, dim)

tt_options = dt.TTOptions(init_rank=1, max_rank=5, max_als=3, verbose=0)
tt = dt.TT(tt_options)
ftt = dt.FTT(bases, tt)

Bridging Densities

Next, we specify a set of intermediate (bridging) densities to approximate using each layer of the DIRT construction.

Following Cui, Dolgov, and Scheichl (2024), we use a set of tempered densities, \(\{\pi^{(d)}_{k}(\cdot)\}_{k=1}^{N}\) to approximate the optimal importance sampling density associated with the denominator of the ratio function (i.e., the posterior). In particular, the \(k\)-th density takes the form

\[ \pi_{k}^{(d)}(\boldsymbol{x}) \propto \mathcal{L}(\boldsymbol{x}; \boldsymbol{y})^{\alpha_{k}} \pi_{0}(\boldsymbol{x}), \]

where the tempering parameters start with \(\alpha_{1} = 10^{-5}\), and are incremented such that \(\alpha_{k+1} = 10^{1/3}\alpha_{k}\) until \(\alpha_{N} = 1\). This means that \(N = 16\).

betas = 10 ** torch.linspace(-5.0, 0.0, 16)
bridge_q_tilde = dt.Tempering(betas)

To approximate the optimal importance sampling density associated with the numerator of the ratio function, we first note that if the set \(\mathcal{F}\) is not aligned with the coordinate axes of the parameters, the FTT approximation of the optimal importance density may require very high ranks. Therefore, following Cui, Dolgov, and Scheichl (2024), we replace the indicator function \(\mathbb{I}_{\mathcal{F}}(\cdot)\) with the (smooth) sigmoid function \(g_{\gamma}(\cdot)\), which is defined as

\[ g_{\gamma}(\boldsymbol{x}) := \left(1 + \exp\left[\gamma\left(I_{\mathrm{max}} - \max_{t \in [0, 5]} I_{K}(t; \boldsymbol{x}) \right)\right]\right)^{-1}. \]

Observe that in the limit \(\gamma \rightarrow \infty\), the above becomes the indicator function.

In addition, we employ separate tempering of the posterior density, as above. The intermediate densities therefore take the form

\[ \pi_{k}^{(n)}(\boldsymbol{x}) \propto \mathcal{L}(\boldsymbol{x}; \boldsymbol{y})^{\beta_{k}} \pi_{0}(\boldsymbol{x}) g_{\gamma_{k}}(\boldsymbol{x}), \]

where \(\beta_{k} = \alpha_{k}\) and \(\gamma_{k} = \beta_{k}\gamma_{*}\). We choose a value of \(\gamma_{*} = 10^{4}/I_{\mathrm{max}}\).

betas = 10 ** torch.linspace(-5.0, 0.0, 16)
gamma_prime = 1e04 / I_max
gammas = betas * gamma_prime
bridge_p_tilde = dt.SigmoidSmoothing(gammas, betas)

DIRT Objects

We can now construct the DIRT approximations to the importance densities used for the numerator (\(\tilde{P}_{N}\)) and denominator (\(\tilde{Q}_{N}\)) of the ratio estimator, \(\tilde{R}_{N}\) (Equation 3).

dirt_options = dt.DIRTOptions(verbose=0)
p_tilde = dt.DIRT(rare_event_func, preconditioner, ftt, bridge_p_tilde, dirt_options)
q_tilde = dt.DIRT(neglogpost_func, preconditioner, ftt, bridge_q_tilde, dirt_options)

Rare Event Estimation

Now that the DIRT approximations to the importance densities have been constructed, we can compute an importance sampling estimate of the rare event probability. We begin by defining a function which computes the ratio \(\tilde{R}\) using a specified number of samples.

def estimate_rare_event(
    rare_event_func: dt.TargetFunc,
    neglogpost_func: dt.TargetFunc,
    p_tilde: dt.DIRT, 
    q_tilde: dt.DIRT,
    num_samples: int
) -> float:

    rs = p_tilde.reference.random(num_samples, p_tilde.dim)
    xs, neglogevents_dirt = p_tilde.eval_irt(rs)
    neglogevents_true = rare_event_func(xs)

    P_is = dt.run_importance_sampling(neglogevents_dirt, neglogevents_true)
    P_hat = P_is.log_weights.exp().mean()

    rs = q_tilde.reference.random(num_samples, q_tilde.dim)
    xs, neglogposts_dirt = q_tilde.eval_irt(rs)
    neglogposts_true = neglogpost_func(xs)

    Q_is = dt.run_importance_sampling(neglogposts_dirt, neglogposts_true)
    Q_hat = Q_is.log_weights.exp().mean()

    R_hat = P_hat / Q_hat
    return float(R_hat)

We will then call this function repeatedly to obtain a set of estimates of the rare event probability.

num_estimates = 20
num_samples = 10_000

R_hats = torch.tensor([
    estimate_rare_event(
        rare_event_func,
        neglogpost_func,
        p_tilde, 
        q_tilde, 
        num_samples
    ) for _ in range(num_estimates)
])

print(f"Estimate of event probability: {R_hats.mean():.4e}.")
print(f"Standard deviation of estimate of event probability: {R_hats.std():.4e}.")
Estimate of event probability: 4.3823e-10.
Standard deviation of estimate of event probability: 2.9122e-12.

References

Cui, Tiangang, Sergey Dolgov, and Robert Scheichl. 2024. “Deep Importance Sampling Using Tensor Trains with Application to a Priori and a Posteriori Rare Events.” SIAM Journal on Scientific Computing 46 (1): C1–29. https://doi.org/10.1137/23M1546981.