Heat Equation

Here, we characterise the posterior distribution of the diffusion coefficient of a two-dimensional heat equation. We will consider a similar setup to that described in Cui and Dolgov (2022).

Problem Setup

We consider the domain \(\Omega := (0, 3) \times (0, 1)\), with boundary denoted by \(\partial \Omega\). The change in temperature, \(u(\boldsymbol{x}, t)\), at each point in the domain over time can be modelled by the heat equation,

\[ \frac{\partial u(\boldsymbol{x}, t)}{\partial t} = \nabla \cdot (\kappa(\boldsymbol{x}) \nabla u(\boldsymbol{x}, t)) + f(\boldsymbol{x}, t), \quad \boldsymbol{x} \in \Omega, t \in (0, T], \]

where \(\kappa(\boldsymbol{x})\) denotes the (spatially varying) diffusion coefficient, and \(f(\boldsymbol{x}, t)\) denotes the forcing term, which models heat sources or sinks. We set the end time to \(T = 10\), and impose the initial and boundary conditions

\[ \begin{align} u(\boldsymbol{x}, 0) &= 0, \qquad \boldsymbol{x} \in \Omega, \\ \frac{\partial \kappa(\boldsymbol{x}) u(\boldsymbol{x}, t)}{\partial \boldsymbol{n}} &= 0, \qquad \boldsymbol{x} \in \partial\Omega. \end{align} \]

In the above, \(\boldsymbol{n}\) denotes the outward-facing normal vector on the boundary of the domain.

We assume that the forcing term is given by

\[ f(\boldsymbol{x}, t) = c \left(\exp\left(−\frac{1}{2r^{2}}||\boldsymbol{x} − \boldsymbol{a}||^{2}\right) − \exp\left(-\frac{1}{2r^{2}}||\boldsymbol{x} − \boldsymbol{b}||^{2}\right)\right), \]

where \(\boldsymbol{a} = \begin{bmatrix} 1/2, 1/2 \end{bmatrix}^{\top}\), \(\boldsymbol{b} = [5/2, 1/2]^{\top}\), and \(c = 5 \pi \times 10^{-2}\).

Prior Density

We endow the logarithm of the unknown diffusion coefficient with a process convolution prior; that is,

\[ \log(\kappa(\boldsymbol{x})) = \log(\bar{\kappa}(\boldsymbol{x})) + \sum_{i=1}^{d} \xi^{(i)} \exp\left(-\frac{1}{2r^{2}}\left\lVert\boldsymbol{x} - \boldsymbol{x}^{(i)}\right\rVert^{2}\right), \]

where \(d=27\), \(\log(\bar{\kappa}(\boldsymbol{x}))=-5\), \(r=1/16\), the coefficients \(\{\xi^{(i)}\}_{i=1}^{d}\) are independent and follow the unit Gaussian distribution, and the centres of the kernel functions, \(\{\boldsymbol{x}^{(i)}\}_{i=1}^{d}\), form a grid over the domain (see Figure 1).

Data

To estimate the diffusivity coefficient, we assume that we have access to measurements of the temperature at 13 locations in the model domain (see Figure 2), recorded at one-second intervals. This gives a total of 130 measurements. All measurements are corrupted by i.i.d. Gaussian noise with zero mean and a standard deviation of \(\sigma=1.65 \times 10^{-2}\).

Implementation in \(\texttt{deep\_tensor}\)

We will now use \(\texttt{deep\_tensor}\) to construct a DIRT approximation to the posterior. To accelerate this process, we will use a reduced order model in place of the full model. Then, we will illustrate some debiasing techniques which use the DIRT approximation to the posterior, in combination with the full model, to accelerate the process of drawing exact posterior samples.

from matplotlib import pyplot as plt
import torch

import deep_tensor as dt

from examples.heat import setup_heat_problem

We begin by defining the prior, (full) model and reduced order model.

The full model is implemented in FEniCS, on a \(96 \times 32\) grid, using piecwise linear basis functions. Timestepping is done using the backward Euler method. The reduced order model is constructed using the proper orthogonal decomposition (see, e.g., Benner, Gugercin, and Willcox 2015).

# Construct the prior, full model and reduced order model
prior, model_full, model_rom = setup_heat_problem()

Next, we will generate the true log-diffusion coefficient using a sample from the prior. The true log-diffusion coefficient is plotted in Figure 1.

xi_true = torch.randn(prior.dim)
logk_true = prior.transform(xi_true)
Code
fig, ax = plt.subplots(figsize=(6.0, 2.0))
cbar_label = r"$\log(\kappa(\bm{x}))$"
plot_dl_function(fig, ax, model_full.vec2func(logk_true), cbar_label)
ax.scatter(*prior.ss.T, s=16, c="k", marker="x")
ax.set_xlabel(r"$x_{0}$")
ax.set_ylabel(r"$x_{1}$")
plt.show()
Figure 1: The true log-diffusion coefficient, \(\log(\kappa(\boldsymbol{x}))\), and the centres of the kernel functions of the process convolution prior (black crosses).

Next, we will solve the (full) model to obtain the modelled temperatures corresponding to the true diffusion coefficient, and use these to generate some synthetic data. Figure 2 shows the true temperature field at time \(T=10\), as well as the observation locations.

# Generate true temperature field
u_true = model_full.solve(logk_true)

# Specify magnitude of observation noise
std_error = 1.65e-2
var_error = std_error ** 2

# Extract true temperatures at the observation locations and add 
# observation noise
d_obs = model_full.observe(u_true)
noise = std_error * torch.randn_like(d_obs)
d_obs += noise
Code
fig, ax = plt.subplots(figsize=(6.0, 2.0))
cbar_label = r"$u(\bm{x}, 10)$"
plot_dl_function(fig, ax, model_full.vec2func(u_true[:, -1]), cbar_label, vmin=-0.15, vmax=0.10)
ax.scatter(*model_full.xs_obs.T, s=16, c="k", marker=".")
ax.set_xlabel(r"$x_{0}$")
ax.set_ylabel(r"$x_{1}$")
plt.show()
Figure 2: The true final temperature distribution, \(u(\boldsymbol{x}, 10)\), and the observation locations (black dots).

Building the DIRT Object

Now we will build a DIRT object to approximate the posterior density of the coefficients, \(\{\xi^{(i)}\}_{i=1}^{d}\), used to parametrise the log-diffusion coefficient, for the reduced-order model. We begin by defining functions which return the potential associated with the likelihood and prior.

def neglogpri(xs: torch.Tensor) -> torch.Tensor:
    """Returns the negative log prior density evaluated a given set of 
    samples.
    """
    return 0.5 * xs.square().sum(dim=1)

def _negloglik(model, xs: torch.Tensor) -> torch.Tensor:
    """Returns the negative log-likelihood, for a given model, 
    evaluated at each of a set of samples.
    """
    neglogliks = torch.zeros(xs.shape[0])
    for i, x in enumerate(xs):
        k = prior.transform(x)
        us = model.solve(k)
        d = model.observe(us)
        neglogliks[i] = 0.5 * (d - d_obs).square().sum() / var_error
    return neglogliks

def negloglik_full(xs: torch.Tensor) -> torch.Tensor:
    """Returns the negative log-likelihood for the full model (to be 
    used later).
    """
    return _negloglik(model_full, xs)

def negloglik_rom(xs: torch.Tensor) -> torch.Tensor:
    """Returns the negative log-likelihood for the reduced-order model."""
    return _negloglik(model_rom, xs)

Next, we specify a preconditioner. Because the prior of the coefficients \(\{\xi^{(i)}\}_{i=1}^{d}\) is the standard Gaussian, the mapping between a Gaussian reference and the prior is simply the identity mapping. This is an appropriate choice of preconditioner in the absence of any other information.

reference = dt.GaussianReference()
preconditioner = dt.IdentityMapping(prior.dim, reference)

Next, we specify a polynomial basis.

poly = dt.Legendre(order=20)

Finally, we can construct the DIRT object.

# Reduce the initial and maximum tensor ranks to reduce the cost of each layer
tt_options = dt.TTOptions(init_rank=12, max_rank=12)

dirt = dt.DIRT(
    negloglik_rom, 
    neglogpri,
    preconditioner,
    poly, 
    tt_options=tt_options
)
[DIRT] Iter:  1 | Cum. Fevals: 2.00e+03 | Cum. Time: 3.31e+00 s | Beta: 0.0001 | ESS: 0.9961
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105252 |       14 |     1.00000e+00 |      1.00000e+00 |     2.75250e-04 |      2.46817e-04
[ALS]  ALS complete.
[DIRT] Iter:  2 | Cum. Fevals: 2.15e+05 | Cum. Time: 1.84e+02 s | Beta: 0.0098 | ESS: 0.5144 | DHell: 0.0008
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     9.79286e-01 |      1.28998e-01 |     2.34390e-02 |      1.97313e-02
[ALS]  ALS complete.
[DIRT] Iter:  3 | Cum. Fevals: 4.28e+05 | Cum. Time: 3.72e+02 s | Beta: 0.0366 | ESS: 0.5183 | DHell: 0.0202
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     7.23863e-01 |      1.54375e-01 |     1.12788e-02 |      1.41172e-02
[ALS]  ALS complete.
[DIRT] Iter:  4 | Cum. Fevals: 6.41e+05 | Cum. Time: 5.68e+02 s | Beta: 0.0972 | ESS: 0.5138 | DHell: 0.0427
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     9.95568e-01 |      9.11834e-02 |     2.48469e-02 |      2.67500e-02
[ALS]  ALS complete.
[DIRT] Iter:  5 | Cum. Fevals: 8.54e+05 | Cum. Time: 7.68e+02 s | Beta: 0.2122 | ESS: 0.5119 | DHell: 0.0327
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     1.95882e+00 |      1.27061e-01 |     3.21562e-02 |      2.97796e-02
[ALS]  ALS complete.
[DIRT] Iter:  6 | Cum. Fevals: 1.07e+06 | Cum. Time: 9.78e+02 s | Beta: 0.4001 | ESS: 0.5382 | DHell: 0.0400
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     2.23520e+00 |      1.15147e-01 |     2.04326e-02 |      3.30888e-02
[ALS]  ALS complete.
[DIRT] Iter:  7 | Cum. Fevals: 1.28e+06 | Cum. Time: 1.22e+03 s | Beta: 0.7185 | ESS: 0.5394 | DHell: 0.0485
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     7.65588e+00 |      3.24070e-01 |     3.16540e-02 |      2.91619e-02
[ALS]  ALS complete.
[DIRT] Iter:  8 | Cum. Fevals: 1.49e+06 | Cum. Time: 1.45e+03 s | Beta: 1.0000 | ESS: 0.7690 | DHell: 0.0643
[ALS]  Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS]     1 |     105588 |       14 |     5.96450e-01 |      5.68127e-02 |     1.17163e-02 |      2.11140e-02
[ALS]  ALS complete.
[DIRT] DIRT construction complete.
[DIRT]  • Layers: 8.
[DIRT]  • Total function evaluations: 1,706,736.
[DIRT]  • Total time: 1.7e+03 s.
[DIRT]  • DHell: 0.0755.

Debiasing

We will now use the DIRT density as the proposal density for an independence MCMC sampler. This will allow us to sample exactly from the posterior associated with the full model.

# Generate a set of samples from the DIRT density
rs = dirt.reference.random(d=dirt.dim, n=5000)
xs, potentials_dirt = dirt.eval_irt(rs)

# Evaluate the true potential function (for the full model) at each sample
potentials_exact = neglogpri(xs) + negloglik_full(xs)

# Run independence sampler
res = dt.run_independence_sampler(xs, potentials_dirt, potentials_exact)

print(f"Acceptance rate: {res.acceptance_rate:.2f}")
print(f"Mean IACT (all parameters): {res.iacts.mean():.2f}")
print(f"Maximum IACT (all parameters): {res.iacts.max():.2f}")
Acceptance rate: 0.86
Mean IACT (all parameters): 1.38
Maximum IACT (all parameters): 1.73

For all parameters, the integrated autocorrelation time (IACT) is small, which suggests that the sampler is moving around the posterior efficiently. Figure 3 shows trace plots for the log-posterior density and selected parameters, which reinforces this idea.

Code
parameters = torch.hstack((res.potentials[:, None], res.xs[:, 21:23]))
ylabels = [r"$-\log(f(x))$", r"$\xi_{22}$", r"$\xi_{23}$"]

fig, axes = plt.subplots(1, 3, figsize=(7.5, 3))

for i, ax in enumerate(axes):
    ax.plot(parameters[:, i], c="tab:red", lw=0.5)
    ax.set_ylabel(ylabels[i])
    ax.set_box_aspect(1)
    add_arrows(ax)

axes[1].set_xlabel("Iteration")
plt.show()
Figure 3: Trace plots for the negative log-posterior density (left), and coefficients \(\xi_{22}\) (centre) and \(\xi_{23}\) (right).

Finally, Figure 4 shows a set of prior and posterior samples for three selected coefficients. It is evident that the posterior is significantly different to the prior.

Code
xs_pri = dirt.reference.random(d=dirt.dim, n=1000)
xs_post = res.xs[::10]

labels = [r"$\xi_{"+f"{i}"+r"}$" for i in range(22, 25)]
bounds = torch.tensor([[-4.0, 4.0], [-4.0, 4.0], [-4.0, 4.0]])

pairplot(
    xs_post[:, 21:24], 
    xs_pri[:, 21:24], 
    truth=xi_true[21:24],
    labels=labels, 
    x_label="Posterior",
    y_label="Prior", 
    bounds=bounds
)
Figure 4: A pair plot showing 1000 samples from the prior and posterior of three selected coefficients, as well as the true values.

References

Benner, Peter, Serkan Gugercin, and Karen Willcox. 2015. “A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems.” SIAM Review 57 (4): 483–531. https://doi.org/10.1137/13093271.
Cui, Tiangang, and Sergey Dolgov. 2022. “Deep Composition of Tensor-Trains Using Squared Inverse Rosenblatt Transports.” Foundations of Computational Mathematics 22 (6): 1863–1922. https://doi.org/10.1007/s10208-021-09537-5.