from matplotlib import pyplot as plt
import torch
import deep_tensor as dt
from examples.heat import setup_heat_problem
Heat Equation
Here, we characterise the posterior distribution of the diffusion coefficient of a two-dimensional heat equation. We will consider a similar setup to that described in Cui and Dolgov (2022).
Problem Setup
We consider the domain \(\Omega := (0, 3) \times (0, 1)\), with boundary denoted by \(\partial \Omega\). The change in temperature, \(u(\boldsymbol{x}, t)\), at each point in the domain over time can be modelled by the heat equation,
\[ \frac{\partial u(\boldsymbol{x}, t)}{\partial t} = \nabla \cdot (\kappa(\boldsymbol{x}) \nabla u(\boldsymbol{x}, t)) + f(\boldsymbol{x}, t), \quad \boldsymbol{x} \in \Omega, t \in (0, T], \]
where \(\kappa(\boldsymbol{x})\) denotes the (spatially varying) diffusion coefficient, and \(f(\boldsymbol{x}, t)\) denotes the forcing term, which models heat sources or sinks. We set the end time to \(T = 10\), and impose the initial and boundary conditions
\[ \begin{align} u(\boldsymbol{x}, 0) &= 0, \qquad \boldsymbol{x} \in \Omega, \\ \frac{\partial \kappa(\boldsymbol{x}) u(\boldsymbol{x}, t)}{\partial \boldsymbol{n}} &= 0, \qquad \boldsymbol{x} \in \partial\Omega. \end{align} \]
In the above, \(\boldsymbol{n}\) denotes the outward-facing normal vector on the boundary of the domain.
We assume that the forcing term is given by
\[ f(\boldsymbol{x}, t) = c \left(\exp\left(−\frac{1}{2r^{2}}||\boldsymbol{x} − \boldsymbol{a}||^{2}\right) − \exp\left(-\frac{1}{2r^{2}}||\boldsymbol{x} − \boldsymbol{b}||^{2}\right)\right), \]
where \(\boldsymbol{a} = \begin{bmatrix} 1/2, 1/2 \end{bmatrix}^{\top}\), \(\boldsymbol{b} = [5/2, 1/2]^{\top}\), and \(c = 5 \pi \times 10^{-2}\).
Prior Density
We endow the logarithm of the unknown diffusion coefficient with a process convolution prior; that is,
\[ \log(\kappa(\boldsymbol{x})) = \log(\bar{\kappa}(\boldsymbol{x})) + \sum_{i=1}^{d} \xi^{(i)} \exp\left(-\frac{1}{2r^{2}}\left\lVert\boldsymbol{x} - \boldsymbol{x}^{(i)}\right\rVert^{2}\right), \]
where \(d=27\), \(\log(\bar{\kappa}(\boldsymbol{x}))=-5\), \(r=1/16\), the coefficients \(\{\xi^{(i)}\}_{i=1}^{d}\) are independent and follow the unit Gaussian distribution, and the centres of the kernel functions, \(\{\boldsymbol{x}^{(i)}\}_{i=1}^{d}\), form a grid over the domain (see Figure 1).
Data
To estimate the diffusivity coefficient, we assume that we have access to measurements of the temperature at 13 locations in the model domain (see Figure 2), recorded at one-second intervals. This gives a total of 130 measurements. All measurements are corrupted by i.i.d. Gaussian noise with zero mean and a standard deviation of \(\sigma=1.65 \times 10^{-2}\).
Implementation in \(\texttt{deep\_tensor}\)
We will now use \(\texttt{deep\_tensor}\) to construct a DIRT approximation to the posterior. To accelerate this process, we will use a reduced order model in place of the full model. Then, we will illustrate some debiasing techniques which use the DIRT approximation to the posterior, in combination with the full model, to accelerate the process of drawing exact posterior samples.
We begin by defining the prior, (full) model and reduced order model.
The full model is implemented in FEniCS, on a \(96 \times 32\) grid, using piecwise linear basis functions. Timestepping is done using the backward Euler method. The reduced order model is constructed using the proper orthogonal decomposition (see, e.g., Benner, Gugercin, and Willcox 2015).
# Construct the prior, full model and reduced order model
= setup_heat_problem() prior, model_full, model_rom
Next, we will generate the true log-diffusion coefficient using a sample from the prior. The true log-diffusion coefficient is plotted in Figure 1.
= torch.randn(prior.dim)
xi_true = prior.transform(xi_true) logk_true
Code
= plt.subplots(figsize=(6.0, 2.0))
fig, ax = r"$\log(\kappa(\bm{x}))$"
cbar_label
plot_dl_function(fig, ax, model_full.vec2func(logk_true), cbar_label)*prior.ss.T, s=16, c="k", marker="x")
ax.scatter(r"$x_{0}$")
ax.set_xlabel(r"$x_{1}$")
ax.set_ylabel( plt.show()

Next, we will solve the (full) model to obtain the modelled temperatures corresponding to the true diffusion coefficient, and use these to generate some synthetic data. Figure 2 shows the true temperature field at time \(T=10\), as well as the observation locations.
# Generate true temperature field
= model_full.solve(logk_true)
u_true
# Specify magnitude of observation noise
= 1.65e-2
std_error = std_error ** 2
var_error
# Extract true temperatures at the observation locations and add
# observation noise
= model_full.observe(u_true)
d_obs = std_error * torch.randn_like(d_obs)
noise += noise d_obs
Code
= plt.subplots(figsize=(6.0, 2.0))
fig, ax = r"$u(\bm{x}, 10)$"
cbar_label -1]), cbar_label, vmin=-0.15, vmax=0.10)
plot_dl_function(fig, ax, model_full.vec2func(u_true[:, *model_full.xs_obs.T, s=16, c="k", marker=".")
ax.scatter(r"$x_{0}$")
ax.set_xlabel(r"$x_{1}$")
ax.set_ylabel( plt.show()

Building the DIRT Object
Now we will build a DIRT object to approximate the posterior density of the coefficients, \(\{\xi^{(i)}\}_{i=1}^{d}\), used to parametrise the log-diffusion coefficient, for the reduced-order model. We begin by defining functions which return the potential associated with the likelihood and prior.
def neglogpri(xs: torch.Tensor) -> torch.Tensor:
"""Returns the negative log prior density evaluated a given set of
samples.
"""
return 0.5 * xs.square().sum(dim=1)
def _negloglik(model, xs: torch.Tensor) -> torch.Tensor:
"""Returns the negative log-likelihood, for a given model,
evaluated at each of a set of samples.
"""
= torch.zeros(xs.shape[0])
neglogliks for i, x in enumerate(xs):
= prior.transform(x)
k = model.solve(k)
us = model.observe(us)
d = 0.5 * (d - d_obs).square().sum() / var_error
neglogliks[i] return neglogliks
def negloglik_full(xs: torch.Tensor) -> torch.Tensor:
"""Returns the negative log-likelihood for the full model (to be
used later).
"""
return _negloglik(model_full, xs)
def negloglik_rom(xs: torch.Tensor) -> torch.Tensor:
"""Returns the negative log-likelihood for the reduced-order model."""
return _negloglik(model_rom, xs)
Next, we specify a preconditioner. Because the prior of the coefficients \(\{\xi^{(i)}\}_{i=1}^{d}\) is the standard Gaussian, the mapping between a Gaussian reference and the prior is simply the identity mapping. This is an appropriate choice of preconditioner in the absence of any other information.
= dt.GaussianReference()
reference = dt.IdentityMapping(prior.dim, reference) preconditioner
Next, we specify a polynomial basis.
= dt.Legendre(order=20) poly
Finally, we can construct the DIRT object.
# Reduce the initial and maximum tensor ranks to reduce the cost of each layer
= dt.TTOptions(init_rank=12, max_rank=12)
tt_options
= dt.DIRT(
dirt
negloglik_rom,
neglogpri,
preconditioner,
poly, =tt_options
tt_options )
[DIRT] Iter: 1 | Cum. Fevals: 2.00e+03 | Cum. Time: 3.31e+00 s | Beta: 0.0001 | ESS: 0.9961
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105252 | 14 | 1.00000e+00 | 1.00000e+00 | 2.75250e-04 | 2.46817e-04
[ALS] ALS complete.
[DIRT] Iter: 2 | Cum. Fevals: 2.15e+05 | Cum. Time: 1.84e+02 s | Beta: 0.0098 | ESS: 0.5144 | DHell: 0.0008
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 9.79286e-01 | 1.28998e-01 | 2.34390e-02 | 1.97313e-02
[ALS] ALS complete.
[DIRT] Iter: 3 | Cum. Fevals: 4.28e+05 | Cum. Time: 3.72e+02 s | Beta: 0.0366 | ESS: 0.5183 | DHell: 0.0202
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 7.23863e-01 | 1.54375e-01 | 1.12788e-02 | 1.41172e-02
[ALS] ALS complete.
[DIRT] Iter: 4 | Cum. Fevals: 6.41e+05 | Cum. Time: 5.68e+02 s | Beta: 0.0972 | ESS: 0.5138 | DHell: 0.0427
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 9.95568e-01 | 9.11834e-02 | 2.48469e-02 | 2.67500e-02
[ALS] ALS complete.
[DIRT] Iter: 5 | Cum. Fevals: 8.54e+05 | Cum. Time: 7.68e+02 s | Beta: 0.2122 | ESS: 0.5119 | DHell: 0.0327
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 1.95882e+00 | 1.27061e-01 | 3.21562e-02 | 2.97796e-02
[ALS] ALS complete.
[DIRT] Iter: 6 | Cum. Fevals: 1.07e+06 | Cum. Time: 9.78e+02 s | Beta: 0.4001 | ESS: 0.5382 | DHell: 0.0400
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 2.23520e+00 | 1.15147e-01 | 2.04326e-02 | 3.30888e-02
[ALS] ALS complete.
[DIRT] Iter: 7 | Cum. Fevals: 1.28e+06 | Cum. Time: 1.22e+03 s | Beta: 0.7185 | ESS: 0.5394 | DHell: 0.0485
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 7.65588e+00 | 3.24070e-01 | 3.16540e-02 | 2.91619e-02
[ALS] ALS complete.
[DIRT] Iter: 8 | Cum. Fevals: 1.49e+06 | Cum. Time: 1.45e+03 s | Beta: 1.0000 | ESS: 0.7690 | DHell: 0.0643
[ALS] Iter | Func Evals | Max Rank | Max Local Error | Mean Local Error | Max Debug Error | Mean Debug Error
[ALS] 1 | 105588 | 14 | 5.96450e-01 | 5.68127e-02 | 1.17163e-02 | 2.11140e-02
[ALS] ALS complete.
[DIRT] DIRT construction complete.
[DIRT] • Layers: 8.
[DIRT] • Total function evaluations: 1,706,736.
[DIRT] • Total time: 1.7e+03 s.
[DIRT] • DHell: 0.0755.
Debiasing
We will now use the DIRT density as the proposal density for an independence MCMC sampler. This will allow us to sample exactly from the posterior associated with the full model.
# Generate a set of samples from the DIRT density
= dirt.reference.random(d=dirt.dim, n=5000)
rs = dirt.eval_irt(rs)
xs, potentials_dirt
# Evaluate the true potential function (for the full model) at each sample
= neglogpri(xs) + negloglik_full(xs)
potentials_exact
# Run independence sampler
= dt.run_independence_sampler(xs, potentials_dirt, potentials_exact)
res
print(f"Acceptance rate: {res.acceptance_rate:.2f}")
print(f"Mean IACT (all parameters): {res.iacts.mean():.2f}")
print(f"Maximum IACT (all parameters): {res.iacts.max():.2f}")
Acceptance rate: 0.86
Mean IACT (all parameters): 1.38
Maximum IACT (all parameters): 1.73
For all parameters, the integrated autocorrelation time (IACT) is small, which suggests that the sampler is moving around the posterior efficiently. Figure 3 shows trace plots for the log-posterior density and selected parameters, which reinforces this idea.
Code
= torch.hstack((res.potentials[:, None], res.xs[:, 21:23]))
parameters = [r"$-\log(f(x))$", r"$\xi_{22}$", r"$\xi_{23}$"]
ylabels
= plt.subplots(1, 3, figsize=(7.5, 3))
fig, axes
for i, ax in enumerate(axes):
="tab:red", lw=0.5)
ax.plot(parameters[:, i], c
ax.set_ylabel(ylabels[i])1)
ax.set_box_aspect(
add_arrows(ax)
1].set_xlabel("Iteration")
axes[ plt.show()

Finally, Figure 4 shows a set of prior and posterior samples for three selected coefficients. It is evident that the posterior is significantly different to the prior.
Code
= dirt.reference.random(d=dirt.dim, n=1000)
xs_pri = res.xs[::10]
xs_post
= [r"$\xi_{"+f"{i}"+r"}$" for i in range(22, 25)]
labels = torch.tensor([[-4.0, 4.0], [-4.0, 4.0], [-4.0, 4.0]])
bounds
pairplot(21:24],
xs_post[:, 21:24],
xs_pri[:, =xi_true[21:24],
truth=labels,
labels="Posterior",
x_label="Prior",
y_label=bounds
bounds )
