Generative Models for Inverse Problems

Generative Models for Inverse Problems#

Bayesian Framing and Variational Reconstruction#

The whole module can be read as a progressive build-up toward this point. We started from classical inverse problems, then moved to supervised neural reconstruction, then discussed noise and realism in the data-generation process, and finally introduced several kinds of generative models. The final question is therefore inevitable: how can these generative models actually be used to solve inverse problems in imaging?

This chapter is the conceptual synthesis of the course. It shows how a learned image prior can be combined with the measurement model to produce reconstructions that are not only visually plausible, but also informed by the physics of acquisition.

Bayesian formulation.

Let

\[ \boldsymbol{y}^\delta = K \boldsymbol{x}^\dagger + \boldsymbol{e} \]

be the measurement model. If the noise is Gaussian,

\[ \boldsymbol{e} \sim \mathcal{N}(0,\sigma^2 I), \]

then the likelihood is

\[ p(\boldsymbol{y}^\delta| \boldsymbol{x}) \propto \exp\left(-\frac{1}{2\sigma^2}\|K\boldsymbol{x}-\boldsymbol{y}^\delta\|_2^2\right). \]

If we also have a prior distribution \(p(\boldsymbol{x})\) over plausible images, Bayes’ theorem gives the posterior

\[ p(\boldsymbol{x}| \boldsymbol{y}^\delta) \propto p(\boldsymbol{y}^\delta| \boldsymbol{x})p(\boldsymbol{x}). \]

This formula is the bridge between classical regularization and modern generative reconstruction. The likelihood encodes data consistency, while the prior encodes what kinds of images are considered plausible.

The great promise of generative models is precisely that they can provide a powerful learned approximation of the prior term.

MAP estimation and the classical variational form.

If one seeks only a single reconstruction, a natural choice is the maximum a posteriori estimator:

\[ \widehat{\boldsymbol{x}}_{\mathrm{MAP}} = \operatorname*{arg\,max}_\boldsymbol{x} p(\boldsymbol{x}| \boldsymbol{y}^\delta). \]

Taking the negative logarithm and dropping constants independent of \(\boldsymbol{x}\), this becomes

\[ \widehat{\boldsymbol{x}}_{\mathrm{MAP}} = \operatorname*{arg\,min}_\boldsymbol{x} \frac{1}{2\sigma^2}\|K\boldsymbol{x}-\boldsymbol{y}^\delta\|_2^2 -\log p(\boldsymbol{x}). \]

This is exactly the familiar variational form

\[ \operatorname*{arg\,min}_\boldsymbol{x} \mathcal{D}(K\boldsymbol{x},\boldsymbol{y}^\delta)+\mathcal{R}(\boldsymbol{x}), \]

with a learned regularizer

\[ \mathcal{R}(\boldsymbol{x}) = -\log p(\boldsymbol{x}). \]

This identity is worth highlighting. It shows that generative inverse problems are not disconnected from classical regularization theory. They are a modern way of learning the prior instead of hand-designing it.

Generative Priors Through Latent Models and Denoisers#

Note

This chapter is where the whole module closes conceptually: inverse problems, probabilistic modeling, and learned reconstruction algorithms meet in a single mathematical framework.

Suppose that a generative model provides a decoder or generator

\[ G_{\boldsymbol{\Theta}} : \mathbb{R}^k \to \mathbb{R}^n, \qquad \boldsymbol{x} = G_{\boldsymbol{\Theta}}(\boldsymbol{z}), \]

with latent prior

\[ \boldsymbol{z} \sim \mathcal{N}(0,I). \]

Then one may solve the inverse problem in latent space rather than in image space:

\[ \widehat{\boldsymbol{z}} = \operatorname*{arg\,min}_{\boldsymbol{z}} \frac{1}{2\sigma^2}\|KG_{\boldsymbol{\Theta}}(\boldsymbol{z})-\boldsymbol{y}^\delta\|_2^2 + \frac{\lambda}{2}\|\boldsymbol{z}\|_2^2. \]

The reconstruction is then

\[ \widehat{\boldsymbol{x}}=G_{\boldsymbol{\Theta}}(\widehat{\boldsymbol{z}}). \]

This strategy is mathematically appealing because the search space has dimension \(k\) instead of \(n\), often with \(k \ll n\), as already emphasized in generative compressed sensing methods such as [2]. The generator acts as a nonlinear low-dimensional manifold of plausible images.

# Higher-dimensional latent optimization on an image patch.
from pathlib import Path

def course_asset_path(name):
    here = Path.cwd().resolve()
    for base in (here, here.parent, here.parent.parent):
        candidate = base / 'imgs' / name
        if candidate.exists():
            return candidate
    raise FileNotFoundError(f'Could not locate imgs/{name} from {here}')
from PIL import Image
import numpy as np
import torch

torch.manual_seed(0)

img = Image.open(course_asset_path('Mayo.png')).convert('L').resize((16, 16))
x_true = torch.tensor(np.array(img), dtype=torch.float32).reshape(-1) / 255.0

latent_dim = 12
image_dim = x_true.numel()
G = torch.randn(image_dim, latent_dim) / latent_dim**0.5
A = torch.randn(80, image_dim) / image_dim**0.5

# Project the true patch approximately onto the generator range.
z_ref = torch.linalg.lstsq(G, x_true.unsqueeze(1)).solution.squeeze(1)
x_range = G @ z_ref
y = A @ x_range

z = torch.zeros(latent_dim, requires_grad=True)
optimizer = torch.optim.Adam([z], lr=5e-2)

for step in range(200):
    x_hat = G @ z
    loss = torch.mean((A @ x_hat - y) ** 2) + 1e-3 * torch.mean(z ** 2)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if step in [0, 49, 99, 199]:
        image_rmse = torch.mean((x_hat.detach() - x_range) ** 2).sqrt().item()
        print(f'Step {step + 1:03d} | measurement loss = {loss.item():.6f} | image RMSE = {image_rmse:.6f}')

print('Reference latent norm:', float(torch.norm(z_ref)))
print('Recovered latent norm:', float(torch.norm(z.detach())))

Step 001 | measurement loss = 0.001296 | image RMSE = 0.033675
Step 050 | measurement loss = 0.000008 | image RMSE = 0.002910
Step 100 | measurement loss = 0.000001 | image RMSE = 0.000245
Step 200 | measurement loss = 0.000001 | image RMSE = 0.000041
Reference latent norm: 0.12879076600074768
Recovered latent norm: 0.1286381185054779

Compared with the earlier scalar example, this version is closer to the real logic of latent reconstruction. The unknown object now lives in a higher-dimensional image space, but the optimization still takes place in a lower-dimensional latent space.

import torch

torch.manual_seed(0)

def generator(z):
    return torch.stack([z, z**2], dim=-1)

A = torch.tensor([[1.0, -0.5]])
z_true = torch.tensor(0.8)
x_true = generator(z_true)
y = (A @ x_true).squeeze()

z = torch.tensor(0.0, requires_grad=True)
optimizer = torch.optim.Adam([z], lr=0.1)

for _ in range(200):
    x = generator(z)
    loss = ((A @ x).squeeze() - y) ** 2 + 0.05 * z**2
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print('True latent variable:', float(z_true))
print('Recovered latent variable:', float(z.detach()))
print('True image:', x_true)
print('Recovered image:', generator(z.detach()))

True latent variable: 0.800000011920929
Recovered latent variable: 0.5788173079490662
True image: tensor([0.8000, 0.6400])
Recovered image: tensor([0.5788, 0.3350])

Strengths and limitations of latent-space methods.

The strengths are clear:

strong regularization through dimensionality reduction;
explicit control through the latent prior;
compatibility with both VAEs and GAN generators.

But the limitations are equally important:

the optimization problem in latent space is nonconvex;
the result depends strongly on how expressive the generator range is;
if the true image lies outside the learned generator manifold, the reconstruction is biased no matter how informative the data are.

This last point should be stressed in class. A generative prior is powerful only to the extent that it covers the image class of interest.

Plug-and-play and denoiser priors.

A second strategy does not parameterize the solution explicitly as \(\boldsymbol{x}=G_{\boldsymbol{\Theta}}(\boldsymbol{z})\). Instead, it uses a learned denoiser as a prior step inside an iterative algorithm.

Start from a variational problem

\[ \min_\boldsymbol{x} \frac{1}{2\sigma^2}\|K\boldsymbol{x}-\boldsymbol{y}^\delta\|_2^2+\lambda R(\boldsymbol{x}). \]

If one applies a splitting scheme such as HQS or ADMM, the iteration alternates between:

a data-consistency update, which uses the forward model;
a prior update, which can be interpreted as denoising.

Schematically, one writes

\[ \boldsymbol{x}^{k+1}\approx D_\tau(\boldsymbol{z}^k), \]

where \(D_\tau\) is a learned denoiser with effective noise level \(\tau\).

This framework is attractive because it preserves a visible role for the forward operator while allowing the prior to be highly expressive.

# Toy HQS-like iteration with a simple denoiser surrogate.
from pathlib import Path

def course_asset_path(name):
    here = Path.cwd().resolve()
    for base in (here, here.parent, here.parent.parent):
        candidate = base / 'imgs' / name
        if candidate.exists():
            return candidate
    raise FileNotFoundError(f'Could not locate imgs/{name} from {here}')
from PIL import Image
import numpy as np
import torch

def smooth(v):
    kernel = torch.tensor([[1.0, 1.0, 1.0], [1.0, 2.0, 1.0], [1.0, 1.0, 1.0]], dtype=torch.float32)
    kernel = (kernel / kernel.sum()).view(1, 1, 3, 3)
    return torch.nn.functional.conv2d(v, kernel, padding=1)

img = Image.open(course_asset_path('GoPro.jpg')).convert('L').resize((32, 32))
x_true = torch.tensor(np.array(img), dtype=torch.float32).unsqueeze(0).unsqueeze(0) / 255.0
y = (x_true + 0.08 * torch.randn_like(x_true)).clamp(0.0, 1.0)

x = y.clone()
for k in range(5):
    z = 0.6 * y + 0.4 * x
    x = smooth(z)
    rmse = torch.mean((x - x_true) ** 2).sqrt().item()
    print(f'Iteration {k + 1}: RMSE = {rmse:.6f}')

Iteration 1: RMSE = 0.081795
Iteration 2: RMSE = 0.089947
Iteration 3: RMSE = 0.092231
Iteration 4: RMSE = 0.092877
Iteration 5: RMSE = 0.093073

This is not a true plug-and-play denoiser learned from data, but it mirrors the algorithmic structure: alternate a data-oriented update with a prior-oriented denoising step. In later, more advanced versions of the course, this placeholder denoiser can be replaced by a pretrained neural model.

Diffusion and Conditional Generative Approaches to Posterior Inference#

Diffusion and score-based models offer an even richer possibility. Instead of only providing a denoising operator, they approximate the prior score

\[ \nabla_\boldsymbol{x} \log p(\boldsymbol{x}), \]

or its noisy-scale analogues.

This is crucial because the posterior score satisfies

\[ \nabla_\boldsymbol{x} \log p(\boldsymbol{x}| \boldsymbol{y}^\delta) = \nabla_\boldsymbol{x} \log p(\boldsymbol{y}^\delta| \boldsymbol{x}) + \nabla_\boldsymbol{x} \log p(\boldsymbol{x}). \]

For Gaussian noise, the likelihood score is explicit:

\[ \nabla_\boldsymbol{x} \log p(\boldsymbol{y}^\delta| \boldsymbol{x}) = -\frac{1}{\sigma^2}K^T(K\boldsymbol{x}-\boldsymbol{y}^\delta). \]

Hence, if a generative model supplies the prior score, then posterior-guided sampling becomes possible.

Diffusion posterior sampling.

In practice, one often combines a pretrained diffusion model with the measurement model during the reverse denoising process, as in diffusion posterior sampling approaches such as [3]. At each time step the algorithm performs two conceptual actions:

move toward the image prior using the learned diffusion denoiser or score;
correct the current sample toward data consistency using the likelihood gradient.

At a schematic level, a reverse step may look like

\[ \boldsymbol{x}_{t-1} \approx \text{prior step}(\boldsymbol{x}_t) -\eta_t K^T(K\widehat{\boldsymbol{x}}_0(\boldsymbol{x}_t)-\boldsymbol{y}^\delta), \]

where \(\widehat{\boldsymbol{x}}_0(\boldsymbol{x}_t)\) denotes the current estimate of the clean image.

This is one of the most elegant modern answers to the inverse-problems question, because it combines learned image statistics and forward-model physics at every stage of generation.

Why posterior sampling is different from point estimation.

Traditional reconstructions often output a single image. But in many inverse problems the posterior is broad or multimodal. A single estimate may hide genuine uncertainty.

Generative posterior methods allow one to sample multiple reconstructions compatible with the same measurement. This is scientifically important because:

it reveals ambiguity in the inverse problem;
it allows uncertainty quantification;
it distinguishes stable structures from hallucinated details.

This is a major conceptual advantage over purely deterministic end-to-end reconstructors.

import torch

prior_mean = 0.0
prior_var = 1.0
noise_var = 0.2
y = 1.2

posterior_var = 1.0 / (1.0 / prior_var + 1.0 / noise_var)
posterior_mean = posterior_var * (prior_mean / prior_var + y / noise_var)
map_estimate = posterior_mean

samples = posterior_mean + torch.sqrt(torch.tensor(posterior_var)) * torch.randn(5)

print('Posterior mean / MAP in this Gaussian example:', float(map_estimate))
print('Five posterior samples:')
print(samples)

Posterior mean / MAP in this Gaussian example: 0.9999999999999998
Five posterior samples:
tensor([0.8813, 0.3833, 0.8491, 0.8097, 0.7047])

Conditional generative models.

Another route is to train the generative model directly in conditional form:

\[ p_{\boldsymbol{\Theta}}(\boldsymbol{x}| \boldsymbol{y}^\delta). \]

This can be realized with conditional VAEs, conditional GANs, conditional diffusion models, or conditional flow matching models.

The benefit is clear: the model is specialized to the inverse problem from the start. The conditioning information is not added only at test time, but built into the learned generative mechanism.

However, there is an important tradeoff. Conditional models are usually more tightly tied to the operator and noise distribution seen during training. If the acquisition setting changes, retraining or careful adaptation may be necessary.

Conditional flow matching.

Flow matching offers a similar perspective with deterministic transport. A conditional vector field

\[ \boldsymbol{v}_{\boldsymbol{\Theta}}(\boldsymbol{x},t,\boldsymbol{y}^\delta) \]

is trained to move noise toward the distribution of reconstructions conditioned on the measured datum. At test time, integrating the corresponding ODE yields one or more plausible reconstructions.

This approach is attractive because it may require fewer sampling steps than diffusion while still allowing uncertainty-aware generation.

Reliability, Hallucinations, and the Main Cautionary Principle#

A final warning is essential. A strong generative prior can produce highly realistic images, but realism is not the same as truth. In scientific and medical imaging, this distinction is critical. A method may generate structures that look plausible according to the learned prior but are not sufficiently supported by the data.

For this reason, every generative reconstruction method should be judged along at least two axes:

prior plausibility;
data fidelity.

If a reconstruction is visually convincing but poorly supported by the measurement, then it may be scientifically dangerous.

This is the point where the themes of the entire course come together: architecture matters, the forward model matters, the noise model matters, and realistic evaluation matters.

Warning

In medical or scientific imaging, a plausible detail that is unsupported by the data can be more dangerous than a visibly imperfect reconstruction. This is why learned priors must always be discussed together with data fidelity and uncertainty.

Summary#

The final conceptual map should be:

Bayesian inverse problems combine likelihood and prior information;
generative models supply learned image priors;
VAEs and GANs are often used through latent-space optimization;
plug-and-play schemes use learned denoisers inside iterative reconstruction;
diffusion and flow matching enable posterior-aware sampling methods;
generative priors are powerful, but they must always be constrained by the physics of the measurements.

Exercises#

Starting from Bayes’ theorem, derive the MAP variational objective for Gaussian noise.
Explain the main benefit and the main limitation of latent-space reconstruction.
Describe the difference between a MAP estimate and posterior sampling.
Why can a strong generative prior also increase the risk of hallucinations?

Generative Models for Inverse Problems

Contents