Homework 2: Generative Models for Inverse Problems#
This homework is based on the material of the generative-model chapter, with emphasis on the use of learned priors for inverse problems. The goal is to reuse pretrained generative models from the lecture notebooks and study how they can guide reconstruction under a known forward operator, following the broad perspective of learned priors surveyed in [3, 30].
Starting from clean Mayo images \(\{\boldsymbol{x}_i\}_{i=1}^N\), you will generate synthetic measurements of the form
where \(K\) is a known motion-blur operator and \(\boldsymbol{e}\) is additive Gaussian noise. You will then compare at least two reconstruction strategies:
a generator-range prior method based on a pretrained VAE decoder [6, 24];
a diffusion-prior method based on the pretrained denoiser from the diffusion notebook [7, 18].
The purpose of the assignment is not to reproduce a research paper line by line, but to demonstrate that you understand the algorithmic logic of generative priors: latent-space optimization, data fidelity, denoising guidance, and the tradeoff between realism and measurement consistency.
Homework Goals and Deliverables#
You are asked to complete the notebook and submit the following:
A completed version of this notebook with all
TODOsections filled in.The pretrained weights you reused from the lecture notebooks, together with any additional saved results you produced for this homework.
A short written discussion, included at the end of the notebook, answering the conceptual questions.
At least one figure comparing the corrupted datum and the final reconstructions produced by your methods on the same test image.
Your work should show that you can:
build a dataset pipeline consistent with the generative models used in the lectures;
simulate an inverse problem through a known operator \(K\);
load and reuse pretrained generative models correctly;
implement a latent-prior reconstruction procedure;
implement a diffusion-prior reconstruction procedure;
compare the methods critically rather than only visually.
Note
Use grayscale images resized to \(64 \times 64\), because this is the resolution used by the VAE, GAN, and diffusion lecture notebooks.
Warning
Do not use external pretrained foundation models, external inverse-problem toolkits, or black-box diffusion pipelines. This includes downloading a ready-made Hugging Face diffusion pipeline instead of reusing the course denoiser. The homework is about reusing and understanding the models developed in this course.
Suggested Structure of the Work#
A reasonable workflow is to start from the data and corruption model, then load the pretrained generative models, then implement the reconstruction procedures, and finally compare them on the same examples.
The mandatory part of the homework is:
An optional extension is to include a GAN-based generator prior or to implement both DPS-style and DiffPIR-style diffusion reconstructions and discuss how the comparison changes.
The notebook intentionally leaves several implementation choices open. You are expected to reuse the ideas developed in the lecture notebooks, but not simply copy them without understanding the role of each step.
import glob
import math
import sys
from pathlib import Path
import matplotlib.pyplot as plt
import torch
from PIL import Image
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from tqdm.auto import tqdm
book_root = Path('..').resolve()
ipp_root = book_root / 'IPPy'
if str(ipp_root) not in sys.path:
sys.path.append(str(ipp_root))
import operators
from utilities import gaussian_noise, get_device
from nn.diffusion import DiffusionUNet, cosine_beta_schedule, extract, denormalize_to_01
weights_dir = book_root / 'weights'
weights_dir.mkdir(exist_ok=True)
device = get_device()
torch.manual_seed(0)
print('Working device:', device)
print('Weights directory:', weights_dir)
Part 1: Data Pipeline, Corruption Model, and Pretrained Models#
In this first part, build a dataset for the Mayo images at \(64 \times 64\), define the synthetic inverse problem, and load the pretrained generative models from the lecture notebooks.
The objective of this part is to verify that the data pipeline, the corruption model, and the reused weights are all correct before the reconstruction procedures are implemented.
Be careful about architecture compatibility. The diffusion weights now correspond to the stronger lecture denoiser used in the diffusion notebook, so your DiffusionUNet definition must match that notebook exactly when you call load_state_dict. In particular, reuse the same configuration (in_ch=1, base_ch=64, channel_mults=(1, 2, 4), time_dim=256, attention enabled at the same levels, and the same diffusion schedule).
You should at least load:
the pretrained VAE from
../weights/VAE.pth;the pretrained diffusion denoiser from
../weights/DDPMDenoiser.pth.
If you attempt the optional extension, you may also load the pretrained GAN generator from ../weights/GAN_G_EMA.pth.
class MayoDataset(Dataset):
def __init__(self, data_path, data_shape=64):
super().__init__()
self.fname_list = sorted(glob.glob(f'{data_path}/*/*.png'))
self.transform = transforms.Compose([
transforms.Resize((data_shape, data_shape), antialias=True),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])
def __len__(self):
return len(self.fname_list)
def __getitem__(self, idx):
# TODO: complete the dataset implementation.
raise NotImplementedError
# TODO:
# 1. Build the training and test datasets and create the dataloaders.
# 2. Define the forward operator K and the additive noise level.
# 3. Visualize at least one clean / corrupted pair.
# 4. Recreate the pretrained model classes from the lecture notebooks
# and load the VAE and diffusion weights from ../weights/.
# For the diffusion model, match the lecture architecture exactly
# and keep the image normalization in [-1, 1].
# 5. If you do the optional extension, also load the GAN generator.
Part 2: Latent Reconstruction with a Generator Prior#
Use the pretrained VAE decoder as a generator prior. The idea is to reconstruct an image by optimizing a latent code \(\boldsymbol{z}\) so that the generated image both matches the measurements and remains plausible under the latent prior, in the spirit of deep generative priors for inverse problems [6, 24].
A standard objective has the form
where \(G\) denotes the decoder and \(\lambda\) controls the latent regularization.
The mandatory part is to implement the VAE-based version. If you want the optional extension, you may implement the same idea with the GAN generator and compare the behavior.
def latent_objective(z, generator, y_delta, K, lam=1e-3):
# TODO:
# Decode z, apply the forward operator, and compute a latent-prior
# objective combining data fidelity and latent regularization.
raise NotImplementedError
def reconstruct_with_vae_prior(generator, y_delta, K, latent_dim, num_steps=500, lr=1e-2, lam=1e-3):
# TODO:
# Optimize the latent code z and return the final reconstruction.
raise NotImplementedError
# TODO:
# Reconstruct at least one corrupted test image with the VAE prior,
# visualize the result, and save any useful intermediate values such
# as the optimized latent code or objective history.
Part 3: Reconstruction with a Diffusion Prior#
Implement one diffusion-prior reconstruction method by reusing the pretrained denoiser from the lecture notebook. Choose either a DPS-style method [7] or a DiffPIR-style method [46], since these are the two reference examples discussed in the course notebook.
The purpose of this part is to show that you understand how the diffusion model enters the reconstruction algorithm: not through a tractable closed-form prior density, but through denoising or score information used iteratively together with the data-fidelity term.
Use the same diffusion schedule and denoiser conventions as in the lecture notebook. In particular, keep the images normalized to [-1, 1], rebuild the denoiser with the correct architecture, and use the matching diffusion schedule when loading ../weights/DDPMDenoiser.pth.
Keep the implementation compact and readable. It is acceptable to produce a pedagogical version rather than an exact research implementation, provided you explain the choice clearly in your discussion.
def predict_x0_from_eps(x_t, eps_pred, t, alpha_bars):
# TODO:
# Reconstruct the clean-image estimate from x_t and the predicted noise.
raise NotImplementedError
def reconstruct_with_diffusion_prior(model, y_delta, K, alpha_bars, num_steps=40, guidance_scale=0.1):
# TODO:
# Implement either a DPS-style or a DiffPIR-style reconstruction method.
raise NotImplementedError
# TODO:
# Define the diffusion schedule exactly as in the lecture notebook,
# reconstruct the same corrupted test image used in Part 2, and
# visualize the diffusion-prior result.
Part 4: Visual and Quantitative Comparison#
Use the same corrupted test image for all methods and compare the outputs both visually and quantitatively.
At a minimum, compute the MSE. If you want a richer comparison, also compute PSNR and SSIM.
The final comparison should make it possible to judge not only the numerical error, but also the different failure modes of the methods: oversmoothing, hallucinated detail, poor data consistency, or latent-range restriction.
# TODO:
# Evaluate the reconstructed images quantitatively, build a clear
# comparison figure, and comment on the differences between the
# corrupted input, the latent-prior reconstruction, and the
# diffusion-prior reconstruction.
Deliverables and Discussion#
Complete the notebook by answering the following questions in a few sentences each.
How did the VAE-based latent prior behave? Did the latent regularization help, and if so in what sense?
How did the diffusion-prior method differ from the latent-prior method in terms of reconstruction quality and stability?
Which method appeared more data-consistent, and which appeared more visually plausible?
Did you observe any signs of hallucination or model bias? If yes, how did they manifest themselves?
Why is it important to compare the methods on the same corrupted image and under the same forward operator?
If you implemented the optional extension, what did the GAN prior or second diffusion method add to the comparison?
In your final submission, make sure that the notebook contains:
the completed code;
the loaded or reused weights;
the reconstruction figures;
the quantitative comparison;
and the written discussion.
Warning
A reconstruction that looks realistic is not automatically reliable. In inverse problems, generative models can improve plausibility while still violating the measurements or introducing biased details. Your discussion should explicitly address this point.