Homework 1: End-to-End Reconstruction Before Generative Models

Homework 1: End-to-End Reconstruction Before Generative Models#

This homework is entirely based on the material covered before the generative-model chapter. The goal is to implement a compact end-to-end reconstruction pipeline, train a few neural architectures, and analyze their behavior on a controlled inverse problem, in the spirit of the supervised imaging pipelines discussed in [3, 30].

Starting from clean Mayo images \(\{\boldsymbol{x}_i\}_{i=1}^N\), you will generate synthetic measurements of the form

\[ \boldsymbol{y}_i^\delta = K\boldsymbol{x}_i + \boldsymbol{e}, \]

where \(K\) is a known motion-blur operator and \(\boldsymbol{e}\) is additive Gaussian noise. You will then train neural networks to reconstruct \(\boldsymbol{x}_i\) from \(\boldsymbol{y}_i^\delta\) and compare the results.

The purpose of the assignment is not to obtain the strongest possible model, but to demonstrate that you understand the full workflow discussed in the lectures: dataset preparation, synthetic corruption, model design, training, evaluation, and critical discussion of the results.

Homework Goals and Deliverables#

You are asked to complete the notebook and submit the following:

A completed version of this notebook with all TODO sections filled in.
The trained weights of your models, saved in ../weights/ with meaningful names.
A short written discussion, included at the end of the notebook, answering the conceptual questions.
At least one figure comparing the reconstructions produced by your models on the same test image.

Your work should show that you can:

build a dataset pipeline for imaging data;
simulate an inverse problem through a known operator \(K\);
implement simple end-to-end architectures in PyTorch;
train and compare the models fairly;
interpret the results rather than only showing them.

Note

You may work entirely on CPU, but training will be faster on a GPU if one is available. Keep the code readable, and add comments only where the implementation is not obvious.

Warning

This homework is about the material of the end-to-end and cross-domain chapters. Do not use generative models, diffusion models, pretrained foundation models, or external black-box reconstruction libraries.

Suggested Structure of the Work#

A reasonable workflow is to start from the dataset and the corruption model, then move to the reconstruction architectures, then to training and evaluation. The mandatory part of the homework is based on SimpleCNN and ResCNN, that is, on a plain convolutional model and a residual variant connected to the lecture material on CNNs and residual learning [15, 25]. An additional comparison with a UNet [33] is recommended as an extension for students who want to go one step further.

The notebook intentionally leaves several implementation choices open. You are expected to reuse the ideas developed in the lecture notebooks, but not simply copy them without understanding them.

import glob
import math
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import torch
from PIL import Image
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from tqdm.auto import tqdm

sys.path.append('..')
from IPPy import operators, utilities

book_root = Path('..').resolve()
weights_dir = book_root / 'weights'
weights_dir.mkdir(exist_ok=True)

device = utilities.get_device()
torch.manual_seed(0)

print('Working device:', device)
print('Weights directory:', weights_dir)

Part 1: Data Pipeline and Synthetic Measurements#

In this first part, implement a dataset for the Mayo images and create the synthetic inverse problem. Use grayscale images resized to \(256 \times 256\).

Your implementation should build training and test dataloaders, then generate measurements of the form

\[ \boldsymbol{y}^\delta = K\boldsymbol{x} + \boldsymbol{e} \]

using a Motion Blur operator from IPPy and additive Gaussian noise.

The objective of this part is to verify that the data pipeline is correct before any model is trained.

class MayoDataset(Dataset):
    def __init__(self, data_path, data_shape=256):
        super().__init__()
        self.fname_list = sorted(glob.glob(f'{data_path}/*/*.png'))
        self.transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Resize((data_shape, data_shape)),
        ])

    def __len__(self):
        return len(self.fname_list)

    def __getitem__(self, idx):
        # TODO: complete the dataset implementation.
        raise NotImplementedError


# TODO:
# Build the training and test datasets, create the dataloaders,
# inspect a batch, define the operator K, and visualize at least
# one clean / corrupted pair.

Part 2: Reconstruction Networks#

In the lecture, we discussed a plain convolutional baseline and a residual variant. In this homework you must implement both:

SimpleCNN, which directly predicts the reconstruction;
ResCNN, which predicts a correction to be added back to the corrupted image.

Architecturally, both models should remain simple and close to the lecture notebooks, but the exact implementation details are part of the assignment.

Note

If you want an optional extension after the mandatory part is completed, you may also implement a plain UNet and include it in the final comparison.

class SimpleCNN(nn.Module):
    def __init__(self, in_ch=1, out_ch=1, n_filters=32, kernel_size=3):
        super().__init__()
        # TODO: implement the model.
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError


class ResCNN(nn.Module):
    def __init__(self, in_ch=1, out_ch=1, n_filters=32, kernel_size=3):
        super().__init__()
        # TODO: implement the model.
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError


# TODO:
# Instantiate the models you want to study and report their number
# of trainable parameters.

Part 3: Training, Saving, and Evaluating the Models#

Train the models by generating the corrupted measurements online during training, exactly as done in the lecture. Use nn.MSELoss() as the default loss function and track the training with tqdm.

Your implementation must train the models fairly under the same corruption setup, save the trained weights in ../weights/, reload the weights before the final evaluation, and compare the models on the same test images.

A reasonable baseline is to train for about 20 epochs. If your hardware allows it, you may train longer.

def train_model(model, train_loader, K, weights_path, num_epochs=20, noise_level=0.01, lr=1e-3):
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    loss_fn = nn.MSELoss()
    history = []

    # TODO: complete the training procedure.

    return history


# TODO:
# Train the models you implemented, save their weights, plot the
# training curves, and reload the weights before evaluation.

Part 4: Visual and Quantitative Comparison#

Use the same corrupted test image for all trained models and compare the outputs both visually and quantitatively.

At a minimum, compute the MSE. If you want to connect this homework to the evaluation section of the course, you may also compute PSNR and SSIM.

The final comparison should make it possible to judge both the visual quality and the numerical behavior of the models.

# TODO:
# Evaluate the trained models on one or more test examples and build
# a clear visual and quantitative comparison.

Deliverables and Discussion#

Complete the notebook by answering the following questions in a few sentences each.

Which model performed better in your experiments, and why do you think that happened?
Did the residual architecture help? If yes, in what sense?
How did the noise level affect training and reconstruction quality?
Why is it important to generate the corruption through a known operator \(K\) instead of treating the problem as a generic image-to-image task?
Why should one be cautious when evaluating pure end-to-end methods only through visual quality?
If you implemented the optional UNet extension, how did it compare with the simpler CNN-based models?

In your final submission, make sure that the notebook contains:

the completed code;
the saved weights;
the training curves;
the final comparison figure;
and the written discussion.