Homework 4: Unsupervised Learning

Homework 4: Unsupervised Learning#

Warning

The submission of the homeworks has NO deadline. You can submit them whenever you want, on Virtuale. You are only required to upload it on Virtuale BEFORE your exam session, since the Homeworks will be a central part of the oral exam.

You are asked to submit the homework as one of the two, following modalities:

A PDF (or Word) document, containing screenshoots of code snippets, screeshots of the results generated by your code, and a brief comment on the obtained results.
A Python Notebook (i.e. a .ipynb file), with cells containing the code required to solve the indicated exercises, alternated with a brief comment on the obtained results in the form of a markdown cell. We remark that the code SHOULD NOT be runned during the exam, but the student is asked to enter the exam with all the programs already executed, with the results clearly visible on the screen.

Joining the oral exam with a non-executed code OR without a PDF file with the obtained results visible on that, will cause the student to be rejected.

Exercise 1: Compute the SVD of a Matrix & Verify the Reconstruction#

Construct any non-square matrix \(A \in \mathbb{R}^{m \times n}\) (e.g. \(m=10\), \(n=6\)) with random or structured entries.
Compute its SVD:

\[ A = U\Sigma V^T. \]
Verify numerically that:

\[ A \approx U\Sigma V^T, \qquad \|A - U\Sigma V^T\|_F \approx 0. \]
Print and plot the singular values \(\sigma_1,\dots,\sigma_{\min(m,n)}\).
Comment on why singular values appear in descending order, why small singular values correspond to “less important” directions, and why floating-point arithmetic makes exact zeros rare.

Exercise 2: Best Rank-\(k\) Approximation#

Implement a function:

\[ A_k = \sum_{i=1}^k \sigma_i u_i v_i^T. \]
For several values of \(k\), compute:

\[ E(k) = \|A - A_k\|_F. \]
Plot the approximation error \(E(k)\) vs \(k\).
Explain why SVD gives the optimal rank-\(k\) approximation.

Exercise 3: Image Compression with SVD#

Using the cameraman image (as in the notes):

Load the image as a matrix \(X \in \mathbb{R}^{512\times512}\).
Compute its SVD:

\[ X = U\Sigma V^T. \]
For \(k \in \{5, 20, 50, 100, 200\}\):
- Compute the rank-\(k\) approximation \(X_k\),
- Plot each reconstructed image,
- Compute the compression factor:
  
  \[ c_k = 1 - \frac{k(m+n+1)}{mn}. \]
- Plot reconstruction error vs \(k\),
- Plot compression factor vs \(k\).
Comment on:
- How visual quality improves with \(k\),
- Why most of the “energy” is contained in the first singular values,
- The trade-off between compression and fidelity,
- The connection between SVD and optimal low-rank approximation.

Exercise 5: PCA + Clustering on a Real Dataset#

Use the MNIST Kaggle dataset: https://www.kaggle.com/datasets/animatronbot/mnist-digit-recognizer.

To reduce computational burden, filter only digits 3 and 4 exactly as in the teaching notes.

Do as follows:

Load the CSV file into memory (you may use NumPy or pandas).
Split into training and test sets.
Center the training data:

\[ X_{c,\text{train}} = X_{\text{train}} - c(X_{\text{train}}). \]
Compute the reduced SVD of the centered training data.
Choose \(k=2\) principal components.
Project:
- \(Z_{\text{train}} = X_{c,\text{train}} V_2\),
- \(Z_{\text{test}} = X_{c,\text{test}} V_2\).
Plot:
- Scatterplot of \(Z_{\text{train}}\) colored by labels (3 vs 4),
- Scatterplot of \(Z_{\text{test}}\) overlaid in the same plot.
Repeat with digits 5 vs 8, or 1 vs 7 and discuss on the different results.
Finally, try \(k=3\) and use a 3D scatterplot.

Exercise 6: Classification After PCA: Linear Classifier and Centroid Classifier#

In this exercise, you will build two simple classifiers after projecting the data with PCA:

A linear classifier on 2D PCA features,
A PCA-centroid classifier that assigns each test point to the closest class centroid in the reduced space.

You will perform all experiments on the same Kaggle dataset used previously: https://www.kaggle.com/datasets/animatronbot/mnist-digit-recognizer, filtering only digits 3 and 4, as you did in the previosu exercise.

The aim of this exercise is to connect dimensionality reduction with classification and illustrates how PCA can be used not only for visualization, but also as a preprocessing tool for machine learning pipelines.

Step 1: PCA Projection#

Use exactly the same PCA workflow as in Exercise 6:

Load and split digits {3,4} into train and test.
Center the training data:

\[ X_{c,\text{train}} = X_{\text{train}} - c(X_{\text{train}}). \]
Compute the reduced SVD:

\[ X_{c,\text{train}} = U \Sigma V^T. \]
Take the first two principal directions:

\[ P = V_2 \in \mathbb{R}^{d \times 2}. \]
Project:

\[ Z_{\text{train}} = X_{c,\text{train}} P, \qquad Z_{\text{test}} = (X_{\text{test}} - c(X_{\text{train}}))\, P. \]

You will now use these 2D representations \(Z_{\text{train}}\) and \(Z_{\text{test}}\) for classification.

Step 2: Fit a linear classifier#

Use the logistic regression model:

\[ f_{\Theta}(z) = \sigma(\Theta^T z), \qquad z \in \mathbb{R}^2, \]

with sigmoid and binary cross-entropy loss, and train the model using SGD.
Once trained:
1. Plot the PCA-reduced training data in 2D,
2. Plot the decision boundary defined by:
  
  \[ \Theta^T z = 0 \quad \Rightarrow \quad \text{boundary line}. \]
Compute, on the test set:
- Accuracy,
- Precision and Recall,
- Confusion matrix.

Step 3: PCA-Centroid Classifier (Nearest-Centroid in PCA Space)#

This classifier does not use a linear model. It relies purely on geometry in the reduced PCA space.

For each class \(c\in\{3,4\}\), compute the centroid in PCA space:

\[ \mu_c = \frac{1}{|\mathcal{D}_c|}\sum_{z_i \in \mathcal{D}_c} z_i. \]

Plot both centroids \(\mu_3\) and \(\mu_4\) in the PCA plane.
Given a test sample \(x\in\mathbb{R}^d\):
1. Project it:
  
  \[ z = (x - c(X_{\text{train}}))\, P. \]
2. Compute distances:
  
  \[ d_3 = \|z - \mu_3\|_2,\qquad d_4 = \|z - \mu_4\|_2. \]
3. Assign:
  
  \[ \hat y = \arg\min_{c\in\{3,4\}} d_c. \]
This is the nearest-centroid classifier in PCA space.
Compute, on the test set:
- Accuracy,
- Precision and Recall,
- Confusion matrix.
Plot the test points with colors given by classification results.

Step 4: Compare accuracy and error patterns#

Compare linear classifier on PCA features vs centroid classifier in PCA space.
On the PCA scatterplot, plot:
- the linear boundary,
- the two centroids,
- misclassified test samples highlighted in red.
Repeat by changing the digits in \(\{ (1,7), (5,8), (2,3) \}\) and observe how class similarity influences:
- the cluster shape,
- centroid separation,
- decision boundaries.