HW 4: MLE/MAP#

Warning

The submission of the homeworks has NO deadline. You can submit them whenever you want, on Virtuale. You are only required to upload it on Virtuale BEFORE your exam session, since the Homeworks will be a central part of the oral exam.

You are asked to submit the homework as one of the two, following modalities:

  • A PDF (or Word) document, containing screenshoots of code snippets, screeshots of the results generated by your code, and a brief comment on the obtained results.

  • A Python Notebook (i.e. a .ipynb file), with cells containing the code required to solve the indicated exercises, alternated with a brief comment on the obtained results in the form of a markdown cell. We remark that the code SHOULD NOT be runned during the exam, but the student is asked to enter the exam with all the programs already executed, with the results clearly visible on the screen.

Joining the oral exam with a non-executed code OR without a PDF file with the obtained results visible on that, will cause the student to be rejected.

Maximum Likelihood Estimation (MLE) and Maximum a Posteriori (MAP)#

Consider the theory and the notation provided in the the MLE/MAP section (https://devangelista2.github.io/statistical-mathematical-methods/regression_classification/MLE_MAP.html). Let \(f_\theta(x)\) be a polynomial regression model as in the previous Homework, and let the poly_regression_small.csv from Virtuale be the training set. Then, sample 20% of the data in the poly_regression_large.csv dataset to use as test set.

  • For a given value of \(K\), write three Python functions computing \(\theta_{MLE}\), i.e. the optimal parameters obtained by optimizing the MLE-related loss function with Gaussian assumption on the likelihood \(p_\theta(y | x)\), by Gradient Descent, Stochastic Gradient Descent (with a batch_size = 5), and Normal Equations method with Cholesky Decomposition.

  • Compare the performance of the three regression model computed above. In particular, if \((X_{test}, Y_{test})\) is the test set from the poly_regression_large.csv dataset, for each of the model, compute:

    \[ Err = \frac{1}{N_{test}} \sum_{i=1}^{N_{test}} (f_\theta(x^i) - y^i)^2, \]

    where \(N_{test}\) is the number of elements in the test set, \((x^i, y^i)\) are the input and output elements in the test set. Comment the performance of the three models.

  • For different values of \(K\), plot the training datapoints and the test datapoints with different colors and visualize (as a continuous line) the three learned regression model \(f_\theta(x)\). Comment the results.

  • For increasing values of \(K\), compute the training and test error as discussed above. Plot the two errors with respect to \(K\). Comment the results.

  • Repeat the same experiments by considering the MAP formulation with Gaussian assumption on the prior term \(p(\theta)\). Set \(K = 8\) and test different values of \(\lambda > 0\) in the experiments. Comment the results, comparing:

    • the three optimization method used to obtain \(\theta_{MAP}\) (i.e. GD, SGD and Normal Equations),

    • the different values of \(\lambda > 0\) tested,

    • the results obtained by \(\theta_{MLE}\) vs \(\theta_{MAP}\).