Model Evaluation

Model Evaluation#

Regression vs. Classification#

Two very simple datasets are plotted below. Say you have to predict y from x in each of them.

One of the datasets defines a classification problem and the other one defines a regression problem. Which is which?
Explain your answer!

../_images/328586c7c36a22780ddf402da13a79a38e0785d5ed8c09d886195a38d55f4d3a.png

../_images/dc9a88df35f64afa4d15abfaff7cbee49d5178e1def252785d2b4cc900115fc6.png

Regression#

MAE vs MSE#

Let \(y_i\) be the actual value and \(\hat{y}_i\) be the predicted value, then: \(MAE = \frac{1}{n} \sum_i |y_i - \hat{y}_i|\) and \(MSE = \frac{1}{n} \sum_i (y_i - \hat{y}_i)^2\).

The MAE and MSE are applied on the example data below to evaluate a mean predictor model.

What pros/cons do you see in this example for the MAE and the MSE respectively?
When would you generally use MAE and when would you use MSE?

mean = 1.8
Mean Predictor MAE = 1.4400
Mean Predictor MSE = 5.7600

../_images/08cb642ccf9219ceb259413552c3f6b3a63c89ab13fbb052aefe95e0c0f4f830.png

\(R^2\)#

Let \(y_i\) be the actual value, \(\hat{y}_i\) the predicted value and \(\bar{y} = \frac{1}{n} \sum_i y_i\) the mean of all actual values. Then, \(R^2 = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_j (y_j - \bar{y})^2}\).

Look at the formula and try to understand what the \(R^2\) measure intuitively measures. Hint: Consider the case were we only predict the mean \(\bar{y}\). Below you can find the \(R^2\) measure for the preceding example.

Mean Predictor R^2 = 0.0000

Classification#

Prediction outcomes#

For a binary classification problem, what are the four possible prediction outcomes? List their names and briefly explain what they represent.

Properties of precision and recall#

What unwanted strategies could a classifier predicting the class 0 or 1 employ to

always get a precision of 1
always get a recall of 1

Choosing between precision and recall#

In what settings is precision more important than recall and in which applications is recall more important than precision? Hint: As an example, think of a spam filter which has to differentiate between spam and important mails from your boss.

Combine precision and recall#

What is a measure that combines precision and recall?
Define it.
Why do we use the harmonic rather than the arithmetic mean?

Reference values#

Consider the following label set:

y: roughly the same amount of cases (1) and controls (0)

Now, calculate the accuracy, precision, recall, and ROC AUC.

For a classifier that returns random labels. What do you observe?
For a classifier that always returns 1. What do you observe?

Hints:

Wikipedia has a nice diagram at the top of its “Precision and recall” page, which can help to think about such questions.
You can simulate these classifiers without input data, i.e., by generating their predictions manually.
You can use numpy and scikit-learn to show these cases instead of answering theoretically.

import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score

np.random.seed(42)
n = 1000

# labels 1 (random)
y = np.random.choice([0,1], p=(0.5, 0.5), size=n)

# predictions of different classifiers
y_pred_random = np.random.choice([0,1], p=(0.5, 0.5), size=n)
y_pred_1 = np.repeat(1, y.size)

Imbalanced data#