02: Learning from Data - Data Science Course

Warmup¶

To warm up, you will define the weights of a perceptron manually.

Have a look at the toy dataset, that the following code gives you.

Define the weights of the perceptron where the code asks you to fill in the manual_weights array.

The code cell after that gives you a plot that lets you check what your perceptron is doing.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# defining a toy dataset
AND_data = pd.DataFrame({"x1": [0., 0., 1., 1.], "x2": [0., 1., 0., 1.], "y": [-1, -1, -1, +1]})
label_col = "y"
print("--- data ---")
print(AND_data.head())
print("------------")
print()

def plot_data(df):
    plt.scatter(df["x1"], df["x2"], c=df["y"])
    plt.xlim(-1, 2)
    plt.ylim(-1, 2)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.show()

# visualizing our toy data
plot_data(AND_data)

class Perceptron:
    def __init__(self, weights) -> None:
        self.weights = weights

    def predict(self, x: np.ndarray) -> np.ndarray:
        return np.sign(x.dot(self.weights))

manual_weights = np.array([])  # TODO: fill the array
manual_perceptron = Perceptron(manual_weights)

def plot_perceptron_on_data(perceptron, df):
    line = np.array([[x1, -(perceptron.weights[0] + x1*perceptron.weights[1])/perceptron.weights[2]] for x1 in np.linspace(-1, 2, 2)])

    plt.scatter(df["x1"], df["x2"], c=df["y"])
    line_artist = plt.plot(line[:, 0], line[:, 1])[0]
    plt.xlim(-1, 2)
    plt.ylim(-1, 2)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.legend([line_artist], ["Perceptron"])
    plt.show()


plot_perceptron_on_data(manual_perceptron, AND_data)

Is this a supervised or unsupervised learning problem?
What is the learning task in this problem called?
Complete the hypothesis set definition to match your mental model selection process from the first task in this section. (Do it in handwriting if you do not know $\LaTeX$ syntax.)

Supervised/Unsupervised (remove the wrong one)

Learning task:

Hypothesis set:
$\mathcal{H} = \{\}$
(1)

Perceptron Learning Algorithm (PLA)¶

Let’s start by implementing the perceptron learning algorithm from the lecture.

Rename the variable something to how it is called in the lecture’s pseudo code.
Complete the function implementation.
- Use the code cell after that to plot your perceptron and verify your implementation.
Now you should see a line in that plot. The plot calls it Perceptron, but what is this really called?

def PLA(D: pd.DataFrame, label_col="y"):
    if len(D) == 0:
        raise Exception("Training data cannot be empty, but it is.")
    
    d = len(D.columns)-1
    X = np.ones(shape=(len(D), d+1))
    X[:, 1:] = D[D.columns[D.columns != label_col]]
    Y = np.array(D[label_col])
    w = np.random.random(d+1)

    something = np.sign(X.dot(w)) != Y  # TODO: Start by renaming this variable. What is this?
    
    # TODO: add your code here
    
    return lambda x: np.sign(x.dot(w)), w

PLA_weights = PLA(AND_data, label_col)
print("w =", PLA_weights)
PLA_perceptron = Perceptron(PLA_weights)

plot_perceptron_on_data(PLA_perceptron, AND_data)

Linear Separability¶

The criterion that tells us whether the PLA can fit a dataset is called linear separability.

Fill in the following mathematical expression where there are underscores, so that it defines linear separability. (Do it in handwriting if you do not know $\LaTeX$ syntax.)
Update point 3 from our previous toy dataset in the following code cell to the closest possible point, such that the data is no longer linearly separable.
How does your PLA implementation from above behave on this new data?
Copy your PLA implementation to the function PLA_robust below. Now change the loop condition to avoid this problem. Feel free to change the function signature as you see fit.
Obtain a perceptron from your function PLA_robust for a new dataset by running the last code cell in this section.

We call a dataset $D = \{(x_n, y_n) \mid x_n \in \_, y_n \in \_\}^N_{n=1}$ linearly separable, iff
$\exists \_ \in \_ \forall (x_i, y_i) \in D: \_ = y_i.$
(2)

not_linearly_separable_data = AND_data.copy()
not_linearly_separable_data.loc[3, "x1"] = 1  # TODO: change value
not_linearly_separable_data.loc[3, "x2"] = 1  # TODO: change value

plot_data(not_linearly_separable_data)

def PLA_robust(D: pd.DataFrame, label_col="y"):
    pass  # TODO: implement

blobs_xs, blobs_ys = make_blobs(100, 2, centers=[[0, 0], [0, 1], [1, 0], [1, 1]], cluster_std=.5)
noisy_AND_data = pd.DataFrame({"x1": blobs_xs[:, 0], "x2": blobs_xs[:, 1], "y": [1 if y == 3 else -1 for y in blobs_ys]})

PLA_robust_weights = PLA_robust(noisy_AND_data, label_col)
print("w =", PLA_robust_weights)
PLA_robust_perceptron = Perceptron(PLA_robust_weights)

plot_perceptron_on_data(PLA_robust_perceptron, noisy_AND_data)

Confusion Matrix¶

The confusion matrix is the basis for common classification model evaluation.

Run the following code to obtain a confusion matrix for the perceptron you have just trained.
Which cell in this matrix relates to which area in this figure: https://commons.wikimedia.org/wiki/File:Precisionrecall.svg?

confmat = confusion_matrix([PLA_robust_perceptron.predict(np.array([[1, row[0], row[1]]])) for row in noisy_AND_data.itertuples(index=False)], noisy_AND_data["y"])
confmat_display = ConfusionMatrixDisplay(confmat)
confmat_display.plot()
plt.show()

Relations to Wikipedia figure

top-left:
top-right:
bottom-left:
bottom-right:

Bonus: Elements of learning¶

Let’s look at the final learning setup again, where you applied the PLA_robust learning algorithm to the noisy data, and determine the main elements that were involved.

Some questions will be tough to answer. Seek a discussion with fellow student or the class tutor if you feel there is no definitive answer.

Map the definition of a learning algorithm to this setup.
1. What is the task T?
2. What is the experience E?
3. What is the performance measure P?
Map the elements of the “learning setup diagram” from the lecture to this setup.
1. What is the target function?
2. What are the training examples?
3. What is the hypothesis set?
4. What is the learning algorithm?
5. What is the final hypothesis?