Neural Networks

Contents

Neural Networks#

A simple neural network to regress a quadratic function

In this tutorial, we are investigating how a simple neural network approximates a simple non-linear function. In this case, we will try to approximate the quadratic function \(y = f(x) = x^2\):

# ground truth function f(x) = x^2 that we want to approximate
x_plot = np.linspace(-5, 5, 100)
y_plot = x_plot * x_plot

# draw training samples
x = np.random.uniform(-5, 5, 100)
y = x * x

def plot_samples(x, y, title = "samples"):
    global x_plot, y_plot
    plt.plot(x_plot, y_plot, color="tab:red", label="$f(x)=x^2$")
    plt.scatter(x, y, label="samples")
    plt.gca().set(xlabel="x", ylabel="y")
    plt.legend()
    plt.title(title)

plot_samples(x, y)

_images/cd3c4b873f71440cf83e31362b3c224b63178fc92257f28eb1cb6f7a7f57eeec.png

We will use PyTorch to build our neural network

from torch.optim import SGD

# number of hidden units
N_HIDDEN = 8

# define the layout of our neural network
model = nn.Sequential(
    nn.Linear(1, N_HIDDEN),
    # we will use ReLU as our non-linear activation function
    nn.ReLU(),
    nn.Linear(N_HIDDEN, 1)
)

# you can ignore this for now
# later, this hook will save as the latent representations
activations = dict()
def hook(module, x_in, x_out):
    global activations
    activations["latent"] = x_out.detach()
model[1].register_forward_hook(hook)

# the optimizer will update the model parameters for us
# we will use SGD: Stochastic Gradient Descent
LEARNING_RATE = 0.02
optimizer = SGD(model.parameters(), lr=LEARNING_RATE)

# number of iterations that we will train our model
EPOCHS = 5000

# our loss function: defines error between predictions and targets
# we will use Mean Squared Error, since it is a regression task
criterion = nn.MSELoss()

# for torch, we need to do some reshaping of our training data
# our network expects a single number as input
# so we need to reshape x/y from (100,) to (100,1), including a batch dimension
x_train = torch.from_numpy(x).reshape(-1, 1).float()
y_train = torch.from_numpy(y).reshape(-1, 1).float()

loss_log = []

for i in trange(EPOCHS):
    # 1. forward pass: get model predictions
    y_hat = model(x_train)
    # 2. compute loss: error between predictions and targets
    loss = criterion(y_hat, y_train)
    # 3. backward pass: propagate gradient trough network
    loss.backward()
    loss_log.append(loss.item())
    # 4. update parameters: perform one optimization step
    optimizer.step()
    # 5. reset torch gradients
    optimizer.zero_grad()

  0%|          | 0/5000 [00:00<?, ?it/s]

  6%|▌         | 280/5000 [00:00<00:01, 2796.76it/s]

 11%|█▏        | 563/5000 [00:00<00:01, 2812.19it/s]

 17%|█▋        | 849/5000 [00:00<00:01, 2831.51it/s]

 23%|██▎       | 1133/5000 [00:00<00:01, 2834.45it/s]

 28%|██▊       | 1417/5000 [00:00<00:01, 2832.11it/s]

 34%|███▍      | 1701/5000 [00:00<00:01, 2834.51it/s]

 40%|███▉      | 1986/5000 [00:00<00:01, 2836.81it/s]

 45%|████▌     | 2271/5000 [00:00<00:00, 2840.65it/s]

 51%|█████     | 2556/5000 [00:00<00:00, 2839.37it/s]

 57%|█████▋    | 2840/5000 [00:01<00:00, 2833.06it/s]

 62%|██████▎   | 3125/5000 [00:01<00:00, 2837.67it/s]

 68%|██████▊   | 3410/5000 [00:01<00:00, 2841.34it/s]

 74%|███████▍  | 3697/5000 [00:01<00:00, 2847.11it/s]

 80%|███████▉  | 3982/5000 [00:01<00:00, 2846.60it/s]

 85%|████████▌ | 4268/5000 [00:01<00:00, 2848.38it/s]

 91%|█████████ | 4553/5000 [00:01<00:00, 2848.29it/s]

 97%|█████████▋| 4839/5000 [00:01<00:00, 2849.82it/s]

100%|██████████| 5000/5000 [00:01<00:00, 2839.69it/s]

# plot the loss curve
plt.plot(loss_log)
plt.ylabel("MSE Loss")
plt.xlabel("epoch")

Text(0.5, 0, 'epoch')

_images/f3baee41fc95f9a66325be3cdd8df16803ca24936aa5f2cfa97090660264701b.png

# don't need gradients to predict on whole linspace
with torch.no_grad():
    x_test = torch.from_numpy(x_plot).reshape(-1, 1).float()
    y_pred = model(x_test).reshape(-1).detach().numpy()

# plot predictions
plot_samples(x_plot, y_pred);

_images/94f928e5146f9474d3176d59fdb2d999877eb5fc45b3f0f235f94eb0e405ec2c.png

But what’s going on within the model? Let’s investigate by visualizing the latent activations. This gives us a clue, what each neuron learned:

z = activations["latent"]

for i in range(z.shape[1]):
    plt.figure()
    plot_samples(x_plot, z[:, i].detach().numpy(), f"hidden neuron {i}")

_images/f281b84be5a70b4404d21137cc3b2563b58d1267cdce7416ac7d5c46548973a0.png

_images/06e29950e3110773c3528858da93206c7533ba69d0e281cd99fb56501c2c1a3b.png

_images/a866a2e134a9b5a1736d6a53ccca4a368329edeb5840c8b4316b796b0a6c3c19.png

_images/70f03470a71cf2aeb5c144cbfd20ff4167f2bfb959166aada1a8b51762125aae.png

_images/52d89df287c27c23d8ac118a9a93aab7ae1de5c95e1ae50913b4eb4a28224cce.png

_images/41647d5e8563e0c14e8f842ee2950f792c48cb085659385cbf37e708918984fb.png

_images/4c0adb756cc70bc9d74b50828838063f74577ae2c79ecf31a74631175fb3ac47.png

_images/1815f7fb1322427fb89077e77909a99625e159ffa2aba2a394a56abef92cea05.png

Prompts#

Try out different values for LEARNING_RATE and EPOCHS. How does the loss change?
Try different numbers of hidden units by changing N_HIDDEN. How does the quality of the approximation change?
Head to the tensorflow playground (https://playground.tensorflow.org) and play around with network configurations!