Neural Networks

Contents

Neural Networks#

A simple neural network to regress a quadratic function

In this tutorial, we are investigating how a simple neural network approximates a simple non-linear function. In this case, we will try to approximate the quadratic function \(y = f(x) = x^2\):

# ground truth function f(x) = x^2 that we want to approximate
x_plot = np.linspace(-5, 5, 100)
y_plot = x_plot * x_plot

# draw training samples
x = np.random.uniform(-5, 5, 100)
y = x * x

def plot_samples(x, y, title = "samples"):
    global x_plot, y_plot
    plt.plot(x_plot, y_plot, color="tab:red", label="$f(x)=x^2$")
    plt.scatter(x, y, label="samples")
    plt.gca().set(xlabel="x", ylabel="y")
    plt.legend()
    plt.title(title)

plot_samples(x, y)

../../_images/64c7baaf9723b4f0f3f4aa2a52138f4be23acd253f6f428f6dcacb674c9f7e65.png

We will use PyTorch to build our neural network

from torch.optim import SGD

# number of hidden units
N_HIDDEN = 8

# define the layout of our neural network
model = nn.Sequential(
    nn.Linear(1, N_HIDDEN),
    # we will use ReLU as our non-linear activation function
    nn.ReLU(),
    nn.Linear(N_HIDDEN, 1)
)

# you can ignore this for now
# later, this hook will save as the latent representations
activations = dict()
def hook(module, x_in, x_out):
    global activations
    activations["latent"] = x_out.detach()
model[1].register_forward_hook(hook)

# the optimizer will update the model parameters for us
# we will use SGD: Stochastic Gradient Descent
LEARNING_RATE = 0.02
optimizer = SGD(model.parameters(), lr=LEARNING_RATE)

# number of iterations that we will train our model
EPOCHS = 5000

# our loss function: defines error between predictions and targets
# we will use Mean Squared Error, since it is a regression task
criterion = nn.MSELoss()

# for torch, we need to do some reshaping of our training data
# our network expects a single number as input
# so we need to reshape x/y from (100,) to (100,1), including a batch dimension
x_train = torch.from_numpy(x).reshape(-1, 1).float()
y_train = torch.from_numpy(y).reshape(-1, 1).float()

loss_log = []

for i in trange(EPOCHS):
    # 1. forward pass: get model predictions
    y_hat = model(x_train)
    # 2. compute loss: error between predictions and targets
    loss = criterion(y_hat, y_train)
    # 3. backward pass: propagate gradient trough network
    loss.backward()
    loss_log.append(loss.item())
    # 4. update parameters: perform one optimization step
    optimizer.step()
    # 5. reset torch gradients
    optimizer.zero_grad()

# plot the loss curve
plt.plot(loss_log)
plt.ylabel("MSE Loss")
plt.xlabel("epoch")

Text(0.5, 0, 'epoch')

../../_images/a8fb95e92f43f6c50bebaa3d16cce49e7bde32ca28e5771368180f6f8d5c39ca.png

# don't need gradients to predict on whole linspace
with torch.no_grad():
    x_test = torch.from_numpy(x_plot).reshape(-1, 1).float()
    y_pred = model(x_test).reshape(-1).detach().numpy()

# plot predictions
plot_samples(x_plot, y_pred);

../../_images/fb5026de76635a4dfa85b84f5f6978dcd4c0370c07b510cb8a9825ad97d98df1.png

But what’s going on within the model? Let’s investigate by visualizing the latent activations. This gives us a clue, what each neuron learned:

z = activations["latent"]

for i in range(z.shape[1]):
    plt.figure()
    plot_samples(x_plot, z[:, i].detach().numpy(), f"hidden neuron {i}")

../../_images/3e5748a23a0c654726a08da8209903e6eddc41d44660b76b42758371e6fbdd83.png

../../_images/fe398f2b2da39f94f21a17a1d2eb8fb256566846c0047ec1405c73a8efe7cf25.png

../../_images/dfd9157748304bb02537ada8bd2532d57fea051156a45d7739426e9023eb8813.png

../../_images/2867b7b4e65baf307383ccee92892b9f931126327d203a75e29118b1a822c54a.png

../../_images/0c70dd3cdc120eb197e05e743eff0b565df81d0090999150f759e1efc7b29aab.png

../../_images/b2f41f288e78f3737e37e4ba005cbf32faf8c4d5baf886327faca3b52634ea1d.png

../../_images/ee8cce9d427dc5d3f43762f0b42d3bdd85e9c5a01cd90a25b32cbb571f4c494f.png

../../_images/df16a06bd9d563aaa51f9e5b89c75ff25d07b311653b55ee20377b185f43a1a0.png

Prompts#

Try out different values for LEARNING_RATE and EPOCHS. How does the loss change?
Try different numbers of hidden units by changing N_HIDDEN. How does the quality of the approximation change?
Head to the tensorflow playground (https://playground.tensorflow.org) and play around with network configurations!