Why GNNs?¶
Answer briefly:
Consider the given list of limitations of shallow encoders like random walk based node embedding methods. How do GNNs solve them?
Poor scalability ( parameters are needed, : dimension of the embedding space)
Transductive nature (Cannot obtain embeddings for nodes not in the training set.)
Cannot capture structural similarity
Cannot utilize node, edge, and graph features
Given an undirected graph , let’s flatten the adjacency matrix (i.e., concatenate the rows into a single vector) and feed it to a Multi-Layer Perceptron (MLP). What’s wrong with this approach?
In a CNN, a convolutional kernel aggregates information from local pixel neighborhoods. What would “locality” mean on a graph? How could we define a convolution operation that respects this locality?
Permutation Invariance and Equivariance¶
Consider a graph with three node labels A, B, C and the adjacency matrix:
Suppose we feed this graph into a GNN layer defined as:
where AGG is the sum function.
If we permute the node order to (C, A, B), will the node embeddings change after the layer? Why or why not?
Let be the adjacency matrix and be the node feature matrix. Let be an permutation matrix (it reorders node indices). Permutation of the graph means:
For each function below, determine wheteher it is:
Permutation invariant:
Permutation equivariant:
Neither
Function Inv./Equiv./Neither
One GNN Layer¶
Consider the following simple undirected graph :
The initial node feature matrix and the (unweighted) adjacency matrix are given as follows:
We apply one Graph Convolutional Network (GCN) layer as defined by Kipf & Welling (2017):
where
is the degree matrix ()
is ReLU
and for simplicity, the weight matrix is
Tasks:
Compute and
Compute the normalized adjacency
Multiply with
Apply the ReLU
Write down the resulting node embeddings
Now compare the initial node features and the embeddings . How did each node’s location in the embedding space change and why?
Programming: One GNN Layer with Torch¶
In this exercise, you’ll implement the same task given in the previous example with torch.
Complete the following code snippet and reproduce your result.
import torch
# initial features
X = torch.tensor([
[1., 0.], # Node 1
[0., 1.], # Node 2
[1., 1.] # Node 3
])
# toy graph: 1–2–3
A = torch.tensor([
[0., 1., 0.],
[1., 0., 1.],
[0., 1., 0.]
])
# weight matrix (identity for simplicity)
W = torch.eye(2)
# Task 1: Compute A_hat and D_hat
A_hat = 0
D_hat = 0
# Task 2: Compute normalized adjacency
A_norm = 0
# Task 3-4: Multiply normalized adjacency with XW and apply ReLU
H = 0
print("Initial features X:\n", X)
print("\nNormalized adjacency A_norm:\n", A_norm)
print("\nEmbeddings H:\n", H)
Initial features X:
tensor([[1., 0.],
[0., 1.],
[1., 1.]])
Normalized adjacency A_norm:
0
Embeddings H:
0
Now, re-apply the GCN layer 10 more times. What happens to the embeddings? Interpret your results.