Graph Neural Networks - Machine Learning with Graphs

Why GNNs?¶

Answer briefly:

Consider the given list of limitations of shallow encoders like random walk based node embedding methods. How do GNNs solve them?
- Poor scalability ( $|V|d$ parameters are needed, $d$ : dimension of the embedding space)
- Transductive nature (Cannot obtain embeddings for nodes not in the training set.)
- Cannot capture structural similarity
- Cannot utilize node, edge, and graph features
Given an undirected graph $G=(V,E)$ , let’s flatten the adjacency matrix $A$ (i.e., concatenate the rows into a single vector) and feed it to a Multi-Layer Perceptron (MLP). What’s wrong with this approach?
In a CNN, a convolutional kernel aggregates information from local pixel neighborhoods. What would “locality” mean on a graph? How could we define a convolution operation that respects this locality?

Permutation Invariance and Equivariance¶

Consider a graph with three node labels A, B, C and the adjacency matrix:
$A = \begin{bmatrix} 0 & 1 & 1\\ 1 & 0 & 0\\ 1 & 0 & 0 \end{bmatrix}$
(1)
Suppose we feed this graph into a GNN layer defined as:
$h_v^{(1)} = \sigma (W \cdot \text{AGG}(\{h_u^{(0)}: u \in N(v)\}))$
(2)
where AGG is the sum function.
If we permute the node order to (C, A, B), will the node embeddings change after the layer? Why or why not?

Let $A \in \{0,1\}^{n \times n}$ be the adjacency matrix and $X \in \mathbb{R}^{n \times d}$ be the node feature matrix. Let $P$ be an $n \times n$ permutation matrix (it reorders node indices). Permutation of the graph means:

A^\prime = PAP^\intercal, \quad X^\prime=PX

(3)

For each function $f(A,X)$ below, determine wheteher it is:

Permutation invariant: $f(A^\prime ,X^\prime) = f(A,X)$
Permutation equivariant: $f(A^\prime ,X^\prime) = Pf(A,X)$
Neither

Function	Inv./Equiv./Neither
$f(A,X) = 1^\intercal X$
$f(A,X) = X$
$f(A,X) = AX$
$f(A,X) = A^\intercal X$
$f(A,X) = \text{ReLU}(AXW)$
$f(A,X) = \frac{1}{n} 1^\intercal X$
$f(A,X) = X^\intercal X$
$f(A,X) = XW$
$f(A,X) = A_{1,:}X$
$f(A,X) = \text{sort}(X)$

One GNN Layer¶

Consider the following simple undirected graph $G=(V,E)$ :

V=\{ 1,2,3 \}, \quad E=\{ \{1,2\},\{2,3\} \}

(4)

The initial node feature matrix and the (unweighted) adjacency matrix are given as follows:

X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix}, \quad A = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}

(5)

We apply one Graph Convolutional Network (GCN) layer as defined by Kipf & Welling (2017):

H = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}XW)

(6)

where

$D$ is the degree matrix ( $D=\text{diag}(1,2,1)$ )
$\tilde{A}=A+I$
$\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}$
$\sigma(\cdot)$ is ReLU

and for simplicity, the weight matrix is

W = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}

(7)

Tasks:

Compute $\tilde{A}$ and $\tilde{D}$
Compute the normalized adjacency $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$
Multiply with $XW$
Apply the ReLU
Write down the resulting node embeddings $H_1,H_2,H_3$
Now compare the initial node features $X$ and the embeddings $H$ . How did each node’s location in the embedding space change and why?

Programming: One GNN Layer with Torch¶

In this exercise, you’ll implement the same task given in the previous example with torch.

Complete the following code snippet and reproduce your result.

import torch

# initial features
X = torch.tensor([
    [1., 0.],  # Node 1
    [0., 1.],  # Node 2
    [1., 1.]   # Node 3
])

# toy graph: 1–2–3
A = torch.tensor([
    [0., 1., 0.],
    [1., 0., 1.],
    [0., 1., 0.]
])

# weight matrix (identity for simplicity)
W = torch.eye(2)

# Task 1: Compute A_hat and D_hat
A_hat = 0
D_hat = 0

# Task 2: Compute normalized adjacency
A_norm = 0

# Task 3-4: Multiply normalized adjacency with XW and apply ReLU
H = 0

print("Initial features X:\n", X)
print("\nNormalized adjacency A_norm:\n", A_norm)
print("\nEmbeddings H:\n", H)

Initial features X:
 tensor([[1., 0.],
        [0., 1.],
        [1., 1.]])

Normalized adjacency A_norm:
 0

Embeddings H:
 0

Now, re-apply the GCN layer 10 more times. What happens to the embeddings? Interpret your results.