Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Graph Neural Networks

Why GNNs?

Answer briefly:

  1. Consider the given list of limitations of shallow encoders like random walk based node embedding methods. How do GNNs solve them?

    • Poor scalability (Vd|V|d parameters are needed, dd: dimension of the embedding space)

    • Transductive nature (Cannot obtain embeddings for nodes not in the training set.)

    • Cannot capture structural similarity

    • Cannot utilize node, edge, and graph features

  2. Given an undirected graph G=(V,E)G=(V,E), let’s flatten the adjacency matrix AA (i.e., concatenate the rows into a single vector) and feed it to a Multi-Layer Perceptron (MLP). What’s wrong with this approach?

  3. In a CNN, a convolutional kernel aggregates information from local pixel neighborhoods. What would “locality” mean on a graph? How could we define a convolution operation that respects this locality?

Permutation Invariance and Equivariance

  1. Consider a graph with three node labels A, B, C and the adjacency matrix:

    A=[011100100]A = \begin{bmatrix} 0 & 1 & 1\\ 1 & 0 & 0\\ 1 & 0 & 0 \end{bmatrix}

    Suppose we feed this graph into a GNN layer defined as:

    hv(1)=σ(WAGG({hu(0):uN(v)}))h_v^{(1)} = \sigma (W \cdot \text{AGG}(\{h_u^{(0)}: u \in N(v)\}))

    where AGG is the sum function.

    If we permute the node order to (C, A, B), will the node embeddings change after the layer? Why or why not?

  1. Let A{0,1}n×nA \in \{0,1\}^{n \times n} be the adjacency matrix and XRn×dX \in \mathbb{R}^{n \times d} be the node feature matrix. Let PP be an n×nn \times n permutation matrix (it reorders node indices). Permutation of the graph means:

    A=PAP,X=PXA^\prime = PAP^\intercal, \quad X^\prime=PX

    For each function f(A,X)f(A,X) below, determine wheteher it is:

    • Permutation invariant: f(A,X)=f(A,X)f(A^\prime ,X^\prime) = f(A,X)

    • Permutation equivariant: f(A,X)=Pf(A,X)f(A^\prime ,X^\prime) = Pf(A,X)

    • Neither

    FunctionInv./Equiv./Neither
    f(A,X)=1Xf(A,X) = 1^\intercal X
    f(A,X)=Xf(A,X) = X
    f(A,X)=AXf(A,X) = AX
    f(A,X)=AXf(A,X) = A^\intercal X
    f(A,X)=ReLU(AXW)f(A,X) = \text{ReLU}(AXW)
    f(A,X)=1n1Xf(A,X) = \frac{1}{n} 1^\intercal X
    f(A,X)=XXf(A,X) = X^\intercal X
    f(A,X)=XWf(A,X) = XW
    f(A,X)=A1,:Xf(A,X) = A_{1,:}X
    f(A,X)=sort(X)f(A,X) = \text{sort}(X)

One GNN Layer

Consider the following simple undirected graph G=(V,E)G=(V,E):

V={1,2,3},E={{1,2},{2,3}}V=\{ 1,2,3 \}, \quad E=\{ \{1,2\},\{2,3\} \}

The initial node feature matrix and the (unweighted) adjacency matrix are given as follows:

X=[100111],A=[010101010]X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix}, \quad A = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}

We apply one Graph Convolutional Network (GCN) layer as defined by Kipf & Welling (2017):

H=σ(D~1/2A~D~1/2XW)H = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}XW)

where

  • DD is the degree matrix (D=diag(1,2,1)D=\text{diag}(1,2,1))

  • A~=A+I\tilde{A}=A+I

  • D~ii=jA~ij\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}

  • σ()\sigma(\cdot) is ReLU

and for simplicity, the weight matrix is

W=[1001]W = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}

Tasks:

  1. Compute A~\tilde{A} and D~\tilde{D}

  2. Compute the normalized adjacency D~1/2A~D~1/2\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}

  3. Multiply with XWXW

  4. Apply the ReLU

  5. Write down the resulting node embeddings H1,H2,H3H_1,H_2,H_3

  6. Now compare the initial node features XX and the embeddings HH. How did each node’s location in the embedding space change and why?

Programming: One GNN Layer with Torch

In this exercise, you’ll implement the same task given in the previous example with torch.

  1. Complete the following code snippet and reproduce your result.

import torch

# initial features
X = torch.tensor([
    [1., 0.],  # Node 1
    [0., 1.],  # Node 2
    [1., 1.]   # Node 3
])

# toy graph: 1–2–3
A = torch.tensor([
    [0., 1., 0.],
    [1., 0., 1.],
    [0., 1., 0.]
])

# weight matrix (identity for simplicity)
W = torch.eye(2)

# Task 1: Compute A_hat and D_hat
A_hat = 0
D_hat = 0

# Task 2: Compute normalized adjacency
A_norm = 0

# Task 3-4: Multiply normalized adjacency with XW and apply ReLU
H = 0

print("Initial features X:\n", X)
print("\nNormalized adjacency A_norm:\n", A_norm)
print("\nEmbeddings H:\n", H)
Initial features X:
 tensor([[1., 0.],
        [0., 1.],
        [1., 1.]])

Normalized adjacency A_norm:
 0

Embeddings H:
 0
  1. Now, re-apply the GCN layer 10 more times. What happens to the embeddings? Interpret your results.