Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

A General Perspective on Graph Neural Networks

General GNN Framework

Consider the following GNN layer:

hv(l)=σ(AGG(l)({MSG(l)(hu(l1)uN(v))}))h_v^{(l)} = \sigma \left( \operatorname{AGG}^{(l)} \left( \{\operatorname{MSG}^{(l)}(h_u^{(l-1)} \mid u \in N(v))\} \right) \right)
  1. Explain the role of AGG and MSG components.

  2. Why must AGG be permutation invariant?

  3. What issue arises in this formulation? (Hint: Consider how node vv incorporates information about itself.)

  4. How can we modify this equation to fix the issue in (3)?

  5. Modify the layer so that self-information and neighbor-information are treated differently. Provide the updated equation. (Hint: You are not limited to a single trainable weight matrix.)

Depth of a GNN

Consider the following GNN layer:

hv(l)=CONCAT(AGG({mu(l)uN(v)}),mv(l))h_v^{(l)} = \operatorname{CONCAT} \big( \operatorname{AGG} \big( \{ m_u^{(l)} \mid u \in N(v) \} \big), m_v^{(l)} \big)

where mu(l)=MSG(l)(hu(l1))m_u^{(l)}=\operatorname{MSG}^{(l)}(h_u^{(l-1)}) are the messages from neighbors and mv(l)=MSG(l)(hv(l1))m_v^{(l)}=\operatorname{MSG}^{(l)}(h_v^{(l-1)}) is the self-message.

  1. Assume that the AGG is the SUM function, and the messages mu(l)m_u^{(l)} and mv(l)m_v^{(l)} are dd-dimensional vectors. What is the dimension of hv(l)h_v^{(l)}?

  2. Discuss potential issues or challenges this concatenation might introduce when stacking multiple GNN layers.

  3. Suggest a technique to control the dimension after concatenation.

  4. Now, assume that we change the layer definition so that the concatenated vector is fed to a kk-layer multi-layer perceptron (MLP) as given in the following:

    hv(l)=MLP(CONCAT(AGG({mu(l)uN(v)}),mv(l)))h_v^{(l)} = \operatorname{MLP} \big( \operatorname{CONCAT} \big( \operatorname{AGG} \big( \{ m_u^{(l)} \mid u \in N(v) \} \big), m_v^{(l)} \big) \big)

    If we stack LL such layers, what will be the total depth of the GNN?

Graph Convolutional Networks

Consider the layer definition of the GCN by Kipf & Welling (paper link):

hv(l+1)=σ(uN(v){v}1d~vd~uhu(l)W(l))h_v^{(l+1)} = \sigma\left( \sum_{u \in N(v) \cup \{v\}} \frac{1}{\sqrt{\tilde{d}_v \tilde{d}_u}} \, h_u^{(l)} W^{(l)} \right)

where d~v\tilde{d}_v is the degree of node vv in the augmented adjacency A~=A+I\tilde{A} = A + I after adding self-loops.

  1. Compared to the simple normalization factor 1N(v)\frac{1}{|N(v)|} we have seen in the lecture (W4, slide 17), what could be the reason to have 1d~vd~u\frac{1}{\sqrt{\tilde{d}_v \tilde{d}_u}} (symmetric normalization)?

  2. How does the GCN process a node’s own message differently from its neighbors’ messages?

  3. GraphSAGE (SAmple and aggreGatE) by Hamilton et. al. (paper link) is an extension of the GCN framework. Given the pseudocode below, list three aspects of GraphSAGE that is different from the original GCN and discuss how these changes improve the method.

    alt text
  4. How does the Graph Attention Network by Veličković et. al. (paper) improve the GCN?

Programming: Simple GNN with Torch

In this exercise, you’ll implement a simple GNN with pytorch and test on the Cora dataset (link).

The layer definition of the GNN you’ll implement is given as follows:

hv(l)=σ(Wself(l)hv(l1)+Wneigh(l)AGG(l)({hu(l1)uN(v)}))h_v^{(l)} = \sigma \left( W_{\text{self}}^{(l)} \cdot h_v^{(l-1)} + W_{\text{neigh}}^{(l)} \cdot \operatorname{AGG}^{(l)}\left(\{h_u^{(l-1)} \mid u \in N(v)\}\right) \right)

Basically, it uses two learnable weight matrices WselfW_{\text{self}} and WneighW_{\text{neigh}} that are multiplied with the messages from the node’s self and aggregated messages from neighbors, respectively. Then, the resulting vectors are summed and fed to σ\sigma.

  1. Complete the following code snippet given the following:

    • Use MEAN aggretagor for neighbor messages.

    • Use ReLU as σ\sigma.

import torch
import torch.nn as nn
import torch.nn.functional as F

# A simple GNN layer that
# - aggregates neighbor features by mean
# - uses separate weight matrices for self and neighbor features
class SimpleGNNLayer(nn.Module):
    def __init__(self, in_dim, out_dim):
        super().__init__()
        self.linear_self = nn.Linear(in_dim, out_dim, bias=True)
        self.linear_neigh = nn.Linear(in_dim, out_dim, bias=True)

    def forward(self, x, edge_index):

        # edges j -> i (col = source, row = target)
        row, col = edge_index

        # messages are neighbor features
        messages = x[col]

        # aggregate by mean
        ############# your code here ############


    
        #########################################

        # update rule: combine self and neighbors with separate weights
        ############# your code here ############


    
        #########################################

        # return updated node features
        return out

# A simple 2-layer GNN model
class SimpleGNN(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super().__init__()
        self.layer1 = SimpleGNNLayer(in_dim, hidden_dim)
        self.layer2 = SimpleGNNLayer(hidden_dim, out_dim)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        # apply GNN layers with ReLU non-linearity
        ############# your code here ############


    
        #########################################

        return x

Now, train your GNN using the following script.

from torch_geometric.datasets import Planetoid

# load the Cora dataset
dataset = Planetoid(root='/tmp/Cora', name='Cora')

# model, data, optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleGNN(in_dim=dataset.num_node_features, hidden_dim=16, out_dim=dataset.num_classes).to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

# training loop
train_losses = []
model.train()
for epoch in range(50):
    
    optimizer.zero_grad()
    out = model(data)
    
    # training loss
    loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    train_losses.append(loss.item())
    print(f"Epoch {epoch}: Train Loss = {loss.item():.4f}")

# final eval
model.eval()

pred = model(data).argmax(dim=1)
correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
acc = int(correct) / int(data.test_mask.sum())

print(f"Test Accuracy: {acc:.4f}")
  1. Now, modify the SimpleGNN class so that it takes the number of layers as input.

  1. Now, train two GNNs, one with 2 layers and another with 16. Then, compare the Mean Average Distance (MAD) of the embeddings learned by the models using the given function.

    • Mean Average Distance (MAD): Average pairwise Euclidean distance between node embeddings.

def get_mad(embeddings):

    # embeddings: [N, d]  (N nodes, d-dimensional embeddings)
    N = embeddings.size(0)

    # Normalize embeddings
    embeddings = F.normalize(embeddings, dim=1)

    # Compute all pairwise Euclidean distances → shape [N, N]
    dist_matrix = torch.cdist(embeddings, embeddings, p=2)

    # Average the distances over all distinct pairs
    mad_val = dist_matrix.sum() / (N * (N - 1))

    return mad_val.item()
  1. Interpret the accuracy and MAD values of models with 2 and 16 layers.