Skip to article frontmatterSkip to article content

Introduction

Graph Basics

Given the following undirected graph G=(V,E)G=(V,E):

V={A,B,C,D,E}V=\{A,B,C,D,E\}

E={{A,B},{A,C},{B,C},{C,D},{D,E}}E=\{\{A,B\},\{A,C\},\{B,C\},\{C,D\},\{D,E\}\}

  1. Draw the graph.
  2. Write down the adjaceny matrix AA.
  3. Compute the degree of each node and the average node degree.
  4. Is the graph connected? If not, how many connected components are there? (Hint: An undirected graph is connected if there is a path between every pair of nodes.)

Directed vs Undirected

For the same set of nodes and edges as above, assume now that all edges are directed from the alphabetically smaller node to the larger one (e.g., ABA \to B, BCB \to C).

  1. Write the adjacency matrix for this directed version.
  2. Compute in-degrees and out-degrees of each node.
  3. Identify nodes with zero in-degree or zero out-degree.

Large Graphs

You are given an undirected graph with 1 million nodes.

  1. What is the lower and upper bounds of the cardinality of EE? (Hint: Cardinality is the number of elements in a set.)
  2. If the graph had 5 million edges, what would be the density of the graph? (The graph density is defined to be the ratio of the number of edges with respect to the maximum possible edges.)
  3. With 1 million nodes and 5 million edges, which representation (adjacency list vs. adjacency matrix) would be more memory-efficient and why? (Hint: Adjacency list is a graph representation where each node stores a list of its neighboring nodes (i.e., nodes directly connected by an edge))
  4. Discuss which one is more convenient for performing:
    • Degree computations
    • Edge lookups
    • Neighborhood sampling (i.e., randomly selecting a subset of a node’s neighbors).

Heterogeneous Graphs

You are given a heterogeneous graph describing a small academic network:

  • Nodes: {paper, author, institution}
  • Edges:
    • author -> paper (“writes”)
    • paper -> paper (“cites”)
    • author -> institution (“affiliated_with”)
  1. Explain why using a simple adjacency matrix is not sufficient to represent this graph, and how would you represent it?
  2. What would be the maximum number of edges this graph may have with vpv_p paper, vav_a author and viv_i institution nodes?
  3. Assume that all possible edges exist in this graph. For vp=100v_p=100, va=20v_a=20 and vi=2v_i=2, what would be the density? (Hint: For the denominator, think of the case where there is no constraints on edges.)
  4. Give one example of:
    • A node-level task
    • An edge-level task
    • A graph-level task
  5. You are tasked to add auxiliary (implicit) edges between author nodes in this graph, what edge types would you add? How would you determine which author nodes to connect?

Local Structure

Consider the following graphs G1G1 and G2G2:

G1{width=100px}

G2{width=100px}

  1. Compute the degree of each node and the average degree in both graphs.
  2. For each node, list the degrees of its neighbors (this represents a 1-hop “neighborhood description”).
  3. Based only on this information (degrees and neighbor degrees), could a simple model that only “looks one hop around each node” tell G1G1 and G2G2 apart?
  4. How might a model do better?

Deep Learning with Graphs

Answer briefly:

  1. Why can’t we directly apply CNNs to graphs like we do for images?
  2. What property must the model have to generalize across graphs of different sizes?
  3. Why is this property natural for graphs but not for images?

Programming: NetworkX and PyG Basics

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. For more info, check out its documentation.

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data. Check out the documentation here.

Building a Simple Graph

Complete the following code snippet to build the graph given in the first exercise.

import networkx as nx

# create an empty graph
G=nx.Graph()

############# Your code here ############

#########################################

# draw the graph
nx.draw(G, with_labels=True)

Zachary’s karate club network

Now let’s consider a more interesting graph: The Karate Club Network is a graph which describes a social network of 34 members of a karate club and documents links between members who interacted outside the club.

The following code snippet loads and draws the graph. Also notice that the average_degree function is left for you to implement.

G = nx.karate_club_graph()

# Visualize the graph
nx.draw(G, with_labels = True)

def average_degree(num_edges, num_nodes):
    # TODO: Implement this function that takes number of edges
    # and number of nodes, and returns the average node degree of
    # the graph.

    avg_degree = 0

    ############# Your code here ############

    #########################################

    return avg_degree

num_edges = G.number_of_edges()
num_nodes = G.number_of_nodes()
avg_degree = average_degree(num_edges, num_nodes)
print("Average degree of karate club network is {}".format(avg_degree))
Average degree of karate club network is 0
<Figure size 640x480 with 1 Axes>

NetworkX Graph to PyTorch Tensor

Now let’s transform the graph GG into a PyTorch tensor. First, let’s check if PyTorch is properly installed.

import torch
print(torch.__version__)
2.4.1

We can generate PyTorch tensor with all zeros, ones or random values.

# Generate 3 x 4 tensor with all ones
ones = torch.ones(3, 4)
print(ones)

# Generate 3 x 4 tensor with all zeros
zeros = torch.zeros(3, 4)
print(zeros)

# Generate 3 x 4 tensor with random values on the interval [0, 1)
random_tensor = torch.rand(3, 4)
print(random_tensor)

# Get the shape of the tensor
print(ones.shape)
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
tensor([[0.7204, 0.9176, 0.1714, 0.6785],
        [0.2893, 0.4568, 0.6883, 0.8125],
        [0.4675, 0.7884, 0.3535, 0.7510]])
torch.Size([3, 4])

PyTorch tensor contains elements for a single data type, the dtype.

# Create a 3 x 4 tensor with all 32-bit floating point zeros
zeros = torch.zeros(3, 4, dtype=torch.float32)
print(zeros.dtype)

# Change the tensor dtype to 64-bit integer
zeros = zeros.type(torch.long)
print(zeros.dtype)
torch.float32
torch.int64

In the next code snippet, complete the two functions to get the edge list of the karate club network and transform it into torch.LongTensor. What is the torch.sum value of pos_edge_index tensor?

def graph_to_edge_list(G):
    # TODO: Implement the function that returns the edge list of
    # an nx.Graph. The returned edge_list should be a list of tuples
    # where each tuple is a tuple representing an edge connected
    # by two nodes.

    edge_list = []

    ############# Your code here ############

    #########################################

    return edge_list

def edge_list_to_tensor(edge_list):
    # TODO: Implement the function that transforms the edge_list to
    # tensor. The input edge_list is a list of tuples and the resulting
    # tensor should have the shape [2, len(edge_list)].

    edge_index = torch.tensor([])

    ############# Your code here ############

    #########################################

    return edge_index

pos_edge_list = graph_to_edge_list(G)
pos_edge_index = edge_list_to_tensor(pos_edge_list)
print("The pos_edge_index tensor has shape {}".format(pos_edge_index.shape))
print("The pos_edge_index tensor has sum value {}".format(torch.sum(pos_edge_index)))
The pos_edge_index tensor has shape torch.Size([0])
The pos_edge_index tensor has sum value 0.0

Question: What could be the benefit of the Edge Index Tensor representation over the adjacency list?