Introduction - Machine Learning with Graphs

Graph Basics¶

Given the following undirected graph $G=(V,E)$ :

$V=\{A,B,C,D,E\}$

$E=\{\{A,B\},\{A,C\},\{B,C\},\{C,D\},\{D,E\}\}$

Draw the graph.
Write down the adjaceny matrix $A$ .
Compute the degree of each node and the average node degree.
Is the graph connected? If not, how many connected components are there? (Hint: An undirected graph is connected if there is a path between every pair of nodes.)

Directed vs Undirected¶

For the same set of nodes and edges as above, assume now that all edges are directed from the alphabetically smaller node to the larger one (e.g., $A \to B$ , $B \to C$ ).

Write the adjacency matrix for this directed version.
Compute in-degrees and out-degrees of each node.
Identify nodes with zero in-degree or zero out-degree.

Large Graphs¶

You are given an undirected graph with 1 million nodes.

What is the lower and upper bounds of the cardinality of $E$ ? (Hint: Cardinality is the number of elements in a set.)
If the graph had 5 million edges, what would be the density of the graph? (The graph density is defined to be the ratio of the number of edges with respect to the maximum possible edges.)
With 1 million nodes and 5 million edges, which representation (adjacency list vs. adjacency matrix) would be more memory-efficient and why? (Hint: Adjacency list is a graph representation where each node stores a list of its neighboring nodes (i.e., nodes directly connected by an edge))
Discuss which one is more convenient for performing:
- Degree computations
- Edge lookups
- Neighborhood sampling (i.e., randomly selecting a subset of a node’s neighbors).

Heterogeneous Graphs¶

You are given a heterogeneous graph describing a small academic network:

Nodes: {paper, author, institution}
Edges:
- author -> paper (“writes”)
- paper -> paper (“cites”)
- author -> institution (“affiliated_with”)

Explain why using a simple adjacency matrix is not sufficient to represent this graph, and how would you represent it?
What would be the maximum number of edges this graph may have with $v_p$ paper, $v_a$ author and $v_i$ institution nodes?
Assume that all possible edges exist in this graph. For $v_p=100$ , $v_a=20$ and $v_i=2$ , what would be the density? (Hint: For the denominator, think of the case where there is no constraints on edges.)
Give one example of:
- A node-level task
- An edge-level task
- A graph-level task
You are tasked to add auxiliary (implicit) edges between author nodes in this graph, what edge types would you add? How would you determine which author nodes to connect?

Local Structure¶

Consider the following graphs $G1$ and $G2$ :

{width=100px}

Compute the degree of each node and the average degree in both graphs.
For each node, list the degrees of its neighbors (this represents a 1-hop “neighborhood description”).
Based only on this information (degrees and neighbor degrees), could a simple model that only “looks one hop around each node” tell $G1$ and $G2$ apart?
How might a model do better?

Deep Learning with Graphs¶

Answer briefly:

Why can’t we directly apply CNNs to graphs like we do for images?
What property must the model have to generalize across graphs of different sizes?
Why is this property natural for graphs but not for images?

Programming: NetworkX and PyG Basics¶

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. For more info, check out its documentation.

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data. Check out the documentation here.

Building a Simple Graph¶

Complete the following code snippet to build the graph given in the first exercise.

import networkx as nx

# create an empty graph
G=nx.Graph()

############# Your code here ############

#########################################

# draw the graph
nx.draw(G, with_labels=True)

Zachary’s karate club network¶

Now let’s consider a more interesting graph: The Karate Club Network is a graph which describes a social network of 34 members of a karate club and documents links between members who interacted outside the club.

The following code snippet loads and draws the graph. Also notice that the average_degree function is left for you to implement.

G = nx.karate_club_graph()

# Visualize the graph
nx.draw(G, with_labels = True)

def average_degree(num_edges, num_nodes):
    # TODO: Implement this function that takes number of edges
    # and number of nodes, and returns the average node degree of
    # the graph.

    avg_degree = 0

    ############# Your code here ############

    #########################################

    return avg_degree

num_edges = G.number_of_edges()
num_nodes = G.number_of_nodes()
avg_degree = average_degree(num_edges, num_nodes)
print("Average degree of karate club network is {}".format(avg_degree))

Average degree of karate club network is 0

NetworkX Graph to PyTorch Tensor¶

Now let’s transform the graph $G$ into a PyTorch tensor. First, let’s check if PyTorch is properly installed.

import torch
print(torch.__version__)

2.4.1

We can generate PyTorch tensor with all zeros, ones or random values.

# Generate 3 x 4 tensor with all ones
ones = torch.ones(3, 4)
print(ones)

# Generate 3 x 4 tensor with all zeros
zeros = torch.zeros(3, 4)
print(zeros)

# Generate 3 x 4 tensor with random values on the interval [0, 1)
random_tensor = torch.rand(3, 4)
print(random_tensor)

# Get the shape of the tensor
print(ones.shape)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
tensor([[0.7204, 0.9176, 0.1714, 0.6785],
        [0.2893, 0.4568, 0.6883, 0.8125],
        [0.4675, 0.7884, 0.3535, 0.7510]])
torch.Size([3, 4])

PyTorch tensor contains elements for a single data type, the dtype.

# Create a 3 x 4 tensor with all 32-bit floating point zeros
zeros = torch.zeros(3, 4, dtype=torch.float32)
print(zeros.dtype)

# Change the tensor dtype to 64-bit integer
zeros = zeros.type(torch.long)
print(zeros.dtype)

torch.float32
torch.int64

In the next code snippet, complete the two functions to get the edge list of the karate club network and transform it into torch.LongTensor. What is the torch.sum value of pos_edge_index tensor?

def graph_to_edge_list(G):
    # TODO: Implement the function that returns the edge list of
    # an nx.Graph. The returned edge_list should be a list of tuples
    # where each tuple is a tuple representing an edge connected
    # by two nodes.

    edge_list = []

    ############# Your code here ############

    #########################################

    return edge_list

def edge_list_to_tensor(edge_list):
    # TODO: Implement the function that transforms the edge_list to
    # tensor. The input edge_list is a list of tuples and the resulting
    # tensor should have the shape [2, len(edge_list)].

    edge_index = torch.tensor([])

    ############# Your code here ############

    #########################################

    return edge_index

pos_edge_list = graph_to_edge_list(G)
pos_edge_index = edge_list_to_tensor(pos_edge_list)
print("The pos_edge_index tensor has shape {}".format(pos_edge_index.shape))
print("The pos_edge_index tensor has sum value {}".format(torch.sum(pos_edge_index)))

The pos_edge_index tensor has shape torch.Size([0])
The pos_edge_index tensor has sum value 0.0

Question: What could be the benefit of the Edge Index Tensor representation over the adjacency list?

Machine Learning with Graphs

Node Embeddings