Knowledge Graphs - Machine Learning with Graphs

Warm-up: What is actually learned?¶

In KG embedding models like TransE:

What are the trainable parameters?
Suppose a new entity $e_{\text{new}}$ is added to the KG after training. Can TransE produce an embedding for it without retraining? Why or why not?
Let’s assume that you observe one triple $(h, r, e_{\text{new}})$ . Could you heuristically assign an embedding to $e_{\text{new}}$ ? What are the limitations? Hint: Think of the translation $\mathbf{h} + \mathbf{r} \approx \mathbf{t}$ .
Given a trained TransE model and a query $(h, r, ?)$ , inference is performed by computing the query embedding $\mathbf q = \mathbf h + \mathbf r$ . How can we get semantic understanding of what the embedding vector $\mathbf{q}$ corresponds to?

TransE Mechanics¶

Consider the following 2D embeddings:

$\mathbf{h} = (1, 0)$
$\mathbf{r} = (1, 1)$
$\mathbf{t_1} = (2, 1)$
$\mathbf{t_2} = (2, 2)$

Compute the TransE score $f_r(h,t) = -\lVert h + r - t \rVert_2$ for $t_1$ and $t_2$ . Which triple is more plausible?
Assume that relation $r$ is symmetric (e.g., “siblingOf”), meaning: $(h,r,t_1)$ and $(t_1,r,h)$ are both true. Write down the TransE equations implied by symmetry. What’s the issue here?
Assume that relation $r$ is 1-to-N (e.g., “studentOf”): $(h,r,t_1)$ and $(h,r,t_2)$ are true with $t_1 \neq t_2$ . Write the TransE equations for both triples. What geometric constraint does this impose on $t_1$ and $t_2$ ?
Suppose we increase the embedding dimension from $k=2$ to $k=100$ . Does this resolve the issue in (2) and (3)? Justify mathematically.

Path Queries¶

Consider a knowledge graph with the following entity types:

Person
Company
City
University
Country

and the following directed relations:

worksAt(Person → Company)
locatedIn(Company → City)
studiedAt(Person → University)
locatedIn(University → City)
basedIn(City → Country)
foundedBy(Company → Person)

For each of the following natural-language queries, write the formal path query.
- $Q_1$ : Which company does Alice work at?
- $Q_2$ : In which city is the company where Alice works located?
- $Q_3$ : In which country is the company where Alice works based?
- $Q_4$ : Which people work at companies located in Berlin?
- $Q_5$ : Which people studied at universities located in the same city as the company where Alice works?
- $Q_6$ : Which people studied at universities located in cities where companies founded by Bob are based?
- $Q_7$ : Which people both studied at universities in Berlin and work at companies based in Germany?
Which of the queries in (1) can TransE represent?
What if we simply use $-r$ to represent inverse relations. Would that work?

Programming: TransE on a Toy KG¶

In this exercise, we’ll use the PyG implementation of TransE on a toy KG, which is given in the following code snippet:

import torch
from torch_geometric.data import Data
from torch_geometric.nn.kge import TransE

# toy KG
entities = [
    "Alice", "Bob", "Charlie",
    "CompanyA", "CompanyB",
    "Berlin", "Munich"
]
relations = ["worksAt", "locatedIn"]
triples = [
    ("Alice",   "worksAt",   "CompanyA"),
    ("Bob",     "worksAt",   "CompanyA"),
    ("Charlie", "worksAt",   "CompanyB"),
    ("CompanyA","locatedIn", "Berlin"),
    ("CompanyB","locatedIn", "Munich"),
]

# helper mappings
ent2id = {e: i for i, e in enumerate(entities)}
rel2id = {r: i for i, r in enumerate(relations)}

head = torch.tensor([ent2id[h] for h, _, _ in triples], dtype=torch.long)
rel  = torch.tensor([rel2id[r] for _, r, _ in triples], dtype=torch.long)
tail = torch.tensor([ent2id[t] for _, _, t in triples], dtype=torch.long)

# store as PyG data
edge_index = torch.stack([head, tail], dim=0)
edge_type = rel
data = Data(num_nodes=len(entities), edge_index=edge_index, edge_type=edge_type)

In the following code snippet, we define the model using the TransE implementation in PyG. Complete the train function. Use all the triples given in the KG.

# model setup
model = TransE(
    num_nodes=data.num_nodes,
    num_relations=len(relations),
    hidden_channels=2,
    margin=1.0,
    p_norm=2.0, # use L2 distance
    sparse=False,
)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)

# training loop
def train(steps=2000, seed=42):
    torch.manual_seed(seed)
    model.train()

    # === YOUR CODE HERE ===

    # ======================

train()

Now, using the following inference functions, test the given cases:

Rank Company tails for (Alice, worksAt, ?)
Rank City tails for (CompanyA, locatedIn, ?)
Vector-space path query: q = Alice + worksAt + locatedIn, rank cities by distance
Heuristic inverse: q = CompanyA - worksAt, rank persons by distance

# inference helpers
@torch.no_grad()
def rank_tails(head_name: str, rel_name: str, candidate_tail_names=None, topk=5):
    """
    Ranks candidate tails for query (head, rel, ?) by TransE score.
    Higher score = more plausible.
    """
    model.eval()
    h = torch.tensor([ent2id[head_name]], dtype=torch.long)
    r = torch.tensor([rel2id[rel_name]], dtype=torch.long)

    if candidate_tail_names is None:
        cand_ids = torch.arange(len(entities), dtype=torch.long)
        cand_names = entities
    else:
        cand_ids = torch.tensor([ent2id[x] for x in candidate_tail_names], dtype=torch.long)
        cand_names = candidate_tail_names

    # broadcast h,r to match candidates
    h_rep = h.repeat(cand_ids.numel())
    r_rep = r.repeat(cand_ids.numel())

    scores = model(h_rep, r_rep, cand_ids)  # forward() returns score for triplets
    vals, pos = torch.topk(scores, k=min(topk, scores.numel()), largest=True)

    return [(cand_names[i], float(v)) for i, v in zip(pos.tolist(), vals.tolist())]

# for vector-space traversal (q = h + r1 + r2)
@torch.no_grad()
def get_entity_emb(name: str) -> torch.Tensor:
    return model.node_emb.weight[ent2id[name]]

@torch.no_grad()
def get_rel_emb(name: str) -> torch.Tensor:
    return model.rel_emb.weight[rel2id[name]]

@torch.no_grad()
def rank_by_query_vector(q_vec: torch.Tensor, candidate_names):
    cand = torch.stack([get_entity_emb(n) for n in candidate_names], dim=0)
    # TransE score is -||q - t||, so ranking by smallest distance is equivalent
    d = torch.norm(cand - q_vec.unsqueeze(0), p=2, dim=-1)
    order = torch.argsort(d, descending=False)
    return [(candidate_names[i], float(d[i])) for i in order.tolist()]