Warm-up: What is actually learned?¶
In KG embedding models like TransE:
What are the trainable parameters?
Suppose a new entity is added to the KG after training. Can TransE produce an embedding for it without retraining? Why or why not?
Let’s assume that you observe one triple . Could you heuristically assign an embedding to ? What are the limitations? Hint: Think of the translation .
Given a trained TransE model and a query , inference is performed by computing the query embedding . How can we get semantic understanding of what the embedding vector corresponds to?
TransE Mechanics¶
Consider the following 2D embeddings:
Compute the TransE score for and . Which triple is more plausible?
Assume that relation is symmetric (e.g., “siblingOf”), meaning: and are both true. Write down the TransE equations implied by symmetry. What’s the issue here?
Assume that relation is 1-to-N (e.g., “studentOf”): and are true with . Write the TransE equations for both triples. What geometric constraint does this impose on and ?
Suppose we increase the embedding dimension from to . Does this resolve the issue in (2) and (3)? Justify mathematically.
Path Queries¶
Consider a knowledge graph with the following entity types:
Person
Company
City
University
Country
and the following directed relations:
worksAt(Person → Company)
locatedIn(Company → City)
studiedAt(Person → University)
locatedIn(University → City)
basedIn(City → Country)
foundedBy(Company → Person)
For each of the following natural-language queries, write the formal path query.
: Which company does Alice work at?
: In which city is the company where Alice works located?
: In which country is the company where Alice works based?
: Which people work at companies located in Berlin?
: Which people studied at universities located in the same city as the company where Alice works?
: Which people studied at universities located in cities where companies founded by Bob are based?
: Which people both studied at universities in Berlin and work at companies based in Germany?
Which of the queries in (1) can TransE represent?
What if we simply use to represent inverse relations. Would that work?
Programming: TransE on a Toy KG¶
In this exercise, we’ll use the PyG implementation of TransE on a toy KG, which is given in the following code snippet:
import torch
from torch_geometric.data import Data
from torch_geometric.nn.kge import TransE
# toy KG
entities = [
"Alice", "Bob", "Charlie",
"CompanyA", "CompanyB",
"Berlin", "Munich"
]
relations = ["worksAt", "locatedIn"]
triples = [
("Alice", "worksAt", "CompanyA"),
("Bob", "worksAt", "CompanyA"),
("Charlie", "worksAt", "CompanyB"),
("CompanyA","locatedIn", "Berlin"),
("CompanyB","locatedIn", "Munich"),
]
# helper mappings
ent2id = {e: i for i, e in enumerate(entities)}
rel2id = {r: i for i, r in enumerate(relations)}
head = torch.tensor([ent2id[h] for h, _, _ in triples], dtype=torch.long)
rel = torch.tensor([rel2id[r] for _, r, _ in triples], dtype=torch.long)
tail = torch.tensor([ent2id[t] for _, _, t in triples], dtype=torch.long)
# store as PyG data
edge_index = torch.stack([head, tail], dim=0)
edge_type = rel
data = Data(num_nodes=len(entities), edge_index=edge_index, edge_type=edge_type)In the following code snippet, we define the model using the TransE implementation in PyG. Complete the train function. Use all the triples given in the KG.
# model setup
model = TransE(
num_nodes=data.num_nodes,
num_relations=len(relations),
hidden_channels=2,
margin=1.0,
p_norm=2.0, # use L2 distance
sparse=False,
)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
# training loop
def train(steps=2000, seed=42):
torch.manual_seed(seed)
model.train()
# === YOUR CODE HERE ===
# ======================
train()Now, using the following inference functions, test the given cases:
Rank Company tails for (Alice, worksAt, ?)
Rank City tails for (CompanyA, locatedIn, ?)
Vector-space path query: q = Alice + worksAt + locatedIn, rank cities by distance
Heuristic inverse: q = CompanyA - worksAt, rank persons by distance
# inference helpers
@torch.no_grad()
def rank_tails(head_name: str, rel_name: str, candidate_tail_names=None, topk=5):
"""
Ranks candidate tails for query (head, rel, ?) by TransE score.
Higher score = more plausible.
"""
model.eval()
h = torch.tensor([ent2id[head_name]], dtype=torch.long)
r = torch.tensor([rel2id[rel_name]], dtype=torch.long)
if candidate_tail_names is None:
cand_ids = torch.arange(len(entities), dtype=torch.long)
cand_names = entities
else:
cand_ids = torch.tensor([ent2id[x] for x in candidate_tail_names], dtype=torch.long)
cand_names = candidate_tail_names
# broadcast h,r to match candidates
h_rep = h.repeat(cand_ids.numel())
r_rep = r.repeat(cand_ids.numel())
scores = model(h_rep, r_rep, cand_ids) # forward() returns score for triplets
vals, pos = torch.topk(scores, k=min(topk, scores.numel()), largest=True)
return [(cand_names[i], float(v)) for i, v in zip(pos.tolist(), vals.tolist())]
# for vector-space traversal (q = h + r1 + r2)
@torch.no_grad()
def get_entity_emb(name: str) -> torch.Tensor:
return model.node_emb.weight[ent2id[name]]
@torch.no_grad()
def get_rel_emb(name: str) -> torch.Tensor:
return model.rel_emb.weight[rel2id[name]]
@torch.no_grad()
def rank_by_query_vector(q_vec: torch.Tensor, candidate_names):
cand = torch.stack([get_entity_emb(n) for n in candidate_names], dim=0)
# TransE score is -||q - t||, so ranking by smallest distance is equivalent
d = torch.norm(cand - q_vec.unsqueeze(0), p=2, dim=-1)
order = torch.argsort(d, descending=False)
return [(candidate_names[i], float(d[i])) for i in order.tolist()]