eXplAInable Transformers

Master Thesis

Since complex Deep Learning models are hard to interpret and regarded as a black box even by domain experts, multiple post-hoc interpretation methods have been proposed. For vision-based models, those methods are often generally referred to as Saliency Maps, which compute a heatmap of importance over the pixel space. Transformer-based models utilize the attention mechanism, where the model learns the relevance of the input features during training. This project aims to compare attribution maps of post-hoc interpretation maps with attention maps from transformer-based models and investigate their theoretical relationship.