In the previous post, we studied multi-headed attention mechanisms. In this post, we will look into the details of Graph Attention Networks. Due to their flexibility and versatility that easily allow us to capture complex relationships between data points, Graphs make very useful data structures. In fact, a number of important real-life datasets are composed of graphical structures: Social Networks, Brain Connectomes, Image Scene Graphs etc. Being able to find the underlying patterns in this kind of data is therefore an important direction for machine learning research.
In the previous post, we briefly looked into soft and hard attention mechanisms. We discussed why Soft-attention is much more popular than Hard-attention in machine learning. In this post, we look into the details of multi-head attention. We will also study the reasons and effects of using multiple heads in an attention mechanism.
In the previous part of this blog series, we looked into the details of self and cross attention mechanisms. We discussed how these attention mechanisms are utilised in various settings. In the examples that we discussed earlier, we have used weighted average to calculate the target vector from a sequence of feature vectors (values) and their corresponding relevance scores. But, is that the only way to calculate the target vector? If there is another method then is it better? We try to find out the answers to these questions in this blog.
In the previous part of this series, we looked at an intuitive explanation of attention mechanisms and how to determine relevance between two feature vectors. Next, we classify attention mechanisms and study the types in detail.
Attention based mechanisms have become quite popular in the field of machine learning. From 3D-Pose Estimation to question answering attention mechanisms have been found quite useful. Let’s dive right into what is attention and how has it become such a popular concept in machine learning.