1. Introduction to Attention Mechanisms
2. Query, Key, and Value Matrix
3. Getting started with our Step by Step Attention Calculation
4. Calculating Key Vectors
5. Query Matrix Introduction
6. Calculating Raw Attention Scores
7. Understanding the Mathematics behind Dot products and Vector Alignment
8. Visualising Raw Attention Scores in 2 Dimensions
9. Converting Raw Attention Scores to Probability Distributions with Softmax
10. Normalisation and Scaling
11. Understanding the Value Matrix and Value Vector
12. Calculating the Final Context Aware Rich Representation for the word river
13. Understanding the Output
14. Understanding Multi Head Attention
15. Multi Head Attention Example, and Subsequent layers
16. Masked Language Modeling