Attention Mechanism
The neural network component that lets models focus on relevant parts of their input when generating each output.
AI-generated
Attention allows each part of the model's output to "attend to" (focus on) specific parts of the input. When translating "the cat sat on the mat" to French, the attention mechanism helps the model focus on "cat" when generating "chat." In self-attention (used in Transformers), every word in the input attends to every other word.
Attention is the core innovation that makes Transformers work. It solved the problem of long-range dependencies (understanding relationships between distant words) that plagued earlier architectures. The "Attention Is All You Need" paper title was literal: removing everything except attention produced a better model.
Jay Alammar: "The Illustrated Transformer" - https://jalammar.github.io/illustrated-transformer/
3Blue1Brown: "Attention in transformers" - https://www.youtube.com/watch?v=eMlx5fFNoYc