Attention Mechanism

The neural network component that lets models focus on relevant parts of their input when generating each output.

architecturetechnical

AI Confidence: 85%

AI-generated

What It Means

Attention allows each part of the model's output to "attend to" (focus on) specific parts of the input. When translating "the cat sat on the mat" to French, the attention mechanism helps the model focus on "cat" when generating "chat." In self-attention (used in Transformers), every word in the input attends to every other word.

Why It Matters

Attention is the core innovation that makes Transformers work. It solved the problem of long-range dependencies (understanding relationships between distant words) that plagued earlier architectures. The "Attention Is All You Need" paper title was literal: removing everything except attention produced a better model.

Sources & Further Reading

Jay Alammar: "The Illustrated Transformer" - https://jalammar.github.io/illustrated-transformer/

3Blue1Brown: "Attention in transformers" - https://www.youtube.com/watch?v=eMlx5fFNoYc