Transformer
The neural network architecture behind all modern language models, based on self-attention mechanisms.
AI-generated
The Transformer is a neural network architecture introduced in the 2017 paper "Attention Is All You Need." It processes input sequences using self-attention, which lets every element in the sequence attend to every other element simultaneously. This replaced older recurrent (RNN) and convolutional (CNN) approaches.
Every major AI model today - GPT, Claude, Gemini, LLaMA, Stable Diffusion, Whisper - is built on the Transformer architecture. Understanding Transformers at a high level helps you understand why modern AI works the way it does and why scale (more parameters, more data) keeps improving performance.
Vaswani et al., "Attention Is All You Need" - https://arxiv.org/abs/1706.03762
Jay Alammar: "The Illustrated Transformer" - https://jalammar.github.io/illustrated-transformer/