AI-101

Paper #1

Attention Is All You Need (2017)

AI Confidence: 80%

AI-generated

TL;DR

This paper introduced the Transformer architecture, which replaced recurrence and convolution with self-attention mechanisms. It is the foundation of every modern large language model including GPT, Claude, Gemini, and LLaMA.

What It Does

The authors proposed a new neural network architecture called the Transformer that processes entire sequences in parallel using a mechanism called self-attention. Instead of reading text word by word (like RNNs) or looking at local windows (like CNNs), the Transformer lets every word attend to every other word simultaneously.

The key innovation is multi-head attention, where the model learns multiple different attention patterns in parallel, allowing it to capture different types of relationships (syntactic, semantic, positional) at the same time.

Why It Matters

Before this paper, sequence processing was inherently sequential, which made training slow and limited the ability to capture long-range dependencies. The Transformer solved both problems at once: it trains much faster (because of parallelization) and handles long-range dependencies better (because any two positions are directly connected via attention).

Every major AI system today is built on Transformers. GPT-4, Claude, Gemini, LLaMA, Mistral, DALL-E, Stable Diffusion, Whisper, and hundreds more all descend from this architecture. It is arguably the most impactful machine learning paper of the decade.

Key Details

Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (Google Brain and Google Research).

Original task: Machine translation (English to German and English to French).

Link to paper: https://arxiv.org/abs/1706.03762

Sources & Further Reading

Full paper: https://arxiv.org/abs/1706.03762

The Illustrated Transformer (Jay Alammar) - https://jalammar.github.io/illustrated-transformer/

Yannic Kilcher video walkthrough - https://www.youtube.com/watch?v=iDulhoQ2pro

3Blue1Brown: "Attention in transformers" - https://www.youtube.com/watch?v=eMlx5fFNoYc