Distillation

Training a smaller, faster model to mimic the behavior of a larger, more capable model.

trainingoptimization

AI Confidence: 85%

AI-generated

What It Means

Knowledge distillation trains a small "student" model to reproduce the outputs of a large "teacher" model. The student learns not just the correct answers but the teacher's full probability distribution over possible answers, which contains richer information than simple correct/incorrect labels.

Why It Matters

Distillation is how AI labs create smaller, cheaper models that are still capable. Many fast, affordable models are distilled from larger ones. This makes AI accessible for applications where cost and speed matter. For example, a distilled model can run on a phone while providing most of the quality of a server-side model.

Sources & Further Reading

Hinton et al., "Distilling the Knowledge in a Neural Network" - https://arxiv.org/abs/1503.02531

Hugging Face: Model distillation guide - https://huggingface.co/docs/transformers/model_doc/distilbert