Distillation
Training a smaller, faster model to mimic the behavior of a larger, more capable model.
AI-generated
Knowledge distillation trains a small "student" model to reproduce the outputs of a large "teacher" model. The student learns not just the correct answers but the teacher's full probability distribution over possible answers, which contains richer information than simple correct/incorrect labels.
Distillation is how AI labs create smaller, cheaper models that are still capable. Many fast, affordable models are distilled from larger ones. This makes AI accessible for applications where cost and speed matter. For example, a distilled model can run on a phone while providing most of the quality of a server-side model.
Hinton et al., "Distilling the Knowledge in a Neural Network" - https://arxiv.org/abs/1503.02531
Hugging Face: Model distillation guide - https://huggingface.co/docs/transformers/model_doc/distilbert