Inference

The process of using a trained AI model to generate outputs from new inputs.

technicalcore-concepts

AI Confidence: 85%

AI-generated

What It Means

Inference is when a trained model processes new input and produces output. When you send a message to Claude and get a response, that is inference. It is distinct from training (where the model learns) - inference is using what was already learned.

Why It Matters

Inference cost and speed determine the practical usability of AI models. A model that is brilliant but takes 30 seconds to respond is less useful than one that is good and responds in 2 seconds. Inference costs are the dominant expense for companies running AI services, which is why efficient inference is a major area of innovation.

Sources & Further Reading

Hugging Face: Inference API - https://huggingface.co/docs/api-inference

NVIDIA: What is AI inference? - https://www.nvidia.com/en-us/glossary/ai-inference/