Inference
The process of using a trained AI model to generate outputs from new inputs.
AI-generated
Inference is when a trained model processes new input and produces output. When you send a message to Claude and get a response, that is inference. It is distinct from training (where the model learns) - inference is using what was already learned.
Inference cost and speed determine the practical usability of AI models. A model that is brilliant but takes 30 seconds to respond is less useful than one that is good and responds in 2 seconds. Inference costs are the dominant expense for companies running AI services, which is why efficient inference is a major area of innovation.
Hugging Face: Inference API - https://huggingface.co/docs/api-inference
NVIDIA: What is AI inference? - https://www.nvidia.com/en-us/glossary/ai-inference/