AI-101

Token

The basic unit of text that AI models process - roughly 3/4 of a word in English.

core-conceptstechnical
AI Confidence: 85%

AI-generated

What It Means

Language models do not process text as characters or words. They break text into tokens using a tokenizer. In English, one token is roughly 3/4 of a word, so "artificial intelligence" is typically 3 tokens. Common words are often single tokens; rare words get split into multiple tokens.

Why It Matters

Tokens determine cost and limits. API pricing is per-token. Context window size is measured in tokens. When a model has a "200K context window," that means it can process approximately 150,000 words at once. Understanding tokens helps you estimate costs and work within model limits.

Sources & Further Reading

OpenAI Tokenizer tool - https://platform.openai.com/tokenizer

Anthropic: Token counting - https://docs.anthropic.com/en/docs/build-with-claude/token-counting