Token
The basic unit of text that AI models process - roughly 3/4 of a word in English.
AI-generated
Language models do not process text as characters or words. They break text into tokens using a tokenizer. In English, one token is roughly 3/4 of a word, so "artificial intelligence" is typically 3 tokens. Common words are often single tokens; rare words get split into multiple tokens.
Tokens determine cost and limits. API pricing is per-token. Context window size is measured in tokens. When a model has a "200K context window," that means it can process approximately 150,000 words at once. Understanding tokens helps you estimate costs and work within model limits.
OpenAI Tokenizer tool - https://platform.openai.com/tokenizer
Anthropic: Token counting - https://docs.anthropic.com/en/docs/build-with-claude/token-counting