Lesson 2
How AI "Thinks" (Without the PhD)
AI-generated
- Understand the concept of training data and what AI learns from it
- Grasp how AI generates responses through prediction, not retrieval
- Know why AI can sound confident while being completely wrong
- Develop intuition for when AI output might be unreliable
- Feel comfortable with a "good enough" understanding of the technical details
In the last lesson, you learned that AI is pattern-matching software. But how does that actually work? How can a computer learn from text and then generate new text that sounds remarkably human?
This lesson explains the mechanics in plain language. You do not need a computer science degree. You just need some useful mental models that will help you understand why AI behaves the way it does.
Imagine reading every book in a library, every article on the internet, every conversation ever posted online. Now imagine doing that not to memorize facts, but to notice patterns:
- Which words tend to follow other words
- How sentences are structured
- What topics relate to what other topics
That is roughly what happens during AI training. The model processes enormous amounts of text (hundreds of billions of words) and adjusts its internal parameters to predict patterns. It does not memorize the text. Instead, it learns statistical relationships.
For example, the model learns:
- "The capital of France is" is very likely followed by "Paris"
- Formal emails start differently than text messages
- Python code follows certain syntax rules
This training process takes months and costs millions of dollars. The result is a model with billions of parameters (think of them as adjustable dials) that encode patterns from the training data.
When you ask AI a question, here is what happens under the hood:
- The model receives your text as input
- It processes that input through its neural network (a mathematical structure with many layers)
- It produces a probability distribution over possible next words
- It selects a word and adds it to the response
- It repeats steps 2-4 for every single word
The model might calculate: there is a 40% chance the next word is "the," a 15% chance it is "a," a 10% chance it is "Paris," and so on. It then picks one (usually favoring high-probability options).
This word-by-word generation is why AI is often called "autocomplete on steroids." Your phone's text prediction works the same way, just with a much smaller model.
The key insight: AI generates rather than retrieves. When Claude writes a paragraph about French history, it is not pulling that paragraph from a database. It is generating new text, word by word. This is why the same prompt can produce different responses each time.
AI does not read text the way you do. It breaks text into chunks called tokens before processing.
Quick token math:
- 1 token ≈ 0.75 words in English
- "the" or "cat" = 1 token
- "tokenization" = "token" + "ization" = 2 tokens
- 100,000 tokens ≈ 75,000 words
Why tokens matter to you:
- Pricing and limits are measured in tokens. When a service says "100,000 token context window," that is roughly 75,000 words.
- Tokenization explains AI quirks. Ask an AI to count the letters in "strawberry" and it often gets it wrong. The AI sees tokens, not individual letters. The word is processed as chunks, making character-level tasks surprisingly difficult.
The context window is how much text AI can "remember" during a conversation. Think of it as working memory.
What counts toward the context window:
- Everything you send
- Everything AI responds with
- Any documents you paste in
- The entire conversation history
2026 context window sizes:
- Standard models: ~128,000 tokens (about 100,000 words)
- Premium models: up to 1 million+ tokens
- Claude's largest: equivalent to several novels
When the context window fills up, older parts of the conversation get truncated, and AI loses access to that information. Larger windows = more coherent responses + fewer hallucinations.
Here is something that trips up many users: AI always sounds confident, even when it is wrong.
Why? Two reasons:
- No built-in certainty meter. The model does not check its work. It just outputs the most probable next word, using the same confident tone for facts and fabrications.
- Trained on confident text. Humans rarely write "I don't know" in published content. The model learned to mimic that confident style.
Bottom line: A response full of specific details, technical terms, and authoritative phrasing might be completely fabricated. The style tells you nothing about accuracy.
AI models have a "knowledge cutoff": a date after which they have no training data. For most current models, this is somewhere in late 2024 or early 2025.
Ask an AI about events after its cutoff and you might get:
- A correct admission: "I do not have information about that"
- A hallucinated answer that sounds plausible but is false
- Outdated information presented as current
Practical rule: For anything time-sensitive, treat AI responses as a starting point, not a final answer. Verify against current sources.
You might notice AI sometimes gives creative, varied responses and other times sticks to predictable answers. This is controlled by temperature.
| Temperature | Behavior | Best for |
|---|---|---|
| Low (0-0.3) | Predictable, consistent | Factual questions, code |
| Medium (0.5-0.7) | Balanced | General use |
| High (0.8-1.0+) | Creative, varied | Brainstorming, writing |
Most chat interfaces handle this automatically. But understanding temperature helps you interpret why AI behaves differently in different contexts.
Mental Model 1: Pattern Interpolation Machine AI generates outputs by interpolating between patterns it learned. It excels when your request matches common patterns. It struggles with truly novel reasoning.
Mental Model 2: Confidence is Cosmetic The style of AI output tells you nothing about accuracy. Evaluate content based on verifiability, not tone.
Mental Model 3: Generation is Probabilistic Each response is somewhat random. If you need consistency, ask the same question multiple ways and compare.
Mental Model 4: Knowledge Has Boundaries The training cutoff and data composition create blind spots. AI knows more about well-represented topics and less about niche, recent, or underrepresented ones.
Prompt 1: Variability Test
Generate three different responses to "What should I have for dinner?" and make each one different in style.
Notice how AI can vary its approach significantly for the same basic question.
Prompt 2: Knowledge Cutoff Check
What is the most recent news event you know about? And what is your knowledge cutoff date?
A well-behaved AI should honestly report its knowledge cutoff and not fabricate recent events.
Prompt 3: Fresh Generation Proof
Write a haiku about clouds. Now write another one without looking at the first. Are they different?
The two haikus should differ because each is a new generation.
Goal: Test the probabilistic nature of AI generation.
Step 1: Open Claude or ChatGPT. Start a brand new conversation.
Step 2: Ask: "Explain photosynthesis in exactly two sentences."
Step 3: Write down the response.
Step 4: Start another new conversation (important: new conversation, not a follow-up) and ask the exact same question.
Step 5: Compare the two responses. They will likely be similar in content but different in wording. This demonstrates that AI generates responses fresh each time.
Now try a third conversation with higher stakes: "Name three books about climate change that were published in 2025." Check whether these books actually exist. This tests both the generation process and the potential for hallucination.
- AI learns patterns from training data, not facts. It recognizes statistical relationships between words and concepts.
- Generation is word-by-word prediction. AI outputs the most likely next word, then repeats, creating responses from scratch each time.
- Confident tone does not equal accuracy. AI sounds certain even when completely wrong.
- Knowledge cutoffs matter. AI cannot know about events after its training data ends.
- Temperature controls creativity. Lower temperature gives more consistent responses; higher temperature gives more varied ones.
- IBM: What Are Large Language Models, https://www.ibm.com/think/topics/large-language-models
- AWS: What is a Large Language Model, https://aws.amazon.com/what-is/large-language-model/
- Wikipedia: Large language model, https://en.wikipedia.org/wiki/Large_language_model
- Elastic: Understanding Large Language Models, https://www.elastic.co/what-is/large-language-models
- HatchWorks AI: Large Language Models Guide 2026, https://hatchworks.com/blog/gen-ai/large-language-models-guide/
- The AI Insider: What are Large Language Models and How They Are Changing the World, https://theaiinsider.tech/2026/03/09/what-are-large-language-models-and-how-they-are-changing-the-world/