Lesson 2

How AI "Thinks" (Without the PhD)

AI-generated

Learning Objectives

Understand the concept of training data and what AI learns from it
Grasp how AI generates responses through prediction, not retrieval
Know why AI can sound confident while being completely wrong
Develop intuition for when AI output might be unreliable
Feel comfortable with a "good enough" understanding of the technical details

Introduction

In the last lesson, you learned that AI is pattern-matching software. But how does that actually work? How can a computer learn from text and then generate new text that sounds remarkably human?

This lesson explains the mechanics in plain language. You do not need a computer science degree. You just need some useful mental models that will help you understand why AI behaves the way it does.

Training: How AI Learns From Text

Imagine reading every book in a library, every article on the internet, every conversation ever posted online. Now imagine doing that not to memorize facts, but to notice patterns:

Which words tend to follow other words
How sentences are structured
What topics relate to what other topics

That is roughly what happens during AI training. The model processes enormous amounts of text (hundreds of billions of words) and adjusts its internal parameters to predict patterns. It does not memorize the text. Instead, it learns statistical relationships.

For example, the model learns:

"The capital of France is" is very likely followed by "Paris"
Formal emails start differently than text messages
Python code follows certain syntax rules

This training process takes months and costs millions of dollars. The result is a model with billions of parameters (think of them as adjustable dials) that encode patterns from the training data.

Generation: Predicting the Next Word

When you ask AI a question, here is what happens under the hood:

The model receives your text as input
It processes that input through its neural network (a mathematical structure with many layers)
It produces a probability distribution over possible next words
It selects a word and adds it to the response
It repeats steps 2-4 for every single word

The model might calculate: there is a 40% chance the next word is "the," a 15% chance it is "a," a 10% chance it is "Paris," and so on. It then picks one (usually favoring high-probability options).

This word-by-word generation is why AI is often called "autocomplete on steroids." Your phone's text prediction works the same way, just with a much smaller model.

The key insight: AI generates rather than retrieves. When Claude writes a paragraph about French history, it is not pulling that paragraph from a database. It is generating new text, word by word. This is why the same prompt can produce different responses each time.

Tokens: How AI Reads Text

AI does not read text the way you do. It breaks text into chunks called tokens before processing.

Quick token math:

1 token ≈ 0.75 words in English
"the" or "cat" = 1 token
"tokenization" = "token" + "ization" = 2 tokens
100,000 tokens ≈ 75,000 words

Why tokens matter to you:

Pricing and limits are measured in tokens. When a service says "100,000 token context window," that is roughly 75,000 words.
Tokenization explains AI quirks. Ask an AI to count the letters in "strawberry" and it often gets it wrong. The AI sees tokens, not individual letters. The word is processed as chunks, making character-level tasks surprisingly difficult.

Context Windows: AI's Working Memory

The context window is how much text AI can "remember" during a conversation. Think of it as working memory.

What counts toward the context window:

Everything you send
Everything AI responds with
Any documents you paste in
The entire conversation history

2026 context window sizes:

Standard models: ~128,000 tokens (about 100,000 words)
Premium models: up to 1 million+ tokens
Claude's largest: equivalent to several novels

When the context window fills up, older parts of the conversation get truncated, and AI loses access to that information. Larger windows = more coherent responses + fewer hallucinations.

Why AI Sounds So Confident

Here is something that trips up many users: AI always sounds confident, even when it is wrong.

Why? Two reasons:

No built-in certainty meter. The model does not check its work. It just outputs the most probable next word, using the same confident tone for facts and fabrications.
Trained on confident text. Humans rarely write "I don't know" in published content. The model learned to mimic that confident style.

Bottom line: A response full of specific details, technical terms, and authoritative phrasing might be completely fabricated. The style tells you nothing about accuracy.

The Knowledge Cutoff Problem

AI models have a "knowledge cutoff": a date after which they have no training data. For most current models, this is somewhere in late 2024 or early 2025.

Ask an AI about events after its cutoff and you might get:

A correct admission: "I do not have information about that"
A hallucinated answer that sounds plausible but is false
Outdated information presented as current

Practical rule: For anything time-sensitive, treat AI responses as a starting point, not a final answer. Verify against current sources.

Temperature and Creativity

You might notice AI sometimes gives creative, varied responses and other times sticks to predictable answers. This is controlled by temperature.

Temperature	Behavior	Best for
Low (0-0.3)	Predictable, consistent	Factual questions, code
Medium (0.5-0.7)	Balanced	General use
High (0.8-1.0+)	Creative, varied	Brainstorming, writing

Most chat interfaces handle this automatically. But understanding temperature helps you interpret why AI behaves differently in different contexts.

Putting It Together: Useful Mental Models

Mental Model 1: Pattern Interpolation Machine AI generates outputs by interpolating between patterns it learned. It excels when your request matches common patterns. It struggles with truly novel reasoning.

Mental Model 2: Confidence is Cosmetic The style of AI output tells you nothing about accuracy. Evaluate content based on verifiability, not tone.

Mental Model 3: Generation is Probabilistic Each response is somewhat random. If you need consistency, ask the same question multiple ways and compare.

Mental Model 4: Knowledge Has Boundaries The training cutoff and data composition create blind spots. AI knows more about well-represented topics and less about niche, recent, or underrepresented ones.

Example Prompts to Try

Prompt 1: Variability Test

Generate three different responses to "What should I have for dinner?" and make each one different in style.

Notice how AI can vary its approach significantly for the same basic question.

Prompt 2: Knowledge Cutoff Check

What is the most recent news event you know about? And what is your knowledge cutoff date?

A well-behaved AI should honestly report its knowledge cutoff and not fabricate recent events.

Prompt 3: Fresh Generation Proof

Write a haiku about clouds. Now write another one without looking at the first. Are they different?

The two haikus should differ because each is a new generation.

Hands-On Exercise

Goal: Test the probabilistic nature of AI generation.

Step 1: Open Claude or ChatGPT. Start a brand new conversation.

Step 2: Ask: "Explain photosynthesis in exactly two sentences."

Step 3: Write down the response.

Step 4: Start another new conversation (important: new conversation, not a follow-up) and ask the exact same question.

Step 5: Compare the two responses. They will likely be similar in content but different in wording. This demonstrates that AI generates responses fresh each time.

Now try a third conversation with higher stakes: "Name three books about climate change that were published in 2025." Check whether these books actually exist. This tests both the generation process and the potential for hallucination.

Key Takeaways

AI learns patterns from training data, not facts. It recognizes statistical relationships between words and concepts.
Generation is word-by-word prediction. AI outputs the most likely next word, then repeats, creating responses from scratch each time.
Confident tone does not equal accuracy. AI sounds certain even when completely wrong.
Knowledge cutoffs matter. AI cannot know about events after its training data ends.
Temperature controls creativity. Lower temperature gives more consistent responses; higher temperature gives more varied ones.

Sources

IBM: What Are Large Language Models, https://www.ibm.com/think/topics/large-language-models
AWS: What is a Large Language Model, https://aws.amazon.com/what-is/large-language-model/
Wikipedia: Large language model, https://en.wikipedia.org/wiki/Large_language_model
Elastic: Understanding Large Language Models, https://www.elastic.co/what-is/large-language-models
HatchWorks AI: Large Language Models Guide 2026, https://hatchworks.com/blog/gen-ai/large-language-models-guide/
The AI Insider: What are Large Language Models and How They Are Changing the World, https://theaiinsider.tech/2026/03/09/what-are-large-language-models-and-how-they-are-changing-the-world/