Lesson 4

What AI Gets Wrong (And Why That Is OK)

AI-generated

Learning Objectives

Understand hallucination and why AI makes things up
Recognize common failure modes: math, counting, recent events, niche topics
Know why verification matters even when AI sounds confident
Develop healthy skepticism without paranoia
Learn the "trust but verify" mindset that makes AI useful

Introduction

Here is an uncomfortable truth: AI is wrong more often than it sounds wrong.

That confident, articulate response that cites specific details? It might be completely fabricated. AI can and does:

Make up facts
Invent citations
Miscalculate numbers
Confidently state things that are verifiably false

This is not a bug to be fixed in the next version. It is a fundamental characteristic of how current AI works. Understanding this makes you better at using AI effectively.

Hallucination: When AI Invents Facts

AI hallucination = a model generates content that is false but presented as fact.

Types of Hallucinations

Fabricated citations:

AI invents books that do not exist
Papers that were never published
Quotes nobody ever said
Citations look real (plausible names, dates, titles) but are generated, not retrieved

Invented facts:

Specific statistics that sound precise but are made up
"Studies show that 67% of users prefer..." might be pure fabrication
The oddly specific percentage makes it sound researched

False confidence:

AI does not say "I'm not sure about this"
States hallucinated content with the same tone as accurate information

Why does this happen? AI generates text by predicting probable next words. When it does not "know" something, it does not stop. It generates plausible-sounding text that fits the pattern of what an answer should look like. The model optimizes for sounding good, not being accurate.

Hallucination Rates: Getting Better, But Not Solved

Good news: Hallucination rates have dropped significantly.

4 major models now have sub-1% rates on standard benchmarks
Some models saw 60%+ drops in the past year

Bad news: Rates vary dramatically by task:

Task Type	Hallucination Rate
General factual	< 1% (top models)
Legal information	~6% (best models)
Legal (all models)	up to 18%
Specialized/technical	Higher risk

2025 researcher consensus: Aim for "calibrated uncertainty." Systems should know when they do not know and safely decline to answer. Until that is solved, your verification skills remain essential.

Math and Logic: Surprisingly Unreliable

You might expect a computer to be good at math. AI is not.

Counting problems:

Ask AI how many r's in "strawberry"
It frequently gets it wrong (answer: 3)
AI sees tokens, not individual letters

Arithmetic errors:

Multi-step calculations often contain mistakes
AI might get the approach right but flub the actual computation

Logic puzzles:

Step-by-step reasoning can go awry
Model shortcuts to answers instead of working through each step

The workaround: For math that matters, verify independently. Use AI to explain the approach, but do calculations yourself or with a calculator.

Recent Events: The Knowledge Cutoff Wall

AI models have a knowledge cutoff: a date after which they have no training data.

What happens when you ask about events after the cutoff:

Correct admission: "I do not have information about that"
Hallucinated answer that sounds plausible but is false
Outdated information presented as current

Some AI systems now have web search capabilities, which helps. But the core model's knowledge is still frozen.

Practical rule: For anything time-sensitive, treat AI responses as a starting point, not a final answer.

Confident Nonsense: Why AI Never Says "I Do Not Know"

Traditional Software	AI
Fails loudly	Fails quietly
Returns error when data missing	Returns confident prose
Easy to spot problems	Hard to spot problems

Why? AI was trained on human text, and humans rarely write "I don't know" in published content. The training data is full of confident assertions.

Develop your "AI smell": An intuition for when output might be unreliable.

Red flags that should trigger extra verification:

Specific claims with precise numbers
Obscure or niche topics
Recent events
High-stakes decisions

The Right Mindset: Trust but Verify

Why use AI at all given these limitations?

Because AI is still extraordinarily useful when you approach it correctly. The key is calibrated trust.

Trust AI MORE for:

Brainstorming and ideation
First drafts of writing
Explaining well-documented concepts
Code in common patterns
Summarizing content you can check
Tasks where creativity > precision

Trust AI LESS for:

Factual claims you cannot verify
Medical or legal advice
Recent events
Math and precise calculations
Obscure or specialized topics
Anything with significant consequences if wrong

The verification habit: Treat AI like a knowledgeable but occasionally confused colleague whose work you review before using.

Example Prompts to Try

Prompt 1: Knowledge Cutoff Test

Who won the Super Bowl in February 2027?

Model should recognize this is beyond its training data and say so rather than hallucinating.

Prompt 2: Letter Counting (Known Weakness)

How many letter r's are in the word "strawberry"? Count carefully.

Check if correct (answer: 3). Many models get this wrong, even when asked to count carefully.

Prompt 3: Hallucination Trap

Name three books by [insert an obscure author you actually know well].

Use an author you know so you can verify. If obscure enough, AI might hallucinate titles.

Hands-On Exercise

Goal: Build your "AI smell" by testing AI on topics you actually know.

Step 1: Think of a topic you know well: your profession, a hobby, your hometown.

Step 2: Ask AI a specific factual question about this topic.

Step 3: Before reading the response, predict: will this be accurate?

Step 4: Read carefully and fact-check at least two claims.

Step 5: Score the response:

How many claims were accurate?
How many were wrong or unverifiable?
Did the tone match actual accuracy?

Repeat with different domains to calibrate your expectations.

Key Takeaways

Hallucination is fundamental, not a bug. AI generates plausible-sounding content without checking accuracy.
Math and counting are weaknesses. Do not trust AI for precise calculations or character counting.
Knowledge cutoffs create blind spots. AI cannot know about recent events after its training date.
Confident tone means nothing. AI sounds certain even when completely wrong.
Trust but verify is the right approach. Use AI freely, but check output when accuracy matters.

Sources

Upwork: Debunking 11 Common AI Myths in 2026, https://www.upwork.com/resources/artificial-intelligence-myths
Beam AI: Artificial Intelligence: The Truth Behind the 5 Biggest Myths, https://beam.ai/agentic-insights/artificial-intelligence-the-truth-behind-the-5-biggest-myths
SS&C Blue Prism: Debunking AI Myths and Misconceptions, https://www.blueprism.com/resources/blog/ai-myths-misconceptions/
IBM: What Are Large Language Models, https://www.ibm.com/think/topics/large-language-models
Wikipedia: Large language model (section on limitations), https://en.wikipedia.org/wiki/Large_language_model