AI-101

Lesson 4

What AI Gets Wrong (And Why That Is OK)

AI-generated

Learning Objectives
  • Understand hallucination and why AI makes things up
  • Recognize common failure modes: math, counting, recent events, niche topics
  • Know why verification matters even when AI sounds confident
  • Develop healthy skepticism without paranoia
  • Learn the "trust but verify" mindset that makes AI useful
Introduction

Here is an uncomfortable truth: AI is wrong more often than it sounds wrong.

That confident, articulate response that cites specific details? It might be completely fabricated. AI can and does:

  • Make up facts
  • Invent citations
  • Miscalculate numbers
  • Confidently state things that are verifiably false

This is not a bug to be fixed in the next version. It is a fundamental characteristic of how current AI works. Understanding this makes you better at using AI effectively.

Hallucination: When AI Invents Facts

AI hallucination = a model generates content that is false but presented as fact.

Types of Hallucinations

Fabricated citations:

  • AI invents books that do not exist
  • Papers that were never published
  • Quotes nobody ever said
  • Citations look real (plausible names, dates, titles) but are generated, not retrieved

Invented facts:

  • Specific statistics that sound precise but are made up
  • "Studies show that 67% of users prefer..." might be pure fabrication
  • The oddly specific percentage makes it sound researched

False confidence:

  • AI does not say "I'm not sure about this"
  • States hallucinated content with the same tone as accurate information

Why does this happen? AI generates text by predicting probable next words. When it does not "know" something, it does not stop. It generates plausible-sounding text that fits the pattern of what an answer should look like. The model optimizes for sounding good, not being accurate.

Hallucination Rates: Getting Better, But Not Solved

Good news: Hallucination rates have dropped significantly.

  • 4 major models now have sub-1% rates on standard benchmarks
  • Some models saw 60%+ drops in the past year

Bad news: Rates vary dramatically by task:

Task TypeHallucination Rate
General factual< 1% (top models)
Legal information~6% (best models)
Legal (all models)up to 18%
Specialized/technicalHigher risk

2025 researcher consensus: Aim for "calibrated uncertainty." Systems should know when they do not know and safely decline to answer. Until that is solved, your verification skills remain essential.

Math and Logic: Surprisingly Unreliable

You might expect a computer to be good at math. AI is not.

Counting problems:

  • Ask AI how many r's in "strawberry"
  • It frequently gets it wrong (answer: 3)
  • AI sees tokens, not individual letters

Arithmetic errors:

  • Multi-step calculations often contain mistakes
  • AI might get the approach right but flub the actual computation

Logic puzzles:

  • Step-by-step reasoning can go awry
  • Model shortcuts to answers instead of working through each step

The workaround: For math that matters, verify independently. Use AI to explain the approach, but do calculations yourself or with a calculator.

Recent Events: The Knowledge Cutoff Wall

AI models have a knowledge cutoff: a date after which they have no training data.

What happens when you ask about events after the cutoff:

  1. Correct admission: "I do not have information about that"
  2. Hallucinated answer that sounds plausible but is false
  3. Outdated information presented as current

Some AI systems now have web search capabilities, which helps. But the core model's knowledge is still frozen.

Practical rule: For anything time-sensitive, treat AI responses as a starting point, not a final answer.

Confident Nonsense: Why AI Never Says "I Do Not Know"
Traditional SoftwareAI
Fails loudlyFails quietly
Returns error when data missingReturns confident prose
Easy to spot problemsHard to spot problems

Why? AI was trained on human text, and humans rarely write "I don't know" in published content. The training data is full of confident assertions.

Develop your "AI smell": An intuition for when output might be unreliable.

Red flags that should trigger extra verification:

  • Specific claims with precise numbers
  • Obscure or niche topics
  • Recent events
  • High-stakes decisions
The Right Mindset: Trust but Verify

Why use AI at all given these limitations?

Because AI is still extraordinarily useful when you approach it correctly. The key is calibrated trust.

Trust AI MORE for:

  • Brainstorming and ideation
  • First drafts of writing
  • Explaining well-documented concepts
  • Code in common patterns
  • Summarizing content you can check
  • Tasks where creativity > precision

Trust AI LESS for:

  • Factual claims you cannot verify
  • Medical or legal advice
  • Recent events
  • Math and precise calculations
  • Obscure or specialized topics
  • Anything with significant consequences if wrong

The verification habit: Treat AI like a knowledgeable but occasionally confused colleague whose work you review before using.

Example Prompts to Try

Prompt 1: Knowledge Cutoff Test

Who won the Super Bowl in February 2027?

Model should recognize this is beyond its training data and say so rather than hallucinating.

Prompt 2: Letter Counting (Known Weakness)

How many letter r's are in the word "strawberry"? Count carefully.

Check if correct (answer: 3). Many models get this wrong, even when asked to count carefully.

Prompt 3: Hallucination Trap

Name three books by [insert an obscure author you actually know well].

Use an author you know so you can verify. If obscure enough, AI might hallucinate titles.

Hands-On Exercise

Goal: Build your "AI smell" by testing AI on topics you actually know.

Step 1: Think of a topic you know well: your profession, a hobby, your hometown.

Step 2: Ask AI a specific factual question about this topic.

Step 3: Before reading the response, predict: will this be accurate?

Step 4: Read carefully and fact-check at least two claims.

Step 5: Score the response:

  • How many claims were accurate?
  • How many were wrong or unverifiable?
  • Did the tone match actual accuracy?

Repeat with different domains to calibrate your expectations.

Key Takeaways
  • Hallucination is fundamental, not a bug. AI generates plausible-sounding content without checking accuracy.
  • Math and counting are weaknesses. Do not trust AI for precise calculations or character counting.
  • Knowledge cutoffs create blind spots. AI cannot know about recent events after its training date.
  • Confident tone means nothing. AI sounds certain even when completely wrong.
  • Trust but verify is the right approach. Use AI freely, but check output when accuracy matters.
Sources