Lesson 22
Agent Mode Development
AI-generated
- Understand the agent loop (observe-think-act)
- Know when to use agent mode vs. chat or completion
- Maintain appropriate oversight during agent work
- Give effective high-level task descriptions
- Recover from agent mistakes
This lesson covers agent mode in AI coding assistants. Agent mode is where AI takes autonomous action: reading files, writing code, running commands, and iterating. This is developer-specific content; non-developers should skip to Unit 6.
Agent mode is powerful and risky. This lesson teaches you to use it effectively while maintaining control.
Understanding the agent loop helps you work with it rather than against it.
The Basic Cycle
- Observe: Agent reads relevant files, error messages, terminal output
- Think: Agent reasons about what to do next
- Act: Agent writes code, runs commands, or makes changes
- Repeat: Agent observes the results and continues
This loop continues until the task is complete or the agent gets stuck.
What Agents Can Do
Modern AI coding agents like Claude Code can:
- Read and understand your entire codebase
- Create new files and directories
- Edit existing files with surgical precision
- Run terminal commands (tests, linters, builds)
- Iterate based on error messages
- Search and find relevant code
- Execute multi-step plans
What Agents Cannot Do (Well)
- Understand requirements you did not express
- Know your preferences without instruction files
- Access external services they are not configured for
- Make judgment calls about business requirements
- Guarantee correctness of complex logic
The Trust Gradient
| Task Type | Agent Reliability | Oversight Needed |
|---|---|---|
| Scaffolding/boilerplate | High | Light review |
| Tests for existing code | High | Review edge cases |
| Simple refactoring | Medium-High | Review changes |
| New feature implementation | Medium | Detailed review |
| Security-sensitive code | Lower | Careful review |
| Complex business logic | Lower | Verify requirements |
Agent mode is not always the right choice. Match the mode to the task.
When Agent Mode Excels
- Multi-file changes: Refactoring that touches many files
- Repetitive transformations: Apply same pattern across codebase
- Scaffolding: Generate boilerplate structure
- Test generation: Write tests for existing code
- Bug hunting: "Find and fix the bug where X happens"
- Documentation: Generate docs from code
When to Prefer Chat or Completion
- Learning: When you want to understand, not just get code
- Exploration: When you are unsure what you need
- Small changes: When editing is faster than explaining
- High-stakes logic: When you need to think through each line
- Security-critical: When AI errors could create vulnerabilities
The Complexity Threshold
A good rule: if explaining the task in natural language is faster than doing it yourself, use agent mode. If explaining would take longer, just write the code.
How you describe tasks determines agent success. Be clear and structured.
The Good Task Description Template
- Goal: What should be different when done?
- Scope: What should change and what should not?
- Constraints: Any requirements or limitations?
- Verification: How will we know it worked?
Examples of Good vs. Bad Descriptions
Bad: "Fix the login bug"
Good: "Users report they cannot log in after password reset. The bug is likely in the password reset flow or session handling. Find where the session is not being properly created after password reset and fix it. Verify by running the auth test suite."
Bad: "Add dark mode"
Good: "Add a dark mode toggle to the settings page. Use CSS custom properties (we already have --color-bg-primary etc. defined in globals.css). The toggle should persist to localStorage. Update all components in src/components/ that have hardcoded colors. Do not change the color values themselves; just ensure they use the CSS variables."
Specifying What NOT to Change
Often as important as what to do:
"Refactor the user service to use the repository pattern. Create interfaces first. Do not change any existing tests; they should still pass. Do not modify the controller layer; keep the existing API signatures."
Requesting Plan Review
For complex tasks, have the agent plan before executing:
"I want to add WebSocket support for real-time notifications. Before writing any code, show me your plan: what files you'll create, what you'll modify, and the rough implementation approach. I'll approve before you proceed."
The goal is appropriate oversight: enough control to catch problems, not so much that you lose agent benefits.
Oversight Strategies
For low-risk tasks: Let the agent run, review diff at the end
For medium-risk tasks: Request plan approval, then let it execute
For high-risk tasks: Step through in stages, approving each phase
Reading Agent Diffs Effectively
When reviewing agent changes:
- Start with the file list. Any unexpected files?
- Check for deletions or large modifications
- Review new code for obvious issues
- Run tests before committing
Red Flags to Watch For
- Unexpected dependency additions
- Changes to files outside stated scope
- Removed error handling or validation
- Hardcoded values that should be configurable
- Test modifications (unless explicitly requested)
- Security-sensitive code (auth, crypto, input handling)
Correcting Agent Mistakes
When the agent goes wrong:
Minor issues: "Good progress, but fix these issues: [specific list]. Do not rewrite everything; just address these points."
Wrong direction: "Let's stop. This approach won't work because [reason]. Instead, try [alternative approach]. Start fresh on this task."
Recovery: "The last change broke tests. Read the error output and fix the issue. Do not change anything unrelated to the test failure."
- Understand the loop: Observe-think-act-repeat is how agents work
- Match mode to task: Agent mode for multi-file, repetitive, or investigative work
- Write clear descriptions: Goal, scope, constraints, verification
- Specify what not to do: Prevents scope creep and unintended changes
- Appropriate oversight: Scale review to risk level
Try agent mode with this exercise:
- Pick a real task in a codebase:
- Easy: Add a new component with tests - Medium: Refactor to extract common logic - Hard: Find and fix a bug you know exists
- Write a task description with:
- Clear goal - Explicit scope - What not to change - How to verify
- Let the agent plan. Review the plan before approving.
- After execution, review all changes. Note:
- What did the agent get right? - What needed correction? - How would you write the task differently next time?
- Claude Code agent documentation: https://docs.anthropic.com/en/docs/claude-code
- Research on AI agent oversight: https://arxiv.org/abs/2309.07870
- Human-AI collaboration patterns: https://dl.acm.org/doi/10.1145/3544548.3580969