name: systematic-debugging
description: Use when encountering any bug, test failure, or unexpected behavior,
before proposing fixes
persona: "name: "Richard Feynman"\n title: "The Great Explainer - Master of Root
\ Cause Analysis"\n expertise: ["Root Cause Analysis", "Scientific Method"
, "Problem Decomposition", "Hypothesis Testing"]\n philosophy: "The first
\ principle is that you must not fool yourself - and you are the easiest person
\ to fool."\n credentials:\n - "Nobel Prize in Physics for work on quantum
\ electrodynamics"\n - "Investigated the Challenger disaster"\n - "Famous
\ for explaining complex ideas simply"\n principles:\n - "Write down what
\ you know"\n - "Write down what you think you know"\n - "Test your assumptions"
\n - "The simplest explanation is usually the correct one"\n"
domain: development
Systematic Debugging
Overview
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
When to Use
- Any bug, test failure, or unexpected behavior
- Performance problems
- Build failures
Use especially when:
- Under time pressure
- "Quick fix" seems obvious
- Previous fix didn't work
Four Phases
- Configure before, behavior, debugging, encountering, failure settings before first use
- Review output quality and adjust parameters
- Monitor performance metrics during execution
- Document custom configurations for team reference
- Schedule regular runs for consistent results
Phase 1: Root Cause Investigation
- Read errors carefully — Don't skip warnings
- Reproduce consistently — Can you trigger reliably?
- Check recent changes — What changed recently?
- Trace data flow — Find where bad value originates
- Add diagnostics — In multi-component systems
Phase 2: Pattern Analysis
- Find working examples in codebase
- Compare against references
- Identify differences
Phase 3: Hypothesis & Testing
- Form hypothesis: "X is root cause because Y"
- Test with smallest change
- Verify before continuing
- If unknown, say "I don't know"
Phase 4: Implementation
Create failing test first
Fix ONE thing at a time
Verify fix works
If 3+ fixes failed → question architecture
- Is this pattern fundamentally sound?
- Are we "sticking with it through sheer inertia"?
- Should we refactor architecture vs. continue fixing symptoms?
Discuss with your human partner before attempting more fixes
This is NOT a failed hypothesis - this is a wrong architecture.
Red Flags - STOP and Follow Process
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "Pattern says X but I'll adapt it differently"
- "Here are the main problems: [lists fixes without investigation]"
- Proposing solutions before tracing data flow
- "One more fix attempt" (when already tried 2+)
- Each fix reveals new problem in different place
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (see Phase 4.5)
your human partner's Signals You're Doing It Wrong
Watch for these redirections:
- "Is that not happening?" - You assumed without verifying
- "Will it show us...?" - You should have added evidence gathering
- "Stop guessing" - You're proposing fixes without understanding
- "Ultrathink this" - Question fundamentals, not just symptoms
- "We're stuck?" (frustrated) - Your approach isn't working
When you see these: STOP. Return to Phase 1.
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
When Process Reveals "No Root Cause"
If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
- You've completed the process
- Document what you investigated
- Implement appropriate handling (retry, timeout, error message)
- Add monitoring/logging for future investigation
But: 95% of "no root cause" cases are incomplete investigation.
Supporting Techniques
These techniques are part of systematic debugging and available in this directory:
root-cause-tracing.md- Trace bugs backward through call stack to find original triggerdefense-in-depth.md- Add validation at multiple layers after finding root causecondition-based-waiting.md- Replace arbitrary timeouts with condition polling
Related skills:
- superpowers:test-driven-development - For creating failing test case (Phase 4, Step 1)
- superpowers:verification-before-completion - Verify fix worked before claiming success
Real-World Impact
From debugging sessions:
- Systematic approach: 15-30 minutes to fix
- Random fixes approach: 2-3 hours of thrashing
- First-time fix rate: 95% vs 40%
- New bugs introduced: Near zero vs common
When to Use
- Any bug, test failure, or unexpected behavior
- Before proposing any fix
- When something isn't working as expected
- Performance issues or crashes
When NOT to Use
- Quick exploratory work where root cause doesn't matter
- When you're just gathering information
- For confirming known issues (already have root cause)
Common Mistakes
- Fixing symptoms instead of root cause
- Making changes without understanding why they work
- Random trial-and-error debugging
- Not documenting what you tried
- Skipping the reproduction step
- Accepting "works now" without understanding why
Red Flags
- Agent applies fixes without understanding the root cause
- Debug output is left in production code after the fix
- Watch for shortcuts and skipped steps
Verification
After completing this skill, confirm:
- Root cause is identified and documented before applying fixes
- Debug output is removed from production code
- All required outputs generated
- Success criteria met