name: debug description: Root cause debugging before fixes. Use when investigating bugs, diagnosing test failures, troubleshooting unexpected behavior, or when previous fix attempts failed. Enforces investigate-first discipline. allowed-tools: '*'
Systematic Debugger
Find root cause before fixing. Symptom fixes are failure.
Iron Law: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
When to Use
Answer IN ORDER. Stop at first match:
- Bug, error, or test failure? → Use this skill
- Unexpected behavior? → Use this skill
- Previous fix didn't work? → Use this skill (especially important)
- Performance problem? → Use this skill
- None of above? → Skip this skill
Use especially when:
- Under time pressure (emergencies make guessing tempting)
- "Quick fix" seems obvious (red flag)
- Already tried 1+ fixes that didn't work
The Four Phases
Complete each phase before proceeding.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
1. Read Error Messages Completely
Don't skip past errors. They often contain the exact solution.
- Full stack trace (note line numbers, file paths)
- Error codes and messages
- Warnings that preceded the error
When the error mentions a library/framework symbol (a function, class, or module from a dependency) — read the installed version's docs before forming a hypothesis. Signatures, defaults, and deprecation behavior change between versions; training memory may describe a different version than the one running. Check package.json / lockfile, then the source wired up (Context7, official docs, README at the pinned ref).
2. Reproduce Consistently
| Can reproduce? | Action |
|---|---|
| Yes, every time | Proceed to step 3 |
| Sometimes | Gather more data - when does it happen vs not? |
| Never | Cannot debug what you cannot reproduce - gather logs |
When repro hinges on user-only context (env, timeline, "what were you doing"), call /elicit — its multiple-choice format constrains the answer space better than open prompts.
3. Check Recent Changes
git diff HEAD~5 # Recent code changes
git log --oneline -10 # Recent commits
What changed that could cause this? Dependencies? Config? Environment?
4. Trace Data Flow (Root Cause Tracing)
When error is deep in call stack:
Symptom: Error at line 50 in utils.js
↑ Called by handler.js:120
↑ Called by router.js:45
↑ Called by app.js:10 ← ROOT CAUSE: bad input here
Technique:
- Find where error occurs (symptom)
- Ask: "What called this with bad data?"
- Trace up until you find the SOURCE
- Fix at source, not at symptom
5. Multi-Component Systems
When system has multiple layers (API → service → database):
# Log at EACH boundary before proposing fixes
echo "=== Layer 1 (API): request=$REQUEST ==="
echo "=== Layer 2 (Service): input=$INPUT ==="
echo "=== Layer 3 (DB): query=$QUERY ==="
Run once to find WHERE it breaks. Then investigate that layer.
Phase 2: Pattern Analysis
1. Find Working Examples
Locate similar working code in same codebase. What works that's similar?
2. Identify Differences
| Working code | Broken code | Could this matter? |
|---|---|---|
| Uses async/await | Uses callbacks | Yes - timing |
| Validates input | No validation | Yes - bad data |
List ALL differences. Don't assume "that can't matter."
Phase 3: Hypothesis Testing
1. Form Single Hypothesis
Write it down: "I think X is the root cause because Y"
Be specific:
- ❌ "Something's wrong with the database"
- ✅ "Connection pool exhausted because connections aren't released in error path"
2. Test Minimally
| Rule | Why |
|---|---|
| ONE change at a time | Isolate what works |
| Smallest possible change | Avoid side effects |
| Don't bundle fixes | Can't tell what helped |
3. Evaluate Result
| Result | Action |
|---|---|
| Fixed | Phase 4 (verify) |
| Not fixed | NEW hypothesis (return to 3.1) |
| Partially fixed | Found one issue, continue investigating |
Root Cause Checkpoint (REQUIRED)
Before proceeding to Phase 4:
- Document root cause in ticket (under "## Root Cause" section):
- What is the actual cause (not symptom)?
- Why did this happen?
- How was it confirmed?
- If using tickets: Update frontmatter to
subtype: bug-investigated
Example:
## Root Cause
Connection pool exhausted because connections aren't released in error path.
Confirmed by adding pool logging - saw connections increment without decrement on errors.
Phase 4: Implementation
1. Create Failing Test
Before fixing, write test that fails due to the bug:
it('handles empty input without crashing', () => {
// This test should FAIL before fix, PASS after
expect(() => processData('')).not.toThrow();
});
2. Implement Fix
- Address ROOT CAUSE identified in Phase 1
- ONE change
- No "while I'm here" improvements
3. Verify
- New test passes
- Existing tests still pass
- Issue actually resolved (not just test passing)
End the fix report with Next: on its own line — imperative, name what's now (commit message to use, ticket to mark done, follow-up issue to open for related debt the investigation surfaced). A debug session that ends without naming the next move is incomplete.
Examples:
- Next: commit with
fix(auth): release pool connection in error path, mark ticket 234 done. - Next: commit the fix, then open a ticket for the unrelated logging gap surfaced at
src/auth.ts:88.
4. If Fix Doesn't Work
| Fix attempts | Action |
|---|---|
| 1-2 | Return to Phase 1 with new information |
| 3+ | STOP - Question architecture (see below) |
5. After 3+ Failed Fixes: Question Architecture
Pattern indicating architectural problem:
- Each fix reveals new coupling/shared state
- Fixes require "massive refactoring"
- Each fix creates new symptoms elsewhere
STOP and ask:
- Is this pattern fundamentally sound?
- Should we refactor vs. continue patching?
- Discuss with user before more fix attempts
- If user confirms the pattern is fundamentally unsound, call
/figure-it-outto evaluate approaches (refactor / redesign / extract) before implementing anything.
Red Flags - STOP Immediately
If you catch yourself thinking:
| Thought | Reality |
|---|---|
| "Quick fix for now, investigate later" | Investigate NOW or you never will |
| "Just try changing X" | That's guessing, not debugging |
| "I'll add multiple fixes and test" | Can't isolate what worked |
| "I don't fully understand but this might work" | You need to understand first |
| "One more fix attempt" (after 2+ failures) | 3+ failures = wrong approach |
ALL mean: STOP. Return to Phase 1.
Quick Reference
| Phase | Key Question | Success Criteria |
|---|---|---|
| 1. Root Cause | "WHY is this happening?" | Understand cause, not just symptom |
| 2. Pattern | "What's different from working code?" | Identified key differences |
| 3. Hypothesis | "Is my theory correct?" | Confirmed or formed new theory |
| 4. Implementation | "Does the fix work?" | Test passes, issue resolved |
Voice: plainspoken and concise — write to be scanned.