systematic-debugging - SKILL.md Agent Skill

name: systematic-debugging description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes

Skill: systematic-debugging

When

Any bug, test failure, or unexpected behavior — before proposing fixes.

Follows ARCS CLI Primer: arcs --commands --json for discovery, --json --lean on all calls.

Flow

flowchart TD
    classDef decision fill:#f59e0b,color:#fff
    classDef stop fill:#ef4444,color:#fff

    Bug[Bug observed] --> ARCS[Check ARCS knowledge]
    ARCS --> Found{Match found?}
    Found -->|Yes| Verify[Verify it applies]
    Found -->|No| Observe

    Verify -->|Applies| Fix
    Verify -->|Doesn't apply| Observe

    Observe[Phase 1: Observe] --> Repro{Reproducible?}
    Repro -->|No| Instrument[Add logging/tracing]
    Instrument --> Observe
    Repro -->|Yes| Hypothesize[Phase 2: Hypothesize]

    Hypothesize --> Compare[Find working example, list differences]
    Compare --> Theory[Form single specific hypothesis]

    Theory --> Isolate[Phase 3: Isolate]
    Isolate --> Test{Smallest change confirms?}
    Test -->|Yes| Fix[Phase 4: Fix]
    Test -->|No| FailCount{3+ failures?}
    FailCount -->|No| Theory
    FailCount -->|Yes| Arch[Question architecture]

    Fix --> WriteFail[Write failing test]
    WriteFail --> Implement[Single targeted fix]
    Implement --> Green{Scoped tests pass?}
    Green -->|Yes| Capture[Capture as ARCS knowledge]
    Green -->|No| FailCount

    class Found,Repro,Test,FailCount,Green decision
    class Arch stop

Phase 1: Observe (Root Cause Investigation)

Read the actual error message completely
Reproduce consistently before proceeding
Check recent changes (git log, git diff)
Trace data flow backward from failure point
Instrument component boundaries if cause unclear
Pre-step: arcs knowledge search <slug> "<error>" --json for gotcha/lesson/pattern entries

Phase 2: Hypothesize (Pattern Analysis)

Find a working example in the same codebase
Compare working vs broken — list every difference
Understand the dependency chain
Form ONE specific hypothesis (not multiple)

Phase 3: Isolate

Test with the smallest possible change
One variable at a time — never stack fixes
If hypothesis fails, form a new one from evidence
Escalation: 3+ failed fixes → question the architecture, not the symptom

Phase 4: Fix

Write a failing test FIRST (proves the bug exists)
Implement a single targeted fix
Verify the scoped tests for the files you changed pass (your dispatch VERIFY command — never the full suite; the devil-advocate completion gate owns that)
If your fix introduces new failures in YOUR scoped tests, revert and return to Phase 2. Failures in files outside your scope are report-only (BLOCKED_BY) — likely a sibling agent's in-flight work; never fix or revert it

Log Triage Protocol

Scan order: failure point → errors → warnings → timing anomalies

rg -n "ERROR|FATAL|panic|exception" <logfile>   # Error grep
jq 'select(.level == "error")' <json-log>       # Structured logs

Output: Timeline of events leading to failure (T-5m, T-3m, T-0).

Git Bisect (Regressions)

git bisect start
git bisect bad HEAD
git bisect good <last-known-good>
git bisect run <test-command>

After finding the commit: read the diff, isolate specific lines, feed into Phase 2.

Dependency Conflict Diagnosis

Symptom	Likely Cause
`instanceof` fails across modules	Duplicate package copies
Type mismatch on same interface	Different versions loaded
"Cannot find module" intermittent	Hoisting conflict
Works with `--legacy-peer-deps`	Peer dep unsatisfied

Diagnose: npm ls <pkg>, npm explain <pkg>, check for multiple copies.

ARCS Knowledge Capture

After root cause identified, persist as knowledge:

gotcha — environmental/config traps
lesson — architectural insights from this session
pattern — reusable solution to recurring problem

Include: root cause summary, evidence, affected files, fix approach.

Capture Resolution as Knowledge

After resolving the issue, persist the learning:

# For a surprising behavior or trap
arcs knowledge create <slug> "Redis connection pool exhaustion under load" \
  --kind=gotcha \
  --summary="Pool size defaults to 10; under concurrent requests >50, connections time out silently" \
  --body="Root cause: default pool size. Fix: set poolSize to max(50, expectedConcurrency). Symptoms: intermittent 503s with no error logs." \
  --json

# For a reusable debugging technique or resolution pattern
arcs knowledge create <slug> "Diagnosing silent connection failures" \
  --kind=lesson \
  --summary="Enable connection-level event logging before load testing" \
  --body="Attach listeners to pool 'error' and 'timeout' events. Default Node.js behavior swallows these." \
  --json

# For a pattern that should be followed going forward
arcs knowledge create <slug> "Connection pool sizing formula" \
  --kind=pattern \
  --summary="Pool size = max(50, 2x expected peak concurrency)" \
  --body="Applies to Redis, Postgres, and HTTP agent pools. Validated under load test 2026-05-26." \
  --json

Kind selection guide:

gotcha — surprising behavior, trap, or non-obvious failure mode
lesson — learned technique, debugging approach, resolution method
pattern — reusable solution that should be applied going forward

Constraints

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION. If Phase 1 incomplete, you cannot propose fixes.
One variable at a time. Never apply multiple changes simultaneously.
3+ failures = architectural problem. Stop fixing symptoms, question the pattern.
Test before fix. Failing test proves the bug; green test proves the fix.
Defense in depth: After fixing root cause, add validation at multiple layers to prevent recurrence.
Systematic is faster than thrashing. 15-30min systematic vs 2-3h random fixes.

Red Flags (Return to Phase 1)

"Quick fix for now, investigate later"
"Just try changing X and see"
Proposing solutions before tracing data flow
Each fix reveals a new problem in a different place
"I don't fully understand but this might work"
Human says "stop guessing" or "is that not happening?"