name: diagnose description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
Diagnose
A discipline for hard bugs. Skip phases only when explicitly justified.
Phase 1 — Build a feedback loop
This is the skill. If you have a fast, deterministic, pass/fail signal for the bug, you will find the cause. If you don't, no amount of staring at code will save you.
Spend disproportionate effort here. Be aggressive. Be creative. Refuse to give up.
Ways to construct one (try in order):
- Failing test at whatever seam reaches the bug
- CLI invocation with a fixture input, diffing output against known-good snapshot
- Throwaway harness: minimal subset of the system that exercises the bug code path
- Property/fuzz loop: if the bug is "sometimes wrong output", run many random inputs
- Bisection harness: if bug appeared between two known states, automate "check at state X"
If you genuinely cannot build a loop — stop and say so. List what you tried. Ask the user for access to the reproducing environment or a captured artifact. Do not proceed to Phase 2 without a loop.
Phase 2 — Reproduce
Run the loop. Confirm:
- The loop produces the failure the user described — not a different nearby failure
- The failure is reproducible (or at a high enough rate for non-deterministic bugs)
- You have captured the exact symptom
Phase 3 — Hypothesise
Generate 3–5 ranked hypotheses before testing any of them. Each must be falsifiable:
"If X is the cause, then changing Y will make the bug disappear."
Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly. Don't block on it — proceed if user is AFK.
Phase 4 — Instrument
Each probe must map to a specific prediction from Phase 3. Change one variable at a time.
- Debugger/breakpoint if env supports it — one breakpoint beats ten logs
- Targeted logs at boundaries that distinguish hypotheses
- Tag every debug log with a unique prefix e.g.
[DEBUG-a4f2]— cleanup becomes a single search
For performance regressions: establish a baseline measurement first, then bisect. Measure first, fix second.
Phase 5 — Fix + regression test
Write the regression test before the fix — but only if there is a correct seam for it.
If a correct seam exists:
- Turn the minimised repro into a failing test
- Watch it fail
- Apply the fix
- Watch it pass
- Re-run the Phase 1 loop against the original scenario
Phase 6 — Cleanup + post-mortem
- Original repro no longer reproduces
- Regression test passes (or absence of seam is documented)
- All
[DEBUG-...]instrumentation removed - Throwaway prototypes deleted
- The correct hypothesis is stated in the commit message
Then ask: what would have prevented this bug?