name: dx-bug-all
description: Run the full bug fix workflow — triage, verify, and fix — all in one command. Fetches the bug from ADO, reproduces it in browser, generates and executes a fix, and creates a PR. Use when starting work on a bug ticket.
when_to_use: "Use when starting work on a bug ticket — fetches from ADO, reproduces it in browser, generates a fix, and creates a PR. Trigger on 'fix bug', 'work on bug ticket', 'bug all', or when given a Bug work item ID."
argument-hint: ""
allowed-tools: ["read", "edit", "search", "write", "agent"]
You are a coordinator for the bug pipeline. You delegate each step via the Skill tool — you never implement a fix yourself. You run in one of two modes, decided once up front:
- Local / interactive (
DX_PIPELINE_MODEunset) — the original hands-on flow: parse the id, optional hub dispatch, run triage → verify → fix withTaskListprogress and a retry-once on failure. This mode is unchanged from before recovery existed. - Pipeline / autonomous (
DX_PIPELINE_MODE=true) — the same three steps wrapped in resumable recovery: the per-ticketbugfix/<id>-*branch is a durable state store, every step checkpointsresume-state.json, a blocked/crashed run resumes when a human replies the trigger token, and the ticket never goes silent.
PIPELINE_MODE="${DX_PIPELINE_MODE:-}" # "true" → autonomous; else local
If PIPELINE_MODE is true, follow Pipeline mode below and ignore the Local-mode section. Otherwise follow Local mode.
Local mode (interactive — unchanged)
Progress Tracking
Before creating tasks, use TaskList to check for existing tasks from a previous run. If stale tasks exist, cancel them so the list is clean. Then create one task per item with TaskCreate: (1) Triage bug, (2) Verify reproduction, (3) Fix bug. Mark each in_progress when starting and completed when done.
Local flow
digraph bug_all_local {
"Parse argument" [shape=box];
"Hub dispatch?" [shape=diamond];
"Hub dispatch" [shape=box];
"Dispatch triage" [shape=box];
"Triage succeeded?" [shape=diamond];
"Dispatch verify" [shape=box];
"Dispatch fix" [shape=box];
"Fix succeeded?" [shape=diamond];
"Final summary + log run" [shape=doublecircle];
"Retry triage" [shape=box];
"Report FAIL (triage)" [shape=doublecircle];
"Retry fix" [shape=box];
"Report FAIL (fix)" [shape=doublecircle];
"Parse argument" -> "Hub dispatch?";
"Hub dispatch?" -> "Hub dispatch" [label="hub mode active"];
"Hub dispatch?" -> "Dispatch triage" [label="not hub mode"];
"Dispatch triage" -> "Triage succeeded?";
"Triage succeeded?" -> "Dispatch verify" [label="success"];
"Triage succeeded?" -> "Retry triage" [label="FAIL"];
"Retry triage" -> "Dispatch verify" [label="success"];
"Retry triage" -> "Report FAIL (triage)" [label="FAIL"];
"Dispatch verify" -> "Dispatch fix" [label="success | blocked | FAIL (warn + continue)"];
"Dispatch fix" -> "Fix succeeded?";
"Fix succeeded?" -> "Final summary + log run" [label="success"];
"Fix succeeded?" -> "Retry fix" [label="FAIL"];
"Retry fix" -> "Final summary + log run" [label="success"];
"Retry fix" -> "Report FAIL (fix)" [label="FAIL"];
}
Parse argument
The argument is the ADO work item ID — a numeric value (e.g., 2453532). If the user provides a full ADO URL, extract the numeric ID. If no argument is provided, ask the user for the work item ID.
Hub dispatch?
Read shared/hub-dispatch.md. Hub mode is active when: DX_PIPELINE_MODE is NOT set, .ai/config.yaml has hub.enabled: true, and the cwd is a .hub/ directory. If active: print Hub mode detected. Use /dx-hub-dispatch <ticket-id> to dispatch this bug to repo terminals. and STOP. Otherwise proceed to triage.
Dispatch triage / verify / fix
Skill(/dx-bug-triage <id>)→ printStep 1/3 done —+ summary.Skill(/dx-bug-verify <id>)→ printStep 2/3 done —+ summary. success / blocked / FAIL all continue to fix (verify enhances confidence but is not required).Skill(/dx-bug-fix <id>)→ printStep 3/3 done —+ summary.
Retry / Report FAIL
Retry a failed triage or fix once with the same skill. If still failing, print which step failed and what succeeded, suggest running the individual skill, and STOP. Dependency rules: triage FAIL → STOP (no bug data); verify FAIL → WARN + continue; fix FAIL → STOP after reporting.
Final summary + log run
Find the spec directory and present the completion table (raw-bug.md / triage.md / verification.md / implement.md / verification-local.md, PR URL, what-was-done list). Then log the run (identical to before): append a record to .ai/learning/raw/runs.jsonl (flow:"bug-all", severity, priority, component, verification, fix_result, local_verification, pr_created) and a pattern to .ai/learning/raw/bugs.jsonl (component, root_cause, files_changed). Bug hotspot check: if any component has ≥3 bugs in bugs.jsonl, print a learning note. Known pattern check: if fixes.jsonl exists and the root cause matches a known error_type, print a note. Skip silently if none.
Pipeline mode (autonomous — resumable recovery)
Same three steps, wrapped in recovery. You DO read state and post ADO comments here (you do not in local mode). Coarse-step granularity: triage → verify → fix → finalize. The bugfix/<id>-<slug> branch (created by triage, pushed by every checkpoint) is the durable store; a crashed/blocked container resumes from the committed resume-state.json.
Argument + USER_INPUT
$ARGUMENTS is <id-or-url> [free text]. Extract numeric TICKET_ID; trailing text is INLINE_INPUT. The free text after the trigger token is the primary instruction — exactly like /dx-bug-all <id> <instruction> locally. Resolve USER_INPUT once:
- If
INLINE_INPUTnon-empty →USER_INPUT="$INLINE_INPUT",TRIGGER_COMMENT_ID="". - Else (fired by an ADO comment) fetch comments and pick the triggering one —
mcp__ado__wit_list_work_item_commentswithworkItemId=$TICKET_ID; choose the comment that is (a) authored by a non-bot identity, (b) contains$TRIGGER_TOKEN, (c) has the highest ADO comment id. Strip the token →USER_INPUT; record its id asTRIGGER_COMMENT_ID. USER_INPUTmay be empty (bare@<keyword>with no words) — then only the bug body drives the work.
USER_INPUT is authoritative where it conflicts with the bug body. Pass it to triage/fix as extra context, and use it (with TRIGGER_COMMENT_ID vs COMMENT_CURSOR) on the done-reopen and blocked-answer paths.
Recovery config (read once)
Never hardcode the token or the cap — read from .ai/config.yaml (defaults applied):
TRIGGER_TOKEN=$(bash .ai/lib/dx-common.sh yaml-val 'dx-bug-all.recovery.trigger-token'); TRIGGER_TOKEN=${TRIGGER_TOKEN:-@kai-bugfix}
KEYWORD=${TRIGGER_TOKEN#@} # bare form used in bot-emitted text
MAX_ATTEMPTS=$(bash .ai/lib/dx-common.sh yaml-val 'dx-bug-all.recovery.max-attempts'); MAX_ATTEMPTS=${MAX_ATTEMPTS:-3}
The ADO Service Hook comment filter MUST equal
trigger-token— single source of truth. The bot never emits the literal token (it writes the keyword bare and tells the human to prefix@), so its own comments can't self-trigger the hook.
Pre-flight (HARD GATE)
bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/preflight.sh
If it exits non-zero, STOP: post the stderr to ADO (mcp__ado__wit_add_work_item_comment), touch .ai/run-context/ado-comment-posted.flag, exit non-zero. No other action.
Then touch the orchestration flag — unconditionally, on EVERY pipeline run (fresh AND resume), before Phase 0:
mkdir -p .ai/run-context && touch .ai/run-context/orchestrating.flag
This is what makes the fix step's PR idempotent: /dx-pr-commit reads this flag to set ORCHESTRATED=1 and switch to update-mode when an Active PR already exists (no duplicate PR on a resume after a crash mid-fix). .ai/run-context/ is runtime-only (not committed to the branch), so a fresh resume container must re-create the flag here — do NOT rely on the fresh-init touch alone.
State files
| File | Created in | Purpose |
|---|---|---|
resume-state.json |
after triage (fresh) / read on resume | Durable status — status, last-completed-step, blocked-at-step, blocker{}, answer-attempts, comment-cursor, run-history[]. Phase 0 dispatches on it. Committed every checkpoint. |
raw-bug.md, triage.md |
triage | Bug dump + component analysis |
verification.md |
verify | Repro result + screenshots |
implement.md, verification-local.md |
fix | Fix plan + post-fix local check |
followup.md |
reopen path | Verbatim USER_INPUT that reopened a completed bug |
bug-progress.md |
initialized lazily; updated per step | Step status table |
report.md |
finalize / ABORT | Final human-readable report |
Flow
digraph bug_all_pipeline {
"Preflight (hard gate)" [shape=box];
"Phase 0: resume-check dispatch" [shape=box];
"Read resume comment + apply answer" [shape=box];
"Reopen completed (follow-up)?" [shape=diamond];
"Reopen: re-enter fix with delta" [shape=box];
"No-op: already done (PR exists)" [shape=doublecircle];
"Re-post hard note + exit" [shape=doublecircle];
"Ambiguous: ask which branch + exit" [shape=doublecircle];
"Step: triage" [shape=box];
"Triage ok?" [shape=diamond];
"Checkpoint triage" [shape=box];
"Step: verify (non-blocking)" [shape=box];
"Checkpoint verify" [shape=box];
"Step: fix (idempotent PR)" [shape=box];
"Fix ok?" [shape=diamond];
"Checkpoint fix" [shape=box];
"Finalize: report + ADO comment + flag + done" [shape=doublecircle];
"ABORT: classify blocker" [shape=box];
"Resumed needs-input gate failed again?" [shape=diamond];
"answer-attempts < max?" [shape=diamond];
"Post sharper question (blocked-needs-input)" [shape=box];
"Downgrade to blocked-hard" [shape=box];
"Persist state + ADO comment + flag + exit" [shape=doublecircle];
"Preflight (hard gate)" -> "Phase 0: resume-check dispatch";
"Phase 0: resume-check dispatch" -> "Step: triage" [label="fresh"];
"Phase 0: resume-check dispatch" -> "Step: triage" [label="resume-forward (RE_ENTER_STEP=triage|verify|fix)"];
"Phase 0: resume-check dispatch" -> "Finalize: report + ADO comment + flag + done" [label="resume-forward (RE_ENTER_STEP=finalize)"];
"Phase 0: resume-check dispatch" -> "Read resume comment + apply answer" [label="resume-blocked-input"];
"Phase 0: resume-check dispatch" -> "Re-post hard note + exit" [label="resume-blocked-hard"];
"Phase 0: resume-check dispatch" -> "Reopen completed (follow-up)?" [label="done"];
"Phase 0: resume-check dispatch" -> "Ambiguous: ask which branch + exit" [label="ambiguous-branch"];
"Reopen completed (follow-up)?" -> "Reopen: re-enter fix with delta" [label="fresh comment (id > cursor)"];
"Reopen completed (follow-up)?" -> "No-op: already done (PR exists)" [label="no"];
"Reopen: re-enter fix with delta" -> "Step: fix (idempotent PR)";
"Read resume comment + apply answer" -> "Step: triage" [label="re-enter RE_ENTER_STEP"];
"Step: triage" -> "Triage ok?";
"Triage ok?" -> "Checkpoint triage" [label="success"];
"Triage ok?" -> "ABORT: classify blocker" [label="FAIL (after 1 retry)"];
"Checkpoint triage" -> "Step: verify (non-blocking)";
"Step: verify (non-blocking)" -> "Checkpoint verify" [label="success | blocked | FAIL → warn"];
"Checkpoint verify" -> "Step: fix (idempotent PR)";
"Step: fix (idempotent PR)" -> "Fix ok?";
"Fix ok?" -> "Checkpoint fix" [label="success"];
"Fix ok?" -> "ABORT: classify blocker" [label="FAIL (after 1 retry)"];
"Checkpoint fix" -> "Finalize: report + ADO comment + flag + done";
"ABORT: classify blocker" -> "Resumed needs-input gate failed again?";
"Resumed needs-input gate failed again?" -> "answer-attempts < max?" [label="yes"];
"Resumed needs-input gate failed again?" -> "Persist state + ADO comment + flag + exit" [label="no (first failure)"];
"answer-attempts < max?" -> "Post sharper question (blocked-needs-input)" [label="yes"];
"answer-attempts < max?" -> "Downgrade to blocked-hard" [label="no (cap)"];
"Post sharper question (blocked-needs-input)" -> "Persist state + ADO comment + flag + exit";
"Downgrade to blocked-hard" -> "Persist state + ADO comment + flag + exit";
}
Node Details
Phase 0: resume-check dispatch
RC=$(bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/resume-check.sh "$TICKET_ID")
val() { sed -n "s/^$1=//p" <<<"$RC" | head -1; }
DISPATCH=$(val DISPATCH); BRANCH=$(val BRANCH); SPEC_DIR=$(val SPEC_DIR)
STATUS=$(val STATUS); RE_ENTER_STEP=$(val RE_ENTER_STEP)
ANSWER_ATTEMPTS=$(val ANSWER_ATTEMPTS); LAST_COMPLETED_STEP=$(val LAST_COMPLETED_STEP)
BLOCKED_AT_STEP=$(val BLOCKED_AT_STEP); COMMENT_CURSOR=$(val COMMENT_CURSOR)
MAX_ATTEMPTS=$(val MAX_ATTEMPTS)
resume-check.sh owns anchored discovery (bugfix/<id>-* / feature/<id>-* — never a bare *<id>* substring), the 0 / 1 / >1 decision, and the reverse branch→spec-dir mapping. It is read-only w.r.t. resume-state.json (never bumps answer-attempts). Dispatch on DISPATCH:
| DISPATCH | meaning | action |
|---|---|---|
fresh |
no branch / no committed state | run Step: triage (it creates bugfix/<id>-<slug> + spec dir). After triage, init state (below). |
resume-forward |
prior run crashed (in-progress) |
re-enter RE_ENTER_STEP (triage idempotently skips, verify/fix re-run). RE_ENTER_STEP may be finalize — a crash after the fix checkpoint (last-completed-step=fix) resumes straight at Finalize (reuse the committed report.md/PR; no re-triage/verify/fix). Reuse committed artifacts. Does NOT consume an answer-attempt. |
resume-blocked-input |
recoverable block awaiting a human answer | run Read resume comment + apply answer, then re-enter RE_ENTER_STEP. |
resume-blocked-hard |
unrecoverable, or answer-attempts cap reached |
Re-post hard note + exit — re-post the "needs manual fix / re-tag KAI-DEV-AUTOMATION" note; exit non-zero. |
done |
finalize already succeeded | Reopen completed work (below) on a fresh follow-up comment, else no-op. |
ambiguous-branch |
>1 (bugfix|feature)/<id>-* match |
post a needs-user-input comment asking which branch to resume (loop-safe); set blocked-needs-input; exit. Do NOT pick one. |
git-rules.md exception. Reusing the per-ticket branch for the same bug, continued is a deliberate, scoped exception, allowed only when the anchored match is unique (ambiguous → blocker, not reuse).
Fresh-run state init (only on fresh, after triage creates the spec dir):
SPEC_DIR=$(ls -d .ai/specs/${TICKET_ID}-*/ 2>/dev/null | head -1); SPEC_DIR="${SPEC_DIR%/}"
sed "s/{TICKET}/$TICKET_ID/" \
"$CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/templates/resume-state.json.tmpl" \
> "$SPEC_DIR/resume-state.json"
(orchestrating.flag was already touched in Pre-flight.) On a resume dispatch resume-state.json already exists on the branch — read it, do not re-init.
Step: triage
Skill(/dx-bug-triage <id>), weaving USER_INPUT into the request if non-empty. On resume-forward/reopen, triage is idempotent (skips re-fetch when raw-bug.md is current). On FAIL, retry once; if still failing → ABORT with blocker class needs-user-input (component not identifiable) or transient (MCP/network). Triage FAIL is terminal for the run — no bug data, no fix.
Checkpoint triage
bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/update-progress.sh "$SPEC_DIR" "triage" "done" "<component>"
bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/save-state.sh "$SPEC_DIR" "triage"
If save-state.sh exits 3 (BRANCH-ADVANCED), STOP — a concurrent resume is in flight; do not force-push.
Step: verify (non-blocking)
Skill(/dx-bug-verify <id>). Verify enhances confidence but is not required: success, blocked (no browser/URL/tools), and FAIL (could-not-reproduce) all continue to fix — record the result, never abort. Then Checkpoint verify (save-state.sh "$SPEC_DIR" "verify").
Step: fix (idempotent PR)
Skill(/dx-bug-fix <id>). This plans (implement.md), executes, builds, and opens a PR via /dx-pr-commit. Idempotency (M5): /dx-bug-fix//dx-pr-commit detect an existing open PR for bugfix/<id>-* and switch to update-mode (push commits + refresh description) instead of creating a second PR — so a re-trigger after a crash mid-fix never duplicates the PR. On FAIL, retry once; if still failing → ABORT (hard for build failures surviving the inner retries; needs-user-input if the fix is ambiguous and /dx-bug-fix asked a question). Then Checkpoint fix.
Finalize: report + ADO comment + flag + done
- Render
report.md(completion table + PR URL + what-was-done + verify result). - Log the run exactly as Local mode does (
runs.jsonl+bugs.jsonl; hotspot + known-pattern checks). - Post a truncated report to ADO (
mcp__ado__wit_add_work_item_comment), then record it so the pipeline fallback doesn't double-post:mkdir -p .ai/run-context && touch .ai/run-context/ado-comment-posted.flag - Mark terminal + final checkpoint:
jq '.status="done"' "$SPEC_DIR/resume-state.json" > "$SPEC_DIR/.rs.tmp" && mv "$SPEC_DIR/.rs.tmp" "$SPEC_DIR/resume-state.json" bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/save-state.sh "$SPEC_DIR" "finalize" rm -f .ai/run-context/orchestrating.flag
Read resume comment + apply answer
Only on resume-blocked-input (or ambiguous-branch resolution). Fetch comments; select the one that is non-bot, contains $TRIGGER_TOKEN, has the highest id, and id > stored comment-cursor. If none qualifies → cheap exit (stray re-trigger). Strip the token → continue-input; apply it to blocked-at-step (e.g. component/repo hint for triage; repro URL for verify; file/anchor guidance for fix; the chosen branch for ambiguous-branch). Commit the updated comment-cursor atomically before re-entering (via jq into resume-state.json then save-state.sh), else a crash reprocesses the same comment. Re-enter RE_ENTER_STEP. If the gate fails again, see ABORT's re-ask loop. When quoting the human's prior answer, never echo the literal token — render @<keyword>.
Reopen completed work (follow-up comment)
Only on DISPATCH=done. A completed run is normally a no-op, but a reviewer may want a small refinement on the same branch. Reopen only if USER_INPUT is non-empty and TRIGGER_COMMENT_ID > COMMENT_CURSOR (numeric; empty cursor = 0). Otherwise → no-op: post once "Already completed in PR. Reply @<keyword> <change> to request a follow-up, or open a new ticket." and exit 0.
To reopen: record USER_INPUT verbatim in followup.md; seed status="in-progress", last-completed-step="verify", advance comment-cursor=$TRIGGER_COMMENT_ID, append a run-history reopen entry; save-state.sh "$SPEC_DIR" "verify" (advancing the cursor before re-entry is the loop guard — once this run reaches done, the same comment is at/below the cursor and can't reopen again). Then re-enter Step: fix with the follow-up delta (reuse committed triage.md/implement.md; do not re-fetch or re-triage). This path does NOT touch answer-attempts.
ABORT path (any terminal/blocked step)
Order matters: classify first, persist state, comment, flag, then exit.
- Classify the blocker → set
blocker.class(needs-user-input|transient|hard),blocker.recoverable,blocker.reason,blocker.needs,blocked-at-step, andstatusinresume-state.json(viajq). Use the taxonomy below. Re-ask loop (resumedneeds-inputstep that failed again): ifanswer-attempts < $MAX_ATTEMPTS, bump it, setstatus=blocked-needs-input, post a sharper question (loop-safe — never echo the literal token). Else downgrade tostatus=blocked-hardand recommend re-tagKAI-DEV-AUTOMATION. - Write the failure
report.md(verified: false, failing step marked, Recovery section using@<keyword>). - Persist state — commit + push:
(Exit 3 =bash $CLAUDE_PLUGIN_ROOT/skills/dx-bug-all/scripts/save-state.sh "$SPEC_DIR" "$BLOCKED_AT_STEP"BRANCH-ADVANCED→ concurrent resume; stop.) - Discard uncommitted source edits (spec files are committed and survive; the fix re-runs on resume):
git checkout -- . 2>/dev/null || true - Post the classified ADO comment + record the flag so the pipeline fallback doesn't double-post:
mkdir -p .ai/run-context && touch .ai/run-context/ado-comment-posted.flag - Clean up
orchestrating.flag(leaveado-comment-posted.flagfor the pipeline). Exit non-zero.
Blocker taxonomy
| class | examples | recovery |
|---|---|---|
needs-user-input |
component not identifiable in triage, ambiguous repro, missing repro URL, /dx-bug-fix asked a clarifying question, ambiguous-branch |
human replies @<keyword> <answer> → resume at blocked step |
transient |
MAX_TURNS / step timeout / MCP unreachable / network blip — infra only | re-trigger (any @<keyword> reply, or re-run) → resume forward from last-completed-step |
hard |
build failure surviving /dx-bug-fix's inner retries, fix not achievable as a small change, answer-attempts cap reached |
not recoverable here → recommend re-tag KAI-DEV-AUTOMATION (full DevAgent) |
Blocker comment format (loop-safe)
The literal trigger token never appears in the bot's own text — say the keyword bare and tell the user to prefix @. Holds even when quoting the human's prior answer.
⚠️ /dx-bug-all paused on #<id> — needs your input
**What failed:** <step> — <reason>.
**Recoverable:** yes (answer-attempt <n> of <max>)
**What I need:** reply on this Bug, beginning your comment with the kai-bugfix
keyword (prefixed with @ so it re-triggers me), followed by <the specific input>, e.g.
@<keyword> repro-url=https://qa.example.com/...
<!-- dx-bug-all:blocked step=<step> reason=<short> attempt=<n> comment-cursor=<id> -->
The HTML marker lets the next run parse step/attempt/comment-cursor deterministically. Render the keyword as @<keyword> in examples so the line can't self-trigger the hook.
Return contract
When invoked from an orchestrator, emit at the end:
## Return
verdict: pass | warn | fail
summary: <one sentence>
artifacts:
- $SPEC_DIR/report.md
- $SPEC_DIR/resume-state.json
next_action: <human-readable next step or "none">
Rules
- Coordinator only — delegate every step via the Skill tool; never implement a fix yourself.
- Mode gate is first — decide Local vs Pipeline once; never mix (no checkpoints/ADO comments in Local mode, no
TaskList-only flow in Pipeline mode). - Sequential dependencies — never dispatch step N+1 until step N returns (except verify FAIL → continue to fix).
- Strict checkpoints (Pipeline) — checkpoint after every successful step; a
BRANCH-ADVANCED(exit 3) means stop, never force-push. - Never go silent (Pipeline) — every terminal path posts an ADO comment and touches
ado-comment-posted.flag. - Loop-safe — the bot never emits the literal trigger token.
Platform Compatibility
Skill() calls work on Claude Code and Copilot CLI. Copilot CLI / VS Code Chat fallback (if subagent skill invocation fails): run the steps manually — /dx-bug-triage <id> → /dx-bug-verify <id> → /dx-bug-fix <id>.