name: fix-bug
description: "Use when the user reports an error with stack trace or screenshot, describes unexpected behavior, build/test failures occur, OR provides a batch of issues to fix against a running system that exposes an end-to-end verification surface — API, CLI, REPL, chat agent, or mobile deeplink ('fix these N issues against the API', 'dogfood this batch', '修一批 issue 通过平台自验证'). Triggers: '修 bug', '报错', '不work', '为什么', 'fix this', stack trace pasted, multi-issue list, #N / issue N GitHub references. Single-bug input runs the linear diagnostic; multi-issue input WITH the verification surface present switches to multi-issue loop mode (multi-issue WITHOUT a verification surface falls back to per-issue single-bug flow, not loop mode). Compound 'why does X behave + fix X' inputs stay here — the Hard Gate resolves the why-question via primary sources before hypothesis generation. Not when: user only wants an explanation of behavior with no reported defect (answer directly), or wants a feature added (use brainstorm or write-plan)."
Input
Trigger this command when:
- User reports an error with stack trace or screenshot
- User describes unexpected behavior
- Build/test failures occur
- User provides a batch of bugs/issues to fix against a running system
If input is incomplete, use AskUserQuestion with a single batch covering the missing pieces:
- Steps to reproduce
- Expected vs actual behavior
- Full error message or stack trace
Ask in one turn, not three. Only include questions for fields you don't already have.
Fallback: if AskUserQuestion is not available in the current invocation context (e.g., skill invoked via hook or programmatic dispatch), ask in prose as a single consolidated message instead — do not split into sequential turns.
Mode Detection
Inspect the input. If all of these hold, switch to the multi-issue loop documented in dev-workflow/references/multi-issue-loop.md:
- Input references 2+ issues (e.g.,
#N1 #N2 #N3, "fix these 4 issues", "dogfood this batch", a list of bug IDs, or 2+ symptom paragraphs separated as items) - The system under repair has an end-to-end verification surface (HTTP API, CLI, REPL, chat agent, mobile deeplink, or equivalent)
- The user expects verification through that surface (not just unit-test green) — explicit ("verify via the platform itself"), or implicit (the bugs are user-visible behaviors that only manifest at runtime)
In loop mode, the linear flow below is wrapped by a Baseline → Bundle → per-bundle pipeline. Within each bundle, this skill's Steps 1–6 (diagnostic) still run per-issue — only Step 7 (/write-plan) is invoked once per bundle, covering all issues in that bundle. Read dev-workflow/references/multi-issue-loop.md and follow its L0–L5 process; treat the steps below as the diagnostic substrate that L4.0 calls into.
Otherwise (single bug, OR no end-to-end verification surface): proceed with the steps below as the normal single-bug flow.
Hard Gate: "Why does X behave this way" Questions
When the bug report includes a "why does X happen" / "how is X supposed to work" / "what's the design intent of Y" sub-question (separate from "fix the error"), resolve it via primary sources before generating hypotheses:
- Required first action: Read the relevant files (or dispatch
Explore) and locate the actual behavior. Thengit log -p {file}orgit blameif intent over time matters. - Forbidden: Inferring "what was intended" from function names, comments alone, or training-data patterns. Step 2.5 (Understand intent) already enforces this for non-trivial logic, but the gate also applies the moment the user asks the question — do not answer from memory and then dive into hypotheses.
- Self-check: Did I read or grep at least one specific file in this turn before stating what the code "is supposed to" do? If no, do that first.
The user's friction reports show speculative architecture answers are a top frustration mode; this gate prevents starting a fix-bug flow with a wrong premise.
现状/预期 必填块 (Pre-Edit Gate)
In single-bug mode, before invoking any Edit / Write / MultiEdit / NotebookEdit tool, you MUST print this block — verbatim format, both lines required:
**现状**: <one sentence describing what the user actually sees, no code-layer terms>
**预期**: <one sentence describing what the user should see after fix, no code-layer terms>
Rules:
"现状" describes the user-observable symptom (e.g., "点相册按钮变成拍照功能"), NOT the root cause hypothesis
"预期" describes the user-observable success state (e.g., "点相册按钮进入照片选择界面"), NOT the planned code change
Write in the project's primary language (中文 if the bug was reported in 中文; English otherwise)
冒号可以是半角
:或全角:,hook 两种都识别English variant (use when bug was reported in English) — note hook accepts ASCII colon only for English form:
**Current**: <one sentence describing what the user actually sees> **Expected**: <one sentence describing what the user should see after fix>The bug-fix-gate hook (
dev-workflow/hooks/bug-fix-gate.py) detects this block; missing it inside /fix-bug = Edit blockedThe block's truthfulness is your responsibility — hook can only verify presence, not whether "现状" actually matches what the user reported. Writing a fake block to bypass the gate violates fix-bug protocol.
If you see
[fix-gate]after fixing a[readback-mandate]block: both gates are independent. readback-mandate checks user-intent alignment; fix-gate checks bug-fix expected-behavior anchor. Satisfy each separately.
Why this exists: the frustration audit (.claude/research/frustration-audit-2026-05-23.md) showed multiple cases where the model fixed the wrong bug or did a partial undo because the misunderstanding wasn't surfaced before code changes. Stating the user-visible target explicitly gives the user a checkpoint to redirect cheaply.
Multi-issue loop mode: see dev-workflow/references/multi-issue-loop.md — that mode's per-bundle reproduction step already serves this function. Skip the block requirement when multi-issue mode is active.
Process
Step Echo (audit aid — read before executing any step)
Before executing any step below (pre-0, 0, 0.5, 0.7, 0.8, 0.9, 1, 2, 2.5, 3, 4, 4.5, 5, 6, 7, 8, 9, 10), emit a one-line marker as the FIRST line of the response chunk for that step:
[fix-bug] Step={n} — {name}
Where {n} is the step number (e.g., 3 or 4.5) and {name} is the step's bold title (e.g., BV: Generate falsifiable assertions).
This is an audit aid, not an enforced gate — no hook intercepts a missing marker. Its value is post-hoc: the user can grep the response for [fix-bug] Step= to see which steps actually ran. Skipping from Step 2 directly to Step 7 ("Plan the fix") without emitting markers for 3/4/5/6/7 leaves the gap visible.
Why this exists: the 2026-04 to 2026-05 insights report shows assistant frequently lands on the first plausible hypothesis and ships a fix, skipping the BV (Step 3) / verification (Step 4) / value-domain trace (Step 5) gates. Explicit step markers make skipping visible to the user even when it's not blocked.
Step pre-0: Readback Gate (always — required for fix-bug)
Before any reproduction / diagnosis / planning work:
Collect inputs for readback:
user_request: the user's original prompt (full text)context_terms: 3-5 project-specific terms the user used in this session
Dispatch
readback:intent-echoeragent via Task tool (subagent_type =readback:intent-echoer).Write
.claude/readback-state.json:First, capture the agent's verbatim output into a shell variable. Substitute the literal content between the EOF markers with the actual text returned by intent-echoer — do not modify, escape, or summarize. (If agent output contains the literal string
EOF_AGENT_OUTPUT, use a unique marker variant for both lines.)AGENT_OUTPUT=$(cat <<'EOF_AGENT_OUTPUT' {paste the intent-echoer agent's literal output here; do not modify} EOF_AGENT_OUTPUT )Then write state:
mkdir -p .claude jq -n \ --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ --arg text "$AGENT_OUTPUT" \ '{ created_at: $ts, session_id: null, skill: "fix-bug", readback_done: true, readback_text: $text, user_confirmed: false, confirmed_at: null, correction_count: 0 }' > .claude/readback-state.jsonNote: skill bash writes
session_id: null(it cannot read hook stdin); the readback plugin'sPreToolUsehook stamps the real session id into the file on first read after user confirmation. Seereadback/references/state-schema.mdfor the v2 two-phase identity model.Present agent output VERBATIM to user. Stop. Do not proceed to Step 0.
Wait for user response:
- "go" / "OK" / "对" / 等价表达 → update state
user_confirmed: true, confirmed_at: <now>→ continue to Step 0 - Correction → increment
correction_count, re-dispatch agent with correction, present again correction_count≥ 2 → STOP, suggest user invoke/dev-workflow:brainstorm(alignment broken upstream)
- "go" / "OK" / "对" / 等价表达 → update state
Why this gate exists
readback plugin's pre-tool-use.sh hook is registered globally. When skill=fix-bug and user_confirmed=false, it BLOCKS Write/Edit/MultiEdit/NotebookEdit. This step ensures the model produces a plain-language echo and gets user confirmation before any code changes. See the readback plugin in the indie-toolkit marketplace (readback@indie-toolkit).
Parse input and read GitHub Issue (if reference provided)
If input contains
#Norissue N(e.g.,/fix-bug #5,/fix-bug issue 5):- Extract issue number N
- Run:
gh issue view N --json title,body,labels,milestone - If
gh issue viewreturns an error (issue not found,ghnot installed, no network): inform the user of the error and fall through to Step 0.5 without prior hypotheses - Parse the issue body — extract content under
### Prior Hypothesessection - Present: "Prior hypotheses from issue #N:" followed by the extracted assertions
- Show the full issue body for context
- Store these hypotheses for use in Step 3
If input does not contain an issue reference, skip to Step 0.5.
0.5. Retrieve historical context (via local knowledge base)
- Extract 3-5 keywords from the bug description (error type, component name, API name, symptom)
- Invoke
dev-workflow:kbskill via the Skill tool, passing the keywords as the query. The kb skill searches~/.claude/knowledge/(categories: api-misuse / api-usage / architecture / bug-postmortem / data-research) and returns relevant past lessons. - If results are returned: present them as "Related historical records:" before proceeding
- If the kb skill returns no matches, or the knowledge directory is empty: skip silently and proceed to Step 1
- Do not block investigation if the kb skill is unavailable in the current invocation context
0.7. Project Health — see dev-workflow/references/project-health-scanner.md. If scanner exists and cached state is missing/red/stale (>7 days), run full mode with --reason fix --max-ms 5000 --write-state. Treat red/yellow signals as regression guards for the fix plan.
0.8. Project Context Contract + Ubiquitous Language — see dev-workflow/references/project-context-contract.md. Read docs/00-AI-CONTEXT.md if present; otherwise mark Project context contract: missing and continue. Do not create CONTEXT.md. Also check docs/02-architecture/ubiquitous-language.md (per dev-workflow/references/ubiquitous-language-pattern.md); if present, read it and use the term mappings when reasoning about the bug and describing the fix — keeps AI vocabulary aligned with the project's domain language. If absent, do not auto-create it; suggesting maintenance is part of brainstorm's flow, not fix-bug's.
Parallel execution note (Steps 0.5 / 0.7 / 0.8): these three preflight steps are read-only and independent. Issue them in one tool-call batch (single message, parallel Skill / Bash / Read calls) — Opus 4.7's 1M context easily holds all three results. Sequential execution costs latency without informational benefit.
Agent dispatch verification gate (applies to any agent dispatch later in this skill, including Step 7 /write-plan and any Explore calls): the dev-workflow/hooks/verify-agent-output.py hook intercepts sub-agent stdout — when an agent claims it wrote a file but the file isn't actually on disk, the hook surfaces "files NOT on disk" to this skill. Treat agent stdout as a claim, not a fact: after every Agent return, before acting on the agent's reported changes, verify with ls or Read against the actual paths it claims to have written. Apply this gate without exception (we have a confirmed past case of sub-agent reporting file writes that did not persist).
Step 0.9: Feedback Loop (mandatory before Step 3)
Pick the lowest viable Feedback Loop level. Long-form rationale at dev-workflow/references/feedback-loop-ladder.md.
| Level | Signal |
|---|---|
| 1 | Failing test (unit/integration/E2E) |
| 2 | Curl / HTTP against running dev server |
| 3 | CLI invocation with fixture input, diff stdout vs snapshot |
| 4 | Headless browser script (Playwright/Puppeteer) |
| 5 | Replay captured trace (network request/payload/event log) |
| 6 | Throwaway harness (minimal subset, one function call) |
| 7 | Property / fuzz loop (1000 random inputs) |
| 8 | Bisection harness (git bisect run-able) |
| 9 | Differential loop (old vs new, diff outputs) |
| 10 | HITL bash script (drive human via structured loop, last resort) |
Before Step 1, output:
[Feedback Loop] level={N} (1–10) — {one-line description of the signal}
Command: {exact command to invoke the loop}
Pass: {what stdout / exit code / state means "bug NOT present"}
Fail: {what stdout / exit code / state means "bug PRESENT"}
If no level is constructable, output [Feedback Loop] level=0 — not constructable, blocking on: {what's missing} and either build it or escalate.
Step 3 cross-reference: each assertion in Step 3 must declare which Feedback Loop level its Verify: line runs on.
Reproduce first
- Confirm the bug can be reproduced
- If cannot reproduce: ask for more context, do not guess
- Document the exact steps that trigger the bug
Understand the error
- Read the complete error message and stack trace
- Identify the exact location where the error occurs
- Note any relevant context (user action, data state)
- Native crash without stack trace (iOS/Android, simulator, or device): apply user-level CLAUDE.md "Native crash → stack trace gate" before generating any hypothesis at Step 3. If user-level CLAUDE.md is not present, the inline fallback is: get stack trace before any hypothesis — iOS device
idevicecrashreport -e -k <path>then read the .ips file; Androidadb logcat -b crash; simulator stderr stream. Hypothesis must reference a specific frame (image symbol + osVersion) from the stack trace before being voiced. Hypothesis switches without a stack trace (e.g., guessing "double sheet" → "detents" → "FocusState") are instances of the same "guess known SDK bug" framework, not a framework switch under the 3-Strike Rule.
2.5. Understand intent (trigger: bug location involves non-trivial logic — conditional branches, state machines, multi-step transformations)
Before generating assertions, answer these questions by reading code + git history:
[Intent Analysis]
- Original intent: what was this code trying to achieve?
(cite: comments, function name, commit message, surrounding context)
- Actual behavior: what does it actually do? (cite: code trace)
- Gap type: intentional workaround / unintentional bug /
incomplete implementation / outdated assumption
- Evidence for gap type: {git blame date, TODO comment, related issue, etc.}
- Current goal: what do we want it to do now?
- Delta: {original intent} -> {current goal} — same or different?
- If different: upstream/downstream impact assessment required before proceeding
If gap type = "intentional workaround": Stop. Do not treat as bug. Present finding to user: "This appears to be an intentional workaround from {date/commit}. Confirm: fix it or preserve the workaround?"
BV: Generate falsifiable assertions
Based on error symptoms and code context, generate 3-5 specific, falsifiable assertions. Each assertion must include: hypothesis + file location + verification method + expected outcome for both cases + which Feedback Loop level (Step 0.9) the verification runs on.
If Step 0 extracted prior hypotheses from a GitHub Issue: prepend them to the assertion list as highest-priority items, marked with source: "Prior hypothesis from issue #N". Verify these first before generating additional assertions.
[Bug Assertion 1] {specific hypothesis} Location: {file:line} Verify: read {file:line}; if correct, expect {X}; if wrong, expect {Y}Assertions must cover different dimensions (pick the 3-5 most suspicious):
- Value domain error — type/range/format mismatch
- State timing error — race condition, wrong lifecycle stage
- Path routing error — data reaching wrong handler
- Missing guard — null/empty/boundary unhandled
- Stale code interference — replaced component still active
Principle: backward verification (testing specific assertions) is significantly more accurate than forward generation (guessing a single cause). Even if all assertions are falsified, the verification process exposes reasoning paths that reveal the root cause.
Verify assertions systematically
- Test each assertion from Step 3, one at a time
- Record result: confirmed / falsified / inconclusive
- If an assertion is confirmed: proceed to fix
- If all falsified: the verification traces usually reveal the actual cause; form a new assertion based on what you learned
- Do not test multiple assertions at once
4.5. Hypothesis reset (trigger: user negates a core assumption underlying the confirmed assertion)
Active self-check: After each user response during Step 4 verification, ask yourself: "Does the user's response contradict any premise of the current assertions?" If yes, invoke this step.
When the user provides information that invalidates the foundation of the current diagnosis (e.g., "the data is user-provided, not AI-generated"):
- Stop current investigation path
- Record what was invalidated and the user's correction
- Return to Step 3: regenerate assertions incorporating the user's new information
- After new assertions are verified, Step 7 (plan the fix) must be executed again — the previous plan (if any) was based on invalidated premises
Do NOT patch the old hypothesis. A negated foundation requires new assertions. Do NOT skip ahead to implementation — reset means the full gate chain (Step 3 → 4 → 7) restarts.
Trace value domain — MANDATORY GATE for value-related bugs
⛔ If bug symptom is value-related, this step is MANDATORY. Do not proceed to Step 7 without producing the
[值域检查]table below.A bug is value-related if: a wrong number/string/enum appears where a different one was expected, OR a field displays data from the wrong source, OR a computed result is incorrect. When in doubt, treat as value-related.
- Reverse-trace from bug location to data source (record each variable rename)
- Forward-trace consumers — LSP first, grep fallback:
- If the project has a registered LSP server for the file's language (Swift / TypeScript / Python / Rust / Go), prefer
LSP findReferencesat the symbol's declaration site — it identifies real references (handles same-name-different-scope, follows renames, ignores comment/string occurrences). - If no LSP server, or LSP returns errors: fall back to
Grepfor the source field name + each intermediate variable name. - Hybrid is OK: LSP for the declaration's direct references, then grep for transformed copies (e.g., field renamed via destructuring).
- If the project has a registered LSP server for the file's language (Swift / TypeScript / Python / Rust / Go), prefer
- Verify unit/domain/format assumptions at each consumer
- Output format:
[值域检查] {file:line} — {生产/消费} — 假设值域 {X} — ✅ 一致 / ❌ 同类问题 - All ❌ must be fixed in the same pass
- Skipping this step for value-related fixes = incomplete fix, even if original symptom disappears
Check for parallel paths (trigger: Step 5 finds multiple producers, or same core function has multiple upstream callers)
- List all processing paths from source to sink
- Check coordination mechanisms between paths (shared state, mutex, idempotency check)
- Format:
[路径检查] {核心函数} - 路径 A: {file:line} → {file:line} → {核心函数} - 路径 B: {file:line} → {核心函数} - 协调机制: {具体代码位置 / 无} - Parallel paths without coordination = architectural issue; flag as "⚠️ needs architectural fix" and inform user; do not fix only one path
Plan the fix — MANDATORY GATE
⛔ DO NOT write any fix code until this step is completed and the user has approved the plan.
Expectation Gate (precedes any "how"):
Before the Task Contract structured block below, write two plain-language preambles:
[Expected behavior]— 1–3 sentences describing what the user/reviewer will observe after the fix lands. No technical jargon. No file paths. No API names. Pure user-visible result.[Verifiable steps]— a bullet list. Each bullet is a single action the user or reviewer can run independently (no AI required), paired with the output they should see when the fix works. If a step requires a device or environment the AI cannot reach, mark it(needs-device).
These two blocks anchor the rest of Step 7 in user reality. The structured Task Contract below restates the same Expected behavior in machine-readable form for plan-verifier; they reinforce, not duplicate.
Do NOT proceed to the Task Contract block until
[Expected behavior]and[Verifiable steps]are written.Pre-check: If bug symptom involves value display/transfer, verify Step 5
[值域检查]table was produced. If not → return to Step 5 before proceeding.Task Contract: Before presenting the fix plan, write:
[Task Contract] - Expected behavior: {what the user should observe after the fix} - Current behavior: {observed failure} - Reproduction / verification method: {exact command, test, API call, or device path} - Regression shield: {adjacent behavior that must remain unchanged} - Project Health: {red/yellow signals from Step 0.7, or none}Relationship to write-plan's Task Contract schema: when fix complexity is Complex and Step 7 invokes
/write-plan, the structured**Task Contract:**block in the resulting plan uses a different (more granular) field shape:Expected behavior/Automated verify/Real path verify/Manual/device verify(seedev-workflow/skills/write-plan/SKILL.md§ Task Structure). Translation when handing off: this skill's[Task Contract].Expected behavior→ plan'sExpected behavior;Reproduction / verification method→ split into plan'sAutomated verify(the command/fixture) plusReal path verify(the user-perspective check);Regression shield→ plan's per-taskRegression shield:line. Do not duplicate this skill's[Task Contract]block into the plan — let/write-planregenerate using its own schema.If the bug touches Swift, iOS, macOS, SwiftUI, SwiftData,
.xcodeproj, or.xcworkspace, loadapple-dev:apple-swift-contextinternally before the fix plan.Consumer impact (mandatory for any fix that changes a field's value or source):
Before presenting the plan, enumerate all consumers of the modified field. Use
LSP findReferenceson the field's declaration to get the canonical reference list (when an LSP server is available for the language). For untyped languages or projects without LSP, fall back toGrepof the field name + transformed copies. List all callers:[Consumer Impact] - {consumer file:line} — 当前读取: {X} — 修复后读取: {Y} — 行为变化: {description}Cannot produce this list = have not traced the data flow = return to Step 5.
Classify fix complexity:
Simple — ALL must be true:
- Fix is confined to ≤2 file locations
- All changes are directly evident from verified assertions
- Step 5 not triggered, or all consumers showed ✅
- Step 6 not triggered, or no parallel path issues flagged
Complex — ANY is true:
- Fix spans 3+ file locations
- Step 5 found any ❌ consumer beyond the original bug site
- Step 6 flagged parallel paths without coordination
- Fix requires architectural changes
Required actions:
→ If Simple: enter Claude Code native plan mode (
EnterPlanMode). Present diagnosis context (confirmed assertions, consumer impact) in the plan. User reviews and approves within plan mode, then proceed to Step 8.→ If Complex: invoke
dev-workflow:write-planwith a structured diagnosis bundle as input. The invocation prompt MUST begin with this caller marker line (literal, on its own line as the first non-empty line of the prompt):Caller: dev-workflow:fix-bugwrite-plan reads this marker as the single source of truth for caller identity (gates both Step 1 item 12 Bug-diagnosis population AND Step 2.5 echo-only mode). Without it, write-plan falls through to standalone flow.
Bundle contents (write-plan Step 1 item 12 consumes these and lands them in the plan header's
**Bug diagnosis:**field):- Confirmed assertions from Step 4 — list every
[Bug Assertion N]that resolved to "confirmed", with the file:line evidence cited during verification [值域检查]table from Step 5 — paste verbatim if Step 5 was triggered; include every ❌ consumer (write-plan tasks must address all of them)[路径检查]table from Step 6 — paste verbatim if Step 6 was triggered; include the coordination-mechanism finding[Consumer Impact]list from this Step 7 — every consumer of the modified field with current vs post-fix read values
Readback continuity: this skill's Step pre-0 already obtained
user_confirmed: truefor the bug report. write-plan's Step 2.5 echo-only mode will skip re-prompting iff all of: (a) theCaller:marker above is present, (b).claude/readback-state.jsonis fresh (created within last 30 min), (c) no new requirements were introduced after pre-0. Conservative default: when in doubt about (c), state it explicitly when invoking write-plan so it falls through to the full readback flow. Re-prompting costs one echo; silently skipping alignment costs a misaligned plan.Session-freshness caveat: this continuity claim assumes Step pre-0 and Step 7 ran in the same Claude Code session. If you resumed fix-bug from a prior session (e.g.,
dev-workflow/references/multi-issue-loop.mdstate restore), the readback state file is stale relative to the current session and write-plan's freshness check will fail; full readback will run automatically. To force a fresh readback even within the same session, delete.claude/readback-state.jsonbefore invoking write-plan.Wait for plan approval before proceeding.
Proceeding to Step 8 without a user-approved plan is a violation of this skill's protocol.
Fix the root cause
- Address the actual cause, not just the symptom
- Consider edge cases and related scenarios
- Ensure the fix doesn't introduce new issues
- After the fix is complete, suggest: "Consider running
/collect-lessonto record this bug pattern for future retrieval."
Verify the fix
- Build the project
- Reproduce the original bug scenario - confirm it's fixed
- Test related scenarios to catch regressions
- If this fix originated from a GitHub Issue (Step 0): ask the user: "Close issue #N?" If yes, run:
gh issue close N. Display the closed issue URL.
Tradeoff Report
After verification, produce a fix report. Format depends on complexity (same classification as Step 7):
Simple fix (1-2 locations):
[Fix Summary] 修复 X/Y 项(跳过: {items} — {理由})
[Tradeoff] {修复内容} — 行为变化: {before -> after} — 代价: {known cost} — 验证: {status}
Complex fix (3+ locations):
[Fix Summary] 修复 X/Y 项(跳过: {items} — {理由})
| Issue | 修复前行为 | 修复后行为 | 收益 | 代价 | 验证状态 | 回归风险 |
|-------|-----------|-----------|------|------|---------|---------|
| ... | ... | ... | ... | ... | ... | ... |
Mandatory fields:
- 验证状态:
verified(ran test/command and saw expected output) orneeds-device-verification: {specific steps}(cannot verify in current environment) - 回归风险: if behavioral change, state impact scope (which callers/consumers affected)
- Completeness: "修复 X/Y 项" — every skipped item must have a reason
Build passing alone is compile-time verification only. Runtime behavioral changes
(conditional rendering, data-dependent logic, network failure paths) require either
a test or explicit needs-device-verification annotation.
Escalation Rules
3-Strike Rule
If you have attempted 3+ fixes and none resolved the issue:
Stop fixing. Question the architecture.
Signals of an architectural problem:
- Each fix reveals new coupling or shared state in a different place
- Fixes require "massive refactoring" to implement
- Each fix creates new symptoms elsewhere
Action: Discuss with the user before attempting more fixes. This is not a failed hypothesis; this is likely a wrong architecture.
Proposal Rejection Circuit Breaker
If the user rejects 2 consecutive fix proposals:
Stop proposing. Return to diagnosis.
A rejection is any user response that does not approve proceeding with the proposed fix. Partial approvals ("direction is right but...") count as rejections; they indicate the proposal was insufficient.
- The rejections indicate incomplete understanding, not a communication problem
- Return to Step 5 (value domain trace) or Step 6 (parallel paths); whichever was skipped or incomplete
- Produce the full
[值域检查]or[路径检查]output before making another proposal - Do NOT ask "is this direction correct?"; show the trace results and let the data speak
- If both Step 5 and Step 6 were already completed, escalate to the 3-Strike Rule or ask the user what dimension was missed
Continuing to propose without deeper investigation = repeating the same mistake with different words.
Diagnostic Layering (multi-component systems)
When the bug spans multiple components (e.g., CI -> build -> signing, API -> service -> database):
Before proposing any fix, add diagnostic instrumentation at each component boundary:
For EACH component boundary:
- Log what data enters the component
- Log what data exits the component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE the break occurs. Then narrow investigation to the specific failing component. Do not guess which layer is broken.
Pattern Analysis
When the root cause isn't obvious from assertions alone:
- Find a working example — locate similar working code in the same codebase
- Compare systematically — list every difference between working and broken, however small
- Don't assume irrelevance — "that can't matter" is how bugs hide
Completion Criteria
- Root cause identified with code evidence (file:line)
- Fix applied and build passes
- Original bug scenario verified fixed (Step 9 completed)
- All ❌ consumers from value domain trace fixed (if Step 5 triggered)
- No parallel path coordination issues left unresolved (if Step 6 triggered)
- Tradeoff Report produced with verification status for every fix item (Step 10 completed)