name: issue-tracer description: "Use when asked to trace, investigate, root-cause, plan, fix, close, or prepare a PR for a GitHub issue or bug report. Runs an evidence-first issue workflow: GitHub intake, reproduction, reasoning-guided localization, no-gap fix planning, independent critic review, user approval gate, implementation, tests, and PR-ready closure." allowed-tools: Read Grep Glob Bash Edit MultiEdit Write WebFetch TodoWrite
Issue Tracer
Use this skill to drive a GitHub issue or bug report from intake to a reviewed closure plan, then, after explicit approval, to a minimal verified fix and PR-ready output.
The default behavior is plan-first. You MUST trace the issue end to end, produce a rock-solid plan, send that plan to an independent critic, incorporate the critic's feedback, present the reviewed plan to the user, and wait for explicit user approval before changing production code.
Source Policy
Use these sources in this order.
- GitHub source of truth:
- If
github_mcp_directis available, prefer it for issue fetch, PR metadata, repository metadata, file content, and repository search before falling back to CLI commands. - Prefer
gh issue view,gh issue list,gh pr view,gh api,git log,git blame,git diff, and local repo files. - If a GitHub MCP server is available, use it for issue, PR, discussion, and repository metadata.
- Do not ask the user for GitHub credentials. If GitHub access fails, report the exact blocked operation and fall back to local issue text only.
- If
- Web source of truth:
- Use
WebFetchor equivalent web access for current framework/API behavior, release notes, deprecations, security advisories, and external service semantics. - Any plan claim based on external docs must include the URL in the plan.
- Use
- Repository source of truth:
- Never speculate about code. Open every file before referencing it.
- Verify every symbol, type, command, test, config entry, and path against the repo.
Non-Negotiable Rules
- Quality is the only metric that matters. Time pressure does not exist.
- Do not implement before the user explicitly approves the reviewed plan.
- Reproduce or explain non-reproducibility before localizing.
- Localize before fixing. A plausible patch is not enough.
- Prefer the smallest patch that fully closes the issue without unwired functionality, untested branches, or hidden regressions.
- Use parallel reads/searches for independent files and subsystems whenever available.
- Maintain written artifacts so context compaction or handoff cannot erase the investigation state.
- If confidence drops below 90%, stop and surface the uncertainty instead of guessing.
- Do not disable, delete, weaken, or skip tests to make the run green.
- Do not push, merge, publish, delete data, drop databases, rewrite history, or perform destructive operations without explicit user approval.
- Evidence-grounded reporting: every claim that a command, build, test, lint, or check "passed" or "was validated" MUST include the exact command and its captured output or exit status. Never assert success you did not observe — show the evidence instead of describing it.
- Tests passing is "plausible," not "correct." A patch that turns the suite green can still overfit the test and miss the real defect. Before declaring closure you MUST justify, in writing, why the fix is correct against the issue's intended behavior — not merely that tests are green.
Required Artifacts
Create a trace directory before meaningful investigation:
.claude/issue-traces/<issue-id-or-slug>/
├── 01-issue-summary.md
├── 02-reproduction.md
├── 03-localization-log.md
├── 04-root-cause.md
├── 05-fix-plan.md
├── 06-critic-review.md
├── 07-approved-plan.md
├── 08-test-results.md
├── 08b-implementation-review.md
├── 09-final-critic.md
├── 10-pr-body.md
└── state.md
If .claude/ is not writable or should not be modified in the project, use tmp/issue-traces/<issue-id-or-slug>/ and say so.
Always update state.md at phase boundaries with current phase, completed gates, active hypothesis, selected fix candidate, unresolved risks, and next action.
Detailed templates are in:
references/evidence-artifacts.mdreferences/localization-playbook.mdreferences/critic-gate.mdassets/pr-template.md
Read the relevant reference before starting that phase.
Phase 0: Setup and Scope Control
- Parse the user request into:
- issue URL, issue number, or bug description
- repo path or GitHub owner/repo if provided
- requested mode: plan-only, plan-then-approval, or approved implementation
- Check repo state:
git status --short- current branch
- remotes
- top-level files such as
CLAUDE.md,README*, package manifests, test configs, CI configs
- If the worktree has unrelated user changes, do not overwrite them. Continue read-only until you can isolate your changes or ask the user.
- Create the trace directory and initialize
state.md. - Create a todo list with all phases. Mark only one step in progress at a time, and mark steps complete only after gate verification.
Phase 0 Gate
Proceed only when:
- repo and issue target are identified or the missing identifier is explicitly documented
- worktree safety is checked
- trace directory exists
- todo list exists
state.mdrecords the starting state
Phase 1: Intake and Reproduction
Goal: convert the issue into a precise, reproducible engineering problem.
- Retrieve and read the full issue:
gh issue view <id> --comments --json number,title,body,author,labels,state,comments,createdAt,updatedAt,url- Also read linked PRs, commits, discussions, screenshots, logs, and external docs referenced by the issue.
- Extract into
01-issue-summary.md:- observed behavior
- expected behavior
- exact error messages and stack traces
- reproduction steps
- environment, platform, versions, feature flags, config
- acceptance criteria
- ambiguity list
- Identify the project's verification commands by reading actual repo files:
CLAUDE.mdREADME*- package manifests
- Makefiles
- CI workflow files
- test configs
- Attempt reproduction using the smallest faithful command or scenario.
- Capture exact commands, exit codes, and output in
02-reproduction.md. - If no reproduction exists, create a minimal failing test, script, fixture, or manual reproduction checklist. The reproduction must target the reported behavior, not a guessed implementation detail.
Phase 1 Gate
Proceed only when one is true:
- the issue is reproduced with exact failing output
- a regression test is written and confirmed failing for the reported behavior
- the issue is not reproducible, and
02-reproduction.mddocuments every attempted command, environment mismatch, and missing input needed from the user
If reproduction is impossible because required data, credentials, environment, or hardware is missing, stop and ask for the minimum missing information. Do not jump to a speculative fix.
Phase 2: Root-Cause Localization
Goal: isolate the root cause to the narrowest truthful granularity: file, symbol, line range, invariant, and triggering input.
Use references/localization-playbook.md.
- Build candidate locations from issue evidence:
- stack traces and error text
- failing test names
- UI route/API endpoint/CLI command names
- labels and linked PRs
- recent commits touching related areas
- Search and read in parallel where possible:
rgfor symbols, routes, commands, strings, errors, config keysgit grepfor tracked-file confirmationgit log --oneline --decorate -- <path>git blame -L <start>,<end> -- <path>where useful
- Use reasoning-guided hierarchical localization:
- file-level: which files can plausibly affect the symptom
- element-level: which functions, classes, handlers, tests, or configs matter
- line-level: which conditions, calls, assignments, invariants, or boundary checks are wrong
- Maintain
03-localization-log.md:- every hypothesis
- files read and why
- commands run and results
- evidence for and against each hypothesis
- ruled-out paths
- Follow call chains in both directions:
- from input/event to failure
- from failure back to origin
- through config, serialization, async boundaries, state transitions, and feature flags
- Stop localization only when you can write
04-root-cause.mdin this shape:
# Root Cause
## Summary
[One paragraph: what failed, where, and why.]
## Exact Location
- File: `path/to/file.ext`
- Symbol: `functionOrClassName`
- Lines: `start-end`
## Broken Contract
[The invariant/assumption/contract that was violated.]
## Triggering Conditions
[Inputs, environment, state, flags, or sequence required.]
## Evidence Chain
1. [Issue symptom or failing test]
2. [Code path evidence]
3. [Focused command/test evidence]
4. [Why alternatives were ruled out]
Reasoning-guided ranking and confirmation
State-of-the-art agentic fault localization ranks candidates by explicit reasoning, not surface similarity, and confirms high-risk faults with an independent pass:
- For each surviving candidate location, write a one-paragraph bug-specific explanation: precisely why this exact symbol/line could produce the observed symptom under the triggering conditions. "This file looks related" is not a ranking — a candidate with no causal explanation is ranked last or dropped.
- Rank candidates by causal explanation strength plus direct code evidence (stack-trace/test-spectrum agreement, data-flow reachability, recent diffs), localizing hierarchically: file -> element (function/class/handler/config) -> exact line/condition.
- Do not propose any patch until the fault is justified at the line/condition level, not merely the file level.
- When the fault is high-risk (security, isolation, IPC, auth, data integrity, concurrency) or the top two candidates are close, run a second, independent localization pass that does not read the first pass's conclusion, then reconcile. Agreement across independent passes raises confidence; disagreement means keep localizing rather than guessing.
Phase 2 Gate
Proceed only when:
- at least two plausible hypotheses were considered or the stack trace/repro uniquely identifies the fault
- the selected root cause has direct code evidence
- every referenced symbol/path was opened and verified
- the triggering condition is known
- each retained candidate has a written bug-specific explanation, and the chosen root cause is localized to the line/condition level
- alternative explanations are ruled out or explicitly documented as residual risk
If two or more hypotheses remain equally plausible, stop and ask for the smallest additional evidence needed.
Phase 3: Fix Plan and Independent Critic Gate
Goal: produce a no-gap plan, independently review it, revise it, and ask the user for approval before implementation.
Use references/critic-gate.md.
- Generate 3-5 fix candidates when realistic. For trivial single-line defects, include at least the chosen fix and one rejected alternative.
- Rank candidates by:
- correctness against root cause
- minimality
- regression risk
- public API compatibility
- architectural fit
- testability
- rollback simplicity
- Perform impact analysis:
- callers/importers of changed symbols
- affected tests and fixtures
- config and docs surfaces
- UI/API/CLI contracts
- persistence/migration implications
- concurrency, async, idempotency, and retry behavior
- security and privacy implications
- Write
05-fix-plan.mdwith:- issue summary
- root cause
- candidates considered and ranking
- selected fix
- exact files expected to change
- functions/classes expected to change
- edge cases
- test plan
- rollout/risk/rollback
- explicit "unwired functionality" checklist
- Send the plan to an independent critic:
- If running in the main session and the
Agenttool is available, launch a separate critic subagent withreferences/critic-gate.mdand the trace artifacts as context. - If running as a subagent through
.claude/agents/issue-tracer.md, do not attempt nested subagent invocation. Claude Code subagents cannot spawn other subagents. Run the full fallback self-critic pass fromreferences/critic-gate.mdand disclose this to the user. - If no independent subagent is available in the current environment, run the fallback adversarial critic pass and clearly label it "Fallback self-critic: independent critic unavailable."
- If running in the main session and the
- The critic must return one of:
APPROVENEEDS_REVISIONBLOCKED
- Revise
05-fix-plan.mduntil all critic blockers are resolved or explicitly escalated. - Copy the final reviewed plan to
07-approved-plan.mdwith an unchecked approval line. - Present the final reviewed plan to the user and stop. Ask for explicit approval to implement.
High-risk: candidate-patch sampling
For high-risk or close-call fixes, do not commit to a single patch shape prematurely. Draft 2-3 concrete candidate patches for the selected root cause, and choose between them by which one makes the reproduction test pass while keeping the regression suite green and the diff minimal. On a genuine tie, prefer the smallest, most contract-preserving patch and record why the alternatives were rejected. This mirrors validate-then-select repair: a patch is chosen on evidence (tests + minimality), not on first-draft intuition.
Phase 3 Gate
Do not write production code until:
05-fix-plan.mdexists06-critic-review.mdexists- all critic blockers are resolved or disclosed
07-approved-plan.mdexists- the user explicitly approves implementation
Phase 4: Implementation After Approval
Goal: implement the smallest complete patch that matches the approved plan.
Begin only after explicit user approval.
- Re-check
git status --short. - Create or confirm an isolated branch unless the user asked otherwise:
git switch -c fix/<issue-id>-<short-slug>or equivalent
- Write or update the failing regression test first.
- Run the regression test and confirm it fails for the expected reason.
- Apply the minimal fix.
- Re-read every changed file.
- Run the regression test and confirm it passes.
- Run impacted tests based on the dependency graph and changed files.
- Run project quality checks discovered in Phase 1:
- test suite or impacted suite
- lint
- typecheck
- formatting check
- build
- security/static checks if the repo already has them
- Record commands and results in
08-test-results.md. - If any test fails unexpectedly, treat it as signal. Re-enter localization for that failure before changing code again.
Phase 4 Gate
Proceed only when:
- implementation matches the approved plan or deviations are documented and approved
- regression protection exists
- impacted tests pass, with the exact commands and captured output recorded in
08-test-results.md(no asserted-but-unshown results) - required quality checks pass or failures are documented as unrelated with clean-
origin/mainevidence - a written correctness justification explains why the patch fixes the root cause and does not merely satisfy the test (plausible != correct)
- no TODO, stub, placeholder, dead branch, or unwired path was introduced
Phase 4.5: Independent Implementation Review
Goal: have a fresh, independent context try to refute the implemented patch before it is presented as done. The context that wrote the patch must not be the only context that approves it.
This is distinct from the Phase 3 plan critic: Phase 3 challenges the plan; Phase 4.5 challenges the actual diff and its validation evidence.
- Run the review in an independent context:
- If subagent delegation is available, launch a separate reviewer subagent with
references/critic-gate.md(Implementation Review section), given only the diff,08-test-results.md, and the trace artifacts — not your own reasoning narrative. - If running as a subagent that cannot spawn subagents, or no independent context is available, run the fallback adversarial self-review in a clean pass and label it "Fallback self-review: independent reviewer unavailable."
- If subagent delegation is available, launch a separate reviewer subagent with
- The reviewer's mandate is adversarial: find a concrete input, environment, caller, or sequence for which the patch is wrong, incomplete, overfits the regression test, leaves a runtime path unwired, or regresses a contract. It must verify claims against the actual code and command output, not trust the summary.
- The reviewer returns
APPROVE,NEEDS_REVISION, orBLOCKEDand writes08b-implementation-review.md. - Resolve every
NEEDS_REVISION/BLOCKEDitem by changing code or evidence, then re-review. Do not downgrade a blocker by rewording it. - If subagent delegation is available and the user/session has authorized issue-tracer or swarm work, independent implementation review is mandatory for any code, test, docs, package metadata, release note, or skill-file edit. Fallback self-review is allowed only when no independent context is available, and that limitation must be disclosed in the artifact and final response.
- Any edit after reviewer approval invalidates that approval. Re-run the implementation review on the latest diff and evidence before closure.
Phase 4.5 Gate
Proceed only when:
08b-implementation-review.mdexists with a verdict- the review ran on the real diff and the captured validation evidence
- every blocker is resolved with a code or evidence change, or explicitly escalated to the user
- if the independent reviewer was unavailable, that limitation is disclosed in the artifact and to the user
- the latest edit happened before the latest reviewer approval
Phase 4.6: Final Critic Gate
Goal: have a context distinct from the implementation reviewer challenge the entire completion claim after implementation review approval.
This gate catches drift between code, tests, docs, release notes, package metadata, and the trace evidence.
- Run the critic after Phase 4.5 approval:
- If subagent delegation is available, launch a separate critic with
references/critic-gate.md(Final Critic section), given the current diff,08-test-results.md,08b-implementation-review.md, and the trace artifacts. - If no independent critic is available, run the fallback adversarial critic pass and label it "Fallback final critic: independent critic unavailable."
- If subagent delegation is available, launch a separate critic with
- The critic returns
APPROVE,NEEDS_REVISION, orBLOCKEDand writes09-final-critic.md. - Resolve every
NEEDS_REVISION/BLOCKEDitem by changing code, docs, tests, or evidence, then re-run implementation review when the fix changes the diff and re-run the final critic. - Any edit after final critic approval invalidates that approval. Re-run the critic on the latest diff and evidence.
Phase 4.6 Gate
Proceed only when:
09-final-critic.mdexists with verdictAPPROVE- the critic reviewed the latest diff after implementation reviewer approval
- every reviewer/critic blocker is resolved and re-reviewed
- the latest edit happened before the latest reviewer and critic approvals
Phase 5: Closure and PR-Ready Output
Goal: leave the issue ready for human review or PR creation.
- Inspect the final diff:
git diff --statgit diffgit diff --check
- Verify no unrelated files changed.
- Write
10-pr-body.mdas a draft usingassets/pr-template.md. - Prepare a conventional commit message:
fix(<scope>): <short issue-specific description>
- Publication is governed by the single source of truth,
../commit-pr/SKILL.md. When the user asks you to commit, push, or open/update a PR — and only after confirming there are no unrelated changes — switch to that skill and follow it for the PR title, the PR body contract (Closes #<issue>,## Summary,## Invariant audit,## Test plan), the release fragment, the invariant audit, the issue comment, and CI closeout.assets/pr-template.mdis a drafting aid; the published PR body must satisfy thecommit-prcontract (thepr-standardscheck enforces it). Do not invent a parallel PR format. - Final response must include:
- root cause with file/line references
- exact change summary
- tests and checks run with results
- regression coverage
- unresolved risks, if any
- PR body or PR link if created
Test Validation and Drift Review
This section applies to every phase.
Whenever any of the following change, actively review tests for drift:
- command selection logic or framework detection
- fixture expectations or test helper behavior
- workflow assertions or pipeline step ordering
- scanner coverage behavior or tool registration
- prompt content that affects agent behavior
- documentation or comments claiming system behavior
Requirements:
- Touched tests must be verified against current and intended behavior.
- Stale tests must be realigned to verified behavior, not left as drift.
- Prefer behavior-level validation over brittle string-only expectations.
- New behavior needs positive and negative cases.
- Boundary-critical or security-sensitive behavior needs adversarial cases.
- The release verification sweep must include a focused test-drift regression check.
- Do not accept work where tests pass by coincidence rather than correctness.
No-Gap Closure Checklist
Before declaring the issue ready:
- The reported symptom is reproduced or non-reproducibility is proven.
- The root cause is localized to exact code and triggering conditions.
- The fix addresses the root cause, not only the visible symptom.
- Every changed path is wired into the actual runtime path.
- Public API, CLI, UI, persistence, config, and docs surfaces are checked where relevant.
- Edge cases are tested or explicitly ruled out.
- Regression test fails before the fix and passes after the fix when feasible.
- Impacted tests, lint/type/build checks are run, with commands and captured output recorded.
- Independent critic review of the plan completed before user approval.
- User approval obtained before implementation.
- Independent implementation review (Phase 4.5) completed on the real diff and evidence; blockers resolved.
- Final critic review (Phase 4.6) approved the latest diff and evidence after implementation review.
- No edit occurred after the latest reviewer and critic approvals.
- A written correctness justification distinguishes "tests green" from "root cause fixed."
- Every "passed"/"validated" claim cites the exact command and its captured output.
- Publication (commit/push/PR) followed
../commit-pr/SKILL.md(the single source of truth). - PR-ready summary is complete.
Escalation Triggers
Stop and ask the user or present options when:
- reproduction requires unavailable credentials, secrets, data, hardware, or external services
- the issue is actually a feature request or product decision
- a fix requires breaking public API compatibility
- a data migration or destructive operation is required
- root cause spans multiple subsystems and the approved scope is too narrow
- the critic returns
BLOCKED - confidence remains below 90% after reasonable investigation
Method Provenance (state of the art)
The quality methods in this skill are grounded in current agentic-repair and agent-reliability research, adapted to a plan-first, evidence-first workflow:
- Hierarchical file -> function -> line localization, multi-sample candidate patches, and validate-then-select repair: Agentless (Xia et al. 2024, https://arxiv.org/abs/2407.01489).
- Reasoning-guided, explanation-ranked fault localization (causal explanation per candidate, not surface similarity): RGFL (https://arxiv.org/pdf/2601.18044); structure/spectrum-aware search: AutoCodeRover (https://arxiv.org/abs/2404.05427).
- "Tests passing is plausible, not correct" / patch overfitting: patch-correctness survey (https://dl.acm.org/doi/10.1145/3702972).
- Self-consistency across independent passes: Wang et al. 2022 (https://arxiv.org/abs/2203.11171).
- Fresh independent context refutes the result (the doer is not the grader) and evidence-grounded reporting (show the command and its output, do not assert success): Anthropic, "Effective harnesses for long-running agents" (https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).
- Plan -> implement -> review separation as explicit quality gates: Anthropic, "Building Effective Agents" (https://www.anthropic.com/research/building-effective-agents).
- Escalate when the issue lacks reproducible steps or acceptance criteria (issue clarity predicts resolution success): GitHub coding-agent best practices (https://docs.github.com/copilot/how-tos/agents/copilot-coding-agent/best-practices-for-using-copilot-to-work-on-tasks).