name: verify-pr-fix description: >- Manually verify that a bug-fix PR actually works by reproducing the failure before the fix and confirming it is gone after, with captured evidence, then posting findings to the PR (and optionally the linked issue). Use when asked to manually verify a PR/fix, reproduce an issue and its fix, confirm a fix works end to end, or take screenshots proving a change. argument-hint: '[PR URL or number]'
Verify PR Fix
Prove a bug-fix PR works by reproducing the failure first, then showing the fix removes it, with evidence a reader can check. A fix that "passes" means nothing unless you first showed the bug.
This is behavioral verification, distinct from the local lint/test loop in .agents/skills/verify/SKILL.md
($verify) and from review skills ($adversarial-pr-review, $post-merge-audit). Use this when the
question is "does the fix actually fix the reported problem?", not "does it lint and pass CI?".
Memorable invocation: $verify-pr-fix <PR> or "manually verify this fix and reproduce the issue".
Core principles
- Show the bug before the fix. Always capture a failing "before" and a passing "after" of the same reproduction. Skipping the "before" is the most common way a verification lies.
- Faithful over convenient. Reproduce through the same code path / API the product uses. If you must
build a harness, build it on the real mechanism (the actual module, the real
cluster/HTTP/render path), not a paraphrase of it. - Evidence before assertions. Never claim "verified" without captured output, a state check
(
ps/pgrep, HTTP status, DOM, exit code), or a screenshot. Paste the real output, including PIDs, codes, and timings. Never fabricate or assume output (seeAGENTS.md). - State what you did NOT exercise. If the reproduction is mechanism-level rather than the full app, say so plainly and name the residual-risk path. Honesty about scope is part of the deliverable.
- Reproduce the actual condition. Many bugs are intermittent or race-dependent. Recreate the triggering condition deterministically (the right input, a settle delay, the blocking state, the concurrency) before concluding "no repro". A clean run can mean you never triggered the bug, not that it is absent.
- Leave the machine clean. Kill every process you spawned and remove scratch files. Verify with
pgrep/psthat nothing leaked.
Instructions
- Read the PR and the linked issue.
gh pr view <n> --json title,body,files,commits,url,stateandgh issue view <linked> --json title,body,url. Extract: the claimed bug, the expected vs actual behavior, the reproduction the reporter described, and the validation the author already ran (look for a "Manual / Residual Risk" or "Validation" section — verify what they left UNKNOWN). - Locate the changed surface. Read the diff (
gh pr diff <n>or the files in the worktree). Identify the exact behavioral change and the smallest observable signal that distinguishes broken from fixed (an orphaned process, an HTTP 500 vs 200, a hydration mismatch, a cache key collision, an exit code). - Choose the cheapest faithful reproduction, in this order:
- Full app run when feasible — highest fidelity. This often means the repo's integration test
app(s) plus the end-to-end/browser test command (see
AGENTS.md→ Agent Workflow Configuration) for browser-visible behavior, plus any repo-specific e2e/manual-testing docs. - Minimal faithful harness when the full app is too heavy to stand up quickly (needs a license,
real bundles, a renderer, external services). Build it on the same real API the product uses and
label it mechanism-level. For example, drive the same underlying runtime/process API the product
uses (such as Node's real
clustermodule a renderer uses) rather than booting the whole subsystem. - For renderer/process-level changes, follow any repo-specific validation docs (see
AGENTS.md).
- Full app run when feasible — highest fidelity. This often means the repo's integration test
app(s) plus the end-to-end/browser test command (see
- Reproduce the bug (the "before"). Run the reproduction against pre-fix behavior. Get pre-fix code by
the least invasive means: check out the parent commit in a scratch worktree,
git stashan uncommitted change, check out one file at its pre-fix revision (git checkout <fix-commit>~1 -- <file>), or (for a harness) model the pre-fix path explicitly. Capture the failure. If it does not fail, you have not reproduced it — recreate the triggering condition (input, timing, blocking, concurrency) and retry before concluding anything. - Verify the fix (the "after"). Restore the post-fix code first (
git stash pop,git checkout HEAD -- <file>, or leave the worktree), then run the identical reproduction. Capture the now-passing result and confirm the specific signal flipped (orphans 6/6 -> 0/6, exit code, status, DOM). Confirmgit statusis clean so the "after" really ran against post-fix code. - Capture evidence. Save real terminal output. For UI/browser changes take screenshots via Playwright MCP. For a shareable visual of terminal results you may render the captured output with the visualize tool, but the render must reproduce real output verbatim — never stage numbers.
- Clean up. Kill spawned processes, remove scratch dirs/worktrees, and confirm nothing leaked
(
pgrep -fl <marker>should report none). - Report to the PR. Post a comment with the structured format below. Before posting to GitHub (an
outward-facing action), confirm with the user unless they already told you to post. Write the body to a
temp file and use
gh pr comment <n> --body-file(avoids inline-formatting issues; see global Git workflow rules). - Cross-link the issue (optional). If asked, comment on the linked issue with a 2-3 sentence summary
and a link to the PR comment URL returned by step 8:
gh issue comment <n> --body-file.
Reproduction tactics by change type
- Process / renderer lifecycle (signals, workers, teardown, ports): drive the real
cluster/child process API; assert withps/pgrep/kill -0on captured PIDs and on the master's exit code. Emulate the real supervisor (e.g. Foreman: signal only the master PID, SIGKILL after its ~5s window). - Rendering / hydration / framework output (render output, FOUC, streaming, cache keys): boot the relevant integration test app, hit the affected route, compare server-rendered output vs hydrated DOM, watch the renderer log, diff cache keys. Browser-visible? Screenshot before/after with Playwright MCP.
- Generators / installers / scaffolding: run the generator into a temp app and diff the produced files against expectation; for behavioral output, boot the generated app.
- Caching / dedupe / digests: construct the colliding or repeated inputs and assert hit/miss and that failed renders are not cached.
- Types-only changes: usually covered by the repo's type-check command (see
AGENTS.md→ Agent Workflow Configuration); behavioral reproduction is normally not warranted — say so rather than staging a fake one.
Environment notes
- Pre-fix-on-one-file tactic (cheap before/after for an already-merged PR): when the fix is in a single
file and the PR added regression specs, run the post-fix spec against the pre-fix file —
git checkout <fix-commit>~1 -- <file>, run the spec (it should fail on exactly the new tests), then always restore withgit checkout HEAD -- <file>and confirmgit statusis clean before moving on. - Read the changed function from git, not from memory: when a harness needs the exact pre/post logic
(a regex, a digest, a key format), extract it verbatim with
git show <rev>:<file>so the reproduction can't drift from the real code.
When NOT to use this skill
- Docs, comments, changelog, CI/workflow plumbing, benchmark tooling, refactors with no behavioral change,
license-header enforcement, and pure type narrowing rarely have a user-observable runtime symptom to
reproduce. Route those to
$verify(local checks) or a review skill instead, and say why a behavioral repro adds nothing.
Output format (PR comment)
## Manual verification: reproduced the bug and confirmed the fix ✅ / ❌
<one line on what was verified and how (full app vs mechanism-level harness)>
### Reproduction
<how the bug was triggered; the key condition that makes it deterministic>
### Results
| Scenario | Behavior | Result |
| --------------------- | ---------- | --------------- |
| Pre-fix, <condition> | <observed> | <BROKEN signal> |
| Post-fix, <condition> | <observed> | <FIXED signal> |
```text
<real captured output, before and after>
```
### Caveat
<what was NOT exercised; the residual-risk path that remains UNKNOWN>
Keep the comment evidence-first and honest about scope. End behavioral claims with the captured proof, not adjectives.