name: code-review description: This skill should be used when the user says "review my code", "check these changes", or wants feedback on code before creating a PR. Also used after completing a task during iterative implementation.
Code Review
Reviews code changes using dynamically selected reviewer personas. Spawns parallel sub-agents that return structured JSON, then merges and deduplicates findings into a single report.
When to Use
- After completing a plan section (during
iterative:implementingskill) - Before finishing work and creating a PR
- When feedback is needed on any code changes
- Can be invoked standalone
Severity Scale
All reviewers use P0–P3:
| Level | Meaning | Action |
|---|---|---|
| P0 | Critical breakage, exploitable vulnerability, data loss/corruption | Must fix before merge |
| P1 | High-impact defect likely hit in normal usage, breaking contract | Should fix |
| P2 | Moderate issue with meaningful downside (edge case, perf regression, maintainability trap) | Fix if straightforward |
| P3 | Low-impact, narrow scope, minor improvement | User's discretion |
Reviewers
8 personas in two tiers. See references/persona-catalog.md for the full catalog.
Always-on (every review):
| Agent | Focus |
|---|---|
correctness-reviewer |
Logic errors, edge cases, state bugs, error propagation |
testing-reviewer |
Coverage gaps, weak assertions, brittle tests |
maintainability-reviewer |
Coupling, complexity, naming, dead code, abstraction debt |
Conditional (selected per diff):
| Agent | Select when diff touches... |
|---|---|
security-reviewer |
Auth, public endpoints, user input, permissions |
performance-reviewer |
DB queries, data transforms, caching, async |
api-contract-reviewer |
Routes, serializers, type signatures, versioning |
data-migrations-reviewer |
Migrations, schema changes, backfills |
reliability-reviewer |
Error handling, retries, timeouts, background jobs |
Review Scope
By default, every review spawns all 3 always-on reviewers plus any applicable conditionals — the tier model naturally right-sizes. A small config change triggers 0 conditionals = 3 reviewers. A large auth feature triggers security + maybe reliability = 5 reviewers. No separate "mode" is needed.
How to Run
Stage 1: Determine scope
Compute the diff range, file list, and diff in a single Bash call. This minimizes permission prompts. Do not run extra commands.
Chain everything into one command using && and labeled output markers (BASE:, FILES:, DIFF:) so you can parse each section:
- From implementing (section-level): The caller provides a baseline SHA. Use it directly.
- From implementing (final/branch-level): The caller provides the base branch. Compute merge-base inline.
- Standalone: Detect base branch and compute merge-base inline.
- Explicit files: If the caller specifies files, skip merge-base and use those directly.
Standalone example (single Bash call):
BASE=$(git merge-base HEAD $(git rev-parse --verify origin/main 2>/dev/null && echo origin/main || echo origin/master)) && echo "BASE:$BASE" && echo "FILES:" && git diff --name-only ${BASE}..HEAD -- . ':!*.md' && echo "DIFF:" && git diff -U10 ${BASE}..HEAD -- . ':!*.md'
Parse: BASE: = merge-base SHA, FILES: = file list, DIFF: = diff. If no commits on the branch, fall back to unstaged changes (git diff -U10 -- . ':!*.md').
Stage 2: Intent discovery
Understand what the change is trying to accomplish. Run a single bash call:
echo "BRANCH:" && git rev-parse --abbrev-ref HEAD && echo "COMMITS:" && git log --oneline ${BASE}..HEAD
Combined with conversation context (plan section summary, caller-provided description), write a 2-3 line intent summary:
Intent: Simplify tax calculation by replacing the multi-tier rate lookup
with a flat-rate computation. Must not regress edge cases in tax-exempt handling.
Pass this to every reviewer in their spawn prompt. Intent shapes how hard each reviewer looks, not which reviewers are selected.
When intent is ambiguous: Ask one question: "What is the primary goal of these changes?" Do not spawn reviewers until intent is established.
Stage 3: Select reviewers
Read the diff and file list from Stage 1. The 3 always-on reviewers are automatic. For each conditional persona in the catalog (references/persona-catalog.md), decide whether the diff warrants it. This is agent judgment, not keyword matching.
Announce the team before spawning:
Review team:
- correctness (always)
- testing (always)
- maintainability (always)
- security — new endpoint in routes.rb accepts user-provided redirect URL
- data-migrations — adds migration 20260303_add_index_to_orders
This is progress reporting, not a blocking confirmation.
Stage 4: Spawn sub-agents
Spawn each selected reviewer as a parallel sub-agent using the template in references/subagent-template.md. Each sub-agent receives:
- Their persona file content (identity, failure modes, calibration, suppress conditions)
- Shared diff-scope rules from
references/diff-scope.md - The JSON output contract from
references/findings-schema.json - Review context: intent summary, file list, diff
Sub-agents are read-only: they review and return structured JSON. They do not edit files, run commands, or propose refactors.
Each sub-agent returns JSON matching references/findings-schema.json:
{
"reviewer": "security",
"findings": [...],
"residual_risks": [...],
"testing_gaps": [...]
}
Stage 5: Merge findings
Convert multiple reviewer JSON payloads into one deduplicated, confidence-gated finding set.
- Validate. Check each output against the schema. Drop malformed findings (missing required fields). Record the drop count.
- Confidence gate. Suppress findings below 0.50 confidence. Record the suppressed count.
- Deduplicate. Compute fingerprint:
normalize(file) + line_bucket(line, ±3) + normalize(title). When fingerprints match, merge: keep highest severity, keep highest confidence with strongest evidence, union evidence, note which reviewers flagged it. - Separate pre-existing. Pull out findings with
pre_existing: trueinto a separate list. - Sort. Order by severity (P0 first) → confidence (descending) → file path → line number.
- Collect coverage data. Union residual_risks and testing_gaps across reviewers.
Stage 6: Synthesize and present
Assemble the final report using the template in references/review-output-template.md:
- Header. Scope, intent, reviewer team with per-conditional justifications.
- Findings. Grouped by severity (P0, P1, P2, P3). Each finding shows file, issue, reviewer(s), confidence.
- Pre-existing. Separate section, does not count toward verdict.
- Coverage. Suppressed count, residual risks, testing gaps, failed/timed-out reviewers.
- Verdict. Ready to merge / Ready with fixes / Not ready. Fix order if applicable.
Do not include time estimates. When invoked from iterative:implementing: omit the **Fix order:** line — implementing handles prioritization through its own severity acceptance flow.
Language-Agnostic
This skill does NOT use language-specific reviewer agents. Reviewers adapt their criteria to the language/framework based on project context (loaded automatically). This keeps the skill simple and avoids maintaining parallel reviewers per language.
After Review
When invoked from iterative:implementing: return findings directly — implementing owns its own fix loop. Do not enter the standalone fix loop below.
When invoked standalone or from implementation-wrapup: run the standalone fix loop.
Standalone Fix Loop
After presenting findings and verdict (Stage 6), handle the full fix-review cycle.
Step 5: Severity Acceptance
This is its own prompt — do not combine it with next-step options. Present severity acceptance whenever the review has findings at ANY severity. Do not interpret "no P0/P1" as "clean" — clean means zero findings. If zero findings, skip to Step 8. Use the platform's interactive question tool — AskUserQuestion (Claude Code) or request_user_input (Codex) — for all severity acceptance prompts. Both platforms provide an automatic "Other" free-form option — do not add one manually.
Present a single prompt listing all severity levels with findings. No intermediate "choose which..." step.
Claude Code — use AskUserQuestion with multiSelect: true:
When P0 or P1 issues exist, pre-check the P0+P1 option:
- ☑ P0 + P1 (Recommended) — N Critical, N High
- ☐ P2 — Moderate (N issues)
- ☐ P3 — Low (N issues)
When only P2/P3 issues exist, nothing pre-checked:
- ☐ P2 — Moderate (N issues)
- ☐ P3 — Low (N issues)
Only include severity levels that have findings.
Codex — use request_user_input (single-select, build combined options):
When P0 or P1 issues exist:
- Fix P0 + P1 (Recommended) — N issues
- Fix P0 + P1 + P2 — N issues
- Fix all — N issues
- Skip fixes
When only P2/P3 issues exist:
- Fix P2 only — N issues
- Fix P2 + P3 — N issues
- Skip fixes
Only include options where findings exist at those levels. Omit options that would duplicate another (e.g., if no P3, omit "Fix all" since it equals the line above).
Step 6: Apply Fixes via Subagent
Fix only the selected severities. Spawn one or more subagents with the filtered findings, affected file paths, and diff range from Stage 1. Each subagent applies fixes, runs tests, and commits.
Wait for all fixes to complete before proceeding.
Step 7: Re-review Offer
After fixes land, present an interactive choice:
- Run another review round (Recommended) — verify fixes and check for new issues
- Proceed without re-review
If another round: run the full Stage 1–Step 7 flow again (fresh sub-agents, fresh scope).
Step 8: Post-fix Options
After the fix-review cycle completes (clean verdict or user chose to stop), present next steps via the platform's interactive question tool.
On a feature branch:
- Create a PR (Recommended) — push and open a pull request
- Continue without PR — stay on the branch
- Exit — done for now
On main/master:
- Continue — proceed with next steps
- Exit — done for now
If "Create a PR": push the branch and use gh pr create with a title and summary derived from the branch changes.
Fallback
If the platform doesn't support parallel sub-agents, run reviewers sequentially. Everything else (stages, output format, merge pipeline) stays the same.