name: review version: 1.0.0 description: Review before merge. Stage-1 spec-compliance gate, then 11 Stage-2 canonical axes (analyst, architect, qa, security, devops, roadmap, reliability, observability, agent-safety, decision-rigor, code-quality) plus 3 chained skills (code-qualities-assessment, golden-principles, taste-lints). Run after /test. Run for a full pre-merge review. Do NOT invoke code-qualities-assessment, golden-principles, or taste-lints directly for a full review; review chains them. argument-hint: branch-or-pr-number allowed-tools: Task, Skill, Read, Glob, Grep, Bash(*) user-invocable: true license: MIT
Review
Review: $ARGUMENTS
If no argument, review the current branch diff against the base branch. Detect the base branch from gh pr view --json baseRefName or fall back to main.
Triggers
| Trigger Phrase | Operation |
|---|---|
/review |
Run the Stage-1 spec-compliance gate, then the Stage-2 review against the current branch diff |
/review BRANCH_OR_PR |
Run the Stage-1 gate, then the Stage-2 review against the named branch or PR |
review before merge |
Same as /review |
Convergence contract (REQ-008-04)
/review evaluates every canonical axis the project ships, plus three local-only skill axes that CI cannot afford. The canonical axis prompts are authored at references/{role}.md co-located with this skill, with the canonical path expressed as .claude/skills/review/references/{role}.md in the source repo (the single source of truth). /review auto-discovers the axis set from references/*.md rather than hardcoding a list, so adding a references/{role}.md file enrolls the axis with no edit to this skill body. When CI exists in a project, the project syncs the canonical axes into its own CI prompts via the project's generator and drift checks. The build pipeline copies the entire skill directory (including references/ and scripts/) into vendored plugin installs so the command runs without a CI dependency in any harness that supports plugins.
The canonical set is spec-compliance as the Stage-1 gate plus 11 Stage-2 canonical axes (analyst, architect, qa, security, devops, roadmap, reliability, observability, agent-safety, decision-rigor, code-quality). spec-compliance runs first and gates Stage 2: only a CRITICAL_FAIL short-circuits the review (see Process step 2). A Stage-1 UNKNOWN (INCONCLUSIVE) does NOT short-circuit; Stage 2 still runs and the UNKNOWN is preserved as an UNKNOWN in the merge, because no spec or acceptance criteria could be located and that absence must never suppress a real Stage-2 finding. The caller decides how the merged verdict gates its workflow.
/review is a strict superset of CI: any finding CI surfaces, /review will surface first locally. CI may wire a subset of the canonical axes (its per-axis job list can lag the references/ directory); /review always runs the full discovered set, so it never surfaces fewer axes than CI. The 3 chained skill extras (code-qualities-assessment, golden-principles, taste-lints) cannot run in CI (they require code execution + repo state) and so layer on top.
Path resolution (harness-agnostic)
This skill runs in two layouts: the source Claude Code project (where .claude/ is the repo root) and a vendored plugin install (Copilot CLI and similar harnesses) where the consumer repo has no .claude/ directory and the plugin lives outside the consumer's tree.
- Canonical axis prompts (
{role}is the stem of eachreferences/*.mdfile:spec-compliance|analyst|architect|qa|security|devops|roadmap|reliability|observability|agent-safety|decision-rigor|code-quality, discovered, not hardcoded): resolve thereferences/directory via the first candidate that exists, then glob*.mdinside it for the axis set:${CLAUDE_SKILL_DIR}/references/(ifCLAUDE_SKILL_DIRis set by the harness).claude/skills/review/references/(Claude Code project layout)skills/review/references/resolved relative to plugin install root (vendored install)
- Verdict library (
merge_verdicts,extract_verdict,get_verdict_emoji,FAIL_VERDICTS): try each candidate in order, use the first that exists:.claude/lib/ai_review_common/verdict.py(Claude Code project layout)lib/ai_review_common/verdict.pyresolved relative to the plugin install root (vendored install)
- Complexity tiers reference (
engineering-complexity-tiers.md): try each candidate in order, use the first that exists:.claude/skills/analyze/references/engineering-complexity-tiers.md(Claude Code project layout)skills/analyze/references/engineering-complexity-tiers.mdresolved relative to plugin install root (vendored install)
- Chained-skill scripts (local axes 2 and 3:
golden-principles/scripts/scan_principles.py,taste-lints/scripts/taste_lints.py): these are sibling skills, not under this skill'sreferences/, soCLAUDE_SKILL_DIRdoes not locate them. For each, try each candidate in order, use the first that exists:.claude/skills/{skill}/scripts/{script}(Claude Code project layout)skills/{skill}/scripts/{script}resolved relative to plugin install root (vendored install)
The skill body MUST NOT hard-fail when the .claude/ path is missing; it MUST attempt the vendored-install path for the verdict library and the chained-skill scripts before reporting an error. If neither candidate for a chained-skill script exists, mark that axis UNKNOWN (per UNKNOWN handling), do not abort the review.
Scripts
| Script | Purpose | Exit codes |
|---|---|---|
scripts/validate_review_marker.py |
Validates the SHA-bound Reviewed-By: /review@... marker that /ship requires. |
0 valid marker, 1 missing or stale marker, 2 config error |
Process
Run axes sequentially. Each axis emits a verdict token (PASS, WARN, CRITICAL_FAIL, or UNKNOWN) plus structured findings (severity, category, location, recommendation). The final merged verdict comes from merge_verdicts (resolve via the "Path resolution" section above).
Read the diff with three-dot range syntax (
git diff "origin/$BASE_BRANCH"...HEAD, whereBASE_BRANCHis the detected base branch and the remote-tracking ref is used because the local base branch may not exist in fresh clones) so every evaluation step uses the same change set as the diff-scoped gates in step 5.- Build one
CONTEXT_MODEvalue before invoking any canonical axis: usefullonly when the complete diff plus any linked REQ/DESIGN/TASK docs or PR-body acceptance criteria are present; usesummarywhen only file names or diff stats are present; usepartialwhen only a bounded slice is present. If context completeness is unknown, usepartial. - Every Task input for a canonical axis MUST begin with
CONTEXT_MODE: $CONTEXT_MODE, followed by a blank line and then the diff and supporting context. This is the local/reviewequivalent of the CI header. A missing or unrecognized value remains notfull, so the axis prompts cannot emitPASSon absent evidence.
- Build one
Run the Stage-1 spec-compliance gate before the complexity classifier and all Stage-2 axes. Load the canonical
spec-complianceaxis prompt via the "Path resolution" section above and invokeTask(subagent_type="general-purpose")with that prompt as the system instruction and the CONTEXT_MODE-prefixed diff plus any linked REQ/DESIGN/TASK docs (or PR-body acceptance criteria) as input. Extract its verdict withextract_verdict.- CRITICAL_FAIL only: short-circuit. Do NOT run the complexity classifier, the 11 Stage-2 canonical axes, or the 3 chained skills. Mark all of them
SKIPPEDin the output table, set the FINAL VERDICT to the Stage-1CRITICAL_FAIL, and emit only the Stage-1 findings. The author fixes the unmet criterion, then re-runs/review. UNKNOWN is NOT a short-circuit (see next bullet): a Stage-1 UNKNOWN means no spec or acceptance criteria could be located, so it must never suppress a real Stage-2 finding (e.g. a security or qaCRITICAL_FAIL). - PASS, WARN, or UNKNOWN (INCONCLUSIVE): record the Stage-1 verdict and continue to step 3. A WARN or UNKNOWN here does not block Stage 2; it is merged alongside the other axes (per UNKNOWN handling, a Stage-1 UNKNOWN never overrides a real Stage-2 finding, and on an otherwise-PASS run it surfaces the one-line reason no spec or acceptance criteria could be located). The full FINAL VERDICT comes from
merge_verdictsin step 7.
- CRITICAL_FAIL only: short-circuit. Do NOT run the complexity classifier, the 11 Stage-2 canonical axes, or the 3 chained skills. Mark all of them
Classify complexity tier: Task(subagent_type="analyst"): Read
engineering-complexity-tiers.md(resolved via the "Path resolution" section above) and the diff. Assess as Tier 1-5. Use this to calibrate axis depth.Run every Stage-2 canonical axis, discovered from
references/*.mdafter excludingspec-compliance(already run in step 2). Resolve the directory via the "Path resolution" section, then glob*.mdand sort by stem for a stable order. Do not hardcode the axis list; the directory is the source of truth. For each axis, load the canonical prompt for{role}(the file stem), then invokeTask(subagent_type="{role}")with that prompt as the system instruction, the same CONTEXT_MODE-prefixed diff as input, and the structured Output Schema from the canonical file as the response contract. If the harness does not register a{role}subagent type in itsTaskenum (Copilot CLI today, and any axis with no matching agent such asreliability,observability,agent-safety,decision-rigor), fall back toTask(subagent_type="general-purpose")with the canonical axis prompt as the system instruction; the prompt drives the review, not the subagent identity. The current Stage-2 canonical set (11 axes) is:- axis 1:
analyst - axis 2:
architect - axis 3:
qa - axis 4:
security - axis 5:
devops - axis 6:
roadmap - axis 7:
reliability(general-purpose fallback; no dedicated subagent) - axis 8:
observability(general-purpose fallback; no dedicated subagent) - axis 9:
agent-safety(general-purpose fallback; no dedicated subagent) - axis 10:
decision-rigor(general-purpose fallback; no dedicated subagent) - axis 11:
code-quality(general-purpose fallback; no dedicated subagent)
- axis 1:
Run 3 chained skill axes (local-only; CI does not run these). These run after every canonical axis. Scope the golden-principles and taste-lints axes to the PR diff by passing the base branch detected in step 1 (stored as
BASE_BRANCH) as--diff-scope, quoted, so the gates evaluate only changed files, not the whole tree:- local axis 1: Skill(skill="code-qualities-assessment")
- local axis 2: Skill(skill="golden-principles"), invoking
python3 <scan_principles.py> --diff-scope "origin/$BASE_BRANCH", where<scan_principles.py>isgolden-principles/scripts/scan_principles.pyresolved via the "Path resolution" section (chained-skill scripts). Do not assume the.claude/layout. - local axis 3: Skill(skill="taste-lints"), invoking
python3 <taste_lints.py> --diff-scope "origin/$BASE_BRANCH", where<taste_lints.py>istaste-lints/scripts/taste_lints.pyresolved via the "Path resolution" section (chained-skill scripts). Do not assume the.claude/layout.
Extract verdict per axis. Each axis output ends with a line matching
(?m)^\s*(?i:(?:Final\s+)?Verdict):\s*\[?(PASS|WARN|CRITICAL_FAIL|REJECTED|FAIL|NEEDS_REVIEW|NON_COMPLIANT|COMPLIANT|PARTIAL|UNKNOWN)(?![|A-Z_])\]?(label case-insensitive; tokens case-sensitive uppercase; trailing lookahead rejects template-form lines likeVERDICT: [PASS|WARN|CRITICAL_FAIL]and token-prefix collisions). Useextract_verdictfrom the verdict library (resolved per "Path resolution") to parse. If a skill crashes or returns no parseable verdict, mark that axisUNKNOWNand continue (do not abort).Merge verdicts via
merge_verdicts([...]), passing the Stage-1spec-complianceverdict plus one verdict per Stage-2 axis (the 11 discovered non-spec canonical axes plus the 3 chained skills; 15 total with the current set). Rules: any token inFAIL_VERDICTS(CRITICAL_FAIL/REJECTED/FAIL/NEEDS_REVIEW/NON_COMPLIANT) ->CRITICAL_FAIL; anyWARNorPARTIAL->WARN; anyUNKNOWNor unrecognized token ->UNKNOWN; allPASS/COMPLIANT->PASS; empty ->UNKNOWN. When Stage 1 returnsCRITICAL_FAIL(step 2 short-circuit), skip this merge: the FINAL VERDICT isCRITICAL_FAIL. A Stage-1UNKNOWNdoes NOT short-circuit; it is merged like any other axis (issue #2690).Emit findings table (see Output below).
Vendored install (REQ-008-06)
/review MUST work in a vendored install in any harness that supports plugins (Claude Code, Copilot CLI, and similar). The skill body and every canonical axis file MUST NOT assume a single hard-coded layout; resolve the verdict library via the "Path resolution" section. The build pipeline copies the entire skill directory (including references/ and scripts/) into plugin installs at src/copilot-cli/skills/review/, so ${CLAUDE_SKILL_DIR}/references/ resolves in both layouts without a fallback chain, and the axis set is discovered from that directory. Project-side paths (CI prompts, generator, sync infrastructure) are mentioned in this skill for project maintainers reading the prose, not as runtime dependencies.
UNKNOWN handling
- A skill that crashes or exits non-zero -> mark axis
UNKNOWN, log the failure, continue with remaining axes. - A canonical axis whose output cannot be parsed by
extract_verdict-> markUNKNOWN. - UNKNOWN does NOT override real findings:
merge_verdicts(["WARN", "UNKNOWN"])returnsWARN. UNKNOWN only matters when it would otherwise be PASS. - UNKNOWN axes are surfaced explicitly in the output table so the reviewer sees what was not evaluated.
Output
Findings table with one row per axis:
| Axis | Verdict | Emoji | Summary |
|---|---|---|---|
| spec-compliance | PASS | (from get_verdict_emoji) | (Stage-1 gate; SKIPPED rows below when it short-circuits) |
| analyst | PASS | (from get_verdict_emoji) | (one-line summary) |
| architect | WARN | ... | ... |
| qa | ... | ... | ... |
| security | ... | ... | ... |
| devops | ... | ... | ... |
| roadmap | ... | ... | ... |
| reliability | ... | ... | ... |
| observability | ... | ... | ... |
| agent-safety | ... | ... | ... |
| decision-rigor | ... | ... | ... |
| code-quality | ... | ... | ... |
| code-qualities-assessment | ... | ... | ... |
| golden-principles | ... | ... | ... |
| taste-lints | ... | ... | ... |
FINAL VERDICT: [PASS|WARN|CRITICAL_FAIL|UNKNOWN] (from merge_verdicts)
Followed by per-axis findings in detail. Each finding:
- severity: CRITICAL | IMPORTANT | SUGGESTION
- category: short keyword
- location:
file:line - recommendation: one-sentence fix
Categorize a finding as Critical if its axis verdict is CRITICAL_FAIL, Important if WARN, Suggestion otherwise.
Write the SHA-bound PASS marker (Issue #1938)
On a PASS final verdict (or a WARN where every WARN was acknowledged), write a
SHA-bound marker so /ship can prove the shipped code was reviewed at its current
state without re-running the review. Skip this on CRITICAL_FAIL, on a Stage-1
short-circuit, and on UNKNOWN: a non-PASS verdict must not leave a marker.
The marker is a git trailer (vendor-safe: it lives in the commit, travels in every
clone, needs no .agents/ access). Its contract, quoted verbatim from the reader
.claude/skills/review/scripts/validate_review_marker.py (MARKER_TRAILER_KEY = "Reviewed-By"),
is:
Reviewed-By: /review@<comma-separated-axis-list> on <reviewed-tip-sha>
A commit cannot name its own SHA in a trailer (the SHA hashes the trailer, so writing the SHA changes the SHA, with no fixed point). So record the reviewed tip, then write an empty marker commit on top of it:
REVIEWED_TIP=$(git rev-parse HEAD)(the code that was reviewed).- Build the axis list from the axes that ran (comma-separated stems, e.g.
analyst,architect,qa,security,...). git commit --allow-empty -m "review: /review PASS marker" --trailer "Reviewed-By: /review@<axis-list> on $REVIEWED_TIP"
The marker commit adds no code. SHA-binding holds: the marker is valid only while its parent (the reviewed tip) is HEAD's parent. Land any new code commit and the marker no longer sits on HEAD's parent, so the review is correctly treated as stale. Issue #1938 records the design.
Re-running /review after the verdict is still PASS writes another marker commit;
that is safe (idempotent in effect: the latest marker binds the current tip).
Principles
- Strict superset of CI. Any finding CI surfaces,
/reviewsurfaces first. - Drift fails closed. If
.claude/skills/review/references/and.github/prompts/diverge, the pre-push hook blocks the push. CI re-checks as a backstop. - UNKNOWN is information. A skill that did not evaluate is not a silent PASS.
- Vendored survival.
/reviewworks in a.claude/-only checkout. No axis or skill references.agents/or.github/.
Verification
- The
spec-complianceStage-1 axis file exists underreferences/spec-compliance.mdand runs before Stage 2 (Process step 2) - Every non-spec
references/*.mdfile is discovered and run as a Stage-2 canonical axis (11 with the current set: analyst, architect, qa, security, devops, roadmap, reliability, observability, agent-safety, decision-rigor, code-quality) - Each axis emits a parseable verdict line per the
extract_verdictregex - The verdict library resolves under one of the two documented candidate paths
-
merge_verdictsproduces a single final verdict consistent with the rules in Process step 7 - On a Stage-1
CRITICAL_FAIL, Stage 2 axes are markedSKIPPEDand the final verdict is the Stage-1CRITICAL_FAIL; on a Stage-1UNKNOWN(INCONCLUSIVE), Stage 2 STILL runs and the UNKNOWN is carried into the merge (it never suppresses a real Stage-2 finding such as a securityCRITICAL_FAIL) - When Stage 2 runs, the output table contains the spec-compliance row plus one row per discovered non-spec canonical axis plus the 3 chained skills (15 rows with the current set), plus the final verdict line
Refs
- Verdict module:
.claude/lib/ai_review_common/verdict.py(Claude layout) orlib/ai_review_common/verdict.py(vendored layout, plugin-root relative). - Canonical axes: every
.claude/skills/review/references/*.md(Claude layout) or${CLAUDE_SKILL_DIR}/references/*.mdresolved at runtime (works in both layouts);spec-complianceis the Stage-1 gate, and the non-spec files form the discovered Stage-2 axis set. - Skill chain:
.claude/skills/{code-qualities-assessment,golden-principles,taste-lints}/(the build pipeline copies these into the plugin install too).
(Spec, generator, and drift hook live outside the vendored surface and are not referenced from this skill body. Vendored installs work without them.)