name: review disable-model-invocation: true metadata: internal: true description: Review a Pi run after /pi:execute. Run the full verification suite, perform holistic evaluation, and present the final scorecard with any remaining repair guidance. argument-hint: "[optional task id or filter]" allowed-tools: > Bash(git status ) Bash(git diff ) Bash(git log ) Bash(git branch ) Bash(git rev-parse ) Bash(codex ) Bash(gemini ) Bash(cat .agents/work/) Bash(cat .agents/work/runs//) Bash(cat .agents/work/runs///) Bash(cat .agents/work/runs////) Bash(ls .agents/work/) Bash(ls .agents/work/runs/) Bash(ls .agents/work/runs//) Bash(ls .agents/work/runs///) Read Write Edit Grep Glob
Pi state snapshot
echo "PI_REVIEW_PREFLIGHT_$(date +%s%N)"
git rev-parse --show-toplevel 2>/dev/null || echo "not a git repo"
git branch --show-current 2>/dev/null
git status --short 2>/dev/null | head -30
test -f .agents/work/current.json && cat .agents/work/current.json || echo "no active run"
ls -1 .agents/work/runs 2>/dev/null || echo "no runs"
timeout 3 codex --version 2>&1 || echo "codex: not installed"
timeout 3 gemini --version 2>&1 || echo "gemini: not installed"
The block above runs at skill-load time. Use its output to confirm the brief
and tasks exist for the selected run before starting the full verification
suite, and to gate the final Codex review on CLI availability. The
coordinator must still resolve the run and re-read its state.json before
acting.
Read the Pi protocol module (internal/protocol/README.md in this plugin) and execute Phase 3: Review.
User input: $ARGUMENTS
Active state root: .agents/work/runs/<slug>/ via .agents/work/current.json
Coordinator Pipeline
You (the main thread) are the coordinator. Subagents cannot spawn other subagents, so you own all agent orchestration.
Phase A — Load and Verify Prerequisites
- Resolve the active run:
- Read
.agents/work/current.jsonif present and use its slug. - If
current.jsonis missing and exactly one run exists under.agents/work/runs/, auto-select it and continue. - Otherwise stop and tell the user to run
/pi:planto select or create a run.
- Read
- Read
state.jsonfrom the resolvedstate_root. If phase is notexecuteor later (review,done), tell the user to run/pi:executefirst. - Read
brief.md,rubric.json,tasks/*.json,research/consensus-matrix.mdfrom the active state root. On the first state write of this phase, setstate.json.orchestrator.last_command_cli = "claude"withorchestrator.updated_at= current ISO-8601 time. - Read all evaluations from
evaluations/.
Phase B — Full Verification Suite
- Run the complete local verification suite the project supports.
- Run per-task verification: iterate each task's
verificationarray and record results. - If
rubric.json.criteria.visual_design.applicableistrue, gather browser-backed UI evidence before final scoring:- prefer a Bash-runnable capture path: the project's existing UI/e2e test
harness, or
npx playwright/playwright testrendering desktop and mobile widths. If a Browser or Chrome DevTools tool is available in this session, you may use it instead. - check console errors, obvious layout overlap, responsive behavior, and the
selected direction in
research/ui-layout-decision.md - record screenshot or test-artifact paths
If no screenshot path can be produced, mark visual verification blocked; do
not let the final
visual_designscore pass on prose alone.
- prefer a Bash-runnable capture path: the project's existing UI/e2e test
harness, or
- Write suite results to
evaluations/suite-results.json.
Phase C — Final Evaluation with External Critics
Read
research_policy.providersfromrubric.json. For each provider, spawn the matching reviewer for a final independent read of the full build:codex→ spawncodex-reviewer, save toreviews/codex-final.jsongemini→ spawngemini-reviewer, save toreviews/gemini-final.json
When both providers are active, spawn them in parallel. If the list is empty, skip this step.
If a selected CLI is not available, check
execution_policyfromrubric.json. Ifcodex_policyisskip, proceed without that provider's review. Ifrequired, halt and warn the user. Ifoptional, applydegraded_mode:warn_and_continue— note the absence in the scorecard and proceed;block— halt and warn the user. The same policy applies to Gemini in this phase.Spawn
evaluator(foreground) with:- the brief, rubric, full build (not just last repair)
- per-task verification arrays and consensus matrix
- suite results and the review output from every active provider
- all prior evaluations for context
- for UI work,
research/ui-layout-decision.mdand screenshot/test artifact paths from the suite
Write final evaluation to
evaluations/review.json.
Phase D — Scorecard and Learnings
- Present the full scorecard:
- Global rubric scores (functionality, code_quality, product_depth, visual_design if applicable)
- Per-task verification results (task_id, checks passed/failed)
- Consensus matrix cross-reference: flag any implementation that contradicts resolved planning decisions
- Full-suite test results
- Known gaps
- Repair passes used during execute
- Which external providers were consulted (Codex, Gemini, both, or none) and where any of them changed the outcome
- Which builder ran (
claudeorcodex, fromexecution_policy.primary_executor) - For UI work, screenshot/test artifact paths and whether the chosen layout direction was followed
- If the build still misses the bar:
- Update
state.json:phase->"execute"(not"done") - Present the focused repair plan to the user
- Do NOT write LEARNINGS.md or mark done — the workflow returns to
/pi:executefor another repair cycle
- Update
- If the build passes:
- Append durable project-specific learnings to
LEARNINGS.md - Update
state.json:phase->"done"
- Append durable project-specific learnings to
Review is complete when evaluations/suite-results.json and
evaluations/review.json exist, every task verification has a pass/fail/blocked
state, external-review skips are policy-labeled, and state.json is either
returned to "execute" for repair or advanced to "done".
Follow the protocol exactly. Present the full scorecard, test results, and any remaining repair guidance to the human.