ywc-impl-review

name: ywc-impl-review description: >- (ywc) Use when implementation is complete and the user wants a pre-PR code quality review, spec conformance check, or comprehensive implementation validation. Triggers: "구현 검증", "impl review", "implementation review", "사양 적합성", "코드 리뷰", "구현 리뷰", "PR 전 검증", "check my implementation", "実装レビュー", "実装確認". Do not use for active code generation (use ywc-code-gen), spec-only review before code exists (use ywc-spec-validate), for product/business-level review (use ywc-product-review), for capturing or reading accumulated review preferences as durable learnings (use ywc-review-learnings), or for UI-only live design audits (use ywc-ui-ux-review).

Announce at start: "I'm using the ywc-impl-review skill to run a five-axis (architecture / design / devex / security / QA) implementation review."

Implementation conformance review skill. Runs five parallel Codex review workers (Phase 1) and escalates only ambiguous findings to a short higher-capability advisor pass (Phase 2). The four code-review aspects (architecture / design / devex / security) stay separate so each worker has a bounded review surface. QA stays as a separate axis because coverage analysis is mostly mechanical. See Advisor Pattern for why this shape is used.

Rationalization Defense

When tempted to skip a step, check this table first:

Excuse	Reality
"Phase 1 had no findings, the spec is satisfied"	Absence of findings ≠ proof of conformance. State what was checked, not what was missing.
"This finding looks ambiguous, send it all to the advisor"	Phase 2 budget is 5. Reserve for genuinely ambiguous cases — not first-pass uncertainty.
"Forwarding the full spec gives the advisor better context"	Never forward full files or full project context. Only the finding text + bounded snippet + spec excerpt.
"Severity feels somewhere between High and Medium"	Pick based on impact, not feeling. If truly between, that is a Phase 2 candidate.
"`--no-advisor` saves time on this review"	Skip Phase 2 only on throwaway/prototype code. Production review needs ambiguity escalation.
"Reviewer workers agree, so the finding is correct"	Multiple workers drawing the same wrong conclusion is still wrong. Escalate when stakes are high.
"User wants a quick review, severity ratings are optional"	Without severity, the user cannot triage. Always rate Critical / High / Medium / Low.
"Surface every nitpick to be thorough"	A review that buries one real bug under ten style nits trains the reader to ignore all of them. `chill` is the default: suppress the Style/Docs/polish tail unless `--profile assertive`.
"This finding is plausible, surface it with a 'might be'"	Verify-before-surface: a finding without primary evidence (file:line, traced symbol, command output) is dropped, not hedged.
"There's a `docs/review-learnings.md` FALSE-POSITIVE entry but I'll flag it anyway to be safe"	A `FALSE-POSITIVE` learning carries the team's reason it is a non-issue here. Re-raising it reintroduces the exact noise the learning was created to stop.

Violating the letter of these rules is violating the spirit. Review without honest severity is theater.

Arguments

Parameter	Format	Example	Description
`--spec`	`--spec <path>`	`--spec docs/outline/02-api.md`	Specification file path (required)
`--code`	`--code <path>`	`--code api/src/routes/`	Code path to review (required). `--code` and `--git-range` are mutually exclusive
`--git-range`	`--git-range <sha>..<sha>`	`--git-range abc1234..HEAD`	Git range to derive the review target. Run `git diff --name-only <range>` to obtain the changed-file list. `--code` and `--git-range` are mutually exclusive
`--no-advisor`	flag		Skip Phase 2 entirely. Use when running on throwaway or prototype code where higher-capability advisor judgment on ambiguous findings is not worth the latency
`--profile`	`--profile chill\|assertive`		Verbosity dial (default `chill`). `chill` surfaces correctness / security / logic / runtime-risk findings and suppresses Style/Docs/Devex-polish `Low`/`Info` nits; `assertive` emits those too. Critical/High/Medium are never suppressed. See coderabbit-methodology.md §1
`--skip-learnings`	flag		Skip Step 0 (loading `docs/review-learnings.md`). Use when no learnings file exists or a clean-room review is wanted
`--advisor-budget`	`--advisor-budget <n>`	`--advisor-budget 3`	Maximum number of Phase 2 advisor calls. Default: 5. Applies across all categories combined
`--format`	`--format markdown\|html`	`--format html`	Output format. Default `markdown`. With `html`, writes a self-contained HTML report to `claudedocs/`. See html-output.md

Advisor Pattern

This skill uses Pattern B (Two-Phase Review) from advisor-pattern.md. The rationale: review findings range from mechanical (hardcoded secret, missing null check, trivial OWASP match) to genuinely ambiguous (architectural judgment call, severity debate between two OWASP categories, spec-conformance question with more than one reasonable reading). Running every reviewer at maximum reasoning depth wastes advisor capacity on the mechanical cases; running every reviewer at ordinary depth undersells the ambiguous ones. Phase 1 handles the mechanical cases with bounded Codex workers; Phase 2 escalates only the ambiguous ones to a higher-capability advisor with tightly bounded context.

Budget discipline (see advisor-pattern.md §6): default cap is 5 advisor calls per invocation, shared across all categories. Use fewer when possible. Never forward full files or full project context to the Phase 2 advisor — only the finding text, a bounded snippet, and the relevant spec excerpt.

Execution Steps

Load Review Learnings (skip if --skip-learnings) — Invoke ywc-review-learnings --mode read --target <changed files / --code path> to load the project's accumulated review preferences from docs/review-learnings.md. DO / DO-NOT entries become extra checks injected into the matching reviewer worker; FALSE-POSITIVE entries tell that worker to stop raising a known non-issue. If the file is absent, proceed with an empty set — never block a review on missing learnings. See coderabbit-methodology.md §7.
Collect Project Context — Read AGENTS.md, CODEX.md, CLAUDE.md, and package.json where present to identify conventions, tech stack, and PR gate conditions. If docs/ubiquitous-language.md exists, read it — the Design worker must flag identifiers that match a "Synonyms to Avoid" entry instead of the canonical term. Per coderabbit-methodology.md §3, treat the spec / PR description as the statement of intent and trace each changed symbol to its callers/callees before judging it.
Read Spec + Code — When --git-range is provided instead of --code: run git diff --name-only <range> to obtain the changed-file list and treat those files as the review target. Read the specification file and all target code files. This context stays with the parent; do not forward it wholesale to Phase 2.

Phase 1 — Parallel Executor Review — Use Codex subagent delegation to run five review workers in parallel. Do not pass Claude Code-only model fields; each worker receives its role from the prompt and matching reference file:

Architecture worker — Module boundaries, layering, structural patterns, dependency direction, simplicity / over-abstraction, structural spec conformance. Reference: references/architecture-agent.md. When the diff touches DB schema or migrations, also apply the shared schema review checklist (../references/schema/core.md Part C); raise cascade ↔ API status and multi-tenant scope gaps as one-line cross-references to the Security worker rather than duplicating them.
Design worker — API/interface design, naming, signatures, error models, return shapes, public-surface discipline, contract spec conformance. Reference: references/design-agent.md.
Devex worker — Readability, error messages, logging, documentation, debuggability, config UX. The operator-experience dimension. Reference: references/devex-agent.md.
Security worker — OWASP Top 10 analysis. Reference: references/security-agent.md.
QA worker — Test coverage gaps and missing test cases. Reference: references/qa-agent.md. Coverage-gap detection is largely mechanical (file enumeration, assertion counting, branch enumeration), so keep the prompt narrow and evidence-based.

Inject the Step 0 learnings into each worker prompt, filtered to that aspect's category. Apply the --profile dial: in chill (default), suppress Style/Docs/Devex-polish Low/Info nits and surface only correctness/security/logic/runtime-risk findings; assertive emits the tail too. Verify before surfacing (coderabbit-methodology.md §2): every finding must cite primary evidence — exact file:line, a traced symbol, or fresh command output — and a finding that cannot be substantiated is dropped, not hedged. Where the project ships linters/scanners, run them and feed output to the relevant worker as evidence to triage, not as the verdict (§4).

Each worker must return two artifacts:

Confirmed findings — issues the executor is confident about (Phase 1 complete, no escalation needed).
Advisor candidates — findings the executor flags for Phase 2 review. Each candidate must include: the finding text, a bounded code snippet (≤100 lines), the relevant spec excerpt (if any), and a one-sentence reason the executor wants a second opinion.

Each reference file (references/*-agent.md) explains what "confident" vs "needs advisor" means for that category. The four code-review aspects (architecture / design / devex / security) stay in their own lanes: an Architecture finding does not include naming polish (Design) or error-message wording (Devex). Cross-aspect concerns surface as one-line cross-references, never as duplicated findings.

Language-specific depth via Codex custom agents — When the Codex custom-agent catalog at codex/agents/ is installed, route language-dominant or performance-sensitive review packets to the matching read-only specialist:

Signal in changed-file list or evidence packet	Specialist	Packet boundary
TypeScript / TSX / JavaScript dominates	`ywc-typescript-reviewer`	changed TS/JS files, relevant diff, tsconfig/framework excerpt, forwarded `tsc`/lint output if available
Python dominates	`ywc-python-reviewer`	changed Python files, relevant diff, Python/framework version, forwarded mypy/pyright/pytest/ruff output if available
Go dominates	`ywc-go-reviewer`	changed Go files, relevant diff, go.mod/go.sum excerpt, forwarded vet/staticcheck/test/race output if available
Performance budget, profiler, Lighthouse, query plan, or bundle evidence is central	`ywc-performance-engineer`	bounded code/diff plus the measurement, budget, trace, query plan, or bundle report

Do not import Claude Code named-agent syntax (subagent_type: ...) or rely on claude-code/agents/; those are Claude Code-only surfaces. If Codex custom-agent dispatch is unavailable, keep the worker generic and pass the same specialist checklist explicitly in the bounded prompt.

Aggregate and Select Phase 2 Candidates — Combine candidate lists from all five workers:
- Deduplicate findings that share {file}:{line} across categories.
- Cap the total at --advisor-budget (default 5). If candidates exceed the cap, prioritize: Critical > High > Medium, and within the same severity prefer Security > Architecture > Design > Devex > QA. (Architecture / Design / Devex order reflects irreversibility — structural decisions are hardest to walk back, then contracts, then operator UX.)
- Log the candidates that were dropped due to the cap in the final report so the user can see what was not escalated.
Phase 2 — Advisor Pass (skip entirely if --no-advisor) — For each surviving candidate, run a short higher-capability advisor pass:
- Context payload: only the candidate's finding text, the bounded snippet, the spec excerpt, and the category-specific severity rubric from the matching reference file. Do not forward the full spec, the full file, or the Phase 1 transcripts.
- Expected output: a short verdict (≤200 words) containing: confirmed severity, a one-line rationale, and either "confirmed" or "adjusted" (with the adjustment if any). The 200-word cap is tighter than advisor-pattern.md §3's observed ceiling of "typically <500 words" — 200 is an operational choice for this skill. If a genuinely complex candidate needs more room, invoke the §6 override: exceed the cap and justify the overrun in the final report's Advisor Budget Report section.
- Advisor calls are sequential, not parallel — each is small and fast, and sequential execution keeps the budget enforcement simple and auditable.
Merge and Output Report — Combine Phase 1 confirmed findings with Phase 2 verdicts. Mark each finding in the final report with its provenance (Phase 1 vs Phase 2) so the user can see which decisions involved higher-capability advisor judgment.
Capture Learnings (skip if --skip-learnings) — After the report, offer to promote durable lessons into docs/review-learnings.md via ywc-review-learnings --mode update --source review: recurring confirmed findings become DO/DO-NOT learnings, and findings the user dismisses as false positives become FALSE-POSITIVE learnings with the dismissal reason. Do not write learnings without the user-confirmation CHANGESET that ywc-review-learnings enforces.

Output Format

## Implementation Review Result: {spec} vs {code}

### Summary
- Phase 1 findings: Architecture A, Design D, Devex V, Security M, QA K
- Phase 2 advisor calls: X of Y budget used
- Phase 2 adjustments: N confirmed as-is, M severity-adjusted

### Architecture
1. [severity] [P1|P2] {file}:{line} — Description
   (if P2) Advisor verdict: {one-line rationale}

### Design
1. [severity] [P1|P2] {file}:{line} — Description
   (if P2) Advisor verdict: {one-line rationale}

### Developer Experience (Devex)
1. [severity] [P1|P2] {file}:{line} — Description
   (if P2) Advisor verdict: {one-line rationale}

### Security
1. [severity] [P1|P2] {file}:{line} — Description
   (if P2) Advisor verdict: {one-line rationale}

### Testing (QA)
1. [severity] [P1|P2] — Description
   (if P2) Advisor verdict: {one-line rationale}

### Fix Priority
1. (Sorted by Critical first, with [P1|P2] markers preserved)

### Advisor Budget Report
- Used: X of Y calls
- Dropped (over budget): {list, if any}

### Completion Status
(One of: DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT)

Reporting Symbols: Replace the bracketed [severity] placeholder above with the corresponding symbol from symbols.md. Severity vocabulary is shared across all ywc-* review skills — use the same symbol for the same tier.

Symbol	Severity	Replaces
`🚨`	Critical	`[Critical]`
`🔴`	High	`[High]`
`🟡`	Medium	`[Medium]`
`🔵`	Low	`[Low]`
`ℹ️`	Info	`[Info]`

Example finding line:

🚨 [P2] src/api/users.ts:42 — SQL injection: input concatenated into query.
   Advisor verdict: confirm Critical; remediation requires parameterization.

The [P1] / [P2] marker is preserved unchanged — it describes Phase 1 vs Phase 2 escalation history, while the severity symbol describes finding impact. Both pieces of information are needed; do not collapse them.

For the Confidence Gate score in the report header, use the band marker from symbols.md: ✅ for PROCEED, ⚠️ for REVIEW, ❌ for STOP.

Completion Status rules:

Status	When to use
`DONE`	Review complete, no Critical or High findings
`DONE_WITH_CONCERNS`	Review complete but Critical/High findings were identified — the report details them; human action required before merging
`BLOCKED`	Review cannot proceed — spec file missing, code unreadable, or a Phase 2 escalation returned an inconclusive verdict
`NEEDS_CONTEXT`	Spec and code paths are ambiguous; cannot determine what conformance means without clarification

[P1] marks findings confirmed entirely by the Phase 1 executor. [P2] marks findings that went through the Phase 2 advisor. This distinction matters when the user calibrates trust in the output — Phase 2 items represent the decisions the executor deemed genuinely ambiguous.

HTML mode (--format html) — emits the same findings as a self-contained HTML report: severity color coding, tab navigation, and a Copy as Markdown button. Structure and conventions follow html-output.md. The Markdown surface is preserved inside the file, so downstream integration is unaffected.

Worker Prompt References

Read the corresponding reference file when spawning each worker and include the relevant section in the worker prompt. The reference files describe both the standard review dimensions and the "advisor candidate" criteria specific to each category:

references/architecture-agent.md — Architecture's review dimensions (structural spec conformance, pattern consistency, module interface, simplicity, surgical changes) + advisor triggers
references/design-agent.md — Design's review dimensions (contract spec conformance, naming, signatures, error model, return shapes, public-surface discipline) + advisor triggers
references/devex-agent.md — Devex's review dimensions (readability, error messages, logging, documentation, debuggability, config UX) + advisor triggers
references/security-agent.md — Security's OWASP Top 10 checklist and advisor-escalation triggers
references/qa-agent.md — QA's coverage analysis and advisor-escalation triggers

If a reference file does not yet contain an "Advisor Candidate Criteria" section, fall back to the three-property test in advisor-pattern.md §5: objective trigger, irreversibility, ambiguity. A finding must satisfy all three to be a Phase 2 candidate.

CodeRabbit-derived methodology — the verbosity dial (--profile), verify-before-surface precision gate, intent/context phase, static-analysis-as-evidence, and per-project learnings loop are specified in references/coderabbit-methodology.md. These techniques raise review quality toward bot-reviewer parity without depending on CodeRabbit runtime. The learnings half pairs with ywc-review-learnings, which owns docs/review-learnings.md.

Recurring real-world defects catalog — every reviewer worker additionally consults references/recurring-defects.md, a derived-from-data catalog of the defect classes that production bot reviewers (CodeRabbit, Codex Review) flag most often: data-layer access-boundary (ownership / tenant isolation), data-integrity / NULL handling, error-swallowing, external-call resilience, validation / fail-fast, HTTP-status semantics, and test fidelity. Each worker's "High-frequency real-world checks" section points at its catalog slice. The catalog tells the reviewer where real bugs cluster; the per-aspect rubric still governs severity and escalation.

Confidence Gate

This skill applies the Confidence Gate to the aggregated review output before emitting the final report. The gate sits between Phase 2 advisor consolidation and report emission.

In addition, when the report's gate band lands in PROCEED and findings include a DONE (or DONE_WITH_CONCERNS) completion claim against the implementation, the surface must follow ywc-verify-done: the verification block (command, output excerpt, exit code) appears before the status line, no should / probably / seems wording appears in the conclusion, and any "this finding's fix was verified" claim cites the fresh command output that proves it. A report that reads "all clear, looks good" without an evidence block is not a Confidence-Gate PROCEED — downgrade to REVIEW until the evidence is attached.

Required dimensions (must each score ≥ 70):

Evidence quality — Every finding must cite a verified source: file path, line number, or test output. Findings phrased as "this might be wrong" without primary evidence reduce the score even if the finding itself is plausible.
Root cause identified — A finding that names only the symptom ("test fails") without identifying the underlying cause ("test fixture is shared across cases") cannot be remediated correctly. The reviewer must reach root cause before recording the finding.

Band-to-status mapping for this skill:

Gate band	Completion status	Action
PROCEED (≥ 90)	DONE or DONE_WITH_CONCERNS	Emit the report; status follows from the existing Critical/High count rule.
REVIEW (70 – 89)	NEEDS_CONTEXT	Emit the report with the gate score and weakest dimension flagged at the top. Reviewer judgment required before merge.
STOP (< 70)	BLOCKED	Do not emit findings as authoritative. Report which dimensions failed and what additional Phase 2 advisor calls would be needed to raise the score.

The gate score must appear in the report header. Per-finding [P1] / [P2] markers remain unchanged — they describe escalation history, while the gate describes overall report confidence.

Validation

Before returning the report, verify that every finding includes source evidence, Phase 2 advisor usage is counted against the budget, the Confidence Gate band maps to the final status, and any "verified" claim includes the fresh command output required by ywc-verify-done.

Integration

upstream: ywc-sequential-executor or ywc-parallel-executor (auto-invoked via --review flag)
downstream: PR creation
pairs with: ywc-review-learnings — loaded in Step 0 (read) to apply accumulated review preferences and called in Step 7 (update --source review) to capture new ones
pattern source: references/advisor-pattern.md, references/coderabbit-methodology.md