name: review-spec
description: Use when reviewing a ticket's scenarios (.feature source, with legacy test-definitions.md fallback) — auto-fired by the bdd scenario-gate and re-invokable after scenario edits. Runs vacuous-pass, AODI, determinism, negative-case, and cross-cutting checks and produces a structured findings report. NOT for spec.md JTBD/AC/persona framing — that is self-review.
allowed-tools: '*'
Review Spec — Scenario Quality Gate
Adversarially review a ticket's scenarios: treat them as if you're trying to break them — find the one that passes for the wrong reason, the missing rejection path, the flaky assertion. This is the bdd scenario-gate procedure, extracted so it runs two ways:
- Auto-fire — the bdd flow invokes this on entering the
scenario-gatephase. - Manual re-run — invoke
/review-specanytime afterdefine-behavior(e.g., scenarios changed during implement and you want to re-validate). Allowed on a closed ticket too — a post-hoc audit is still readable.
Read the active ticket's .feature source first; use test-definitions.md only as the R/G/R ledger and as a legacy scenario fallback when no feature source exists. test-definitions.md is the R/G/R ledger. Run every check below against the scenarios, and present findings in the Findings format at the end. (Not a spec.md framing review — JTBD/AC/persona checks live in self-review.)
Vacuous-pass test
Run this first — a scenario that would pass without the feature invalidates every check below it. Mentally delete the implementation and ask: could this scenario still pass? If yes, it is vacuous: flag it and propose a stronger Then, not just a warning. (A good test is behavioral — if the behavior changed, the result should change; a scenario that survives a deleted feature tests nothing.)
Common vacuous patterns, each with its fix:
- Existence-only
Then("a response is returned") → assert the actual value, not that something came back. - Given-echo ("Given a row with X exists … Then a read returns X") → that exercises the store, not the feature; assert something the feature must compute or change.
- Trivially-true setup — the
Givenalready makes theThentrue regardless of theWhen→ move the real precondition out of the assertion. - Non-claim
Then("the system remains running") → assert a falsifiable outcome the feature produces.
AODI validation
Validate each scenario against four criteria:
| Criterion | Check | Red flag |
|---|---|---|
| Atomic | Tests ONE behavior | Multiple When/Then pairs |
| Observable | Has externally visible outcome | Internal state only |
| Deterministic | Same result on repeated runs | Time/random/external dependency |
| Independent | No ordering dependency | "After Scenario 2 runs..." |
Determinism risks
Sharpen AODI's Deterministic check with the patterns that actually flake in CI — each with its fix:
- Time without a wait — a
Thenthat depends on elapsed time, or asserts an async result after a fixed delay → wait on an observable condition (poll/await the state), never a baresleep. - Order-dependent comparison — asserting an unordered collection as if it were ordered → sort, or compare as a set, before asserting.
- Unsequenced concurrency — a
Thenover concurrent operations with no stated ordering → assert on the settled end-state, or name the ordering guarantee the scenario relies on.
Assertion strength (weak vs strong Then) isn't repeated here — it is testing Iron Law 2, and the vacuous-pass check above already coaches a stronger Then.
Adversarial pass
After AODI validation, argue against your own scenario list: "What breaks that none of these scenarios catch?" Present any findings to the user.
One lens to always run — negative-case coverage: for each happy-path scenario, is there a rejection-path counterpart? Partitioning should already have produced the invalid-input classes (equivalence partitioning covers invalid ranges, not only valid ones); this pass is the backstop. Common pairs — create ↔ duplicate, read ↔ not-found, update ↔ not-allowed, act ↔ precondition-failed. Treat a gap as should-strengthen, not must-fix — a sibling AC often already covers the rejection: "Happy path X has no rejection counterpart — add a scenario for path Z?" For one behavior across many inputs, use a Scenario Outline.
Cross-cutting checks
Five lenses across the whole scenario set (not per scenario) — each asks "what's missing?":
- Conflict — do two scenarios contradict (one allows X, another rejects it) with no distinguishing precondition?
- Boundary — zero / one / max / empty / null covered where they apply?
- Failure — external-dependency failures covered (timeout, 5xx, malformed, partition)? Distinct from the feature's own rejections (the negative-case lens above).
- Security — authn/authz failures and abuse vectors covered?
- Persona consistency — is each scenario's triggering persona clear, and would another persona experience it differently?
Findings format
Report findings the way safeword talks to the user — lead with the answer, structure only because a multi-finding review earns it, end with the call:
- Lead with a tally —
**Findings:** N must-fix, M should-strengthen, P looks-good. - Three tiers — Must Fix (correctness/structure), Should Strengthen (clarity/specificity), Looks Good (specific acknowledgement, never padding).
- One
####per finding with the scenario id + a short issue; under it, Current (quote the G/W/T, bold the offending phrase) → why → Proposed (the rewrite). Fix last, so the explanation reads as the answer, not justification. - Bulk — when one pattern hits ≥3 scenarios: one header, an Affected id list, one Representative quote, one Proposed pattern.
- End with
**Next:**— the single fix to start.
**Findings:** 1 must-fix, 0 should-strengthen.
#### oauth.PO1.AC2.change_applies — Then joins two assertions with "and"
Current: "Then the config shows B and later auths use B" — two independent observables.
Proposed: "Then later authentications use User Source B."
**Next:** split the AC2 scenario, then re-run the gate.
After the review
When invoked auto-fire from the bdd scenario-gate, hand control back to bdd/SCENARIOS.md for the Scenario Gate Exit (assign test layers, advance the phase, work-log). When invoked manually, stop after presenting findings — the driver decides what to fix and whether to re-run.