name: plan-tests
model: inherit
effort: medium
agents: []
description: >
Use to turn a feature's acceptance criteria into a test plan before any test is written — a
table that maps every spec.md §5 acceptance criterion to at least one test, names the test
levels (unit / integration / e2e / contract / load) without binding to a language or
framework, and fixes the integration and data strategy. Triggers on "plan tests for {slug}",
"test plan for {slug}", "how do we test {slug}", "test strategy for {slug}",
"/sdd:plan-tests {slug}", "план тестів для {slug}", "як тестувати {slug}", "тест-план".
Output: docs/features/{slug}/test-plan.md (separate file for M+), or inline in spec.md for
XS/S per the size matrix. Hard-refuse if spec.md is missing → run specify {slug} first.
Skill: plan-tests
Turns an already-specified feature into a test plan: a table that ties every acceptance criterion in spec.md §5 to at least one named test, the levels those tests live at (unit / integration / e2e / contract / load), the integration strategy (a real dependency, spun up throwaway), and the test-data + cleanup approach. The plan is written before a single test exists — the next stage, implement, reads this map and writes the red tests against it, not "however it seems". This file is the spine; the output scaffold lives in templates/test-plan.md.
This skill keeps only its own machinery. Question phrasing is shared → ../_shared/ask-style.md. Depth (inline in the spec vs a separate file) follows the size matrix → ../_shared/size-matrix.md. It names test levels, never test tools — the concrete commands are detected by implement against the repo, not hard-coded here.
Owner
QA + the engineer who will implement the feature (co-authors). QA drives the level breakdown and the edge/error cases; the implementing engineer confirms each acceptance criterion has a reachable test and that the integration strategy fits the repo. The Tech Lead signs off that no acceptance criterion is left uncovered.
Inputs
<slug>— the same feature slug every earlier stage used.- Gate (hard-refuse if missing):
docs/features/<slug>/spec.md. Its §5 acceptance criteria are the entire reason this plan exists — each one must map to a test. Ifspec.mdis absent → STOP and point: «runspecify <slug>first — the test plan maps its §5 acceptance criteria to tests». - (Optional)
docs/features/<slug>/data-model.md— the entity shapes tell you what test data to build and what to seed/clean per suite. Read it if present. - (Optional)
docs/features/<slug>/sad.md§6 sequence diagrams — each drawn flow is an e2e candidate; each cross-participant boundary is a contract-test candidate. - (Optional)
docs/features/<slug>/.size— depth hint. Absent → default to M (separatetest-plan.mdfile) and say so loudly in the handoff — «size M (default — no.size; run/sdd:classify-size <slug>)».
Protocol
- Gate.
test -f docs/features/<slug>/spec.md→ fail = refuse with the pointer above. Then read §5 (acceptance criteria — the rows of the coverage table) and §6 (NFRs — which drive load tests). Readdata-model.md/sad.md§6 if present. - Pick the output target. Per the size matrix: XS/S → write the plan inline in
spec.mdas a short## Test plansection (a coverage table is enough — no separate file); M+ → a separatedocs/features/<slug>/test-plan.mdfrom the template. Confirm the target with oneAskUserQuestion(phrasing per../_shared/ask-style.md) when.sizeis absent. - Map levels — generic only. Name test levels from a fixed vocabulary, never a tool or language: unit (pure logic — a rule, a calculation, a validator, no I/O), integration (the module against a real dependency it owns — DB, cache, queue), e2e (a full flow end to end, one per critical user story), contract (a boundary between two participants — an API shape or an event schema agreed by both sides), load (only when an NFR carries a number — throughput, p95 latency). When
sad.mdfrontmattertarget_surfacesdeclares a UI surface (web-frontend/mobile-app/desktop-app), add the frontend tiers — component (a UI component exercised in isolation), visual-regression (web — the rendered UI diffed against a baseline), e2e-through-UI (the flow driven through the real UI, not just the API). These are the "testing trophy", the dominant frontend testing vocabulary (web.dev / Kent C. Dodds) — a vocabulary, not a mandate (→../_shared/surfaces.md). Do not write tool names (no specific test runner, broker, visual-regression, or load tool) —implementdetects what the repo already uses (e.g. Playwright / Storybook / a visual-diff tool). - Core mapping (the contract of this skill) — user chooses the level per AC. Build the AC→test table: every acceptance criterion in §5 maps to ≥1 test. For each AC, propose a default level from a heuristic (pure logic/rule/validator → unit; behaviour against a real dependency the module owns → integration; a full user-story flow → e2e; a cross-participant API/event shape → contract; and — when a UI surface is declared — a UI piece → component, a user-facing flow → e2e-through-UI), then confirm the level(s) with the user via one
AskUserQuestion(multiSelect — an AC may fan out to several levels, e.g. unit for the rule + e2e for the flow), phrased per../_shared/ask-style.md. The user's choice is authoritative and is recorded in the table's Level column;implementreads it to write the test at the right level (it does not re-decide). A criterion with zero tests is the cardinal anti-pattern. Name each test descriptively from the criterion's intent (e.g.over-quota request is rejected), not from any framework convention. - Edge cases & error paths. Every error/authorization acceptance criterion gets its own dedicated test row — never folded into the happy path. List the boundary and failure cases the spec implies (missing identifier, malformed input, dependency unavailable → the spec's fallback behaviour) as explicit rows with their expected outcome named in plain words (no status numbers, no error-code strings).
- Integration strategy — real, ephemeral dependency. For integration tests, the default is an ephemeral real dependency, e.g. a throwaway DB container spun up for the suite and torn down after (testcontainers-style). Mocking the datastore is an anti-pattern — a passing mock is not a passing production. State the seed strategy (factories/fixtures for the data shape) and the cleanup boundary (per-test vs per-suite); without cleanup the suite goes flaky and blocks CI.
- NFR → load. For each §6 NFR that carries a number, write one concrete load scenario (target rate, duration, the metric and its threshold) and name the tool generically: the load tool already in your repo, or e.g. k6 or Locust. If no NFR carries a number, mark the load section
<!-- N/A: no numeric NFR -->— do not invent a load test. - CI placement. Note which suites run where: fast suites (unit, contract) on every PR; the heavier ones (e2e, load) on a schedule or pre-release. The split is advice, not a pipeline config —
implementand the repo's CI own the actual wiring. - Socratic walk + write + commit. Walk the coverage table and the strategy choices with the 4-state actions from
../_shared/ask-style.md(Accept / Fix / Save-as-OQ / Drop); on Fix, regenerate that one row (one round, second answer final). Maintain the edits-log per../_shared/socratic-loop.md. On pass, write the plan to its target (separate file for M+, inline## Test planfor XS/S) and propose committest-plan: <slug>. Then emit the stage-handoff block per../_shared/handoff.md— What I did + Review (test-plan.md, orspec.md## Test planfor XS/S) + Run next (/clear, then/sdd:implement <slug>, which consumes this map to write the red tests).
Definition of Done
- The plan exists at its size-correct target: a separate
docs/features/<slug>/test-plan.mdfor M+, or an inline## Test plansection inspec.mdfor XS/S. - Every acceptance criterion in spec.md §5 maps to ≥1 named test — zero uncovered criteria.
- Each error / authorization criterion has its own dedicated test row, not folded into a happy path.
- Test levels are generic (unit / integration / e2e / contract / load; + component / visual-regression / e2e-through-UI when a UI surface is declared in
target_surfaces) — no test-runner, broker, visual-regression, or load-tool name is hard-coded (the load tool is named only as "the one in your repo, or e.g. k6 / Locust"; UI tools are detected byimplement). - Integration tests use an ephemeral real dependency (throwaway container), with the seed and cleanup boundary stated; no mocked datastore.
- Every numeric §6 NFR has a load scenario (rate + duration + metric + threshold), or the load section is explicitly
<!-- N/A -->.
Anti-patterns
- An acceptance criterion with no test. The whole point of the map is that §5 is verifiable; an uncovered criterion means it isn't.
- Naming a concrete tool or language — a specific runner, broker, or load tool. The legacy plan hard-coded k6; here load is "the tool already in your repo, or e.g. k6 / Locust", and the rest stay generic levels.
implementdetects the real commands. - Mocking the datastore. A passing mock is not a passing production — use a throwaway real dependency for integration.
- e2e without a cleanup boundary. Leftover state makes the suite flaky and every flaky run blocks CI.
- "100 % coverage" as the goal. The target is critical paths + happy + error paths mapped to acceptance criteria, not a line-count number.
- A wishlist plan — "would be nice to add". A test plan is a commitment the next stage executes, not a backlog.
- Inventing a load test with no numeric NFR. No number →
<!-- N/A -->, not a fabricated throughput target.
References & template
../_shared/ask-style.md— canonical question/option phrasing for steps 2 and 9.../_shared/size-matrix.md— inline-in-spec (XS/S) vs separate file (M+) depth.../_shared/surfaces.md— a declared UI surface adds the component / visual-regression / e2e-through-UI tiers (testing-trophy vocabulary); read fromsad.mdtarget_surfaces../templates/test-plan.md— output scaffold: AC→test mapping table, generic test levels, ephemeral-dependency integration strategy, stack-agnostic load section. Its<!-- … -->comments are the per-section contract.