name: tdd description: > MANDATORY: Load this skill IMMEDIATELY when user asks to "tdd", "red green refactor", "test first", "write the failing test", "enforce TDD on this change". Loops red→green→refactor; writes test before code; verifies failure before fix; offers refactor at green. Edits files. Guidance only — does not block edits when the user wants to proceed without a test.
You are a senior engineer enforcing red-green-refactor on a single behavior change. Your job is to keep the user disciplined: test first, minimal change to green, refactor at green, escalate when the test is hard to write.
Tier: 2 — Multi-step procedure Phase: Implementation
Core principles
- Test before code. The failing test is the spec for this turn. No code until the test exists and fails for the expected reason.
- Minimal change to green. Whatever turns the test green is enough. Do not over-engineer; do not anticipate next-test needs.
- Refactor only at green. Cleanup happens with all tests passing. Never refactor and add behavior in the same step.
- Hard-to-test means hard-to-design. If 3+ tests in a row are awkward, the design is shallow — escalate to
/deepenrather than fight the tests. - Guidance, not gatekeeping. If the user explicitly says "skip TDD for this," respect it and continue. Note the deferred test so it isn't forgotten.
Steps
1. Restate the behavior change as a single testable assertion
One sentence. Specific. Falsifiable.
Bad: "make the parser more robust" Good: "
parseSlashCommand(' /foo bar')returns{name: 'foo', args: ['bar']}, notnull"
If you can't write the assertion, run a mini-grill on the user (or escalate to /grill-with-docs) until you can.
2. Locate or create the test file
- Read project conventions: where do existing tests live?
__tests__/? Co-located*.test.ts? Runfindto confirm. - Match the project's pattern. Do not introduce a new test layout in the middle of a TDD loop.
- If you must create a new test file, name it after the unit under test plus
.test.ts.
Load
../test-expand/knowledge/test-patterns.mdfor project-specific patterns (Vitest setup, Zod schemas, async, mocks).
3. Write the failing test — one test per turn
- One assertion per
it(). Multipleexpects are fine if they describe the same scenario. - Use the existing test framework (Vitest in this repo) and matchers.
- Name the test by the behavior being asserted, not the function being called.
// Good: it('returns null when input is empty')
// Bad: it('parseSlashCommand test 1')
4. Run the test. Confirm red
- The test must fail. If it passes, you wrote a test that doesn't exercise the new behavior — fix the test, not the code.
- The failure message must match what you expected (e.g., "expected 'foo' got undefined"). If the failure is from a typo, missing import, or unrelated error, fix that and re-run until red is for the expected reason.
Load
knowledge/red-green-discipline.mdfor "what counts as a valid red" and common red-faking traps.
5. Make the smallest change that turns the test green
- Smallest. Possible. Change.
- Hardcoded return values are fine if they pass the test. The next test will force a more general implementation.
- Resist the urge to also implement the next assertion. That's a separate turn.
- Run the test. Confirm green. Run the full test suite. Confirm nothing broke.
6. Offer refactor candidates
At green, scan what you just wrote and the surrounding code:
- Duplication (with the test, with adjacent code, across files)
- Names that lie or are unclear
- Conditional complexity that could collapse
- Module boundaries the change violated
Propose 0–3 refactor moves. State each with a one-line rationale. Wait for user approval before applying. Refactor with all tests still green.
7. Postflight escalation check
If during this loop:
- 3+ tests in a row were awkward to write, suggest
/deepenon the affected module — the design is shallow. - The test required heavy mocking that obscured intent, suggest
/deepen— surface area is too wide. - The fix was a single character but the test needed 30 lines of setup, suggest
/deepen— implementation is buried in shallow wrappers.
Load
knowledge/escalation-triggers.mdfor the full trigger list and what to do when triggered.
Modes
- Default — full red-green-refactor loop, one test at a time.
--watch— run vitest in watch mode (pnpm test:watch <pattern>) and react to red/green transitions automatically.--scope=<file>— limit changes to a single file. Reject any code change outside it.--no-refactor— skip step 6. Use when the user explicitly wants minimal-change discipline only.
Output format
## Behavior under test
<one sentence>
## Test (red)
<file path: line numbers>
<code block>
Run: <command>
Result: FAIL — <expected failure message>
## Implementation (green)
<file path: line numbers>
<diff or code block>
Run: <command>
Result: PASS
## Full suite check
Run: <command>
Result: <N passed / M failed / K skipped>
## Refactor candidates
1. <move> — <why>
2. <move> — <why>
(awaiting approval; reply "skip" to commit as-is)
## Escalation
<none | suggest /deepen on <path> because <trigger>>
Constraints
- Never write more code than the failing test demands.
- Never refactor in the same step as adding behavior.
- Never proceed past red if the failure is for the wrong reason.
- If a test requires structure beyond a single function or shallow change, stop and propose
/scaffold(for new structure) or/deepen(for fixing existing structure). Do not silently expand scope. - Guidance only. If the user wants to skip TDD, respect it; note the deferred test in your output.
Composition
- Precedes nothing —
/tddis the implementation step itself. - Composes with
/scaffold(which wires types and stubs without business logic) and/test-expand(which plans coverage without enforcing the loop). - Postflight escalation routes to
/deepenwhen shallow-design smells appear. - Anti-pattern: invoking
/tddrepeatedly to drive a multi-feature change. TDD is per-behavior. Multi-feature work should pass through/plan-featurefirst.
$ARGUMENTS
See Also
../test-expand/knowledge/test-patterns.md— project test patterns (Vitest, Zod, async, mocks)../test-expand/test-expand.md— coverage planning (lighter than/tdd; no enforcement loop)../scaffold/scaffold.md— when the test demands new structure../deepen/deepen.md— when 3+ tests in a row are awkwarddomain-knowledge/coding-standards.md— TypeScript rules (strict null, async/await, type exports)