allowed-tools:
- Task
- TaskOutput
- TaskCreate
- TaskUpdate
- TaskList
- Read
- Grep
- Glob
- LS
- Edit
- Write
- AskUserQuestion
- Bash(go test *)
- Bash(go tool *)
- Bash(golangci-lint *)
- Bash(pytest *)
- Bash(uv run pytest *)
- Bash(bun test *)
- Bash(bun run *)
- Bash(npm test *)
- Bash(npx playwright *)
- Bash(bunx playwright *) argument-hint: '[review|refactor|coverage|tdd|full]' context: fork description: Improve test design and coverage with behavior-focused tests, useful seams, characterization tests, TDD, and test refactoring. Use when improving tests, adding coverage, refactoring brittle tests, removing test waste, or working test-first. NOT for fixing production bugs (use fixing-code), production-code refactors (use refactoring-code), or reviewing non-test code quality (use reviewing-code). name: improving-tests user-invocable: true
Test Improvement
Follow the base skill. This Claude overlay only defines tool use and execution details.
Improve tests through public behavior seams. Do not inflate coverage with low-value assertions. Do not change production behavior unless the selected TDD slice requires it.
Arguments
review: find weak, duplicate, brittle, missing, slow, or flaky tests.refactor: simplify tests without changing covered behavior.coverage: add useful tests for uncovered business behavior or error paths.tdd: one red-green-refactor slice at a time.full: review, refactor, and add coverage.
If mode is missing, use AskUserQuestion with those options. Ask before adding a
new test framework.
Use TaskCreate and TaskUpdate when the session has more than two steps:
- Choose mode and scope.
- Inspect test structure and project conventions.
- Select behavior seam.
- Apply one cluster or one TDD slice.
- Verify and report.
Tool order
- Use
Read,Grep,Glob, andLSto find tests, fixtures, helpers, and nearby patterns. - Load only matching language references.
- Run the narrow test or coverage command only when it helps the selected mode.
- Use
Editfor existing tests andWriteonly for new files. - Run the relevant verification before final output.
Use direct reads/search for small scopes. Spawn read-only agents only for broad or mixed-language audits.
Command discipline
Use only commands supported by the repo and available tools. Examples:
go test ./...
go tool cover -func=/tmp/coverage.out
golangci-lint run ./...
pytest -v
uv run pytest -v
bun test
bun run tsc --noEmit
npm test
npx playwright test --list
bunx playwright test --list
If a referenced command is unavailable, report it as skipped with the exact reason. Do not install a test framework or tool without user approval.
TDD mode
For tdd, write one failing test for one behavior, confirm it fails for the expected
reason, implement the smallest passing code, then refactor only while green. Do not
write a bulk suite for imagined future behavior.
Scope control
- Test through public module, package, API, CLI, component, or service boundaries.
- Mock only system boundaries.
- Delete shallow duplicates only after stronger public-boundary tests cover them.
- Do not force table-driven, parametrized, or
it.eachconsolidation when separate tests make distinct behavior clearer. - If no safe behavior seam exists, use
BLOCKEDorProposed Changes.
Output
Use TEST IMPROVEMENT COMPLETE for applied changes:
TEST IMPROVEMENT COMPLETE
=========================
Mode: review | refactor | coverage | tdd | full
Tests changed: N
Waste removed: N
Coverage: before → after | not measured
Status: CLEAN | NEEDS ATTENTION
Key improvements:
- file:line — change
Verification:
- <command> — pass/fail/skipped with reason
Use BLOCKED or Proposed Changes when tools, framework, scope, permission, or a
safe seam is missing. Include the exact missing input and the command the applier
should run.
Do not claim clean without a passing check or explicit skipped-check reason.