improving-tests - SKILL.md Agent Skill

allowed-tools:

Task
TaskOutput
TaskCreate
TaskUpdate
TaskList
Read
Grep
Glob
LS
Edit
Write
AskUserQuestion
Bash(go test *)
Bash(go tool *)
Bash(golangci-lint *)
Bash(pytest *)
Bash(uv run pytest *)
Bash(bun test *)
Bash(bun run *)
Bash(npm test *)
Bash(npx playwright *)
Bash(bunx playwright *) argument-hint: '[review|refactor|coverage|tdd|full]' context: fork description: Improve test design and coverage with behavior-focused tests, useful seams, characterization tests, TDD, and test refactoring. Use when improving tests, adding coverage, refactoring brittle tests, removing test waste, or working test-first. NOT for fixing production bugs (use fixing-code), production-code refactors (use refactoring-code), or reviewing non-test code quality (use reviewing-code). name: improving-tests user-invocable: true

Test Improvement

Follow the base skill. This Claude overlay only defines tool use and execution details.

Improve tests through public behavior seams. Do not inflate coverage with low-value assertions. Do not change production behavior unless the selected TDD slice requires it.

Arguments

review: find weak, duplicate, brittle, missing, slow, or flaky tests.
refactor: simplify tests without changing covered behavior.
coverage: add useful tests for uncovered business behavior or error paths.
tdd: one red-green-refactor slice at a time.
full: review, refactor, and add coverage.

If mode is missing, use AskUserQuestion with those options. Ask before adding a new test framework.

Use TaskCreate and TaskUpdate when the session has more than two steps:

Choose mode and scope.
Inspect test structure and project conventions.
Select behavior seam.
Apply one cluster or one TDD slice.
Verify and report.

Tool order

Use Read, Grep, Glob, and LS to find tests, fixtures, helpers, and nearby patterns.
Load only matching language references.
Run the narrow test or coverage command only when it helps the selected mode.
Use Edit for existing tests and Write only for new files.
Run the relevant verification before final output.

Use direct reads/search for small scopes. Spawn read-only agents only for broad or mixed-language audits.

Command discipline

Use only commands supported by the repo and available tools. Examples:

go test ./...
go tool cover -func=/tmp/coverage.out
golangci-lint run ./...
pytest -v
uv run pytest -v
bun test
bun run tsc --noEmit
npm test
npx playwright test --list
bunx playwright test --list

If a referenced command is unavailable, report it as skipped with the exact reason. Do not install a test framework or tool without user approval.

TDD mode

For tdd, write one failing test for one behavior, confirm it fails for the expected reason, implement the smallest passing code, then refactor only while green. Do not write a bulk suite for imagined future behavior.

Scope control

Test through public module, package, API, CLI, component, or service boundaries.
Mock only system boundaries.
Delete shallow duplicates only after stronger public-boundary tests cover them.
Do not force table-driven, parametrized, or it.each consolidation when separate tests make distinct behavior clearer.
If no safe behavior seam exists, use BLOCKED or Proposed Changes.

Output

Use TEST IMPROVEMENT COMPLETE for applied changes:

TEST IMPROVEMENT COMPLETE
=========================
Mode: review | refactor | coverage | tdd | full
Tests changed: N
Waste removed: N
Coverage: before → after | not measured
Status: CLEAN | NEEDS ATTENTION

Key improvements:
- file:line — change

Verification:
- <command> — pass/fail/skipped with reason

Use BLOCKED or Proposed Changes when tools, framework, scope, permission, or a safe seam is missing. Include the exact missing input and the command the applier should run.

Do not claim clean without a passing check or explicit skipped-check reason.