testing - SKILL.md Agent Skill

name: testing description: Test Protocol

Tests are the immune system — they reject bugs, not document them.

Test Truth Table

Code	Test	Interpretation
Working	Green	Good
Buggy	Red	Good (bug exposed)
Working	Red	Wrong expectations
Buggy	Green	DANGEROUS
Unknown	Red	STOP — use Analysis Framework below

ANTI-PATTERN: The most dangerous instinct is to "fix" failing tests by accepting whatever source currently does.

When Tests Fail

Ask for context:

Is the code in production? Not a guarantee of correctness, yet to consider
Is the test much more recent than the source code? Could explain wrong expectations

Default heuristic: Tests encode intent; assume the test is correct until evidence suggests otherwise. Source drifts; tests usually don't drift without reason.

Analysis Framework:

Examine both logics independently
Consider system design: method contract, error patterns
Identify sounder logic given context
When uncertain, ask user
Type hints may be wrong — verify actual runtime behavior before trusting signatures

Test Modification

FORBIDDEN without approval:

Changing assertions to match buggy behavior
Weakening expectations
Adding exception handlers to mask failures

Allowed: Fresh repro tests capturing current failure (existing tests untouched).

Writing Tests

Quality Standards:

Deterministic, behavioral testing over type checking
Exception testing with message validation: pytest.raises(ValueError, match="...")
Systematic edge cases: None, boundaries, dates
Branch coverage: success and failure paths

Assertion Strength: Would this assertion pass for a broken implementation? If yes (tautology, type-only, existence-only, shape-only), strengthen it.

Mocking Discipline: Test the code, not the scaffolding. Heavy mock setup suggests you're testing assumptions about dependencies, not behavior. Mock external boundaries (APIs, DBs, filesystems), not internal logic.

Paired Coverage: For each positive test, consider a negative counterpart. Happy-path-only is a red flag for complex source.

Coverage Relevance: "Tests pass" ≠ "tests exercise changed code". For non-trivial changes, verify tests cover modified paths.

Signal Hierarchy: Integration failures > Unit failures for architectural changes. Green units with red integration = units work but don't compose.

Structure: Prefer GIVEN / WHEN / THEN blocks for readability. Omit a section when empty; the structure clarifies intent, not ceremony.

Template Principle: Identify one high-quality test file as reference. Match its patterns.

Flaky Tests: Flakiness is a bug, not noise. When encountered: isolate, report, do not retry-until-green. Flakiness masks real failures.