name: test-first-evidence
description:
Govern an implementation change with failing-test evidence — classify the change, capture a failing test (or an explicit waiver) before editing production code, keep the change scoped, re-run final validation, and record it all through the nils-cli test-first-evidence command. Use when work should be governed by failing-test evidence before production changes, or when a repo / user-global [test_first].require gate needs a verified record.
Test First Evidence
Contract
Prereqs:
test-first-evidenceis installed from the released nils-cli package and available onPATH.- The implementation change is classified before production behavior is edited.
- The output directory is explicit.
Inputs:
- Implementation task, target behavior or bug, done criteria, relevant files, known test command, and constraints when available.
- Classification and production path.
- Failing command and exit code, or an explicit waiver reason.
- Final validation command and pass/fail status.
Outputs:
- Change classification.
- A deterministic test-first evidence record: failing-test evidence before production edits, or an explicit waiver with substitute validation, plus a passing final validation.
- A verification result usable as delivery evidence for the
forge-clitest-first gate.
Failure modes:
- Production behavior changed without failing evidence or waiver.
- Final validation is missing.
- The evidence record is incomplete or malformed.
- No usable test harness exists and a waiver is not acceptable for the change.
Discipline
The engineering judgment behind the record — follow it whether or not the
forge-cli gate is enabled:
- Classify before editing production code. Decide whether the request changes testable production behavior. Treat bug fixes, parser logic, state machines, API contracts, workflow logic, user-visible behavior, and new features as testable by default. Docs-only, generated-only, formatting-only, visual-only, exploratory spikes, emergency hotfixes, or repos with no usable test harness may use a waiver.
- Failing test first. For a testable behavior change, add or identify a focused regression / unit / integration / acceptance test and capture failing evidence (command, exit code, failing test name, concise failure summary) before editing production code. Do not weaken, skip, or overfit the test to the planned implementation.
- Waiver when test-first does not apply. State the waiver before editing: the reason, why a failing test is not practical now, and the substitute validation you will run.
- Implement after evidence. Only edit production code after recording failing evidence or a waiver. Keep the change scoped to making the failing test pass; add broader tests only when blast radius or a shared contract justifies it.
- Final validation. Re-run the failing test and the smallest meaningful related validation; record command, result, and any skipped checks.
Entrypoint
Use the released CLI directly (point --out at an explicit evidence directory
resolved through agent-out, not a hand-written /tmp path):
test-first-evidence init --out "$evidence_dir" --classification behavior-change --production-path src/lib.rs
test-first-evidence record-failing --out "$evidence_dir" --command "cargo test bug_repro" --exit-code 101 --summary "bug reproduced"
test-first-evidence record-waiver --out "$evidence_dir" --reason "docs-only change"
test-first-evidence record-final --out "$evidence_dir" --command "cargo test bug_repro" --status pass
test-first-evidence verify --out "$evidence_dir" --format json
Workflow
- Classify the change, then initialize evidence before editing production behavior.
- Record a failing test when practical (rule 2 above).
- Record a waiver when the change is docs-only, config-only, or otherwise not amenable to failing-test evidence (rule 3).
- Implement scoped to the failing test, then record final validation.
- Verify the record before using it as delivery evidence — a record is complete only with a failing test or waiver and a passing final validation.
Delivery gate
When [test_first].require resolves true — from a repo .forge-cli.toml or
the user-global ${XDG_CONFIG_HOME:-~/.config}/forge-cli/config.toml layer —
forge-cli pr create / pr deliver require --test-first-evidence <dir> for
--kind feature / bug PRs, pointing at a directory whose record this skill
produced and verify accepts. docs / chore / ci / refactor kinds are
exempt. A missing, incomplete, or unreadable record fails the PR with
test_first_evidence_required / _incomplete / _unreadable. The gate is the
release-surface enforcement point; this skill is how you satisfy it. See
core/policies/git-delivery.md for the delivery-side contract.
Boundary
test-first-evidence owns the evidence record mechanics and the engineering
judgment about classification, when a failing test is practical, and whether a
waiver is acceptable. The release-surface enforcement (the config-gated
requirement on forge-cli pr create / pr deliver) lives in nils-cli, not in
skill prose; this skill produces the record that gate verifies.