name: test-provenance-guard
description: >
Detects tests that pass by construction — tests that define a private
copy of the function under test instead of importing the production
module — and self-heals by extracting the inline logic to an exported
function, updating production callers, and rewriting the test to
import the export. Two checks: (1) static — the test file must import
the SUT and must not shadow its exported names; (2) mutation —
blanking the production function body re-runs the test and expects
failure. Runs autonomously inside autonomous-workflow Phase 4 and as
a slash command for human-driven PR review. Use when adding new tests
for existing or refactored code, when CI is green but you are unsure
whether the tests actually exercise production, or when reviewing a
PR for tests-by-construction. Triggers on "test provenance", "tests
by construction", "verify tests cover real code", "tests duplicate
logic", "mutation sanity check", "are these tests fake",
"/test-provenance-guard".
disable-model-invocation: false
argument-hint: '[--mode static|mutate|both] [--fix|--report-only] [
Test Provenance Guard
Catches the failure mode where a new test file re-implements the function it claims to test, then asserts against its own copy. The tests pass. The CI is green. The tests provide zero regression protection.
This skill detects that pattern and, when invoked autonomously, fixes it without user intervention.
This
SKILL.mdis a thin index. Detailed procedures live inrules/*.mdand load on demand. The case-study post-mortem and failure-mode taxonomy live inreferences/*.md.
Contents
- Inputs
- Workflow
- Quick Decision Flow
- Required Reading by Phase
- Output Contract
- Core Principles
- Anti-patterns
- Definition of Done
- Related Skills
Inputs
Parse $ARGUMENTS:
| Argument | Default | Meaning |
|---|---|---|
--diff |
off | Discover changed test files via git diff --name-only <base>...HEAD. |
--base <ref> |
main |
Base ref for --diff. |
<positional paths> |
— | Explicit test files to check (overrides --diff). |
--mode static|mutate|both |
both |
Which checks to run. |
--fix |
on when called from autonomous-workflow; off for slash invocation | Apply self-heal: extract logic, rewrite tests, re-verify. Gated by confidence(code) ≥ 90 % unless --no-confidence-gate is passed. |
--report-only |
off | Force off --fix even when called from autonomous-workflow. |
--no-confidence-gate |
off | Manual override — skip the pre-heal confidence gate. Reserved for human-driven slash invocations; never set inside the autonomous loop. |
The skill defaults to --fix in the autonomous loop, but autofix is gated by Skill("confidence", "code") ≥ 90 % before any file is mutated (see Self-Heal Step 3). Below the threshold, the heal is skipped, the finding is emitted as heal-skipped-low-confidence, and the autonomous-workflow's stuck-loop protocol takes over.
Manual slash invocation defaults to report-only so a human can review the proposed extraction first; pass --fix explicitly to apply. The confidence gate is recommended for manual invocations too, but a human can pass --no-confidence-gate to override.
If no test files resolve from either --diff or positional args, exit cleanly with no test files in scope, skipping.
Workflow
| Phase | Name | Rule file | Gate |
|---|---|---|---|
| 1 | Static check | rules/static-check.md |
Every test file imports the SUT; no shadowed exports |
| 2 | Mutation check | rules/mutation-check.md |
Sabotaged production code makes the test FAIL |
| 3 | Self-heal (if --fix) |
rules/self-heal.md |
confidence(code) ≥ 90 % on the proposed extraction; extracted helper exists, test imports it, mutation check now passes |
Phase 1 is mandatory.
Phase 2 runs only when Phase 1 passes (a static-shadow finding already explains the failure — no need to mutate).
Phase 3 runs only on detected findings and only when --fix is on and the pre-heal confidence gate clears (≥ 90 %).
Quick Decision Flow
for each test file in scope:
finding = static_check(file) # rules/static-check.md
if finding.kind == "shadowed-export":
if --fix: gated_self_heal(finding) # rules/self-heal.md
else: record(finding)
continue
if finding.kind == "no-sut-import":
record(finding) # cannot mutate without an SUT
continue
if mode in {"mutate", "both"}:
finding = mutation_check(file) # rules/mutation-check.md
if finding.kind == "test-survives-sabotage":
if --fix: gated_self_heal(finding)
else: record(finding)
report(findings)
exit code = 0 if all healed or no findings, else 1
def gated_self_heal(finding):
proposal = plan_extraction(finding) # Step 1+2 of self-heal
if not --no-confidence-gate:
score = Skill("confidence", "code", proposal)
if score < 90:
record(finding, status="heal-skipped-low-confidence", score=score)
return
self_heal(finding, proposal) # Steps 4-6 of self-heal
Required Reading by Phase
Load on demand — do not preload.
| Phase | Files |
|---|---|
| 1 | rules/static-check.md |
| 2 | rules/mutation-check.md |
| 3 | rules/self-heal.md |
| — | references/failure-modes.md — taxonomy |
| — | references/pr-12340-postmortem.md — origin case |
Output Contract
After every run, emit a structured report. The autonomous-workflow Phase 4 logger consumes the JSON line; humans read the Markdown summary.
TEST PROVENANCE REPORT
======================
Files checked: <N>
Findings:
- <test-file>:<line> — <kind> — <one-line evidence>
Action: <healed | reported | skipped | skipped-low-confidence>
Confidence (if gated): <N>%
Self-heal summary (if --fix):
- Pre-heal confidence: <N>% (gate ≥ 90 %)
- Extracted <symbol> from <prod-file>:<line> → <new-export-file>:<line>
- Rewrote <test-file> to import <symbol>
- Mutation check after heal: PASS
Exit: <0 | 1>
When called from autonomous-workflow Phase 4, also append one line to .agent/{branch}/plan.md's Progress Log:
- [<ISO-8601>] Phase 4: test-provenance-guard — <N> file(s) checked, <M> finding(s), <K> healed (confidence ≥ 90 %), <L> skipped-low-confidence
Core Principles
- A test that does not import the production module is not a test of that module. The static check is the cheapest, strongest signal.
- A test that survives sabotage of its claimed target is by-construction. Mutation is the second-line evidence — slow but conclusive.
- Self-heal is a refactor, not a rewrite. Extract the inline logic verbatim; do not redesign the function while moving it.
- Never silently change test assertions. Self-heal moves logic and rewrites imports; assertions are preserved unchanged.
- Report, then act.
Even when
--fixis on, the report is emitted before the heal so the audit trail survives. - Skip cleanly when out of scope. No test files? Exit 0. Test files in unsupported languages? Skip them with a one-line note and continue.
Anti-patterns
- Treating a passing test as proof the test is correct. A test can pass for the wrong reason — that is the entire raison d'être of this skill.
- Mutating without restoring.
Every sabotage step is paired with a restore —
git restore <sut-file>when the file was clean, or moving the recorded backup over it when it was dirty — seerules/mutation-check.md. - Self-healing with a redesign. Moving logic and improving it in the same step is two refactors masquerading as one — split them.
- Running mutation when static already failed. Wasted tokens. Skip Phase 2 when Phase 1 already produced a finding.
- Asking the user during autonomous runs.
This skill self-heals when
--fixis on; if it cannot, it reports the finding and exits — the autonomous-workflow stuck-loop protocol takes over.
Definition of Done
- Every test file in scope passed Phase 1 (or was healed).
- Every test file in scope passed Phase 2 (or was healed).
- Self-heal summary lists the extraction(s) with file paths and line numbers.
- All originally passing tests still pass after self-heal (
<TEST_CMD>re-run is green). - Report emitted in the structured format above.
- When called from autonomous-workflow, the Progress Log line is appended.
Related Skills
autonomous-workflow— invokes this skill from Phase 4 Step 5.tdd— Test-After Mode's "mutate to verify" rule is the conceptual ancestor of Phase 2 here.code-quality— Pass 12 ("Testability") covers adjacent design issues; this skill targets the specific by-construction failure.confidence— when self-heal cannot resolve a finding, the autonomous-workflow Phase 4 stuck-loop protocol callsconfidence(analysis)next.