verification-before-completion - SKILL.md Agent Skill

name: verification-before-completion description: "Forces verification commands before success claims. Evidence before assertions. Triggers: complete, fixed, passing, done, ready, verified." user-invocable: false allowed-tools: Read

Verification Before Completion

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this message, you cannot claim it passes.

Claiming work is complete without verification is dishonesty, not efficiency.

The Gate Function

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying

When To Apply

ALWAYS before:

ANY variation of success/completion claims
ANY expression of satisfaction ("Great!", "Perfect!", "Done!")
ANY positive statement about work state
Committing, PR creation, task completion
Moving to next task
Delegating to agents

Don't Assume It Exists

A prompt — yours, the user's, or another agent's — naming a file, table, column, endpoint, env var, config key, or dependency is a hint, not a fact. The name existing in text is not the same as the thing existing in the repo.

Before you read from, write to, import, or quote any such resource, confirm it: Read the file, ls/grep the path, list the table, check the lockfile. One look beats a confident guess.
Never reconstruct a file's contents, a function's body, or an API's signature from its name alone. The name tells you almost nothing about the shape. If you haven't read it, you don't know it.
A plausible-sounding path or symbol is the easiest thing in the world to invent. Treat your own fluency here as a warning sign, not as evidence.
AUTHORIZED security work (CTF, sanctioned pentest, defensive review) still verifies — probing whether a path/endpoint/parameter actually exists is part of the job, not an exception to it.

Declare Ungrounded as a Real Outcome

When search, KB lookup, or tool calls come back with nothing relevant, the correct move is to say so and stop — not to backfill the hole from training memory and present it as established fact.

"Not found in <source>" is a complete, honest answer. State which source you checked and that it came up empty.
Do not promote a half-remembered detail to a definite claim just because the gap feels uncomfortable. An unverified recollection is a hypothesis; label it as one or leave it out.
If the missing piece blocks the task, surface the gap and ask — do not paper over it with a guess dressed as a finding.

Common Failures

Claim	Requires	Not Sufficient
Tests pass	Test command output: 0 failures	Previous run, "should pass"
Linter clean	Linter output: 0 errors	Partial check, extrapolation
Build succeeds	Build command: exit 0	Linter passing, logs look good
Bug fixed	Test original symptom: passes	Code changed, assumed fixed
Regression test works	Red-green cycle verified	Test passes once
Agent completed	VCS diff shows changes	Agent reports "success"
Requirements met	Line-by-line checklist	Tests passing
No dead code (Art. VI.1)	Grep for every removed/renamed symbol: 0 references	"I cleaned up what I touched"
Behavior change covered (Art. VI.2)	Integration test for the API surface + unit test + docs updated	Unit test on the helper only
Diff is clean (Art. VI.4)	Re-read full diff: no orphaned imports, no stale docs, no skipped fixes	"I only changed what I needed"
Resource exists (file/table/env var/endpoint/dep)	Read/list/grep it and see it	The prompt mentioned it, or the name sounds real
File contents / API signature	Open the file, read the actual lines	Inferring shape from the filename or symbol name
Answer is grounded	A source you actually read returns it	Recalling it from training memory

Red Flags — STOP

Using "should", "probably", "seems to"
Expressing satisfaction before verification
About to commit/push/PR without verification
Trusting agent success reports without independent check
Relying on partial verification
Thinking "just this once"

Rationalization Prevention

Excuse	Reality
"Should work now"	RUN the verification
"I'm confident"	Confidence is not evidence
"Just this once"	No exceptions
"Linter passed"	Linter is not compiler
"Agent said success"	Verify independently
"Partial check is enough"	Partial proves nothing
"Different words so rule doesn't apply"	Spirit over letter

Pre-Completion Self-Audit

Before you type "done", run these fast yes/no gates over your own draft. Each one has a fix — if the answer is bad, do the fix, don't ship the draft.

Self-check	If the honest answer is "no" / "yes, I did"
Did I actually run the verification command this message, or am I asserting success from memory?	Run it now. Memory is not a test result.
Does every factual claim trace to output or a source I actually saw?	Cite the evidence, or cut the claim.
Did I invent any path, filename, symbol, table, flag, or version number?	Verify it exists, or remove it.
Did I quote file contents or an API signature I never opened?	Open and confirm, or stop quoting.
Did a search/KB lookup come back empty that I then "filled in" anyway?	Replace the fill-in with "not found in `<source>`".
Am I about to commit/push/PR on the strength of a claim I haven't proven?	Prove it first.

Same ethos as the rest of this skill: evidence before assertions, applied to your own output one line at a time.

Key Patterns

Tests:

CORRECT:  [Run test command] [See: 34/34 pass] "All tests pass"
WRONG:    "Should pass now" / "Looks correct"

Regression tests (TDD Red-Green):

CORRECT:  Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
WRONG:    "I've written a regression test" (without red-green verification)

Requirements:

CORRECT:  Re-read plan → Create checklist → Verify each → Report gaps or completion
WRONG:    "Tests pass, phase complete"

Agent delegation:

CORRECT:  Agent reports success → Check VCS diff → Verify changes → Report actual state
WRONG:    Trust agent report at face value

Live-App Rubric Verification (optional)

When the success criterion is behavioral or visual (a UI flow, a generated app, a multi-step interaction), a pass/fail command is not enough — the proof is the running app, observed. Use a weighted rubric instead of a single assertion:

Define the rubric BEFORE building — 3–6 criteria, each with a weight and an explicit pass bar. Example:

Criterion	Weight	Pass bar
Core flow completes end-to-end	0.40	No error, reaches success state
Empty / loading / error states render	0.25	All three visible
Matches the requested layout	0.20	No major deviation
No console errors	0.15	Console clean

Launch the app and observe — actually run it and capture the behavior (screenshot, console, network). Do not infer from the source.
Score with a fresh evaluator — have an independent agent grade the observed behavior against the rubric, not the implementer who wrote it (self-grading anchors high). Compute the weighted score.
Gate on the threshold — below the bar (e.g. < 0.8) the claim is NOT verified: list the failing criteria as concrete defects and iterate. At or above the bar, state the score WITH the captured evidence.

The rubric is the verification command for work that has no green/red exit code. The same Iron Law applies: observed evidence before the claim, every time.

Constitutional Anchors

This skill enforces Constitution Art. VI.4 (Verify Before Claiming Done). The diff re-read is not optional: before any completion claim, confirm no orphaned references, no missing test coverage for changed paths, no stale docs. A task is not done while any of those exist.

The Bottom Line

Run the command. Read the output. THEN claim the result.

This is non-negotiable.