skill-verification-gate - SKILL.md Agent Skill

name: skill-verification-gate description: "Evidence before claims — run verification commands before declaring work complete, fixed, or passing"

Host: Codex CLI — This skill was designed for Claude Code and adapted for Codex. Cross-reference commands use installed skill names in Codex rather than /octo:* slash commands. Use the active Codex shell and subagent tools. Do not claim a provider, model, or host subagent is available until the current session exposes it. For host tool equivalents, see skills/blocks/codex-host-adapter.md.

Verification Gate

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this turn, you cannot claim it passes.

The Gate

Before claiming any success or expressing satisfaction:

IDENTIFY — What command proves this claim?
RUN — Execute the full command (fresh, not cached)
READ — Full output, check exit code, count failures
VERIFY — Does output actually confirm the claim?
ONLY THEN — State the claim WITH evidence

Skip any step = the claim is unverified.

What Counts as Evidence

Claim	Requires	NOT Sufficient
Tests pass	Test command output showing 0 failures	Previous run, "should pass"
Build succeeds	Build command exit 0	Linter passing
Bug fixed	Reproduce original symptom: now passes	"Code changed, should work"
Regression test works	Red (fail without fix) → Green (pass with fix)	Test passes once
Subagent completed task	`git diff` shows expected changes	Subagent says "done"
Requirements met	Line-by-line checklist against spec	Tests passing
Provider dispatch worked	Output contains expected content	No error ≠ success

Red Flags — STOP and Verify

If you catch yourself thinking any of these, STOP:

Thought	What to do instead
"Should work now"	Run the verification
"I'm confident"	Confidence ≠ evidence
"Just this once"	No exceptions
"The linter passed"	Linter ≠ tests ≠ build
"The agent said it worked"	Verify independently
"It's a small change"	Small changes cause big bugs

Multi-Provider Context

In Claude Octopus workflows, verification is especially critical because:

Provider outputs can be hallucinated — Codex, Gemini, Antigravity, Copilot, and other providers may claim success without evidence
Consensus ≠ correctness — three models agreeing doesn't mean they're right
Synthesis files may be stale — check timestamps, don't assume freshness
orchestrate.sh exit code 0 ≠ quality — the script ran, but did it produce good output?

After any multi-provider workflow:

# Verify synthesis file exists and is recent
ls -la ~/.claude-octopus/results/*-synthesis-*.md | tail -1

# Verify it has content (not just headers)
wc -l ~/.claude-octopus/results/*-synthesis-*.md | tail -1

When to Apply

ALWAYS before:

Committing code
Creating PRs
Marking tasks complete
Moving to next workflow phase
Reporting results to user
Claiming a bug is fixed

In orchestrate.sh workflows:

After probe (discover) — verify synthesis file exists
After grasp (define) — verify consensus score meets threshold
After tangle (develop) — verify tests pass, not just that code was written
After ink (deliver) — verify review actually ran, not just that it was dispatched

Examples

Correct: Evidence-Based Claim

$ npm test
  ✓ user.create() saves to database (45ms)
  ✓ user.create() validates email (12ms)
  Tests: 2 passed, 2 total

All 2 tests pass. ← Claim backed by output.

Incorrect: Claim Without Evidence

I've implemented the feature. It should work now. The tests should pass.
← No test was run. "Should" is not evidence.

Correct: Regression Test Red-Green

1. Write test → run → FAIL (expected, proves test detects the bug)
2. Implement fix → run → PASS (proves fix works)
3. Revert fix → run → FAIL (proves test isn't false-positive)
4. Restore fix → run → PASS (final confirmation)

Integration with Other Skills

This skill is referenced by:

flow-develop.md — verification gate after implementation
flow-deliver.md — verification gate before delivery
skill-code-review.md — verify review findings before reporting
skill-tdd.md — red-green cycle requires evidence at each step
skill-factory.md — autonomous pipeline must verify at every phase