name: uverify description: Use after execute to attack the change and demonstrate how it's broken. Default stance — the change is broken; prove and demonstrate it. Builds an attack checklist (happy-path / negative / invariant / interface hypotheses), runs each freshly, smokes the end-to-end, writes a short summary to the task file's Verify section, loops back to execute on any demonstrated break.
Verify
Verify is the adversary. Default stance: the change is broken — prove and demonstrate it with running evidence in this message. Each check is a hypothesis about how the change bites; verify tries to make the bite land. Pass = honest attack produced no demonstrated break. Fail = break demonstrated, here's the evidence.
Verify's verdict either advances the workflow to up:ureview or bounces it back to up:uexecute with remediation notes. A short summary is persisted to the task file's ## Verify section so review (and later readers) see what was attacked.
Steelman the critique: don't fish for cases that pass — fish for the case that would bite a future reader, on-call, or downstream user. If you didn't try to break it, you didn't verify it.
Brevity
Phase 1 — Build the attack list (happy-path, negative, invariant, interface)
Each CK is a hypothesis: a specific way the change might bite. Built from the Plan, IV / PC / AS from Design, plus your own adversarial reading — not imagined confirmations.
Checks are numbered CK1..CKN within the task file. Each one targets exactly one bite.
- Happy-path attack (CK): hypothesize the claimed happy path falls over for some input shape, env condition, ordering, or hidden state the author didn't consider. "POST /items: find a valid payload the handler mishandles (unicode name, max-length, concurrent submit, retried request)." Not "show it returns 201" — find the valid case where it doesn't.
- Negative attack (CK): hypothesize bad input slips past validation, gets silently coerced, or surfaces the wrong error. "POST /items: find a missing-field shape that bypasses validation (null, empty string, whitespace, type confusion, deeply nested)." Not "show it returns 400" — find the bad input that doesn't get rejected.
- Invariant attack (CK): hypothesize the IV is already violated, or trivially bypassable. Reference IV
. "IV1: hunt for from traininginsrc/dataset/, and re-exports / dynamic imports that smuggle it in." - Interface attack (CK): hypothesize the declared contract doesn't match real callers or real returns. Reference IF
. "IF1: find a caller passing the wrong type, or a return path that violates the declared signature (None on error, str instead of bytes)."
The attack list lives in-session. It is not written to the task file.
Negative:
- CK3 — POST /items: try {name: ""}, {name: null}, {name: " "}, missing field entirely — find one that doesn't 400
- CK4 — Dataset.load: try missing file, directory, symlink to /dev/null, file with no read perms — find one that doesn't raise cleanly
Invariants / assumptions:
- CK5 (IV1) — grep "from training" src/dataset/ and re-export chains — try to find a smuggled import
- CK6 (IV2) — find a DB write that bypasses transaction() (raw cursor, ORM escape hatch)
- CK7 (AS1) — sample upstream /users — find a non-UTF-8
emailin the wild
Interfaces:
- CK8 (IF1) — grep
Parser.parsecallers — find one passing non-str - CK9 (IF2) — invoke Formatter.render with malformed AST — find a return that violates
-> str
</good-example>
<bad-example>
"I'll test the happy path." Confirmation, not attack. No hypothesis of how it breaks.
</bad-example>
## Phase 2 — Run each attack freshly, in this message
Evidence before claims. If you haven't run the attack in this message, you cannot claim "no break demonstrated."
- Use `/up:try`-style minimal probes — the direct command, no harness
- One-off scripts go in project-local `tmp/` (gitignored); clean up after
- Capture *actual* output. "Looks right" is not evidence; a stack trace is.
- Decide on what you saw, not what you expected. A passed attack (you tried hard and couldn't break it) is a real outcome; so is a landed attack (you broke it — record the repro).
If an attack failed to land in an earlier session — re-run it. State drifts; what couldn't be broken yesterday may break today.
## Phase 3 — Smoke the end-to-end (lowest-bar attack)
The smoke is the most basic attack hypothesis: the change is broken enough that it doesn't even run end-to-end in its real shape.
Run the shortest full path:
- CLI change → invoke the command with representative input
- API change → `curl` against a running server
- UI change → open in a browser, click through the feature
- ML change → run a tiny training step or inference call
If smoke fails, that's a demonstrated break — record it. If smoke passes, the change clears the lowest bar; the per-attack hypotheses still need to land or fail to land on their own.
If you can't run the smoke (infra unavailable), say so explicitly. Do not fabricate success. Do not substitute a unit test for the smoke.
If the smoke only exercises a proxy for the Goal — a small or local stand-in for a full-scale or real-world run (converting 10 files when the Goal is 700GB on a pod) — say so: name what the proxy covered and what real-world validation still stands between here and the Goal. That gap is what `/up:make` step 11 hands to the user.
## Phase 4 — Write the Verify summary to the task file
<required>
Append (or replace) the `## Verify` section of `docs/tasks/<slug>.md`. Keep it short — this is not a transcript, it's an audit trail of what was attacked.
Per-CK verdict vocabulary:
- `held` — attack ran, no break demonstrated.
- `broke` — attack landed; evidence required (one line, real output).
- `deferred` — attack couldn't be run (infra, scope); name what blocks it.
Overall `Result:` is `passed` only if every CK is `held` (or justifiably `deferred` with user-visible deferral). Any `broke` → `failed`.
Format:
```markdown
## Verify
**Result:** passed | failed
Happy-path:
- CK1 — <attack hypothesis> — held
- CK2 — <attack hypothesis> — broke: <evidence>
Negative:
- CK3 — <attack hypothesis> — held
Invariants / assumptions:
- CK4 (IV1) — <attack hypothesis> — held: <how attacked>
- CK5 (AS1) — <attack hypothesis> — deferred: <what blocks>
Interfaces:
- CK6 (IF1) — <attack hypothesis> — held
- CK7 (IF2) — <attack hypothesis> — broke: <evidence>
Smoke: `<command>` → <one-line result> (omit if not run; never substitute a non-smoke)
Goal: proxy only — <what the smoke covered, what real-world validation remains> (omit when the smoke exercised the full Goal)
Notes: <break repros, deferrals, re-runs> (omit if none)
Write this whether verify passed or failed. On failure, Notes names the demonstrated break(s) and points to where execute should pick up.
Result: passed
Happy-path:
- CK1 — unicode/long/double-submit POST /items — held
- CK2 — LF/CRLF/BOM variants of good.csv — held
Negative:
- CK3 — null/empty/whitespace/missing name on POST /items — held (all 400)
- CK4 — missing/dir/symlink-to-null inputs to Dataset.load — held (all raise)
Invariants / assumptions:
- CK5 (IV1) — grep + re-export sweep for
from traininginsrc/dataset/— held - CK6 (IV2) — manual trace of write paths — held, all go through
transaction()
Interfaces:
- CK7 (IF1) — caller-type sweep for
Dataset.load— held - CK8 (IF2) — malformed AST to Formatter.render — held (raises ValueError, doesn't violate
-> str)
Smoke: curl -X POST /items ... → 201 — end-to-end OK
</good-example>
<good-example>
Failing form (a break landed):
```markdown
## Verify
**Result:** failed
Negative:
- CK3 — null name on POST /items — broke: `{"name": null}` → 500 (TypeError in handler), not 400
Notes: validation layer doesn't reject `null` before the handler; should reject with 400 "name is required". Loop back to execute.
Phase 5 — Consolidate: held loops to review, broke loops to execute
- Every attack held (or justifiably deferred) → declare verify passed. Invoke
up:ureview. - Any attack broke → for each, describe how it should have worked conceptually (not "add the missing line" — the behavior it was supposed to exhibit under the attack). Loop back to
up:uexecutewith these notes. Do not move forward.
Future Work vs. incomplete work — the slacking-loophole rule
When a check fails or surfaces ambiguity, do not move it to ## Conclusion → Future Work unless you have justification.
Red flags — STOP, do not claim pass
If any of these was the basis of a pass verdict: back to Phase 1.
Never
- Claim a CK held without running the attack in this message
- Declare pass when any attack broke
- Build the attack list as restatements of the happy path (no adversarial angle = not an attack)
- Skip verify to get to review faster
- Trust a prior session's verdict — re-run
Hands-off mode
See up:handsoff for the full contract. Stage-specific delta: verify's behavior is unchanged — the pass → review / fail → execute loop already runs without user confirmation. Infeasible smoke tests (infra unavailable, etc.) are logged under ### Deferred (needs user input) so the user knows what wasn't verified end-to-end. Never fabricate success to skip a deferred entry.
Terminal state
Verify summary written to task file. Pass → invoke up:ureview. Fail → invoke up:uexecute with failure notes describing intended behavior.