name: aspirations-verify description: "Phase 5 of the aspirations loop: verifies a just-executed goal using hypothesis outcomes, unified outcome/check evaluation, Q1/Q2/Q3 verification escalation, streak tracking, and dependent-goal unblocking. Use whenever /aspirations-execute finishes and the loop must confirm the goal actually met its verification criteria before state-update fires. Internal sub-skill — never invoke outside the aspirations loop; assumes the just-executed goal is still in working context." user-invocable: false parent-skill: aspirations conventions: [aspirations, goal-schemas] minimum_mode: autonomous execution_history: total_invocations: 0 outcome_tracking: successful: 0 unsuccessful: 0 success_rate: 0.0 last_invocation: null revision_id: "skill-bootstrap-aspirations-verify-c60403"
previous_revision_id: null
/aspirations-verify — Goal Completion Verification
Verifies whether a goal achieved its desired end state after execution. Handles hypothesis goals, unified verification checks, and the Q1/Q2/Q3 structured escalation protocol for goals with empty checks.
Abbreviation Policy
Mandatory writes for this obligation: see core/config/obligation-schema.yaml
→ obligations.verify. Abbreviation is permitted only when one of the
abbreviated_allowed_when conditions holds (routine outcome, or
context_budget.zone == tight). Zone is distance-to-autocompact, not raw
usage — see core/scripts/context-budget-status.py classify_zone. When
the tight-zone condition is the basis for abbreviation, Bash: bash core/scripts/context-budget-banner.sh and quote its output before claiming
the condition. When abbreviating on the tight-zone condition, log TWO lines
in the iteration journal entry (in this order):
OBLIGATION ABBREVIATED: verify — {condition}
CTX: raw N% | of-autocompact N% | zone tight | headroom N tokens | env ... | updated ...
The banner line must be the actual output captured from the banner script.
core/scripts/context-citation-audit.sh audits the pair. For other
conditions (routine outcome), the banner line is not required.
Then satisfy minimum_inline. Otherwise invoke this Skill literally — a
bare status write is NOT a verify. The learning-gate audit (Phase 9.5d)
scans those journal lines and logs false claims to
agents/<agent>/session/obligation-audit.jsonl.
Step 0: Load Conventions — Bash: load-conventions.sh with each name from the conventions: front matter.
Inputs (from orchestrator)
goal: The executed goal object (with verification field)result: Execution result (from Phase 4)source: Queue origin ("world"or"agent") — pass--source {source}to allaspirations-*.shcallsprior_checks: dict (optional, default{}) — map of checks already passed in a prior turn that was interrupted by autocompact or graceful stop. Keys:q1_passed,q1_artifact,q1_5_passed,q1_5_checklist,q2_passed,q2_failure_mode_checked,q3_scope,standard_checks_passed. Threaded in by the Phase -1.4 Graceful Stop Handler fromiteration-checkpoint.json'sphase_progressfield. See core/config/conventions/compact-recovery.md "Iteration Checkpointphase_progressField".
Outputs (to orchestrator)
goal_completed: Boolean — did the goal pass verification?aspiration_complete: Boolean — is the parent aspiration now fully complete?
Hypothesis Goal Verification
if goal.hypothesis_id:
if result == "CONFIRMED" or result == "CORRECTED":
if not goal.recurring:
Bash: aspirations-update-goal.sh --source {source} <goal-id> status completed
Bash: aspirations-update-goal.sh --source {source} <goal-id> completed_date <today>
Bash: aspirations-update-goal.sh --source {source} <goal-id> achievedCount <N+1>
Update recurring streaks if applicable (see Recurring Streak Logic below)
Unblock dependent goals
elif result == "EXPIRED":
Bash: aspirations-update-goal.sh --source {source} <goal-id> status expired
else:
# PENDING — hypothesis hasn't resolved yet
Bash: aspirations-update-goal.sh --source {source} <goal-id> status pending
Unified Verification Checks
Scripted Check Evaluation
Structured verification.checks[] entries are evaluated by
verify-check-eval.sh (sister script to predicate.py). Empty checks[]
falls through to the Q1/Q2/Q3 Empty-Checks Escalation Protocol below — the
scripted path handles structured checks only; Q1/Q2/Q3 is the LLM-judgment
path for investigation / novel-research goals that legitimately lack
structured checks.
Bash: bash core/scripts/verify-check-eval.sh --goal <goal-id> --all
Read JSON result:
flags = []: all_passed → standard-pass path
flags = ["checks_empty"]: fall through to Q1/Q2/Q3 (LLM evidence)
flags = ["checks_failed"]: goal fails verification; mark pending; record blocker if warranted
flags = ["has_string_checks"]: structured passed but string checks exist; run Q1/Q2/Q3 too
Sub-Phase Checkpoint Helper (shared by Q1/Q2/Q3 and standard checks)
Each Q-check and the standard-checks block writes its outcome to
agents/<agent>/session/iteration-checkpoint.json via the single-writer wrapper
(g-248-36). This makes verify resumable mid-phase: if autocompact or graceful
stop interrupts after Q1 passes but before Q3 runs, the re-invoked verify
sees prior_checks.q1_passed and skips Q1. Pattern (reused below, one line
per write — dotted phase_progress.<key> paths deep-merge into existing
phase_progress dict):
bash core/scripts/loop-state-save.sh update --set "phase_progress.<key>=<value>"
The wrapper provides typed-key validation, atomic tempfile+rename, and
deep-merge into nested dicts. The checkpoint auto-deletes at LOOP_CONTINUE,
so phase_progress resets on any successful iteration.
Each check also appends a diary breadcrumb (execution-diary.sh append, JSON
on stdin) so postcompact-restore surfaces Q-check outcomes in fresh context.
Pre-Escalation Retrieval (G2 / R6)
Before answering Q1/Q2/Q3, load verification-related guardrails and reasoning-bank entries so the answers are checked against accumulated anti-patterns rather than written from memory.
Per .claude/rules/retrieve-before-deciding.md decision point 2 ("verifying
a goal's outcome"), this retrieval is mandatory whenever checks is empty
AND no prior_checks shortcut applies. Fail-open: if retrieve.sh errors,
log the failure and proceed to Q1/Q2/Q3 anyway — verify must not block.
IF len(checks) == 0 AND NOT (prior_checks.q1_passed AND prior_checks.q2_passed AND prior_checks.q3_scope):
# Token-overlap match against verification anti-patterns
Bash: retrieve.sh --category "verification escalation Q1 Q2 Q3 evidence anti-pattern" --depth shallow
From the returned JSON, surface to the Q1/Q2/Q3 answers:
- guardrails[] whose rule mentions positive-state, negation-claim,
artifact-evidence, schema-probe, exhaustive-search, or canonical-probe
- reasoning_bank[] entries flagged with verification or
verify-before-assuming categories
Use the loaded entries to:
1. Sharpen Q1: does the claimed artifact match an entry's "what counts
as evidence" pattern? If a guardrail (e.g., guard-346 output-completeness,
positive-state-gate companion) flags the claim shape, Q1 must run
the explicit probe before passing.
2. Sharpen Q2: do any retrieved entries describe failure modes for this
goal's category that the agent should explicitly check?
3. Sharpen Q3: do any entries warn about unit-only verification for this
class of work?
Diary breadcrumb:
echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"Verification retrieval: rb=<N> guard=<M> loaded"}' | bash core/scripts/execution-diary.sh append
ELSE:
# checks[] is non-empty (scripted path) OR all Q-checks already passed via prior_checks.
# Skip pre-escalation retrieval — checks evaluation is already structured,
# and prior_checks means a prior turn loaded what it needed.
Skip
Collaborative Goal Agent-Leg Terminal State (g-305-04)
Fires BEFORE Q1/Q2/Q3, ONLY for a collaborative goal that declares an agent
leg. Lets the agent close its own leg cleanly instead of inventing closure
justification for the user-gated portion (the US-04 anti-pattern: zeta hedged
g-250-80 closure twice because the schema forced "agent done, user pending"
through the single outcomes field). Backward-compatible: goals without
verification.outcomes_agent_leg, or non-collaborative goals, skip this branch
entirely and verify exactly as before.
IF "user" in (goal.participants or []) AND goal.verification.outcomes_agent_leg is a non-empty list:
# Evaluate ONLY the agent-leg outcomes against in-hand artifacts -- these
# are the criteria the agent can satisfy WITHOUT the user. Apply the same
# artifact-checking rigor as Q1 (cite the concrete artifact per outcome;
# no hedging on observed evidence, communication-clarity.md Rule 6).
agent_leg_met = every entry in outcomes_agent_leg is satisfied by a cited artifact
IF agent_leg_met:
# VALID TERMINAL STATE -- "agent-leg-complete". The agent leg is the
# authoritative closure criterion for the agent's portion. The user leg
# (outcomes minus outcomes_agent_leg) is tracked SEPARATELY via
# user_leg_scope + any follow-up goal -- NOT hedged into this closure
# and NOT a reason to hold the goal pending.
all_passed = true
status = completed
outcome_note = "agent-leg-complete; user-leg pending (user_leg_scope: <scope or participants:[...,user]>)"
Assert (Rule 6 form): "Agent leg complete because <artifact> satisfies each
outcomes_agent_leg entry; user leg (<remaining outcomes>) remains pending,
tracked via user_leg_scope -- closing agent-leg-complete, not hedged."
Bash: echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"agent-leg-complete terminal state: agent outcomes met, user leg pending"}' | bash core/scripts/execution-diary.sh append
SKIP Q1/Q2/Q3 -- agent-leg closure is established. Proceed to streak/attribution (Phase 5.3).
ELSE:
# The agent's OWN leg is not done yet -- fall through to the standard
# Q1/Q2/Q3 escalation below (do not close on a partial agent leg).
Continue to Empty-Checks Escalation Protocol.
Empty-Checks Escalation Protocol (Q1/Q2/Q3)
When len(checks) == 0, the agent MUST answer three structured questions.
Each Q respects prior_checks — a Q already passed in a prior turn is skipped.
Assertion format requirement (communication-clarity.md Rule 6): Every verification statement in Q1/Q2/Q3 responses MUST use the form "X is Y because Z" — cite the concrete evidence source (script output, file read, command result, artifact path). Hedging language on observed evidence — "possibly", "might be", "could be", "seems to", "appears to" — on an outcome the execution actually produced triggers re-verification: the reviewer cannot distinguish "evidence was genuinely ambiguous" from "agent didn't look carefully enough," so hedge detection forces another pass. When evidence is truly partial, state what was observed and what is unknown as two separate claims, not one hedged claim.
Q1 EVIDENCE: "What concrete artifact (file, output, state change, commit) proves this goal succeeded?" Must reference a checkable artifact.
IF prior_checks.q1_passed: log"Q1 skipped (prior checkpoint: artifact={prior_checks.q1_artifact})"; skip to Q2.- Else: if references concrete artifact → attempt to verify (Read file, check existence).
- If artifact verification fails:
all_passed = false, status → pending. - If no concrete reference:
all_passed = false.
- If artifact verification fails:
- On Q1 PASS (artifact verified):
bash core/scripts/loop-state-save.sh update --set "phase_progress.q1_passed=true" --set "phase_progress.q1_artifact=<artifact-path>" echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"Q1 passed: artifact=<artifact-path>"}' | bash core/scripts/execution-diary.sh append - Positive-state audit on Q1 claim (verify-before-assuming.md Positive
File-State Claims): if the Q1 artifact claim references a specific file
(e.g., "handoff.yaml reflects session N", "config.yaml contains Y"), the
claim must be backed by an in-turn Read/ls/stat of that file — not narrated
from prior-session memory. Run:
Exit 1 = claim unverified →py core/scripts/positive-state-gate.py --claim "<Q1 artifact claim>" --evidence "<concatenated in-turn Read/ls outputs referencing the file>"all_passed = false, status → pending, and appendverification_gapto sensory_buffer citing the gate reason. Re-read the file before re-verifying. Known false positive → re-call with--override "<justification>".
Q1.5 GENERATED CHECKLIST (TICKing All the Boxes, 2410.03608 — BRD Gap 15;
runs only when Q1 passed): decompose "did this goal succeed?" into a concrete,
artifact-checkable checklist, then surface verification dimensions the goal's
OWN criteria never declared. Primarily DETECTIVE — its escalation gate is
conservative (covered-criterion failures only); its main product is the
goal-template-improvement signal in meta/missing-verification-criteria.jsonl.
IF prior_checks.q1_5_passed: log"Q1.5 skipped (prior checkpoint)"; skip to Q2.IF outcome_class == "routine" OR context_budget.zone == "tight": abbreviate — run a 3-item mental checklist, skip logging, log"Q1.5 abbreviated ({condition})"; skip to Q2. (The gaps it would surface on routine / at-budget work are low-value.)- Else:
- GENERATE 5–10 yes/no checklist items from
goal.title+goal.description- the goal's primary action/skill. Each item is one concrete property a correct artifact MUST have — phrased as a checkable yes/no question, NOT a restatement of the goal. Span four dimensions: deliverable PRESENCE, content CORRECTNESS, INTEGRATION/wiring, and absence of obvious BREAKAGE.
- EVALUATE each item against the produced artifact(s) → pass / fail / NA. Reuse the evidence already in hand from Q1 plus targeted Read/Grep. Do NOT re-execute the goal.
- CLASSIFY each item's coverage: is this property already represented in
goal.verification.outcomesorgoal.verification.checks(semantic match, LLM judgment)? COVERED = an existing verification entry would catch its failure; UNCOVERED otherwise. covered_failures= items with result==fail AND coverage==covered.uncovered_failures= items with result==fail AND coverage==uncovered.- CONSERVATIVE HARD GATE: IF
covered_failuresis non-empty → the checklist caught a real miss Q1's single-artifact check was too shallow to see:all_passed = false, status → pending, append each covered failure to sensory_buffer as averification_gap. (UNCOVERED failures NEVER gate — they are gaps in how the goal DECLARED its criteria, not current-goal failures. This keeps Q1.5 low-risk per the BRD: an over-strict generated item can only log, never block.) - RECORD
uncovered_failuresfor later goal-template improvement (the BRD's "misses →missing-verification-criteria.jsonl"). One append per verify:
The companion script appends one locked JSONL line to# only when uncovered_failures is non-empty echo '{"goal_id":"<goal.id>","source":"<source>","category":"<goal.category>","artifact":"<Q1 artifact>","items":[<uncovered item strings>],"note":"Q1.5 generated-checklist gap"}' \ | bash core/scripts/missing-criteria-log.shmeta/missing-verification-criteria.jsonl(detective/telemetry only — it never mutates goal state; fail-open so it cannot block verify). - On Q1.5 assessed (whether or not items failed):
bash core/scripts/loop-state-save.sh update --set "phase_progress.q1_5_passed=true" --set "phase_progress.q1_5_checklist=<n_pass>/<n_total>" echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"Q1.5 checklist: <n_pass>/<n_total> pass, <n_uncovered> uncovered gap(s) logged"}' | bash core/scripts/execution-diary.sh append
- GENERATE 5–10 yes/no checklist items from
Q2 NEGATIVE CHECK (only when Q1 passes): "What would it look like if this APPEARED to succeed but actually failed? Did I check for that?"
IF prior_checks.q2_passed: log"Q2 skipped (prior checkpoint: failure-mode={prior_checks.q2_failure_mode_checked})"; skip to Q3.- Else: HARD GATE — if you CAN name a failure mode but DIDN'T check → check NOW.
- If check reveals problem:
all_passed = false, status → pending. - If vague/empty: soft signal → append
verification_gapto sensory_buffer.
- If check reveals problem:
- Automated negation check (rb-229 capability-gate extensions, g-115-44/g-115-45 wire-up, rb-245 zero-count extension):
Extract any negation claim from the result text or your Q2 failure-mode answer. Three classes:
- Knowledge / capability ("isn't built", "doesn't exist", "can't be done") → exhaustive-search-gate.py
- Infrastructure / operational ("is down", "not responding", "connection refused") → verify-before-assuming-gate.py
- Statistical / audit ("N% have X=0", "0 records have Y", "missing field", "zero-utilization") → zero-count-gate.py (requires a jsonl-field-probe.py run first)
# ALL gates are Python scripts — invoke with `py` (or `python3`), NEVER `bash`. # `bash <file>.py` silently fails: bash tries to parse Python as shell, exits 2, # stdout is empty, and the caller can't see `would_block` or `reason`. # Knowledge / capability negations: py core/scripts/exhaustive-search-gate.py --claim-text "<negation>" --tiers-used "<csv>" --queries-count <N> # AND (for infrastructure negations only): py core/scripts/verify-before-assuming-gate.py --claim-text "<negation>" --signals-count <N_nonsilent> [--infra-check-run] # AND (for statistical/audit negations — rb-245): run probe FIRST, then gate. py core/scripts/jsonl-field-probe.py --file <jsonl-path> --field <dot.path> # → capture field_present into <found|missing> py core/scripts/zero-count-gate.py --claim-text "<negation>" --file-probed "<jsonl-path>" --field-probed "<dot.path>" --probe-result "<found|missing>"A claim may trigger multiple gates (knowledge + statistical for "no records found with field X"). ALL triggered gates must pass (exit 0). Overlap is intentional defense-in-depth, not a bug. Interpret exit codes:
0= evidence sufficient (or no trigger matched — no-op), pass;1= evidence insufficient, treat as Q2 FAIL →all_passed = false, status → pending, and appendverification_gapto sensory_buffer citing the gate reason. If a gate'swould_block=trueis a known false positive, re-call with--override "<justification>"(audit trail on stderr is automatic). No negation claim in the result → skip; the gates are no-ops on non-negation text. - On Q2 PASS (failure mode named and checked, no problem revealed):
bash core/scripts/loop-state-save.sh update --set "phase_progress.q2_passed=true" --set "phase_progress.q2_failure_mode_checked=<one-liner>" echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"Q2 passed: failure-mode=<one-liner>"}' | bash core/scripts/execution-diary.sh append
Q3 INTEGRATION SCOPE: "Did I verify at the integration level (caller → target → side effect), or only the unit level?"
IF prior_checks.q3_scope is set: log"Q3 skipped (prior: {prior_checks.q3_scope})"; proceed.- Else: assess scope →
"unit"or"integration". - On Q3 assessed:
bash core/scripts/loop-state-save.sh update --set "phase_progress.q3_scope=<unit|integration>" echo '{"entry_type":"observation","goal_id":"<goal.id>","content":"Q3 scope: <unit|integration>"}' | bash core/scripts/execution-diary.sh append
Standard Checks
IF len(checks) > 0:
IF prior_checks.standard_checks_passed matches "<N>/<N>" where N == len(checks):
log "Standard checks skipped (prior checkpoint: <N>/<N>)"
all_passed = true
ELSE:
all_passed = all(check_passes(c) for c in checks)
passed_count = sum(1 for c in checks if check_passes(c))
# Record outcome to checkpoint + diary
bash core/scripts/loop-state-save.sh update --set "phase_progress.standard_checks_passed=<passed_count>/<len(checks)>"
echo '{"entry_type":"finding","goal_id":"<goal.id>","content":"Verify: <passed_count>/<len(checks)> standard checks passed"}' | bash core/scripts/execution-diary.sh append
On Pass
if all_passed:
if not goal.recurring:
Bash: aspirations-update-goal.sh --source {source} <goal-id> status completed
Bash: aspirations-update-goal.sh --source {source} <goal-id> completed_date <today>
Bash: aspirations-update-goal.sh --source {source} <goal-id> achievedCount <N+1>
Update recurring streaks if applicable
# OUTPUT-CENTRIC HANDOFF (arXiv 2603.28990: factual outputs > status)
# Generate output_summary BEFORE unblocking — dependent goals need it.
IF source == "world":
output_summary = one-line factual summary of WHAT was produced/found/changed
echo "${output_summary}" | Bash: board-post.sh --channel coordination --type handoff --tags ${goal.id},${goal.category}
Unblock dependent goals (with output passing — see below)
Unblock Dependent Goals (with Output Passing)
When unblocking goals that depend on this completed goal, inject the output
into dependent goals that declared depends_on (see goal-schemas.md). This
is bash-enforced via dependent-unblock.sh (rb-428 lineage; siblings:
experience-archive-goal.sh, findings-gate.sh, decision-rules-append.sh,
loop-state-save.sh, skill-quality-score.sh, journal-append.sh). The wrapper
walks both world and agent aspirations.jsonl, removes <completed-goal-id>
from any goal's blocked_by, and prepends `## Predecessor Output (
Bash: bash core/scripts/dependent-unblock.sh --goal <completed-goal-id> --summary "${output_summary}"
The wrapper stamps unblocked_by = <completed-goal-id> and
unblocked_summary = ${output_summary} on each dependent goal so consumers
(downstream agents, audit tooling, postcompact-restore) can see which
predecessor produced the injected text. Replaces the prior LLM-orchestrated
loop that was silently drifting (early sessions skipped the description
prepend; the rb-428 family hoists each such loop into a single bash unit).
On Fail
Bash: aspirations-update-goal.sh --source {source} <goal-id> status pending # retry next cycle
log "Goal executed but verification check failed"
# Breadcrumb: postcompact-restore surfaces the last 10 diary entries to the next turn
echo '{"entry_type":"failure","goal_id":"<goal.id>","content":"Verify failed: <reason>"}' | bash core/scripts/execution-diary.sh append
# Post what was learned from the failure (not just "failed")
IF source == "world":
failure_summary = one-line summary of what was attempted and why it failed
echo "${failure_summary}" | Bash: board-post.sh --channel coordination --type release --tags ${goal.id},${goal.category}
Recurring Streak Logic
Recurring goals NEVER set status to "completed" — they stay "pending".
Goal-selector time gate prevents re-selection until interval_hours elapses.
interval = goal.interval_hours (fallback: remind_days * 24, default: 24)
elapsed = hours_since(goal.lastAchievedAt)
Bash: aspirations-update-goal.sh --source {source} <goal-id> lastAchievedAt "$(date +%Y-%m-%dT%H:%M:%S)"
if elapsed is not None and elapsed > 2 * interval:
new_streak = 1 # Missed interval — reset
else:
new_streak = currentStreak + 1
Bash: aspirations-update-goal.sh --source {source} <goal-id> currentStreak <new_streak>
Bash: aspirations-update-goal.sh --source {source} <goal-id> longestStreak <max(new_streak, longestStreak)>
Return Protocol
See .claude/rules/return-protocol.md — last action must be a tool call, not text.
Chaining
- Called by:
/aspirationsorchestrator (Phase 5) - Calls:
aspirations-update-goal.sh --source {source} - Reads: Goal verification field, hypothesis result