name: ev-loop-confidence
description: >-
Execution loop for tiered-transform work. Runs a phase as a
sequence of tiers, each tier processing a batch of files under a tier
contract, gated by evaluator verdicts and pre-flight checks. Writes
tactical retros between tiers. Dispatches to the bin/loom CLI
directly; composes /guild-validate; composes no other loop. Use
when a phase is a bulk transform, audit, or find-replace-style
operation across many files.
argument-hint: " "
user-invocable: true
allowed-tools: Read, Write, Edit, Bash, Agent, Skill, mcp__github__get_file_contents, mcp__github__subscribe_pr_activity, Bash(loom *), Bash(guild *), Bash(griot *)
/ev-loop-confidence
Execute one phase of a project as a confidence loop: tiered transforms, ratcheting from small/safe to large/risky, with an evaluator verdict per unit and a tactical retro per tier.
Composes: bin/loom CLI (via Bash) for substrate
operations; /guild-validate (via the Skill tool) for the antagonist
panel.
Does not compose: other loops. Peer loops are invoked by the router,
not by each other. No ambient /loom-* skills — substrate plumbing
dispatches directly to the CLIs (see § Substrate compositions).
Format reference: docs/LOOM-CONVENTIONS.md (plugin-relative
path; present in every install of the loom plugin).
Skill invocations like /guild-validate below mean
Skill(skill: <name>, args: "…"). CLI invocations like
loom phase update mean Bash("loom phase update <args>").
Antagonist evaluation runs through /guild-validate, which spawns
evaluator agents in parallel via /guild-spawn; the loop itself never
calls the Agent tool directly.
Preflight
/ev-run owns the substrate preflight (Tier 1 presence, Tier 2
format-skew, Tier 3 freshness) and runs it once per dispatch as
the single point where freshness is gated — see /ev-run § Preflight
for the canonical probe and its tier interpretation. This loop does
not re-implement that logic.
Dispatched by
/ev-run(the normal path): the router already ran the preflight this dispatch. Trust its verdict — do not re-probe.Invoked standalone (not via the router): run the preflight yourself, once, before any substrate op. First the presence check:
Bash("command -v loom guild griot >/dev/null 2>&1 || { echo 'ev-loop-confidence requires loom + guild + griot plugins on PATH. Enable them with: claude plugin enable loom@krambuhl guild@krambuhl griot@krambuhl' >&2; exit 1; }")If that exits non-zero, stop and surface the message verbatim. Then, once the slug is known, run
loom doctor <slug>and apply/ev-run§ Preflight's Tier 2 (format-skew → fall back to repo-localnodeentries) and Tier 3 (freshness → surfaceguild-cache-skew/guild-codegen-driftadvisories) interpretation. Both tiers are advisory, not blocking — surface and proceed, never stop.
Substrate compositions
Every substrate operation this loop performs dispatches directly to
bin/loom, bin/griot, or bin/guild — no ambient
skills, no trout scripts. The unit/tier loop steps below cite recipes
by name (e.g. "checkpoint per § Compose PR"). All § <Recipe>
references in this body resolve in docs/SUBSTRATE-COMPOSITIONS.md.
For loom verb shapes and event vocabulary, see
docs/LOOM-CONVENTIONS.md.
Arguments
<project-slug-or-path>— resolved by loom's standard slug resolution (full slug → date-less suffix → relative or absolute path).<phase-number>— which phase of the project to run. Must exist in the manifest and not be incompletedstate.
If <phase-number> is missing, already completed, or the slug does
not resolve, stop and ask the user rather than guessing.
Scope directory
This loop creates and uses a scope directory at
./projects/<slug>/<phase-slug>/ where <phase-slug> is a kebab-case
form of the phase name. Inside:
inventory.md— the full list of files in scope, generated in step 1
Create this directory if it doesn't exist. Phase state lives in the
project's manifest.toml (via § Phase update); tactical retros live
in the project's retros/ directory (via § Retro write). The scope
directory is now just the inventory home.
Phase-level process
Plan
Every phase runs a multi-engineer design pass once before Step 0
(pre-flight). The plan output becomes shared reference material
for every tier in the phase (cited in each tier contract's Inputs:
line). This step is always-on: the loop invokes
/guild-plan at phase start regardless of explicit
configuration; an optional **Plan**: override from the parsed
plan overrides defaults.
Default behavior (no **Plan**: override in the parsed plan):
engineers= all currently registeredplan-*agents, resolved via glob of.claude/agents/plan-*.md.topic= the phase name.rounds= 1.
Override — read the phase's plan field from
loom parse-plan <slug> (a phase-level **Plan**: block wins
over the plan-level one). loom parse-plan hands off the raw block
string as the single source; do not re-grep PLAN.md. This loop parses
the semicolon-delimited DSL:
engineers=<comma-separated names>; recipe=<name>; topic=<one-line topic>; rounds=<N>
Any field in the block overrides the corresponding default. Partial blocks are allowed.
Recipe citation — recipe=<name> names a panel recipe instead of
inlining its members: the loop resolves it via guild recipe <name>
(which emits {name, members}) and uses members as the engineer
list. Prefer this for a known multi-domain panel —
recipe=design-systems rather than spelling out
engineers=plan-composition,plan-abstraction,....
recipe= and engineers= are mutually exclusive; if both appear, stop
and surface a contract error. guild recipe fails loud on an unknown
name (recipe-not-found, non-zero exit) — surface it and stop, never
fall back to the glob or an empty panel (a mis-cited recipe must not
silently degrade to a thin plan).
Plan artifact path:
projects/<slug>/plans/<phase-number>-<topic-slug>.md. Create
the parent directory if it doesn't exist.
Per-round invocation: for each round 1..N, invoke
/guild-plan via the Skill tool with engineers=<list>,
brief=<topic + any phase context>, plan=<path>. The skill
auto-detects round number from existing file state, so re-running is
idempotent (a re-invocation with the same plan file detects
existing rounds and appends a NEW round). For round 2+, the skill
constructs per_agent_context from prior round state so engineers
can address contradictions.
Bootstrapping case (no engineers registered): if the
.claude/agents/plan-*.md glob returns zero matches AND no
explicit engineers= override is given, log a one-line note ("no
plan engineers registered — skipping plan step") and
proceed directly to Step 0.
L-004 session-boundary: if any of the resolved plan-*
engineers were authored in the current session, drop them from the
effective list manually and surface the override in the next tier's
first checkin Notes for the PR. The runtime registry is loaded once
per Claude Code process start; /clear is NOT a session boundary.
Step 0. Pre-flight
Before any work:
- Refresh state per § State refresh.
- Confirm working tree is clean (
git status --porcelain). If not, stop and ask the user to commit or stash. - Branch state. If the manifest's phase has no
branchyet (first tier in this phase), cut a fresh branch from updatedmain—git checkout main && git pull --ff-only origin main && git checkout -b <branch-name>— using the naming convention fromdocs/LOOM-CONVENTIONS.md§ Branch naming:<project-name>.<phase-lazy-name>(e.g.loom-absorb-draft.phase-7-griot-writes). Otherwise confirm the current branch matches the phase's recorded branch in the manifest; if not, stop and ask whether to switch. Branch hygiene: every checkin / phase / event write below commits to whatever branch is checked out, so re-confirmgit branch --show-currentis the phase branch before the first write — a write on the wrong branch strands the work (seedocs/AGENT-CONVENTIONS.md§ Branch hygiene before substrate writes). - Run the verification commands from the manifest's
[config]as a baseline. Record exit status. A red baseline before any work means the loop stops — you are not making a red build redder.
Step 1. Coverage before transforms
Build inventory.md listing every file or item in scope for this phase.
The phase description in PLAN.md or the manifest's [config] tells you the pattern
(e.g. "all .ts/.tsx files using ESLint disable comments"). Use git ls-files,
grep, find, or equivalent to enumerate. Include counts.
Format:
# Scope inventory — Phase <N> <name>
**Generated**: YYYY-MM-DD HH:MM
**Total items**: <count>
**Pattern**: <description of what makes an item in-scope>
## Items
- [ ] path/to/file-1.ts (tier: <tier>)
- [ ] path/to/file-2.tsx (tier: <tier>)
Tier assignment is a judgment call (see Tier assignment below). Assign tiers as you build the inventory, or leave them unassigned and prompt the user.
Do not begin transforms until the inventory is complete. Partial inventory = unknown blast radius.
Step 2. Tier assignment
Divide inventory items into tiers of increasing risk/complexity:
- Tier 1 — mechanical, obvious, identical across items
- Tier 2 — same shape with small variations, low risk
- Tier 3 — requires judgment, possible side effects
- Tier 4+ — bespoke, high risk, may need human-paired work
If tier 4+ items appear, consider routing them to /ev-loop-interactive
rather than handling here. Surface this to the user and ask.
Step 3. Execute tiers in order
For each tier, run a tier loop (see Tier-level process below). Between tiers, write a tactical retro and re-run pre-flight.
Step 4. Phase close
When all tiers in this phase are complete, the loop's job is to open
the PR and stop — the release boundary. This mirrors the interactive
loop's Step 3 and applies the same semantics
(docs/AGENT-CONVENTIONS.md § Guild-offload posture, release-boundary
semantics — the single source); confidence adopts them, it does not
re-define them.
- Verify every inventory item is checked off.
- Run full verification.
- Ensure the latest checkin exists.
- Open / refresh the PR per § Compose PR so it reflects the final state.
- Stop. Leave the phase
--status=in-progress— do not mark itcompletedhere. Return control to the router.
Completion is merge-gated, not phase-close-gated. A phase becomes
completed only when its PR has merged, and the router makes that
transition: the next /ev-run derives live PR state via loom pr discover, sees MERGED, advances the phase, and dispatches the next
phase off a freshly-pulled base. Marking completed here would tell the
router the dependency is satisfied and dispatch the next phase against
an unmerged parent. (The "Output to router" Status: completed below is
the loop's run status — "ran cleanly to phase close" — not the
manifest phase status, which stays in-progress until merge.)
Full-stack option (--phases=all). As in the interactive loop:
instead of a ready PR, open a draft PR per phase (loom pr open --draft) and auto-advance to the next phase, cutting its branch
stacked on the current phase branch (gt create, not main) so the
dependency is satisfied by the stack without a merge. Emit
auto-mode-converged at each clean phase close. At stack end, stop and
leave the drafts for the human (no auto-ready, no auto-merge). A
closed gate-and-ratchet mid-stack halts the auto-advance at that
phase's draft PR — the ratchet is the confidence loop's natural stop
(see § Gate-and-ratchet).
Escape hatch. When an armed run cannot finish — a closed gate from a
budget exhaust or an unresolvable stall, a fork with no panel raiseable,
or the per-phase fork-panel cap exceeded — the loop stops into a
reviewable state per docs/AGENT-CONVENTIONS.md § Guild-offload posture
(escape hatch) + § Budget-exhausted recovery (the single source for the
artifact shapes — do not restate them): open a draft PR with
work-so-far, write UNRESOLVED.md + RECOVERY-STATUS.json plus the
closed-gate / undecided-fork reason into the PR body, emit
auto-mode-budget-exhausted, and stop — never self-deciding the stall.
Under --phases=all this halts the auto-advance at the stalled phase's
draft PR.
Tier-level process
Each tier runs as a sequence of units. A unit is one batch of inventory items transformed together — sized so that one checkin covers it cleanly.
Batch sizing
- Tier 1: batch of 10–30 items (mechanical, cheap to redo)
- Tier 2: batch of 5–15 items
- Tier 3: batch of 1–5 items
- Tier 4+: batch of 1 item
Size down if verification grows slow or evaluator flags pile up. Size up if you're burning checkins on trivial changes.
Tier contract (a specialization of unit contract)
Before the first unit of a tier, write a tier contract as the first checkin of that tier. The Contract section includes tier-wide rules that every unit in the tier must satisfy:
## Contract
- **Goal**: apply <transform> to all Tier <N> items
- **Acceptance criteria**:
- Every item in this tier is updated
- `<verification command>` passes after each batch
- No unrelated files modified
- **Rules applied**:
- <style/lint rules>
- Verification: `<command>`
- **Disqualifiers**:
- Any regression in <area>
- Any file in scope left untouched
- **Inputs**: inventory.md Tier <N> items
Subsequent units inside the tier can reference the tier contract instead
of restating it, as long as the unit's checkin contract says
Rules applied: tier-<N> contract (see checkin <NN>) and lists only
unit-specific deltas.
Unit loop
For each unit inside a tier:
Negotiate. Compose a Checkin JSON with the Contract substructure populated and write it per § Checkin write. The file still needs the full required envelope (
schema_version,number,created,branch) even when onlycontractcarries content —loom checkin writerejects a file missing those. Pick the items for this batch from inventory.md (mark them with a tier tag if not already). Execution / Verdict / Notes-for-PR substructures stay empty for now — they're filled in by a later checkin once the work resolves. (Loom checkins are immutable, so the "fill it in later" pattern is "write a new numbered checkin," not "edit the existing one." The negotiation checkin and the resolution checkin together tell the story.) For a tier unit that resolves in one pass, a single complete checkin after the panel approves is equally valid — the create-once store doesn't require the two-checkin split.Under the armed posture (
--mode=auto—docs/AGENT-CONVENTIONS.md§ Guild-offload posture, gate-to-resolver table): tier/unit-contract negotiation is a convergent gate, so the human approve/redirect is replaced byevaluator-contract-fitauditing the contract against the tier's inputs —approvedis the accept,flaggedis a redirect (each flagged finding addressed, then re-audit). NoAskUserQuestion. Convergence and the two-budget defaults follow the convention (§ Auto-mode and the two-budget shape); on budget exhaust the unit fails to negotiate cleanly and the loop takes the escape hatch (Step 4) rather than committing a half-ambiguous tier contract.Execute. Do the transform on the batch. Keep to scope.
Execution forks under the armed posture. A genuine fork in the confidence loop is most often a tier-assignment / batch-sizing judgment the loop cannot settle from the tier contract (the interactive loop's forks are more often mid-deliverable design choices — § Both loops). When armed, the loop does not ask the human; it routes the fork through
/guild-plan(via theSkilltool — it composes/guild-spawn; never a directAgentcall) and applies the convention's convergence rule (docs/AGENT-CONVENTIONS.md§ Fork-to-panel convergence rule — the single source; do not restate it). Two bounds, both routing to the escape hatch (Step 4) rather than letting the loop self-decide: empty-roster safety (an emptyplan-*glob with no explicit engineers → escape hatch, MUST NOT self-decide) and the per-phase fork-panel cap = 5 (the 6th fork → escape hatch). Record the fork and its resolution in the checkin.Implementer delegation (per-unit switch, default ON). This loop defaults to delegating the transform: compose the implementer candidate set via
Bash("guild derive-panel --phase=implementer"), select theimplementer-<domain>matching the tier's domain, and delegate the write through/guild-spawnwith the tier contract as the brief. Route through/guild-spawn, not a directAgentcall —Agentis inallowed-tools, but the seam through guild coordination is deliberate. The operator can opt a tier/unit out back to inline transform when a batch wants a human hand. Record the switch state in the checkin'snotes_for_pr. Delegation changes who writes, never whether it's gated — step 3 (Evaluate) fires regardless. While noimplementer-*agents are registered (the registry-mirror lag), fall back to inline transform with a one-line note; the live-spawn proof is the Phase 6 runtime gate.Evaluate. Invoke
/guild-validatevia theSkilltool to run the antagonist panel against this unit. Compose the panel by auto-derivation from the unit's file list (see § Panel auto-derivation below) — the result is contextual to the artifact rather than a fixed list.evaluator-contract-fitis always included as the baseline. If a panel spawn fails with a transient529 Overloaded, retry with backoff before treating it as a real failure (seedocs/AGENT-CONVENTIONS.md§ Long-loop resilience) — a single 529 mid-panel is capacity noise, not a verdict. The spec (file-type → evaluator mapping, precedence list, tokens-vs-naming boundary) lives indocs/PANEL-COMPOSITION.md; the derivation logic is § Derive panel.agents: comma-separated output of § Derive panel (paths composed per § Panel auto-derivation below).packet: build a dense packet (see shape below). The substrate default is dense — verbose packets correlate with budget-exhaustion failures underevaluator-*'smaxTurns=5. Live examples in PR #13's checkins 02-06.
Dense packet shape (three sections, in this order):
## How to evaluate efficiently You have a tight tool-use budget (maxTurns=5). Pre-computed verification below is authoritative — do not re-run lint/build/ test/grep unless you find specific evidence the artifact summary contradicts itself. Spot-check at most ONE or TWO criteria with targeted reads, then emit `VERDICT:`. If you cannot reach a verdict within budget, emit `VERDICT: flagged` with `parse-failure: budget-exhausted` so the loop escalates rather than no-ops. ## Contract (paraphrased) <Goal in 1-3 sentences. Acceptance criteria as a numbered list, condensed (full text in <checkin path>). Disqualifiers as a single-line summary. Inputs as a bulleted list of paths.> ## Artifact **Files** (in scope for this batch): <bulleted paths> **Pre-computed verification (authoritative — do not re-run)**: - `npm run lint` → <result> - `npm run build` → <result> - `npm run test` → <result> - <other verification: tier-specific checks, codemod diff samples, etc.> **Direct mappings to acceptance criteria** (for spot-check efficiency): <AC N → file:line ranges or section pointers> **Iteration story** (if applicable): <prior panel runs and what was addressed; helps the evaluator avoid re-flagging fixed issues> ## Original ask <verbatim from PLAN.md phase description or the triggering message> ## Suggested spot-check (one tool use) <the most efficient single read for confirming the most-suspicious criterion; optional but reduces investigation thrashing>Pass the contract as a paraphrased summary plus the checkin file path link, not verbatim — the checkin file is in the repo and renders one click away. The packet's job is orientation; the depth is one click away.
The skill returns a structured verdict (
approved|flagged|flagged-conflict) withblocking_findings,advisory_findings,cli_runs, andconflictslists. See.claude/agents/evaluator-base.mdfor the per-evaluator verdict shape that/guild-validateparses and aggregates.Iterate or commit.
If flagged: write a new resolution checkin (per § Checkin write) recording the remedy in Execution + Verdict (still flagged from the panel's view; the resolution checkin documents what changed). Re-invoke
/guild-validate. Maximum 2 re-iterations per unit — on the third flag, stop and escalate to the user.Fixer delegation (per-unit switch, default ON). This loop defaults to delegating the remedy: compose
fixer-<domain>viaBash("guild derive-panel --phase=fixer"), hand the flagged-finding packet to it through/guild-spawnfor the minimal remedy (fixer'sdefault_personalityispragmatist), then re-invoke/guild-validate. Route through/guild-spawn, not a directAgentcall. The operator can opt a tier/unit out back to inline fix. The re-evaluation gates regardless — delegation changes who fixes, not whether it's re-gated. Record the switch state in the resolution checkin'snotes_for_pr. While nofixer-*agents are registered (the registry-mirror lag), fall back to inline fix with a one-line note; the live-spawn proof is the Phase 6 runtime gate.If approved: write a finalization checkin with Execution / Verdict
approved/ Notes-for-PR populated. Check off the inventory items.
This loop does not append guild findings to
.guild-findings.jsonl(unlike/ev-loop-interactivestep 4.5, it has no per-finding frequency-tracking step), so there is no findings-harvest step here — there is nothing in the scratch stream to fold. If this loop ever gains a findings-append step, mirror/ev-loop-interactivestep 4.5e and runloom findings harvest <slug> --branch=<branch>at unit close (serial, single-writer, never mid-panel).Scope-shift detection (restrictive default). Runs only on approved units (flagged-and-iterating units skip this step). Look for signals that PLAN.md is stale; offer a plan revision (per § Revise PLAN.md) ONLY on two-signal concurrence.
Signal sources:
- Evaluator finding mentioning a missing or changed phase, deliverable, or load-bearing decision (in either blocking or advisory findings).
- User comment during the unit that proposed a change to plan structure (not just tactical refinement of this unit's contract).
- Plan contradiction (round 2+ plan surfaces a disagreement between engineers that current PLAN doesn't resolve).
- Phase boundary (this unit is the last in its phase OR the next phase is about to start).
Two-signal-concurrence rule: offer a plan revision only when 2+ signal sources fire for the same shift. Single signals get a note (see below); the loop does NOT interrupt.
Offer flow: surface a short paragraph naming the two signals and a proposed one-line rationale. Use
AskUserQuestion(or natural-language confirm) for accept/decline/defer. Default: decline (no interrupt unless the user explicitly accepts).On accept: integrate the change per § Revise PLAN.md. After the revision lands, proceed to step 6 (Phase update). Do not re-execute the unit.
On single signal (no concurrence): append the signal to the unit's
notes_for_prarray in the checkin JSON:signal: <signal type>: <one-line description> (single signal; no revise offered)Loop continues normally.
On zero signals: no action. Loop continues.
Phase update. After a checkin lands, the checkin-created event auto-fires from § Checkin write. Then update phase state per § Phase update with
--status=in-progress --branch=<branch>(no need to set--prhere; the PR reference lives on the phase only after § Compose PR runs).Checkpoint? Evaluate the should-checkpoint policy (below). If any condition holds, refresh the PR per § Compose PR so it tracks the latest state. Otherwise continue to the next unit.
Panel auto-derivation
The agents list passed to /guild-validate is computed from the
unit's file list at evaluation time, not hardcoded. The composition
rules (file-type → evaluator mapping, precedence ordering, conflict
policy) live in docs/PANEL-COMPOSITION.md and are the
source of truth.
- Collect file paths. Take the unit's changed and created files
from the tier batch. Practical recipe:
git status --shortminus deletions and substrate carryovers, plus any freshly-authored untracked paths. - Derive the panel per § Derive panel, passing the file paths
from step 1. Use the verb's stdout as the
agents=argument verbatim. - Pass to
/guild-validate. Use the verb's stdout as theagents=argument verbatim. Confidence-loop tiers tend to touch a single file family (a codemod over .module.css, a rename over .tsx imports, etc.), so the derived panel is typically narrower than the interactive loop's — common case is contract-fit plus one or two domain lenses.
Edge cases follow the same shape as /ev-loop-interactive's § Panel
auto-derivation:
- Empty file list → contract-fit only.
- Substrate-only files → contract-fit only.
- L-004 session-boundary: if the derive-panel output includes an
evaluator authored during this session, drop it from
agents=manually and note the override in the checkin's Notes section.
Should-checkpoint policy
Checkpoint (refresh the PR per § Compose PR) when any of the following hold. All are read off state — there is no callable function.
- A full tier has just finished.
- The number of units since last PR update ≥ 5.
- Verification is currently green and we're about to start a riskier tier.
- The user has explicitly asked for a checkpoint.
Do not checkpoint mid-tier unless verification is green.
Tactical retro between tiers
Immediately after the last unit of a tier is approved, before moving to the next tier:
- Re-run pre-flight (working tree, verification).
- Compose the retro JSON (terse fields:
items_processed,units,verification_at_close,what_went_smoothly,what_bit_us,adjustment_for_next) and write per § Retro write with--type=session --phase=<N> --tier=<M>.
Tactical retros are short and specific. Strategic retrospection happens
at /loom-archive, not here.
Gate-and-ratchet
Before starting tier N+1, the gate closes if:
- Any tier N unit is still flagged.
- Verification is red.
- The tier retro identified a blocker for tier N+1.
A closed gate stops the loop and reports to the user. The user decides whether to resume, re-tier, or bail.
Under the armed posture (--mode=auto), the gate-and-ratchet is the
natural autonomous stop — the confidence loop's structural
equivalent of the interactive loop's phase boundary
(docs/AGENT-CONVENTIONS.md § Guild-offload posture, § Both loops). A
closed gate is not an inline AskUserQuestion: the run surfaces it at
the release boundary — it stops on the open PR (or, on a budget
exhaust / unresolvable stall, the escape hatch: a draft PR +
UNRESOLVED.md carrying the closed-gate reason), per Step 4. The loop
does not auto-resume past a closed gate and does not self-decide a
re-tier; a closed gate under autonomy is a stop-and-surface point even
mid-stack, exactly as the ratchet is designed to be. The non-armed
behavior above (report + the user decides inline) is unchanged when the
posture is not armed.
Message-driven redirects
If the router passes a message like "address feedback on #14", this loop:
- Triage the PR's comments per § Triage PR comments + draft responses to get a classified list.
- Treats each
blockercomment as a new unit in the current tier (or a new tier if the feedback rewrites scope). - Iterates the unit loop. When done, refreshes the PR per § Compose PR.
Rules
- Coverage before transforms. Do not start tier 1 without a complete inventory.
- Pre-flight before any tier. Every tier starts from a known-good state.
- One contract per tier, restated per unit only for deltas.
- Evaluator always runs. No exceptions. Never self-approve.
- Scope discipline. Fixes outside this phase's pattern get noted in "Notes for the PR" and are deferred — not absorbed silently.
- Record corrections in the checkin. If the user redirects a unit
mid-flight, overrides a decision, or the evaluator flags something
the generator defaulted to incorrectly, note it verbatim in the
checkin JSON's
execution.corrections[]array. The session handoff (§ Save session) surfaces unresolved corrections intoopen_threads; § Capture finding (from-checkin pathway) promotes notable ones intolearnings/session-notes/at session close, and/griot-compactdecides which get promoted further. The loop itself never writes tolearnings/. - No
AskUserQuestionunder the armed posture. When--mode=autois armed (docs/AGENT-CONVENTIONS.md§ Guild-offload posture), the loop makes noAskUserQuestioncalls mid-phase. Tier-contract negotiation goes toevaluator-contract-fit; a genuine fork goes to a/guild-planpanel or the escape hatch; a closed gate-and-ratchet surfaces at the release boundary. The harness would not silence a strayAskUserQuestioneven in its own auto mode (RESEARCH § A), so any mid-phase call would hang an unattended run — this is an invariant, not a preference. The non-armed default keeps every human touchpoint. - Delegation default ON is preserved under the posture. Implementer / fixer delegation defaults ON in this loop (bulk transform → delegate the write), and the armed posture does not change that — the posture changes who decides (a panel vs the human), never who writes. The interactive loop's OFF default is the deliberate divergence (§ Both loops).
- No emojis.
Output to router
On any termination — phase close, closed gate, or escalation — return:
- Status: completed | gated | escalated | aborted
- Phase:
<N><name> - Tiers run: list with counts (e.g.,
1: 3 files, 2: 7 files) - Last checkin:
<NN> - Last PR update:
<url>ornone - PR subscription:
active(subscribed at open) |unavailable(localgh-only session — router falls back to § Wait for merge) |n.a.(no PR opened this run) - Reason (if not completed): one-line cause
- Next action: what the router or user should do next
Failure modes
- Pre-flight fails → stop, report, do not proceed.
- Evaluator flags 3× → stop, escalate to user, do not auto-merge or force-approve.
- Inventory shrinks mid-phase (items disappear) → stop; something moved under you. Regenerate inventory and reconcile.
- Working tree dirty at boundary → stop; do not stash silently.