ev-loop-confidence

star 0

Execution loop for tiered-transform work. Runs a phase as a sequence of tiers, each tier processing a batch of files under a tier contract, gated by evaluator verdicts and pre-flight checks. Writes tactical retros between tiers. Dispatches to the bin/loom CLI directly; composes /guild-validate; composes no other loop. Use when a phase is a bulk transform, audit, or find-replace-style operation across many files.

krambuhl By krambuhl schedule Updated 6/3/2026

name: ev-loop-confidence description: >- Execution loop for tiered-transform work. Runs a phase as a sequence of tiers, each tier processing a batch of files under a tier contract, gated by evaluator verdicts and pre-flight checks. Writes tactical retros between tiers. Dispatches to the bin/loom CLI directly; composes /guild-validate; composes no other loop. Use when a phase is a bulk transform, audit, or find-replace-style operation across many files. argument-hint: " " user-invocable: true allowed-tools: Read, Write, Edit, Bash, Agent, Skill, mcp__github__get_file_contents, mcp__github__subscribe_pr_activity, Bash(loom *), Bash(guild *), Bash(griot *)

/ev-loop-confidence

Execute one phase of a project as a confidence loop: tiered transforms, ratcheting from small/safe to large/risky, with an evaluator verdict per unit and a tactical retro per tier.

Composes: bin/loom CLI (via Bash) for substrate operations; /guild-validate (via the Skill tool) for the antagonist panel. Does not compose: other loops. Peer loops are invoked by the router, not by each other. No ambient /loom-* skills — substrate plumbing dispatches directly to the CLIs (see § Substrate compositions).

Format reference: docs/LOOM-CONVENTIONS.md (plugin-relative path; present in every install of the loom plugin).

Skill invocations like /guild-validate below mean Skill(skill: <name>, args: "…"). CLI invocations like loom phase update mean Bash("loom phase update <args>"). Antagonist evaluation runs through /guild-validate, which spawns evaluator agents in parallel via /guild-spawn; the loop itself never calls the Agent tool directly.

Preflight

/ev-run owns the substrate preflight (Tier 1 presence, Tier 2 format-skew, Tier 3 freshness) and runs it once per dispatch as the single point where freshness is gated — see /ev-run § Preflight for the canonical probe and its tier interpretation. This loop does not re-implement that logic.

  • Dispatched by /ev-run (the normal path): the router already ran the preflight this dispatch. Trust its verdict — do not re-probe.

  • Invoked standalone (not via the router): run the preflight yourself, once, before any substrate op. First the presence check:

    Bash("command -v loom guild griot >/dev/null 2>&1 || { echo 'ev-loop-confidence requires loom + guild + griot plugins on PATH. Enable them with: claude plugin enable loom@krambuhl guild@krambuhl griot@krambuhl' >&2; exit 1; }")
    

    If that exits non-zero, stop and surface the message verbatim. Then, once the slug is known, run loom doctor <slug> and apply /ev-run § Preflight's Tier 2 (format-skew → fall back to repo-local node entries) and Tier 3 (freshness → surface guild-cache-skew / guild-codegen-drift advisories) interpretation. Both tiers are advisory, not blocking — surface and proceed, never stop.

Substrate compositions

Every substrate operation this loop performs dispatches directly to bin/loom, bin/griot, or bin/guild — no ambient skills, no trout scripts. The unit/tier loop steps below cite recipes by name (e.g. "checkpoint per § Compose PR"). All § <Recipe> references in this body resolve in docs/SUBSTRATE-COMPOSITIONS.md. For loom verb shapes and event vocabulary, see docs/LOOM-CONVENTIONS.md.

Arguments

  • <project-slug-or-path> — resolved by loom's standard slug resolution (full slug → date-less suffix → relative or absolute path).
  • <phase-number> — which phase of the project to run. Must exist in the manifest and not be in completed state.

If <phase-number> is missing, already completed, or the slug does not resolve, stop and ask the user rather than guessing.

Scope directory

This loop creates and uses a scope directory at ./projects/<slug>/<phase-slug>/ where <phase-slug> is a kebab-case form of the phase name. Inside:

  • inventory.md — the full list of files in scope, generated in step 1

Create this directory if it doesn't exist. Phase state lives in the project's manifest.toml (via § Phase update); tactical retros live in the project's retros/ directory (via § Retro write). The scope directory is now just the inventory home.

Phase-level process

Plan

Every phase runs a multi-engineer design pass once before Step 0 (pre-flight). The plan output becomes shared reference material for every tier in the phase (cited in each tier contract's Inputs: line). This step is always-on: the loop invokes /guild-plan at phase start regardless of explicit configuration; an optional **Plan**: override from the parsed plan overrides defaults.

Default behavior (no **Plan**: override in the parsed plan):

  • engineers = all currently registered plan-* agents, resolved via glob of .claude/agents/plan-*.md.
  • topic = the phase name.
  • rounds = 1.

Override — read the phase's plan field from loom parse-plan <slug> (a phase-level **Plan**: block wins over the plan-level one). loom parse-plan hands off the raw block string as the single source; do not re-grep PLAN.md. This loop parses the semicolon-delimited DSL:

engineers=<comma-separated names>; recipe=<name>; topic=<one-line topic>; rounds=<N>

Any field in the block overrides the corresponding default. Partial blocks are allowed.

Recipe citationrecipe=<name> names a panel recipe instead of inlining its members: the loop resolves it via guild recipe <name> (which emits {name, members}) and uses members as the engineer list. Prefer this for a known multi-domain panel — recipe=design-systems rather than spelling out engineers=plan-composition,plan-abstraction,.... recipe= and engineers= are mutually exclusive; if both appear, stop and surface a contract error. guild recipe fails loud on an unknown name (recipe-not-found, non-zero exit) — surface it and stop, never fall back to the glob or an empty panel (a mis-cited recipe must not silently degrade to a thin plan).

Plan artifact path: projects/<slug>/plans/<phase-number>-<topic-slug>.md. Create the parent directory if it doesn't exist.

Per-round invocation: for each round 1..N, invoke /guild-plan via the Skill tool with engineers=<list>, brief=<topic + any phase context>, plan=<path>. The skill auto-detects round number from existing file state, so re-running is idempotent (a re-invocation with the same plan file detects existing rounds and appends a NEW round). For round 2+, the skill constructs per_agent_context from prior round state so engineers can address contradictions.

Bootstrapping case (no engineers registered): if the .claude/agents/plan-*.md glob returns zero matches AND no explicit engineers= override is given, log a one-line note ("no plan engineers registered — skipping plan step") and proceed directly to Step 0.

L-004 session-boundary: if any of the resolved plan-* engineers were authored in the current session, drop them from the effective list manually and surface the override in the next tier's first checkin Notes for the PR. The runtime registry is loaded once per Claude Code process start; /clear is NOT a session boundary.

Step 0. Pre-flight

Before any work:

  • Refresh state per § State refresh.
  • Confirm working tree is clean (git status --porcelain). If not, stop and ask the user to commit or stash.
  • Branch state. If the manifest's phase has no branch yet (first tier in this phase), cut a fresh branch from updated maingit checkout main && git pull --ff-only origin main && git checkout -b <branch-name> — using the naming convention from docs/LOOM-CONVENTIONS.md § Branch naming: <project-name>.<phase-lazy-name> (e.g. loom-absorb-draft.phase-7-griot-writes). Otherwise confirm the current branch matches the phase's recorded branch in the manifest; if not, stop and ask whether to switch. Branch hygiene: every checkin / phase / event write below commits to whatever branch is checked out, so re-confirm git branch --show-current is the phase branch before the first write — a write on the wrong branch strands the work (see docs/AGENT-CONVENTIONS.md § Branch hygiene before substrate writes).
  • Run the verification commands from the manifest's [config] as a baseline. Record exit status. A red baseline before any work means the loop stops — you are not making a red build redder.

Step 1. Coverage before transforms

Build inventory.md listing every file or item in scope for this phase. The phase description in PLAN.md or the manifest's [config] tells you the pattern (e.g. "all .ts/.tsx files using ESLint disable comments"). Use git ls-files, grep, find, or equivalent to enumerate. Include counts.

Format:

# Scope inventory — Phase <N> <name>

**Generated**: YYYY-MM-DD HH:MM
**Total items**: <count>
**Pattern**: <description of what makes an item in-scope>

## Items
- [ ] path/to/file-1.ts  (tier: <tier>)
- [ ] path/to/file-2.tsx  (tier: <tier>)

Tier assignment is a judgment call (see Tier assignment below). Assign tiers as you build the inventory, or leave them unassigned and prompt the user.

Do not begin transforms until the inventory is complete. Partial inventory = unknown blast radius.

Step 2. Tier assignment

Divide inventory items into tiers of increasing risk/complexity:

  • Tier 1 — mechanical, obvious, identical across items
  • Tier 2 — same shape with small variations, low risk
  • Tier 3 — requires judgment, possible side effects
  • Tier 4+ — bespoke, high risk, may need human-paired work

If tier 4+ items appear, consider routing them to /ev-loop-interactive rather than handling here. Surface this to the user and ask.

Step 3. Execute tiers in order

For each tier, run a tier loop (see Tier-level process below). Between tiers, write a tactical retro and re-run pre-flight.

Step 4. Phase close

When all tiers in this phase are complete, the loop's job is to open the PR and stop — the release boundary. This mirrors the interactive loop's Step 3 and applies the same semantics (docs/AGENT-CONVENTIONS.md § Guild-offload posture, release-boundary semantics — the single source); confidence adopts them, it does not re-define them.

  • Verify every inventory item is checked off.
  • Run full verification.
  • Ensure the latest checkin exists.
  • Open / refresh the PR per § Compose PR so it reflects the final state.
  • Stop. Leave the phase --status=in-progress — do not mark it completed here. Return control to the router.

Completion is merge-gated, not phase-close-gated. A phase becomes completed only when its PR has merged, and the router makes that transition: the next /ev-run derives live PR state via loom pr discover, sees MERGED, advances the phase, and dispatches the next phase off a freshly-pulled base. Marking completed here would tell the router the dependency is satisfied and dispatch the next phase against an unmerged parent. (The "Output to router" Status: completed below is the loop's run status — "ran cleanly to phase close" — not the manifest phase status, which stays in-progress until merge.)

Full-stack option (--phases=all). As in the interactive loop: instead of a ready PR, open a draft PR per phase (loom pr open --draft) and auto-advance to the next phase, cutting its branch stacked on the current phase branch (gt create, not main) so the dependency is satisfied by the stack without a merge. Emit auto-mode-converged at each clean phase close. At stack end, stop and leave the drafts for the human (no auto-ready, no auto-merge). A closed gate-and-ratchet mid-stack halts the auto-advance at that phase's draft PR — the ratchet is the confidence loop's natural stop (see § Gate-and-ratchet).

Escape hatch. When an armed run cannot finish — a closed gate from a budget exhaust or an unresolvable stall, a fork with no panel raiseable, or the per-phase fork-panel cap exceeded — the loop stops into a reviewable state per docs/AGENT-CONVENTIONS.md § Guild-offload posture (escape hatch) + § Budget-exhausted recovery (the single source for the artifact shapes — do not restate them): open a draft PR with work-so-far, write UNRESOLVED.md + RECOVERY-STATUS.json plus the closed-gate / undecided-fork reason into the PR body, emit auto-mode-budget-exhausted, and stop — never self-deciding the stall. Under --phases=all this halts the auto-advance at the stalled phase's draft PR.

Tier-level process

Each tier runs as a sequence of units. A unit is one batch of inventory items transformed together — sized so that one checkin covers it cleanly.

Batch sizing

  • Tier 1: batch of 10–30 items (mechanical, cheap to redo)
  • Tier 2: batch of 5–15 items
  • Tier 3: batch of 1–5 items
  • Tier 4+: batch of 1 item

Size down if verification grows slow or evaluator flags pile up. Size up if you're burning checkins on trivial changes.

Tier contract (a specialization of unit contract)

Before the first unit of a tier, write a tier contract as the first checkin of that tier. The Contract section includes tier-wide rules that every unit in the tier must satisfy:

## Contract
- **Goal**: apply <transform> to all Tier <N> items
- **Acceptance criteria**:
  - Every item in this tier is updated
  - `<verification command>` passes after each batch
  - No unrelated files modified
- **Rules applied**:
  - <style/lint rules>
  - Verification: `<command>`
- **Disqualifiers**:
  - Any regression in <area>
  - Any file in scope left untouched
- **Inputs**: inventory.md Tier <N> items

Subsequent units inside the tier can reference the tier contract instead of restating it, as long as the unit's checkin contract says Rules applied: tier-<N> contract (see checkin <NN>) and lists only unit-specific deltas.

Unit loop

For each unit inside a tier:

  1. Negotiate. Compose a Checkin JSON with the Contract substructure populated and write it per § Checkin write. The file still needs the full required envelope (schema_version, number, created, branch) even when only contract carries content — loom checkin write rejects a file missing those. Pick the items for this batch from inventory.md (mark them with a tier tag if not already). Execution / Verdict / Notes-for-PR substructures stay empty for now — they're filled in by a later checkin once the work resolves. (Loom checkins are immutable, so the "fill it in later" pattern is "write a new numbered checkin," not "edit the existing one." The negotiation checkin and the resolution checkin together tell the story.) For a tier unit that resolves in one pass, a single complete checkin after the panel approves is equally valid — the create-once store doesn't require the two-checkin split.

    Under the armed posture (--mode=autodocs/AGENT-CONVENTIONS.md § Guild-offload posture, gate-to-resolver table): tier/unit-contract negotiation is a convergent gate, so the human approve/redirect is replaced by evaluator-contract-fit auditing the contract against the tier's inputs — approved is the accept, flagged is a redirect (each flagged finding addressed, then re-audit). No AskUserQuestion. Convergence and the two-budget defaults follow the convention (§ Auto-mode and the two-budget shape); on budget exhaust the unit fails to negotiate cleanly and the loop takes the escape hatch (Step 4) rather than committing a half-ambiguous tier contract.

  2. Execute. Do the transform on the batch. Keep to scope.

    Execution forks under the armed posture. A genuine fork in the confidence loop is most often a tier-assignment / batch-sizing judgment the loop cannot settle from the tier contract (the interactive loop's forks are more often mid-deliverable design choices — § Both loops). When armed, the loop does not ask the human; it routes the fork through /guild-plan (via the Skill tool — it composes /guild-spawn; never a direct Agent call) and applies the convention's convergence rule (docs/AGENT-CONVENTIONS.md § Fork-to-panel convergence rule — the single source; do not restate it). Two bounds, both routing to the escape hatch (Step 4) rather than letting the loop self-decide: empty-roster safety (an empty plan-* glob with no explicit engineers → escape hatch, MUST NOT self-decide) and the per-phase fork-panel cap = 5 (the 6th fork → escape hatch). Record the fork and its resolution in the checkin.

    Implementer delegation (per-unit switch, default ON). This loop defaults to delegating the transform: compose the implementer candidate set via Bash("guild derive-panel --phase=implementer"), select the implementer-<domain> matching the tier's domain, and delegate the write through /guild-spawn with the tier contract as the brief. Route through /guild-spawn, not a direct Agent call — Agent is in allowed-tools, but the seam through guild coordination is deliberate. The operator can opt a tier/unit out back to inline transform when a batch wants a human hand. Record the switch state in the checkin's notes_for_pr. Delegation changes who writes, never whether it's gated — step 3 (Evaluate) fires regardless. While no implementer-* agents are registered (the registry-mirror lag), fall back to inline transform with a one-line note; the live-spawn proof is the Phase 6 runtime gate.

  3. Evaluate. Invoke /guild-validate via the Skill tool to run the antagonist panel against this unit. Compose the panel by auto-derivation from the unit's file list (see § Panel auto-derivation below) — the result is contextual to the artifact rather than a fixed list. evaluator-contract-fit is always included as the baseline. If a panel spawn fails with a transient 529 Overloaded, retry with backoff before treating it as a real failure (see docs/AGENT-CONVENTIONS.md § Long-loop resilience) — a single 529 mid-panel is capacity noise, not a verdict. The spec (file-type → evaluator mapping, precedence list, tokens-vs-naming boundary) lives in docs/PANEL-COMPOSITION.md; the derivation logic is § Derive panel.

    • agents: comma-separated output of § Derive panel (paths composed per § Panel auto-derivation below).
    • packet: build a dense packet (see shape below). The substrate default is dense — verbose packets correlate with budget-exhaustion failures under evaluator-*'s maxTurns=5. Live examples in PR #13's checkins 02-06.

    Dense packet shape (three sections, in this order):

    ## How to evaluate efficiently
    
    You have a tight tool-use budget (maxTurns=5). Pre-computed
    verification below is authoritative — do not re-run lint/build/
    test/grep unless you find specific evidence the artifact summary
    contradicts itself. Spot-check at most ONE or TWO criteria with
    targeted reads, then emit `VERDICT:`. If you cannot reach a verdict
    within budget, emit `VERDICT: flagged` with `parse-failure:
    budget-exhausted` so the loop escalates rather than no-ops.
    
    ## Contract (paraphrased)
    
    <Goal in 1-3 sentences. Acceptance criteria as a numbered list,
    condensed (full text in <checkin path>). Disqualifiers as a
    single-line summary. Inputs as a bulleted list of paths.>
    
    ## Artifact
    
    **Files** (in scope for this batch): <bulleted paths>
    
    **Pre-computed verification (authoritative — do not re-run)**:
    - `npm run lint` → <result>
    - `npm run build` → <result>
    - `npm run test` → <result>
    - <other verification: tier-specific checks, codemod diff samples, etc.>
    
    **Direct mappings to acceptance criteria** (for spot-check
    efficiency): <AC N → file:line ranges or section pointers>
    
    **Iteration story** (if applicable): <prior panel runs and what
    was addressed; helps the evaluator avoid re-flagging fixed issues>
    
    ## Original ask
    
    <verbatim from PLAN.md phase description or the triggering message>
    
    ## Suggested spot-check (one tool use)
    
    <the most efficient single read for confirming the most-suspicious
    criterion; optional but reduces investigation thrashing>
    

    Pass the contract as a paraphrased summary plus the checkin file path link, not verbatim — the checkin file is in the repo and renders one click away. The packet's job is orientation; the depth is one click away.

    The skill returns a structured verdict (approved | flagged | flagged-conflict) with blocking_findings, advisory_findings, cli_runs, and conflicts lists. See .claude/agents/evaluator-base.md for the per-evaluator verdict shape that /guild-validate parses and aggregates.

  4. Iterate or commit.

    • If flagged: write a new resolution checkin (per § Checkin write) recording the remedy in Execution + Verdict (still flagged from the panel's view; the resolution checkin documents what changed). Re-invoke /guild-validate. Maximum 2 re-iterations per unit — on the third flag, stop and escalate to the user.

      Fixer delegation (per-unit switch, default ON). This loop defaults to delegating the remedy: compose fixer-<domain> via Bash("guild derive-panel --phase=fixer"), hand the flagged-finding packet to it through /guild-spawn for the minimal remedy (fixer's default_personality is pragmatist), then re-invoke /guild-validate. Route through /guild-spawn, not a direct Agent call. The operator can opt a tier/unit out back to inline fix. The re-evaluation gates regardless — delegation changes who fixes, not whether it's re-gated. Record the switch state in the resolution checkin's notes_for_pr. While no fixer-* agents are registered (the registry-mirror lag), fall back to inline fix with a one-line note; the live-spawn proof is the Phase 6 runtime gate.

    • If approved: write a finalization checkin with Execution / Verdict approved / Notes-for-PR populated. Check off the inventory items.

    This loop does not append guild findings to .guild-findings.jsonl (unlike /ev-loop-interactive step 4.5, it has no per-finding frequency-tracking step), so there is no findings-harvest step here — there is nothing in the scratch stream to fold. If this loop ever gains a findings-append step, mirror /ev-loop-interactive step 4.5e and run loom findings harvest <slug> --branch=<branch> at unit close (serial, single-writer, never mid-panel).

  5. Scope-shift detection (restrictive default). Runs only on approved units (flagged-and-iterating units skip this step). Look for signals that PLAN.md is stale; offer a plan revision (per § Revise PLAN.md) ONLY on two-signal concurrence.

    Signal sources:

    • Evaluator finding mentioning a missing or changed phase, deliverable, or load-bearing decision (in either blocking or advisory findings).
    • User comment during the unit that proposed a change to plan structure (not just tactical refinement of this unit's contract).
    • Plan contradiction (round 2+ plan surfaces a disagreement between engineers that current PLAN doesn't resolve).
    • Phase boundary (this unit is the last in its phase OR the next phase is about to start).

    Two-signal-concurrence rule: offer a plan revision only when 2+ signal sources fire for the same shift. Single signals get a note (see below); the loop does NOT interrupt.

    Offer flow: surface a short paragraph naming the two signals and a proposed one-line rationale. Use AskUserQuestion (or natural-language confirm) for accept/decline/defer. Default: decline (no interrupt unless the user explicitly accepts).

    On accept: integrate the change per § Revise PLAN.md. After the revision lands, proceed to step 6 (Phase update). Do not re-execute the unit.

    On single signal (no concurrence): append the signal to the unit's notes_for_pr array in the checkin JSON:

    signal: <signal type>: <one-line description> (single signal; no revise offered)
    

    Loop continues normally.

    On zero signals: no action. Loop continues.

  6. Phase update. After a checkin lands, the checkin-created event auto-fires from § Checkin write. Then update phase state per § Phase update with --status=in-progress --branch=<branch> (no need to set --pr here; the PR reference lives on the phase only after § Compose PR runs).

  7. Checkpoint? Evaluate the should-checkpoint policy (below). If any condition holds, refresh the PR per § Compose PR so it tracks the latest state. Otherwise continue to the next unit.

Panel auto-derivation

The agents list passed to /guild-validate is computed from the unit's file list at evaluation time, not hardcoded. The composition rules (file-type → evaluator mapping, precedence ordering, conflict policy) live in docs/PANEL-COMPOSITION.md and are the source of truth.

  1. Collect file paths. Take the unit's changed and created files from the tier batch. Practical recipe: git status --short minus deletions and substrate carryovers, plus any freshly-authored untracked paths.
  2. Derive the panel per § Derive panel, passing the file paths from step 1. Use the verb's stdout as the agents= argument verbatim.
  3. Pass to /guild-validate. Use the verb's stdout as the agents= argument verbatim. Confidence-loop tiers tend to touch a single file family (a codemod over .module.css, a rename over .tsx imports, etc.), so the derived panel is typically narrower than the interactive loop's — common case is contract-fit plus one or two domain lenses.

Edge cases follow the same shape as /ev-loop-interactive's § Panel auto-derivation:

  • Empty file list → contract-fit only.
  • Substrate-only files → contract-fit only.
  • L-004 session-boundary: if the derive-panel output includes an evaluator authored during this session, drop it from agents= manually and note the override in the checkin's Notes section.

Should-checkpoint policy

Checkpoint (refresh the PR per § Compose PR) when any of the following hold. All are read off state — there is no callable function.

  • A full tier has just finished.
  • The number of units since last PR update ≥ 5.
  • Verification is currently green and we're about to start a riskier tier.
  • The user has explicitly asked for a checkpoint.

Do not checkpoint mid-tier unless verification is green.

Tactical retro between tiers

Immediately after the last unit of a tier is approved, before moving to the next tier:

  1. Re-run pre-flight (working tree, verification).
  2. Compose the retro JSON (terse fields: items_processed, units, verification_at_close, what_went_smoothly, what_bit_us, adjustment_for_next) and write per § Retro write with --type=session --phase=<N> --tier=<M>.

Tactical retros are short and specific. Strategic retrospection happens at /loom-archive, not here.

Gate-and-ratchet

Before starting tier N+1, the gate closes if:

  • Any tier N unit is still flagged.
  • Verification is red.
  • The tier retro identified a blocker for tier N+1.

A closed gate stops the loop and reports to the user. The user decides whether to resume, re-tier, or bail.

Under the armed posture (--mode=auto), the gate-and-ratchet is the natural autonomous stop — the confidence loop's structural equivalent of the interactive loop's phase boundary (docs/AGENT-CONVENTIONS.md § Guild-offload posture, § Both loops). A closed gate is not an inline AskUserQuestion: the run surfaces it at the release boundary — it stops on the open PR (or, on a budget exhaust / unresolvable stall, the escape hatch: a draft PR + UNRESOLVED.md carrying the closed-gate reason), per Step 4. The loop does not auto-resume past a closed gate and does not self-decide a re-tier; a closed gate under autonomy is a stop-and-surface point even mid-stack, exactly as the ratchet is designed to be. The non-armed behavior above (report + the user decides inline) is unchanged when the posture is not armed.

Message-driven redirects

If the router passes a message like "address feedback on #14", this loop:

  1. Triage the PR's comments per § Triage PR comments + draft responses to get a classified list.
  2. Treats each blocker comment as a new unit in the current tier (or a new tier if the feedback rewrites scope).
  3. Iterates the unit loop. When done, refreshes the PR per § Compose PR.

Rules

  • Coverage before transforms. Do not start tier 1 without a complete inventory.
  • Pre-flight before any tier. Every tier starts from a known-good state.
  • One contract per tier, restated per unit only for deltas.
  • Evaluator always runs. No exceptions. Never self-approve.
  • Scope discipline. Fixes outside this phase's pattern get noted in "Notes for the PR" and are deferred — not absorbed silently.
  • Record corrections in the checkin. If the user redirects a unit mid-flight, overrides a decision, or the evaluator flags something the generator defaulted to incorrectly, note it verbatim in the checkin JSON's execution.corrections[] array. The session handoff (§ Save session) surfaces unresolved corrections into open_threads; § Capture finding (from-checkin pathway) promotes notable ones into learnings/session-notes/ at session close, and /griot-compact decides which get promoted further. The loop itself never writes to learnings/.
  • No AskUserQuestion under the armed posture. When --mode=auto is armed (docs/AGENT-CONVENTIONS.md § Guild-offload posture), the loop makes no AskUserQuestion calls mid-phase. Tier-contract negotiation goes to evaluator-contract-fit; a genuine fork goes to a /guild-plan panel or the escape hatch; a closed gate-and-ratchet surfaces at the release boundary. The harness would not silence a stray AskUserQuestion even in its own auto mode (RESEARCH § A), so any mid-phase call would hang an unattended run — this is an invariant, not a preference. The non-armed default keeps every human touchpoint.
  • Delegation default ON is preserved under the posture. Implementer / fixer delegation defaults ON in this loop (bulk transform → delegate the write), and the armed posture does not change that — the posture changes who decides (a panel vs the human), never who writes. The interactive loop's OFF default is the deliberate divergence (§ Both loops).
  • No emojis.

Output to router

On any termination — phase close, closed gate, or escalation — return:

  • Status: completed | gated | escalated | aborted
  • Phase: <N> <name>
  • Tiers run: list with counts (e.g., 1: 3 files, 2: 7 files)
  • Last checkin: <NN>
  • Last PR update: <url> or none
  • PR subscription: active (subscribed at open) | unavailable (local gh-only session — router falls back to § Wait for merge) | n.a. (no PR opened this run)
  • Reason (if not completed): one-line cause
  • Next action: what the router or user should do next

Failure modes

  • Pre-flight fails → stop, report, do not proceed.
  • Evaluator flags 3× → stop, escalate to user, do not auto-merge or force-approve.
  • Inventory shrinks mid-phase (items disappear) → stop; something moved under you. Regenerate inventory and reconcile.
  • Working tree dirty at boundary → stop; do not stash silently.
Install via CLI
npx skills add https://github.com/krambuhl/agents --skill ev-loop-confidence
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator