name: afk
description: Autonomous loop that drains the ready-for-agent queue on the issue tracker. Each iteration claims an issue, runs it in an isolated worktree, executes with claude or codex, merges back to main, and closes the issue. Use when the user wants to run AFK execution, drain a PRD, hammer specific issues, or otherwise let agents grind through the backlog.
argument-hint: "[--prd N | --issues N,N,N] [--runner claude|codex|opencode] [--alternate] [--fallback-runner] [--request TEXT] [-n N] [--once] [--boot-only] | fleet [N] | fleet stop | monitor | dashboard | daily-review | weekly-review | retake N | reap"
/afk
Read, don't reverse-engineer. This SKILL.md is the contract; source is build artifact.
Drain the agent-ready backlog. Single skill that owns issue selection, worktree isolation, inner-agent execution, GitHub state coordination, merge-back, and runner-fallback.
Runtime & Invocation
Run this skill — do not read its code. This
SKILL.mdis the complete behavioural contract. Thebin/bundle and thescripts/shell files are build/runtime artifacts, not documentation: opening them to "understand what/afkdoes" wastes context and is never required. Everything an agent needs to operate/afkis in this file.
The skill ships a single committed runtime bundle. Invoke it as:
RED_AFK_RUNNER=<claude|codex> node "$CLAUDE_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" <command> [params]
The invoking LLM is responsible for setting RED_AFK_RUNNER to its own host runner (codex from Codex, claude from Claude Code). Do not infer a different runner from binaries on PATH; use --runner only when the user explicitly pinned one.
afk.mjs is a dedicated forwarder (ADR 0039 entrypoint, build role run:dev): every argument is passed straight to the dev bundle, whose own command surface (run, monitor, fleet, …) is documented below. So … afk.mjs run --once, … afk.mjs monitor, and the bare … afk.mjs --issues 42 all reach the orchestrator. The generic entrypoint verbs (run <plugin> / fetch) belong to red-fetch.mjs, not to this launcher — they do not shadow the bundle's commands (#434).
Commands and their parameters are documented in When To Use below — that section is authoritative for the CLI surface. The commands are run (the default — a bare token routes here with argv preserved), monitor, dashboard, daily-review, weekly-review, retake, fleet, reap, codex-statusline (inspect/fix Codex's native footer widgets), codex-monitor-agent (emit the read-only Codex monitor-agent prompt for host-layer spawning), statusline (the shared RedSkills statusline producer for command-backed host adapters; reads the host payload on stdin and resolves the project root from $1 or the payload), and the hidden __supervise (the fleet supervisor entrypoint; never invoked by hand). run accepts --prd, --issues, --runner, --alternate, --fallback-runner, --request/-r, -n, --once, and --boot-only; monitor accepts --once; dashboard accepts --period N|Nd and --json; daily-review and weekly-review accept --json; retake accepts #ISSUE, --apply, --json, --repo OWNER/REPO, and --pr-limit N; fleet accepts an optional numeric target N, the stop subcommand, --request/-r, and --runner; reap takes no flags; codex-statusline accepts --config, --fix, and --json; codex-monitor-agent accepts --project-root, --mode run|fleet, --interval-seconds, and --json; statusline accepts an optional project-root path as $1.
The bundle is a single self-contained build (one file, one inlined runtime dependency, no node_modules, no install step) and is the public entrypoint. Every command — orchestration, supervisor, statusline, and hooks — executes natively in the bundle; the legacy shell orchestrator under scripts/ has been removed (ADR 0032, ADR 0034). Treat this SKILL.md as the contract: run the bundle, don't read its source.
Execution Substrate (ADR 0033)
The per-issue agent run executes on @reddb-io/red-castle — reddb.io's vendored sandcastle fork, a packages/red-castle submodule consumed as source (ADR 0061) — not on a hand-rolled claude -p / codex exec session whose stdout is grepped for stage transitions. The boundary is clean: the substrate owns execution, AFK owns the issue policy.
- sandcastle (one
run()call per attempt) spawns the inner agent, creates and manages the git worktree, runs the configured sandbox, captures the agent's stream, detects the completion signal, and lands the agent's commits on the worker branch. - AFK keeps everything around that call: issue selection, the three-layer claim, the handoff file, the package-aware feedback gate, lock-toggled landing (ADR 0030), base resolution (ADR 0031), the terminal-event envelope, close, and the boot/monitor/mirror sweeps.
AFK drives the sandcastle Orchestrator through injected providers (SandcastleDeps: run, agentFor, sandboxFor) so the single adapter module is the only code coupled to the package. The pure mapping (buildRunOptions → RunOptions, interpretOutcome → outcome) is unit-tested with run injected; the real providers are wired lazily once, on the first agent run, so a monitor / reap / empty-queue path never imports sandcastle. AFK's canonical sentinels <promise>DONE</promise> and <promise>BLOCKED</promise> are registered as sandcastle completionSignals, so the AGENT-PROMPT.md contract is unchanged — the agent still authors its own exit.
run() returns { branch, commits, completionSignal }; AFK maps completionSignal to an outcome (done / blocked / no-sentinel) and proceeds with its own feedback → landing → envelope → close. Execution is a single runAgent call, not a multi-mode dispatch over named run-modes.
When To Use
/afk— every issue currently labelledready-for-agent./afk --prd 42— only issues that reference PRD #42 (byprd: #42line in body, parent link, orprd:42label)./afk --issues 356,359,362— explicit list, in that order./afk --runner codex— pin a backend (disables detection cascade; mutually exclusive with--alternate)./afk --alternate— opt in to round-robin runner rotation between issues (claude → codex → claude → …)./afk --fallback-runner— opt in to swapping runners mid-issue when one returnsRUNNER_EXHAUSTED. Without this flag, exhaustion routes the issue through boundedblocked:quotarecovery and stops the outer run with exit 75./afk --request "dont run cargo tests for this issue resolution"or/afk -r "..."— add a special user request block to every inner-agent prompt for this run./afk -n 5— cap at five issues.-n Ncaps the run atNissues;-n 0(and omitting-n) drains the whole queue until it is empty (0means unlimited, not zero). For a no-agent dry-run use--boot-onlyinstead./afk --once— single supervised iteration. Use for debugging the prompt./afk --boot-only— run the boot sweeps then exit without claiming or spawning an agent; a safe dry-run to inspect bootstrap / orphan-cleanup / unblock-sweep / precheck./afk monitor— readonly status board, aggregates every.red/tmp/workers/*/*/afk.state.jsonso you see all live workers from another terminal. Also (binding): mirrors live workers onto the host runner's native task surface —TaskCreate/TaskUpdateunder Claude Code, the sub-agent surface under Codex when present (falls back to the dashboard otherwise). See Task Mirror below — this is not optional and you must do it on every tick, even when the user only asked "como estamos?"./afk dashboard [--period 30d] [--json]— readonly process dashboard: open PRDs/issues, globalrunningissues, local AFK workers on this machine, issue/PR flow metrics, and DORA proxy metrics./afk daily-review [--json]— readonly daily operational review from yesterday local midnight to now: delivery big numbers, local worker attempts/time, token spend when available, HITL/blocker challenges, and issue/PR cycle times./afk weekly-review [--json]— readonly six-day operational review from six-days-ago local midnight to now, with the same sections asdaily-review./afk retake 123 [--apply] [--json]— issue resumption report: reads the issue, linked PRs, matching local/remote branches, matching local worktrees, HITL state, and prints the next command to continue, fix, recreate a ship worktree, or run/ship. With--apply, executes only safe local setupgitoperations and still leaves merges/HITL to/shipor/hitl. The parser accepts#123too; quote it when invoking through a shell./afk fleet [N]— launch the supervisor maintainingNconcurrent workers (default2). See Fleet Mode below./afk fleet stop— gracefully shut down a running fleet supervisor and cancel its auto-monitor cron./afk reap— run branch hygiene without starting a worker: one count line for remoteafk/*, remoteafk-attempts/*, and localafk/*, then the same safe reapers used at boot.
Running /afk in an execution environment (GitHub Actions)
The same /afk --issues N --runner opencode --once command runs unchanged in a
GitHub Actions runner — one attempt, one issue, one PR per invocation, no fleet,
no admin-merge. Only the trigger and the secret-injection surface differ.
The lane is packaged as three layers (ADR 0059/0062): the reusable workflow
.github/workflows/reusable-afk-attempt.yml (triggers + trust gate) → the composite
action .github/actions/afk-attempt (execution) → the afk.mjs launcher +
Release bundle (runtime). Two adoption paths: turnkey (call the reusable) or
composable (uses: reddb-io/red-skills/.github/actions/afk-attempt@v1 with
your own triggers/gate). Pin @v1/SHA for reproducibility. The composite action
carries its own red-skills checkout, so the launcher resolves in any adopter repo
— no workspace build, no submodule.
Triggers: issues: labeled/opened (on ready-for-agent), workflow_dispatch,
workflow_call. Trust gate (ADR 0056): author + label-actor must both be
allowlisted. Runner: opencode (API-auth); point it at OpenAI/MiniMax/OpenRouter
by wiring the matching key + a <provider>/<model> slug via the model input.
→ Full adopter guide:
actions-lane.md (architecture, both examples, all
inputs, triggers, trust gate, auth precedence, the MiniMax recipe, permissions).
The k8s job manifest + real-environment E2E remain tracked as #631 (ADR 0059).
Parallelization
/afk is trivially parallel — just open another terminal and run /afk again. No flag, no coordination, no slot to manage.
/afk # terminal A → spawns worker "wZ2R4"
/afk # terminal B → spawns worker "wK7M2"
/afk # terminal C → spawns worker "w9RQP"
Each invocation generates its own worker ID — literal w plus 4 random characters from [A-Z0-9] (e.g. wZ2R4, ~1.7M possible IDs) — and uses it as the prefix for every per-run file. The leading w makes the worker directory .red/tmp/workers/{id} an unambiguous live-worker anchor. The ID is printed on the first line of the run so you can tail or kill it later.
Per-attempt files live under .red/tmp/workers/{id}/{N}-a{n}/ in the primary checkout, where {id} is the worker ID, {N} is the issue number, and {n} is the per-issue attempt counter (derived by the attempt-ledger — every retry, even by a different worker, gets a fresh a{n} directory). Everything for one (worker, issue, attempt) is in one directory — when the attempt ends successfully the whole directory is removed; when it blocks the whole directory is preserved. The worker also holds a single per-worker liveness anchor at .red/tmp/workers/{id}/worker.pid (see the worker.pid row below).
| Path | Purpose |
|---|---|
.red/tmp/workers/{id}/worker.pid |
Per-worker liveness anchor: the orchestrator's PID, written once at bootstrap and removed on the worker's EXIT trap (along with rmdir of the empty worker dir). The single liveness anchor for the worker; the fleet supervisor's slot matching keys off it. |
.red/tmp/workers/{id}/{N}-a{n}/worktree/ |
Git worktree for issue N on attempt n. Lives inside the gitignored .red/tmp/ so it never pollutes sibling directories. |
.red/tmp/workers/{id}/{N}-a{n}/afk.log |
Append-only plain log for this attempt (orchestrator output + inner-agent stdout + heartbeat lines). Per-attempt scope — each attempt gets a fresh log. |
.red/tmp/workers/{id}/{N}-a{n}/agent.log.jsonl |
Clean agent lane (issue #250) — one type=agent JSONL record per assistant turn and nothing synthetic, so it is the true liveness signal and reads as a live transcript: tail -f … | jq -r .msg. Single-writer. |
.red/tmp/workers/{id}/{N}-a{n}/log.jsonl |
The firehose (issue #250) — every record of the attempt in the uniform JSONL envelope: agent turns, heartbeat vitals, hook dispatches, runner timings, and errors. Flock-serialised (many concurrent writers). |
.red/tmp/workers/{id}/{N}-a{n}/afk.state.json |
State snapshot for this attempt. Schema in docs/ENVELOPE.md. |
.red/tmp/workers/{id}/{N}-a{n}/handoff.md |
Handoff file the inner agent reads — <issue-body> (issue body verbatim, including the ## Agent brief markdown section), <previous-attempts>, <human-guidance-thread> (one <human-guidance> per extracted directive), <thread-discussion> (advisory comments with no directive marker), <agent-notes>. Top-level XML wrappers make body/comments/notes unambiguous. Template in docs/HANDOFF.md. |
Two workers cannot claim the same issue thanks to a local mkdir lock at .red/tmp/claims/{N}/ plus a gh issue view pre-check before the edit. The gh edit itself is not atomic (see Issue Lifecycle below for the full three-layer scheme). The race surface is the brief window between two separate checkouts on the same host — acceptable for the intended scale.
Hard Preconditions
Refuse to start if any fail — the user fixes them.
git remote -v: SSH only. Reject HTTPS — never auto-rewrite.gh auth statussucceeds.- Repo has
mainbranch:git -C primary log -1 mainworks. - Label
ready-for-agentexists; if not, point at/triage. pnpmis on PATH.
Bootstrap
Run before the first iteration:
- Ensure
.red/tmp/exists (create) and in.gitignore(append if missing). - Generate worker ID:
w+ 4 random[A-Z0-9]chars (e.g.wZ2R4). Regenerate on live-directory collision. Printworker: {id}first. - Detect runner (first wins; log
runner: <r> (detected via <method>)). Load the matching runner doc. Never probecommand -v; swap only via--fallback-runner.--runner Xpin (opencodevalid only here or via env) →RED_AFK_RUNNERenv → env-var sniff (CLAUDECODE/CLAUDE_CODE_ENTRYPOINT/CLAUDE_CODE_SSE_PORT→claude;CODEX_HOME/CODEX_SANDBOX/CODEX_SANDBOX_NETWORK_DISABLED/CODEX_MANAGED_BY_NPM→codex) → process-tree → path (~/.claude/→claude;~/.codex/→codex) → defaultclaude.
- Read
SAFETY.md— binding for every shell action. - Write
worker.pid: create.red/tmp/workers/{id}/, write current PID once — the worker's liveness anchor for its whole lifetime. - Install signal handlers (SIGINT/SIGTERM/EXIT): release claim, preserve attempt dir, remove
worker.pid, rmdir empty worker dir.
Boot-time sweeps
At boot the bundle reclaims stale state — orphan attempt dirs (issue-state TTL), the per-issue attempt cap (#257), the afk-attempts/* snapshot-branch grace cleanup (#258), and the on-demand /afk reap branch reaper (#275). Mechanics: docs/BOOT-SWEEPS.md.
Dependency Unblock — req:N edges, close cascade + boot sweep
Dependencies are first-class req:N edge labels (one per blocker), and a dependency-blocked issue holds the blocked:dependency state — not ready-for-human (it is healthy, waiting, and never pages). Two mechanisms promote it to ready-for-agent:
1. Close cascade (event-driven, the fast path). Immediately after /afk closes an issue #N on the DONE path (after the completion sweep), it re-evaluates every dependent of #N:
gh issue list --label req:N --state open --json number,labels.- For each dependent, read its
req:*labels and resolve each referenced issue's state (the just-closed #N is known closed; others via a cached lookup). - When every
req:*of a dependent is now closed:gh issue edit --remove-label blocked:dependency --add-label ready-for-agent+ post🤖 /afk unblocked: all dependencies closed (#…).
Best-effort: a gh failure here logs a warn: and never fails the close — the boot sweep below catches anything the cascade missed.
2. Unblock Sweep (boot-time, the safety net). After orphan cleanup and before Straggler Check, /afk re-scans dependency-blocked issues by label and promotes any whose deps all closed:
gh issue listfor openblocked:dependencyissues withnumber,labels,body.- Deps come from the
req:*labels (the source of truth); for pre-req:Nissues with no such label, fall back to extracting#Nrefs under the literal## Blocked bybody heading (- [ ] #N) only when the issue is still labelledblocked:dependency. - Resolve each dep via
gh issue view <N> --json state; promote only when every dep isCLOSED. - On promotion: remove the holding label (
blocked:dependency), addready-for-agent, post the audit comment, and logunblocked N issue(s): #A #B.
ready-for-human is a human gate, not dependency-wait. The boot sweep must not promote it from a legacy ## Blocked by body parse, because a closed blocker can still encode a failed measurement or a no-go decision. blocked:dependency issues do not have that ambiguity: the label means dependency-wait, which is the whole point of separating it from ready-for-human.
Current Blocker State
Human gates are first-class issue-body state, not implicit thread archaeology. Before claiming an issue, /afk checks for an active ## Current blocker block:
## Current blocker
<!-- red:blocker-state v1 -->
status: blocked
kind: decision
ref: #856
summary: Phase 2 measured no columnar read win.
next: Human must decide whether to stop, redesign, or continue anyway.
<!-- /red:blocker-state -->
If this block is present with status: blocked, /afk does not create an attempt. It removes ready-for-agent, adds ready-for-human plus the typed blocker label, leaves the issue open, and waits for /hitl.
When an attempt escalates to a terminal human page (for example BLOCKED, validation failure, non-recoverable stall/infra, or a recoverable reason after retry-budget exhaustion), the runtime writes or replaces this block so the next /hitl turn can start from the current blocker instead of re-reading every old envelope. /hitl clears the block to None, records it under ## Resolved blockers, refreshes ## Agent brief, and moves the issue back to ready-for-agent only when the next agent can continue without guessing.
Use ## Blocked by only for mechanical dependencies that should auto-promote on close. Use ## Current blocker / ## Human decision needed for gates, measurements, product calls, or any state where "the referenced issue closed" is not enough to prove the work is delegable.
Straggler Check
Before issue selection, /afk counts open issues in states it cannot consume:
unlabeled— never triagedneeds-triage— triage in progressneeds-info— waiting on reporter
If any of those are non-zero, print a warning and (on a TTY, not in --once) prompt to confirm before proceeding. This catches the "issue perdida" case where a fresh report never made it through /triage and is silently invisible to /afk.
The systemic fix is the red-issues-needs-triage.yml workflow installed by /setup-red-skills, which auto-applies needs-triage to every fresh issue. The straggler check is the in-loop safety net for repos where the workflow isn't installed yet.
Issue Selection
Pull: gh issue list --label ready-for-agent --state open --json number,title,labels,body --limit 100. Drop every type:prd issue before any filter (log /to-issues N warning for each). Prepend priority:urgent issues before any filter, oldest first.
Filters for the non-urgent remainder:
--issues N…: keep those numbers in argument order; error if missing or notready-for-agent; PRDs rejected.--prd N: keep issues withprd: #Nin body, parent link, orprd:Nlabel; PRD itself excluded.- Default: all remaining
ready-for-agent,priority:highfirst, then ascending by number.
Final queue: [urgent…] + [filtered…], deduped. Empty → <promise>NO MORE TASKS</promise>, exit 0.
Issue Lifecycle (the /afk slice)
Canonical state machine lives in setup-red-skills/triage-labels.md. The portion /afk touches:
ready-for-agent
│
(1) claim
remove ready-for-agent
add running
post start comment
│
▼
running
┌───┴───┐
│ │ inner agent works in worktree → emits DONE | BLOCKED
│ │ orchestrator runs feedback loops, then merges to main
│ │
│ ├──── DONE + green + merged + pushed
│ │ │
│ │ (4a) close
│ │ remove running
│ │ gh issue close --reason completed
│ │ │
│ │ ▼
│ │ closed
│ │
│ └──── terminal failure
│ │
│ classify Attempt Outcome
│ add typed blocked:<reason>
│ │
│ ┌────────┴────────┐
│ │ │
│ │ recoverable and │ non-recoverable, or
│ │ attempt < cap │ recoverable at/over cap
│ │ │
│ ▼ ▼
│ remove running remove running
│ add add
│ ready-for-agent ready-for-human
│ post/retry post blocker/budget
│ audit exhausted comment
│ │ │
│ ▼ ▼
│ ready-for-agent ready-for-human
│ (fresh attempt) (human gate)
│
└──── orchestrator interrupted (SIGINT/SIGTERM)
│
(4c) release
remove running
restore ready-for-agent
post interruption comment
│
▼
ready-for-agent (next /afk run can pick it up)
Label transitions are not atomic at the gh level — gh issue edit --remove-label A --add-label B resolves the new label set client-side and submits the union, so a removed-but-no-longer-present label is a silent no-op and the edit returns 0. To prevent two parallel /afk runners from both thinking they claimed the same issue, the per-issue claim uses three layers:
- Local
mkdirlock at.red/tmp/claims/{N}/(POSIX-atomic). Workers in the same checkout race here, and the loser skips. - Pre-check via
gh issue view --json labels— ifready-for-agentis already gone orrunningis already present, abort before the edit. Cuts the cross-checkout race window to roughly one round-trip. - Stale-lock sweep at boot, during orphan cleanup — any
.red/tmp/claims/{N}/whose recorded pid is dead gets reclaimed automatically.
Residual gap: two clones of the same repo on the same host (or different hosts) do not share .red/tmp/, so each holds its own mkdir lock and the gh edit race re-opens for the brief window the pre-check leaves uncovered. Acceptable for the intended scale (a few terminals, one checkout). If you need cross-host claim safety, gate /afk on a proper coordinator instead of GitHub labels.
Typed Failure Labels And Recovery Caps
AFK labels terminal failures with a descriptive blocked:<reason> label in addition to the routing label. The typed label is observability: a retry path still adds ready-for-agent, and an escalated path still adds ready-for-human.
| Attempt Outcome | typed label | recovery policy | default cap |
|---|---|---|---|
exhausted |
blocked:quota |
quota |
RED_AFK_RETRY_QUOTA=3 |
runner-transient |
blocked:runner-transient |
runner-transient |
RED_AFK_RETRY_RUNNER_TRANSIENT=3 |
merge-conflict |
blocked:merge-conflict |
merge-conflict |
RED_AFK_RETRY_MERGE=3 |
ci-failed |
blocked:ci |
none — escalates to a human/CI-aware finisher (never re-runs the agent) | n/a |
ci-pending |
blocked:ci |
none — escalates to a human/CI-aware finisher (never re-runs the agent) | n/a |
no-sentinel |
blocked:crashed |
crashed |
RED_AFK_RETRY_CRASH=1 |
hook-aborted |
blocked:policy |
policy |
RED_AFK_RETRY_POLICY=1 |
blocked |
blocked:spec |
none — escalates immediately | n/a |
feedback-failed |
blocked:validation |
none — escalates immediately | n/a |
stalled |
blocked:stalled |
none — escalates immediately in the per-issue path | n/a |
infra |
blocked:infra |
none — escalates immediately | n/a |
done / claim-lost |
none | none | n/a |
Recoverable reasons retry while the 1-based attempt number is less than the cap. At the cap and above, the same reason escalates to ready-for-human, keeps the typed blocked:<reason> label, and posts a retry-budget-exhausted comment. Missing, non-numeric, zero, or negative RED_AFK_RETRY_* values fall back to the default cap.
blocked:ci never re-runs the agent (#812). On an enforce_admins base, an admin-merge cannot bypass required status checks, so a completed, MERGEABLE PR whose required checks failed (ci-failed) or are still pending past the CI-wait timeout (ci-pending) is not a merge conflict. These outcomes carry blocked:ci and escalate straight to ready-for-human with the PR left open — they are deliberately NON-recoverable so AFK never re-runs the whole inner agent (re-spending tokens) for work that is already done and only awaiting CI. A human / CI-aware finisher drives the existing PR to merge once CI is green. This is gated by afk.merge.ci_aware (see step 8); with it off, the unlocked path admin-merges immediately (correct only on a base with no required checks).
Per-Issue Loop
For each issue N:
- Claim.
gh issue edit N --remove-label ready-for-agent --add-label running. Then resolve the attempt number{n}from the attempt-ledger (per-issue across all workers), create the attempt directory.red/tmp/workers/{id}/{N}-a{n}/, openafk.log(tee target for orchestrator output), and initialiseafk.state.jsonperdocs/ENVELOPE.md. The orchestrator PID is already recorded once in the per-workerworker.pid(written at bootstrap) and is also embedded inafk.state.json's.pidfield — there is no per-attempt pid file. Comment a start line on the issue: ISO timestamp, runner identity, worktree path. If labelling fails because someone else already claimed it, abandon the attempt directory and skip to the next issue. - Worktree. Resolve the base branch with precedence lock > pin > main (ADR 0031): the primary checkout's branch-lock value (
.red/tmp/branch-lock.yaml, written by the branch-lock skill) wins when set; else the pinned branch (ADR 0008 — the issue's ownbranch:line, else its parent PRD's); elsemain. ({pinned}below denotes this resolved base.) Thengit -C primary fetch origin {pinned}andgit worktree add .red/tmp/workers/{id}/{N}-a{n}/worktree -b afk/{id}/{N}-{slug} origin/{pinned}from the primary checkout. The worktree lives inside the gitignored.red/tmp/tree so it never appears ingit statusformain. Immediately after worktree creation the runtime mirrors the new branch on origin (git push origin -u HEAD:refs/heads/afk/{id}/{N}-{slug} --force-with-lease) and installs a per-worktreepost-commithook that fire-and-forgets agit push origin HEAD --force-with-leaseafter every inner-agent commit. Both calls are best-effort: a network/auth failure logs awarn:line and the iteration continues — theafk-attempts/*failure-push net (seedocs/ENVELOPE.md) still fires on terminal failure. Net effect:afk/{id}/{N}-{slug}is a remote-tracked branch throughout the iteration, so a SIGKILL anywhere from here on preserves the diff on origin without manual recovery. - Handoff file. Materialise the handoff into
.red/tmp/workers/{id}/{N}-a{n}/handoff.mdusing the template below — top-level XML wrappers (<issue-body>,<previous-attempts>,<prior-attempt-context>,<human-guidance-thread>,<agent-notes>) keep the issue body, orchestrator-authored prior attempts, the restart-informed retry block, human comments, and the inner-agent scratchpad unambiguous.<issue-body>carries the issue body verbatim (including the## Agent briefsection written by/triage). The handoff file lives one level above the worktree so the inner agent reads it via../handoff.mdfrom inside the worktree, and so it survives a worktree wipe on retry.- Restart-informed retries (PRD #244, issue #255). On a terminal failure the orchestrator writes two marker files into the failing attempt dir:
snapshot-branch.ref(theafk-attempts/{id}/{N}-{slug}ref it pushed to) andfailure.reason(the envelope summary). On the next attempt — the runtime reads those markers before the current attempt dir is created, so it sees the prior attempt's state — the handoff builder fetches that snapshot branch into the worktree under the local refrefs/afk/prior-attemptand emits a<prior-attempt-context>element carryingprev-snapshot-branch, the verbatimprev-failure-reason, andprev-fetched-ref. The retry still branches fresh off the base (step 2 is unchanged), so a wrong prior approach never compounds; the fetched ref is read-only history for the inner agent to inspect. First attempts skip all of this and are byte-for-byte unchanged.
- Restart-informed retries (PRD #244, issue #255). On a terminal failure the orchestrator writes two marker files into the failing attempt dir:
- Local heartbeat marker. Write one
[heartbeat] iteration started for #Nline toafk.log. Slice D retired the periodic GitHub-comment heartbeat (:one: :two: :three: :four:) — local liveness is now signalled by the inner-agent stdout stream tee'd intoafk.logplus state-file mtime, both of which already exist. - Inner agent. Drive the inner agent via the single sandcastle
runAgentcall (ADR 0033, Execution Substrate above): the handoff file is thepromptFile, the resolved runner/model selects the provider, the resolved sandbox mode selects the isolation backend, and the worker branch is thebranchStrategytarget forked off the base resolved in step 2. The optional--request/-rspecial user request block is materialised into the handoff. sandcastle captures the agent's stream (surfaced through theonAgentStreamEventcallback, which AFK fans out toagent.log.jsonl+ the firehose) and detects the<promise>DONE|BLOCKED</promise>completion signal; AFK reads stages off that stream — seedocs/ENVELOPE.md. The call's termination bounds (idleTimeoutSeconds,maxIterations, and the commit-anchored attempt guard) are documented under Attempt Completion & Termination Bounds. - Inner result.
- Inner committed and emits
<promise>DONE</promise>→ continue to feedback loops. - Inner emits
<promise>BLOCKED</promise>plus notes appended to the handoff file → comment the blocker on the issue, re-labelready-for-human, drop the worktree, go to next issue. - Inner emits
<promise>NO MORE TASKS</promise>from inside one iteration → ignored. That sentinel is for the outer loop. - Runner-exhausted signal (rate limit / quota error string per runner) → without
--fallback-runner, terminate this issue as Attempt Outcomeexhausted; route it through bounded recovery (blocked:quota, retry underRED_AFK_RETRY_QUOTA, escalate at/over cap). With--fallback-runner, keep the same worktree and handoff, swap runner once, and only routeexhaustedif the swapped runner also exhausts.
- Inner committed and emits
- Feedback loops. In the worktree, derive relevant package scopes from the worker branch diff against the pinned base, then run
test,typecheck,lint, andbuildwithpnpm -C <scope>for each touched package that declares the script. Root-only repos keep using the root package. Any missing script is reported as an explicit per-scope skip in the validation section. Any failure blocks the merge and flips the issue toready-for-humanwith the validation report in the blocker envelope. - Merge. All steps target the base branch resolved in step 2 (
{pinned}, defaults tomain). The integration prelude is shared; landing is lock-toggled by the branch-lock state (ADR 0030).- Primary dirty? Auto-stage and commit
chore(afk): pre-merge snapshot for #Nin primary. Nevergit stash. Nevergit checkout -- .. git -C primary fetch origin {pinned}. The primary checkout is pinned tomainby the precheck; when{pinned}is notmain, switch the primary checkout onto it for the merge (creating the local branch fromorigin/{pinned}if needed) and restore it tomainon every exit path.- Integrate the fetched tip into local
{pinned}before merging: fast-forward when local is strictly behind, otherwise rebase local commits ontoorigin/{pinned}. Without this the worker branch merges onto the stale boot-time HEAD and the push is rejected non-fast-forward whenever origin moved mid-run. If integration fails (diverged history that won't rebase), abort the merge and route themerge-conflictoutcome through bounded recovery. - Capture the integrated tip (
pre_merge_sha) for rollback, then land per lock state:- Locked (
.red/tmp/branch-lock.yamlpresent —{pinned}is the locked branch):git -C primary merge --no-ff afk/{id}/{N}-{slug} -m "merge: #{N} {title}"directly into the local locked branch, thengit -C primary push origin {pinned}. Nothing reachesmain— promoting the locked branch tomainis the operator's call. Conflict → one-shot self-resolve; still-conflicting →git merge --abort→ boundedmerge-conflictrecovery. Push rejected → roll back topre_merge_sha→ boundedmerge-conflictrecovery. - Unlocked: land via an admin-merged PR. Force-push the attempt branch's final state to origin, open (or reuse) a PR
--base {pinned} --head afk/{id}/{N}-{slug}, thengh pr merge --admin --merge. The PR is the durable per-attempt history — it survives the branch deletion in step 11. No completed work reaches{pinned}except through this admin-merge. Then fast-forward local{pinned}to the PR merge commit so the closing envelope'smerge_shais correct.- CI-aware merge (#812,
afk.merge.ci_aware: true). On anenforce_adminsbase the admin-merge cannot bypass required status checks, so admin-merging a just-opened PR with checks pending is rejected. Whenci_awareis on, after opening/reusing the PR pollgh pr view --json mergeStateStatus,statusCheckRollupon a bounded loop (budgetRED_AFK_MERGE_CI_TIMEOUT_S, default 1800s) until the PR settles, thengh pr merge --admin --mergeonly oncemergeStateStatus == CLEAN(or it isBLOCKEDsolely by a required review, which--adminwaives). Route the distinct failure modes instead of collapsing all tomerge-conflict: a real git conflict /DIRTY/BEHIND→merge-conflict(bounded recovery — correct here); a failed required check →ci-failed(blocked:ci); checks still pending at the timeout →ci-pending(blocked:ci).ci-failed/ci-pendingleave the PR open and escalate toready-for-human— they never re-run the inner agent for already-complete work (see Typed Failure Labels And Recovery Caps). Withci_awareoff (the default), a push/create/admin-merge failure routes through boundedmerge-conflictrecovery as before.
- CI-aware merge (#812,
- Locked (
- Primary dirty? Auto-stage and commit
- Push. Folded into step 8: the locked path pushes the locked branch over SSH (rollback on reject); the unlocked path's push is the admin-merge of the PR. Either way, do not retry-loop indefinitely.
- Close. Validation comment on the issue: tests pass/fail, lint, typecheck, build, commits added, files touched. Then
gh issue close N --reason completed. Removerunninglabel. Once the close succeeds, delete the live remote branch (git push origin --delete afk/{id}/{N}-{slug}) so the remote graveyard stays tidy — the merge commit on{pinned}already carries the diff. Best-effort: a failed delete (branch protection, network) logs awarn:line and the close still completes; the orphanafk/*branch can be cleaned up later. - Cleanup (split teardown, issue #256). Every close path — success and failure/blocker — always drops the heavy worktree (
git worktree remove .red/tmp/workers/{id}/{N}-a{n}/worktree) while retaining the cheap artifacts (the JSONL laneslog.jsonl/agent.log.jsonland thehandoff.md) in the attempt directory for post-mortem. On DONE the merged branch is also deleted (git branch -d afk/{id}/{N}-{slug}, after the worktree is gone). The retained attempt's state file is marked not-live (pid: 0) so monitor / mirror / statusline read it as finished. No worktree survives a close; the attempt dir itself is reclaimed later by the boot-time orphan sweep's TTL or, on DONE, immediately by the completion sweep below. The remoteafk/{id}/{N}-{slug}ref was deleted in step 10 on DONE; failure paths leave the remote ref intact and instead push the canonicalafk-attempts/{id}/{N}-{slug}ref (seedocs/ENVELOPE.md).- Completion sweep (issue #257). Once an issue is closed and merged, the runtime reclaims every attempt dir for that issue across all workers via the canonical
.red/tmp/workers/*/{N}-a*glob — not just the worker that completed it. The split-teardown retention only buys time for the orphan-sweep TTL; a completed issue needs none of it, so its retained dirs (including this worker's just-closed one) go now. A live worker's active attempt — one whose own state file still carries a livepid— is always skipped, though the claim lock makes a live duplicate of a just-completed issue unlikely.
- Completion sweep (issue #257). Once an issue is closed and merged, the runtime reclaims every attempt dir for that issue across all workers via the canonical
- Tick. Update state file. Recompute ETA from rolling average of last 3 issue durations. Print one summary line:
finished {done}/{total} ({pct}%) — next: #{next}.
Runner Fallback
Default behaviour is no rotation and no fallback — the runner resolved by the detection cascade (see Bootstrap step 4) is used for every issue in the run. RUNNER_EXHAUSTED is first handled as the per-issue Attempt Outcome exhausted: the issue gets blocked:quota, returns to ready-for-agent while under RED_AFK_RETRY_QUOTA, and escalates to ready-for-human at/over the cap. The outer session then stops the drain and returns exit 75 (EX_TEMPFAIL) so a supervisor can retry later instead of treating runner quota as a clean queue drain. Both rotation/fallback behaviours are opt-in:
--alternatere-enables round-robin rotation between consecutive issues (claude → codex → claude → …). Mutually exclusive with--runner.--fallback-runnerre-enables mid-issue swap when the active runner returnsRUNNER_EXHAUSTED. Without it, exhaustion is terminal for the current runner invocation and routes through bounded recovery asblocked:quota.
Exhaustion detection lives in runner-claude.md, runner-codex.md, and runner-opencode.md — they own the per-runner error strings. The orchestrator only sees RUNNER_EXHAUSTED as a structured signal. Note opencode is an API-auth runner; the auth key rides in OpenCodeOptions.env and the model slug's leading segment (openai/, minimax/, openrouter/...) tells OpenCode which endpoint to dispatch to. See runner-opencode.md Auth env precedence for the env-var order (OPENAI_API_KEY > MINIMAX_API_KEY > OPENROUTER_API_KEY). In an API-key-only lane with no host session, run it without --fallback-runner so exhaustion is terminal-through-recovery rather than a swap to a session-auth runner that is not present.
When swap happens mid-issue (only with --fallback-runner), the same worktree and handoff file are reused; the new runner sees the previous agent's Notes appended.
Attempt Completion & Termination Bounds (<promise> is canonical — ADR 0028)
The <promise>…</promise> sentinel the inner agent emits is the canonical "attempt is over" signal. AFK registers <promise>DONE</promise> and <promise>BLOCKED</promise> as sandcastle completionSignals, so sandcastle stops re-invoking the agent the moment one is observed (line-anchored, so the agent quoting the sentinel in planning prose does not false-positive). sandcastle owns the stream read and signal detection — there is no hand-rolled foreground pipe reader, no recursive SIGTERM/SIGKILL of a claude | jq | grep | tee pipeline, and no RED_AFK_ATTEMPT_GRACE_S / RED_AFK_ATTEMPT_KILL_S / RED_AFK_WATCHDOG_GRACE_S tear-down knobs. This is the architecture fix flagged during the #216 bash-hang diagnosis ("a gente tem que ser mais sensível ao resultado da promise"): the completion signal is the terminator, and the substrate enforces it.
runAgent maps the returned completionSignal to an outcome: <promise>DONE</promise> → done, <promise>BLOCKED</promise> → blocked, no signal → no-sentinel. The completion signal is the real terminator — a normal issue finishes in 1-3 iterations — but three independent bounds cap a run that never signals so a stuck agent cannot burn cycles forever:
idleTimeoutSeconds(default 600 s, envRED_AFK_IDLE_TIMEOUT_S) — sandcastle's per-iteration silence watchdog: an iteration producing no stream output for this long is aborted. This is the actual termination bound on a quiet hang.maxIterations(default 12, envRED_AFK_MAX_ITERATIONS) — the sandcastle Orchestrator re-invocation ceiling (issue #322). sandcastle's own default is 1, which would cut the agent off after a single agentic invocation before it can emitDONE; AFK raises it so the completion signal stays the terminator while bounding repeated no-sentinel failures. A non-numeric / zero / negative value (env or config) is ignored and falls back to the default, so a typo can never disable the cap or pin the agent to 1.- Commit-anchored attempt guard (default 2700 s, env
RED_AFK_ATTEMPT_TIMEOUT_S/afk.attempt_timeout, ADR 0044/0045) — proof-of-progress: a run that stays busy (re-exploring, re-running tests) without landing a new commit within the cap is aborted, resetting on every commit. This catches the "productive infinite loop" thatidleTimeoutSecondsmisses because the agent is never silent. It maps to atimeoutoutcome →blocked:stalled/ready-for-human, preserving the worktree/PR. Armed only undernone(no-sandbox) isolation, where the worker branch's commits land in the shared.gitso HEAD advance is observable; under docker/podman the commits are not host-visible until final sync, so a commit-anchored guard would false-fire and is skipped (idle timeout + maxIterations still apply). The fleet hard stall reaper (see Fleet Mode) is separately gated by the active-vitest/tsc/cargo-descendant + flat-cpu predicate, so a worker mid-build/test is never killed for being idle on the agent lane.
No sentinel is on_attempt_error. When sandcastle's run completes with no completion signal, the agent never declared the attempt over (crash, kill, or a daemon that ended without speaking): the outcome is no-sentinel, on_attempt_error fires (error class no-sentinel), and post_attempt does not fire for that invocation. The issue routes through bounded blocked:crashed recovery. With the default RED_AFK_RETRY_CRASH=1, the first such failure escalates to ready-for-human; a higher cap can requeue it first. Runner exhaustion (RUNNER_EXHAUSTED, detected by matching the per-runner quota/rate-limit strings against the thrown sandcastle error) stays out of the sentinel channel — it keeps its own exhausted outcome and the --fallback-runner swap. A transient runner transport/setup failure maps to runner-transient and is bounded by AFK's retry policy rather than escaping as a crash.
The parsed outcome rides into the post_attempt mutable context as result.outcome and the RED_AFK_RESULT_OUTCOME env var, so hooks (and the Memory attempt.hooks record, #216) see the agent-authored exit, not just success/fail.
Preventive counterpart lives in AGENT-PROMPT.md under Background Tasks and Polling — inner agents are required to cap every polling loop with a deadline. The termination bounds are the safety net; the prompt rule is the design.
Liveness & stall protection
Local liveness = the clean agent.log.jsonl lane + the firehose + state-file mtime + a per-minute orchestrator heartbeat (the GitHub-thread heartbeat was retired in Slice D). A solo run is guarded by the commit-anchored attempt-progress guard (#400) and the lane-idle reaper (#363), both armed only under no-sandbox isolation. Details: docs/LIVENESS.md.
Terminal-event envelope, stages & state file
Every terminal event posts exactly one structured <details data-attempt-status=…> comment (the canonical record). Stages are read off the sandcastle stream; the terminal header redraws every 3s; per-attempt state lives in afk.state.json. Schemas + the Attempt-Outcome→status mapping: docs/ENVELOPE.md.
Auto-Monitor Loop (Claude Code only — binding)
When /afk is invoked to spawn a worker (i.e., not the monitor subcommand), the agent additionally schedules a recurring /dev:afk monitor cron inside the current Claude Code session so the user sees progress without re-typing. Death of every worker auto-cancels the cron.
Setup (runs immediately after the run worker is launched in the background):
- Fetch
CronCreateandCronListviaToolSearchif not already loaded (they are deferred tools). CronList— if any existing job hasprompt == "/dev:afk monitor", skip step 3 (don't double-schedule when the user runs a second parallel/afkin the same session).CronCreate(cron="*/10 * * * *", prompt="/dev:afk monitor", recurring=true). The cron is session-only — it dies when the Claude Code session ends, so no risk of orphans across sessions. Auto-expires after 7 days regardless.- Tell the user one line:
monitor loop scheduled (every 10 min) — auto-cancels when all workers exit.
The monitor invocation handles its own teardown — see Self-Cancel under the Monitor section below.
Skip the auto-loop when:
- The invocation is
/afk monitor(not a worker spawn). - The invocation is
/afk --once(single supervised iteration; user is already watching). CronCreateis unavailable (not running under Claude Code — e.g. Codex). Print one linemonitor loop unavailable in this runner; tail .red/tmp/workers/*/*/afk.log manually.and continue.
Codex Monitor Agent (Codex only — binding)
Codex does not expose Claude Code's TaskCreate / TaskUpdate task surface, and
its tui.status_line only renders built-in footer widgets. It does expose a
native sub-agent UI in hosts where the sub-agent primitive is available. For
Codex runs, use that sub-agent UI as a read-only presentation layer over the
canonical /afk monitor dashboard.
When /afk launches a normal detached worker under Codex (run, not
monitor, not --once, not --boot-only):
- Fetch a sub-agent spawn primitive via
ToolSearch(query:spawn agent background monitor). - If unavailable, continue the worker launch and print:
monitor loop unavailable in this runner; run /dev:afk monitor or tail .red/tmp/workers/*/*/afk.log manually. - If available, emit the canonical prompt from the bundle:
Spawn exactly one monitor agent with that prompt. The monitor agent is a presentation consumer only: it periodically runsRED_AFK_RUNNER=codex node "$CODEX_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" codex-monitor-agent --project-root "$PWD" --mode run/dev:afk monitor --once, reports concise progress in the Codex UI, and exits once no supervisor or live workers remain. - Tell the user one line:
Codex monitor agent spawned — auto-closes when AFK exits; manual monitor: /dev:afk monitor.
Hard boundaries for the monitor agent are non-negotiable: it must never edit
files, claim issues, change labels, comment, stop workers, run validation, push,
merge, /ship, /hitl, /triage, /afk run, /afk fleet, /afk fleet stop,
/afk reap, or /afk requeue. Closing it manually must not affect the AFK
worker.
Fleet Mode (runner-portable — binding)
/dev:afk fleet [N] and /dev:afk fleet stop are the user-facing fleet commands. They let one terminal command spin up (or shut down) N concurrent run workers on the current checkout, with the supervisor handling respawn, the circuit breaker, the passive stall detector (samples each slot's per-attempt agent lane agent.log.jsonl mtime — the clean liveness signal — every RED_AFK_STALL_POLL_S=30s; flags any slot alive ≥ RED_AFK_STALL_THRESHOLD_S=600 whose agent lane has been idle ≥ the same — surfaces as ⏸️ stalled in /dev:afk monitor. It keys off the agent lane, never afk.log/log.jsonl, because the orchestrator heartbeat writes those every minute and would mask a real stall — the masking that defeated detection in #243), the hard stall reaper (a slot silent on the agent lane past RED_AFK_STALL_KILL_THRESHOLD_S=1800 is only a candidate: the irreversible kill is gated behind a reaper-signal predicate, so a worker mid-build/test — an active vitest/tsc/cargo/… descendant under its tree, or non-trivial aggregate cpu — is busy and left alone, while a genuinely stuck worker [idle past the threshold, no active descendant, flat cpu] is killed tree-wide, a data-attempt-status="no-sentinel" envelope is posted with the attempt-dir afk.log tail, the issue label is rotated back to ready-for-agent, the worktree + attempt dir are removed, and the slot is freed for the next health-check respawn — RED_AFK_STALL_KILL_THRESHOLD_S must be strictly greater than RED_AFK_STALL_THRESHOLD_S, validated at supervisor boot), and per-slot build isolation.
Worker env passthrough. Any RED_AFK_* variable exported in the operator's shell before /dev:afk fleet is auto-forwarded to every worker the supervisor spawns. Use this for worker-side toggles like RED_AFK_SKIP_PERF=1 or RED_AFK_SKIP_COMPETITIVE_BASELINE=1 without writing a hook. Internal supervisor knobs (RED_AFK_TARGET, RED_AFK_POLL_S, RED_AFK_STALL_*, RED_AFK_CIRCUIT_*, RED_AFK_RUNNER, RED_AFK_REQUEST, RED_AFK_PLUGIN_DIR) and the per-slot *_BASE build-isolation vars are excluded — they have dedicated wiring and the supervisor denylists them from passthrough. The supervisor re-pins RED_AFK_RUNNER=<runner> for each worker.
$ export RED_AFK_SKIP_PERF=1
$ export RED_AFK_SKIP_COMPETITIVE_BASELINE=1
$ /dev:afk fleet 1 # every worker sees both vars
Fleet mode is runner-portable: the supervisor is plain process orchestration, not a Claude Code primitive. Claude Code, Codex, and bare terminals may all launch and stop the supervisor when the normal AFK hard preconditions pass. Runner-specific observability degrades independently:
- Claude Code: schedule the auto-monitor cron when
CronCreate/CronListare available; if not, launch fleet anyway and printmonitor loop unavailable in this runner; run /dev:afk monitor or tail .red/tmp/afk-supervisor.log manually. - Codex: launch fleet with
RED_AFK_RUNNER=codex, skip cron, and spawn one read-only Codex monitor agent from the bundle'scodex-monitor-agent --mode fleetprompt when a sub-agent primitive is available. If no sub-agent primitive is available, launch fleet anyway and print the same manual-monitor guidance. - Bare terminal / unknown runner: launch fleet, skip cron/native monitor, and print the manual-monitor guidance.
/dev:afk fleet [N] — launch
N is optional and defaults to 2. Parse it as a non-negative integer; reject anything else (including stop, which is the other subcommand and routes below). Steps the agent must perform, in order:
- Resolve runner. Determine the active runner using the same intent as the normal AFK cascade: explicit user
--runnerif present, elseRED_AFK_RUNNER, else runner env/process/path signals, elseclaude. The resolved value is carried into the supervisor asRED_AFK_RUNNER=<runner>so detached workers do not fall through to the supervisor's historicalclaudefallback. Under Codex, this must resolve tocodex. - PID-file pre-check. Read
.red/tmp/afk-supervisor.pid. If it exists andkill -0 <pid>succeeds, refuse the launch:
Do not touch the file or attempt to recover. A stale PID file (file exists but✗ fleet already running (supervisor pid=<pid>, log .red/tmp/afk-supervisor.log). to stop it: /dev:afk fleet stopkill -0fails) is left alone — thefleetcommand clears it itself when it acquires the supervisor lock. - Launch the fleet. From the project root, run the bundle's
fleetcommand with the target and any flags:
The command performs the PID-file pre-check from step 2 itself (refusing if a live supervisor already runs), detaches the supervisor, and forwards the resolved runner and theRED_AFK_RUNNER=<runner> node "$CLAUDE_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" fleet <N> [--request <text>]--request/-rtext to every worker it spawns. It waits up to 3 s for.red/tmp/afk-supervisor.pidto appear and contain a live PID, then prints the launched supervisor PID and target; on failure it reports the tail of.red/tmp/afk-supervisor.log. Capture the reported PID for the Report back step. The launched supervisor is the native__superviseentrypoint of the same bundle. - Attach the best available monitor surface.
- Claude Code: same flow as Auto-Monitor Loop —
CronListfirst to deduplicate, thenCronCreate(cron="*/10 * * * *", prompt="/dev:afk monitor", recurring=true). If cron tools are unavailable, skip and use the manual-monitor line. - Codex: fetch a sub-agent spawn primitive via
ToolSearch(query:spawn agent background monitor). If available, emit the canonical prompt withRED_AFK_RUNNER=codex node "$CODEX_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" codex-monitor-agent --project-root "$PWD" --mode fleetand spawn exactly one read-only Codex monitor agent for this newly-launched supervisor. Its task: from the project root, periodically run/dev:afk monitor --once(the bundle'smonitor --once), report concise progress, and auto-close when.red/tmp/afk-supervisor.pidis missing/dead and no[live]workers remain. It must never edit files, claim issues, stop workers, or run merges. The user may close it manually; workers continue. If the primitive is unavailable, skip and use the manual-monitor line. - Bare/unknown: skip native monitor setup and use the manual-monitor line.
- Claude Code: same flow as Auto-Monitor Loop —
- Report back. Print:
Monitor status line choices:🚀 fleet launched (supervisor pid=<pid>, target=<N>) log: .red/tmp/afk-supervisor.log stop: /dev:afk fleet stop <monitor-status-line>- Claude cron scheduled:
monitor loop scheduled (every 10 min) — auto-cancels when all workers exit. - Claude cron already existed:
monitor loop already running (existing cron <id>). - Codex monitor agent spawned:
Codex monitor agent spawned — auto-closes when fleet exits; manual monitor: /dev:afk monitor. - Native monitor unavailable:
monitor loop unavailable in this runner; run /dev:afk monitor or tail .red/tmp/afk-supervisor.log manually.
- Claude cron scheduled:
/dev:afk fleet stop — graceful shutdown
Steps, in order:
- Liveness check. Read
.red/tmp/afk-supervisor.pid. The three cases:- File missing → print
no fleet running.and continue to step 3 (still try runner-specific monitor teardown). - File present but
kill -0fails → stale. Printno fleet running (stale pid file at .red/tmp/afk-supervisor.pid — cleaning).,rm -fit, and continue to step 3. - File present and PID alive → continue to step 2.
- File missing → print
- Touch the stop file.
touch .red/tmp/afk-supervisor.stop. The supervisor's health-check cycle (defaultRED_AFK_POLL_S=15s) picks it up and runscleanup, which SIGTERMs every worker, removes the PID file, removes the stop file, and exits. Wait up to 30 s for the PID file to disappear (poll every 1 s, deadline-bounded — never barewhile). If it's gone, print🛑 fleet stopped (supervisor pid=<pid> exited).. If the deadline trips, print one warning line naming the PID and the log path, and continue to step 3 anyway — the stop file is still there and the supervisor will pick it up eventually. - Tear down runner-specific monitors.
- Claude Code:
CronList→ find every job whoseprompt == "/dev:afk monitor"(there will normally be one, possibly zero, occasionally more if the user manually/loop-ed).CronDeleteeach. Print one line:auto-monitor cron cancelled (<count> entr{y,ies}).(orno auto-monitor cron to cancel.when count is zero). If cron tools are unavailable, printauto-monitor cron unavailable in this runner; skipped. - Codex: do not stop workers through the monitor agent. It auto-closes when it observes no supervisor/live workers, and the user may close it manually. Print
Codex monitor agent will self-close when it observes fleet stopped. - Bare/unknown: print
no native monitor teardown for this runner.
- Claude Code:
- Idempotency. Re-running
/dev:afk fleet stopafter a successful stop just hits the "file missing" branch in step 1 and the runner-specific teardown no-op in step 3. Exit 0 either way.
Circuit Trip Sweep
When the circuit breaker parks a slot (CIRCUIT_K fast deaths inside CIRCUIT_WINDOW_S) the supervisor — not a human — runs sweep_parked_slot to clean up after the burned workers. Three actions, in order, gated on the trip:
- Sweep affected attempt dirs. From the slot log (
afk-supervisor-slot-{slot}.log) the supervisor parses every[afk] worker: w…boot stamp emitted while the slot was alive, globs.red/tmp/workers/{wid}/*/for each ID, and readsafk.state.json's.current.numberto identify the affected issues. Each attempt dir isrm -rf'd after its issue has been processed. - Post a discard envelope on each affected issue. Same
<details data-attempt-status="…">schema as the per-issue terminal envelope, withstatus="discarded"and a summary line that names the runner and the trip cause (runner-broken, slot parked after K fast deaths). The envelope'sdata-section="summary"block carries the slot index, comma-joined worker IDs, fast-death count, and the supervisor log path. Nonotes,drop, orlogsections — the attempts produced no usable artefacts. - Restore label state on each affected issue. Single
gh issue editaddsready-for-agentandrunner-error, removesready-for-humanand (defensively)running— covers both the "issue had already been promoted toready-for-human" path and the "issue was stillrunningat the moment of trip" path.
The runner-error label is created idempotently by /setup-red-skills (see triage-labels.md). The supervisor still calls gh label create runner-error on the fly during a trip so cleanup never fails just because the label is missing.
Idempotency: SLOT_SWEPT[slot]=1 blocks a second sweep within the same supervisor lifetime. Across restarts a new trip yields fresh worker IDs and fresh attempt dirs, so re-tripping never re-touches the previously swept issues. A trip that finds no claimed issues (all workers exited before claiming) parks the slot but posts no envelopes — the attempt-dir sweep is a no-op.
Refs
- The bundle's
fleet/fleet stopcommands — the entrypoints this section drives. Stop-file path, env contract, circuit breaker, and trip-sweep are part of the supervisor behaviour described above. - Auto-Monitor Loop above — the cron lifecycle Fleet Mode hooks into.
- Self-Cancel under Monitor — the dual teardown path (cron tears itself down when no workers remain; fleet stop tears it down immediately).
Monitor
BINDING — every monitor tick must do BOTH of the following, in order. No shortcuts.
- Render the dashboard (the bundle's
monitor --once).- Mirror live workers onto the host runner's native task surface. Per-runner mapping:
- Claude Code: pipe the tracked-task JSONL into the bundle's
monitor --mirror-planand apply the emitted call plan viaTaskCreate(one task per live worker, titled#<n> w<id> — <title>) andTaskUpdate(description carriesstage:<x>, terminal events flipstatetocompleted/failed). See Task Mirror below for the full protocol.- Codex: run
monitor --mirror-plan --runner codex. Today Codex exposes no native task surface, so the sink emits an empty plan and the mirror falls back to the dashboard plus a one-line notice — that is the mirror under Codex; do not silently skip. If Codex grows a native surface, the sink emits the same call-plan descriptors against it.- Bare terminal / unknown runner: skip the mirror silently — the
monitordashboard is the canonical view.The mirror is the only way the user sees per-worker progress advance in their native UI. Skipping it (because "nothing changed" or "just answering a status question") is a bug, not a shortcut —
monitor --mirror-planis idempotent and emits zero descriptors when nothing changed.
/afk monitor is the readonly aggregated view across all live workers. Run the bundle's monitor command — do not reinvent the rendering in inline bash. It:
- Globs
.red/tmp/workers/*/*/afk.state.jsonand renders one section per active attempt. - Verifies liveness via the orchestrator PID recorded in
afk.state.json(.pidfield) — attempts whose PID is dead are flaggedstale/goneand excluded, not counted as running. - Optionally tails the sibling
afk.logfor the most recent line under each worker's header. - Renders the 48h sparkline header (next subsection) on every refresh.
To invoke, from the project root:
RED_AFK_RUNNER=<runner> node "$CLAUDE_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" monitor
The command has two modes, auto-selected by stdout type:
- TTY (real terminal): full box-drawing layout, refreshes every 3 s,
clearbetween frames. Ctrl-C to exit. - Non-TTY (piped, captured by an agent, redirected): one-shot compact dashboard — one sparkline header + one line per worker, then exit 0. Force this with
--onceorRED_AFK_MONITOR_COMPACT=1even from a TTY.
Compact output shape (≈3 lines total for 2 workers — fits inline without truncation in an agent transcript):
48h: ···············································█ (4 closed, peak 4/h, all workers) Δ fleet +382 -45
wZ2R4 [live] claude issues 4/5 #150 [blog/D] Agent SDK on RedDB stage:impl 00:23:01 +382 -45
wK7M2 [live] codex issues 0/16 idle +0 -0
The progress counter is issues <done>/<total> — issues closed over the queue total, not a completion percentage (the old (80%) form read as "no work done" while a worker had already committed thousands of lines). The real volume signal is the +A -R diff (committed + uncommitted, measured from the branch's merge-base with origin/main), which is rendered on every worker line unconditionally — idle and +0 -0 included — and summed across the fleet into the Δ fleet +A -R suffix on the sparkline header, so the total diff volume is always visible at a glance.
When invoking from inside another agent session (Claude Code, Codex), prefer --once even if stdin is a pipe — explicit beats inference. Don't use the full TTY mode in agent transcripts; the 3 s refresh loop floods the captured stream and gets truncated to garbage.
Single-worker operation shows one section/line. Multi-worker adds one section/line per live worker, sorted by started_at. The sparkline aggregates all workers in this checkout's .red/state/afk-history.jsonl — not fractured per-worker; the Δ fleet diff total likewise sums every worker.
The header of every render shows a 48h sparkline of issues closed, one glyph per hour, scaled to the peak hour:
48h: ·▁··▁·▁·▁··█▁▁··▁·▁···▁·▁·▆▁▁··▁···▁▆·▁··▁▃▁·▃▁· (35 closed, peak 5/h)
Source data: .red/state/afk-history.jsonl, an append-only event log written by the orchestrator on every terminal event:
{"ts":"2026-05-17T12:14:00-03:00","epoch":1747494840,"worker":"wK7M2","issue":571,"event":"done","duration_s":816,"runner":"codex","merge_sha":"0936ba54"}
{"ts":"...","epoch":...,"worker":"wK7M2","issue":569,"event":"blocked","duration_s":120,"runner":"codex","reason":"merge-conflict"}
{"ts":"...","epoch":...,"worker":"wK7M2","issue":568,"event":"exhausted","duration_s":0,"runner":"claude","reason":"both-runners"}
.red/state/ is gitignored. The orchestrator creates it during bootstrap, parallel workers serialise appends via flock, and the boot-time orphan sweep truncates the file to the last 10000 lines if it grows past that cap.
The sparkline only counts event == "done". Blockers and exhausted runs are recorded for forensics but excluded from the throughput view.
Self-Cancel (binding when invoked under Claude Code)
Every /afk monitor run — whether typed by the user or fired by the auto-monitor cron — is responsible for tearing down the cron once there's nothing left to watch.
After rendering the dashboard, the agent must:
- Count workers with status
[live]in the rendered output (i.e., orchestrator pid alive, post-orphan-cleanup). - If
live_workers == 0:- Fetch
CronListandCronDeleteviaToolSearchif not already loaded. CronList— find every job withprompt == "/dev:afk monitor". There will normally be exactly one; multiples can appear if the user manually invoked/loop 3m /dev:afk monitoron top of the auto-loop.CronDeleteeach match.- Append one line to the user-facing output:
🛑 no live workers — auto-cancelled monitor loop (cron <id>).
- Fetch
- If
live_workers >= 1or.red/tmp/afk-supervisor.pidresolves to a live PID: arm the cron when unwatched (observe path — same dedup as the spawn path):- Fetch
CronCreateandCronListviaToolSearchif not already loaded (deferred tools). CronList— if any existing job hasprompt == "/dev:afk monitor", skip (cron already present; it continues firing every 10 minutes).- Otherwise
CronCreate(cron="*/10 * * * *", prompt="/dev:afk monitor", recurring=true)and tell the user one line:monitor loop scheduled (every 10 min) — auto-cancels when all workers exit. - If
CronCreateis unavailable (non-Claude-Code host), skip silently.
- Fetch
When CronList / CronDelete are unavailable (Codex runner, or /afk monitor invoked outside Claude Code), skip the teardown silently — the cron infrastructure isn't running there to begin with.
Task Mirror And Codex Monitor Agent (binding)
Every /dev:afk monitor run also mirrors each live worker onto the runner's native task list when that runner exposes one, so a /afk session surfaces progress on the host's native UI — advancing through stages on its own, with no extra typing. This is a read-only reflection of afk.state.json; the mirror never writes state and never touches the orchestration.
Host capability matrix (binding — no parity). The Task mirror is per-runner by construction (ADR 0003/0015): there is no shared native task API across the hosts, so each runner gets its own explicit adapter, never a generic cross-runner abstraction. The honest matrix — encoded as taskMirrorCapability(host) in core/mirror.ts and exercised by tests/mirror.test.ts — is:
| Host (Agent runner) | Surface | Native task API | Adapter / sink | Today's behavior |
|---|---|---|---|---|
| Claude Code | native-task |
yes | mirrorPlan (TaskCreate/TaskUpdate) |
the in-session Agent runner drives the native Task mirror through the host task tools |
| Codex | monitor-agent |
no | codexSinkPlan |
no task API — the mirror falls back to the monitor dashboard plus one read-only Codex monitor agent |
| OpenCode runner | headless |
no | none (empty plan) | a headless API-auth Worker with no host session — there is no surface to mirror into, so no native calls are ever emitted |
The three surfaces are deliberately distinct values (never a single supported: boolean) so the matrix can never imply parity. Exactly one host (Claude Code) exposes a native task API; the other two degrade explicitly, each on its own adapter. An unknown host fails loudly rather than silently inheriting the Claude native path.
The mirror surfaces two signals on one lifecycle (issue #811): the task title carries the calm macro phase — w<id> [<n>/5 <phase>] #<issue> <slug> — while the task description carries the fine micro stage — stage: <impl|explore|tests|commit>. The phase vocabulary is the ordered setup → coding → validating → merging → done (1-based n/5), plus the terminal blocked which drops the n/5 and renders [blocked]. The title changes only when the macro phase moves, so it never flickers on every inner-agent tool call.
The mirror is a pure diff: it reconciles the live worker state files against the tasks already on the native surface and emits a call plan. After rendering the dashboard, the agent (under Claude Code only) must:
- Fetch
TaskCreate,TaskUpdate, andTaskListviaToolSearchif not already loaded (deferred tools). - Build the tracked set.
TaskList→ keep the mirror-owned tasks (those whose title matchesw<id> [<…>] #<n> <slug>). For each, emit one JSONL line{"key":"<worker_id>:<issue>","stage":"<last stage>","phase":"<last phase>"}, reading the key (worker_idfrom the leading token,issuefrom the#<n>) and the phase (the word inside the title's[…]bracket, after anyn/5) from the title, and the stage from the description (stage: <x>). Keep a key→task_id map for step 4. - Compute the plan. Pipe the tracked JSONL from step 2 into the bundle's
monitor --mirror-plansubcommand:
The command globs the state files and reconciles them against the tracked set on stdin (keyed byprintf '%s\n' "$tracked" | node "$CLAUDE_PLUGIN_ROOT/skills/engineering/afk/bin/afk.mjs" monitor --mirror-planworker_id:issue, so parallel workers each get exactly one task and re-runs never duplicate), then prints a JSONL call plan to stdout — one descriptor per harness call (empty stdin → cold reconcile; empty plan → no output). ATaskUpdaterewrites the title when the macro phase moves and refreshes the description when the micro stage moves; a terminal failure re-titles to[blocked]and flips the task tofailed:{"call":"TaskCreate","key":"wAAAA:22","title":"wAAAA [2/5 coding] #22 extract state.sh","description":"stage: impl","state":"in_progress"} {"call":"TaskUpdate","key":"wAAAA:22","title":"wAAAA [3/5 validating] #22 extract state.sh","description":"stage: tests","state":"in_progress"} {"call":"TaskUpdate","key":"wAAAA:22","state":"completed"} - Apply the plan. For each descriptor in order:
TaskCreate→ create the task; recordkey → task_id.TaskUpdate→ resolvekeyto itstask_idvia the map and update. Astateofcompleted/failedmarks the worker's terminal event (done/blocked); the task drops off the active list and the mirror self-cleans. A descriptor whosekeyhas no knowntask_id(e.g. a complete for a task that was never created in this session) is skipped.
An empty plan means nothing changed since the last tick — apply no calls. Because the plan is keyed by worker_id:issue, an idempotent re-run with no stage advance emits zero descriptors.
Re-hydration on session reopen. A native task dies with the Claude Code session; the nohup AFK worker does not. When a session opens with workers still running, TaskList (step 2) returns no mirror-owned tasks, so the tracked set is empty and monitor --mirror-plan reconciles cold — emitting a TaskCreate for every live worker. The status bar recovers the per-worker tasks with no operator action. This is the same path as steady-state, not a new one: only workers whose orchestrator PID (the .pid field in afk.state.json, via state_is_live) is alive re-hydrate (dead workers are untracked-terminal on a cold tick → no ghost task), and the next tick is idempotent because the freshly-created tasks now form the tracked set.
When TaskCreate / TaskUpdate are unavailable because the session is outside any runner (a bare terminal), skip the mirror silently — there is no native surface to drive, and the monitor dashboard is already the canonical view.
Codex sink (runner-specific — binding). The mirror is per-runner, mirroring the runner-claude.md / runner-codex.md split (ADR 0003). Under Codex the state reader and plan reconciler are reused unchanged — only the sink differs. After rendering the dashboard, the Codex agent runs monitor --mirror-plan --runner codex instead of the Claude TaskCreate/TaskUpdate loop:
- If Codex grows a native background-task surface, the sink emits the same call-plan descriptors the Claude sink applies — apply them against the Codex primitive.
- Otherwise (today's reality),
--runner codexemits an empty plan, so the mirror falls back to themonitordashboard and a one-line notice. No native calls are emitted, so there is no half-rendered state, and a dashboard hiccup is swallowed so the tick never crashes.
Codex monitor agent (Codex-only — binding). Codex has a native sub-agent UI even though it does not expose the Claude-style TaskCreate/TaskUpdate task API. When /dev:afk run launches a normal detached worker under Codex, or /dev:afk fleet N launches a new supervisor under Codex, the agent should spawn exactly one read-only Codex monitor agent when the sub-agent primitive is available. Generate its prompt with codex-monitor-agent --mode run|fleet so the read-only rules stay identical across single-worker and fleet launches. That monitor agent periodically runs /dev:afk monitor --once, reports concise progress, and exits once no supervisor or live workers remain. It is a presentation consumer only: it must not edit files, stop workers, claim issues, run validation, push, merge, or repair state. Closing it manually must not affect the worker or fleet.
Do not invent a cross-runner task abstraction (rejected in ADR 0003) — keep the adapter explicitly per-runner.
Handoff file template
The inner agent reads ../handoff.md — top-level XML wrappers (<issue-body>, <previous-attempts>, <prior-attempt-context>, <human-guidance-thread>, <thread-discussion>, <agent-notes>) keep body/comments/notes unambiguous. Full template: docs/HANDOFF.md.
Stop Conditions
- Queue drained →
<promise>NO MORE TASKS</promise>→ exit 0. -n Nreached → summary + exit 0.- Runner exhaustion / runner transport failure → route the current issue through bounded recovery (
blocked:quotaorblocked:runner-transient), then stop the outer run with exit 75. - Uncaught error in orchestrator → leave worktree in place, exit 1, print recovery hint. (No heartbeat sub-shell to kill since Slice D.)
Reporting
After every issue, print:
✓ #142 wire OAuth callback 12m 14s tests:✓ lint:✓ types:✓ build:✓ merged b3f2a91
finished 4 / 12 (33%) — next: #143
After the loop, a final block:
/afk done.
runner : codex (3 issues), claude (1 issue)
duration : 01:14:22
processed : 4 closed, 0 blocked, 0 failed
remaining : 8 still ready-for-agent
Configuration & lifecycle hooks
All .red/config.yaml knobs + RED_AFK_* env overrides (sandbox, runner, model/effort, timeouts, retry caps, stall thresholds, backpressure) and the lifecycle-hook contract live in docs/CONFIG.md. The runtime supplies the documented base env (RED_AFK_REPO, RED_AFK_ROOT, RED_AFK_WORKSPACE, RED_AFK_RUNNER, optional RED_AFK_SLOT) and layers each hook event's documented RED_AFK_* variables from that event's JSON context. Runner/model resolution policy: ../model-tier-policy/SKILL.md.
Safety
See SAFETY.md. The orchestrator and the inner agent both inherit those rules. Violations abort the loop.
Source Of Truth
This skill is the single source of truth for autonomous execution in red-skills repos.