name: ir:onboarding-factory/record
description: >
Carry one assessed cell to a committed, verified recording: check
prerequisites, port any missing driver step, drive the live agent CLI under a
recording daemon via of record run, verify EVERY websocket observation
(state + model + cost + tokens + agent) via of record verify, refresh the
replay golden, and commit. Backflows a correction into the cell when the live
recording disagrees with the assessment. Invoked as
/ir:onboarding-factory record <agent> <scenario>.
record
You run as a focused subagent with no parent context. This verb DRIVES A LIVE AGENT CLI and SPENDS API TOKENS — auth is set up out-of-band, so run it without ceremony (no key checks, no "this will spend money" prompts). It must be serialized: only one
recordruns against the live daemon at a time. When done, return only the "Return contract" block.
Preconditions
- The cell is assessed and on a recordable route. Read it:
of status --agent <agent> --scenario <scenario> --json- route
record/record-known-failing→ proceed. - route
driver-gap→ port the step first (Step 2), then proceed. - route
frozen→ STOP, returnstatus: frozen(nothing to record).
- route
- Prerequisites met.
of record prereq-check --agent <agent>lists the human actions a recording needs (auth mode, env vars, a mock, a local model server). If one is unmet and you can't satisfy it, STOP and returnstatus: prereq_blockednaming the exact blocker — never ask the dispatcher. - Clean
replaydata/tree. The recording precheck refuses a dirtyreplaydata/(re-records must be deliberate commits). If the assessment isn't committed yet, that's the dispatcher's ordering bug — returnstatus: infra_failwith that note. - A recording daemon is up. Use
--attachagainst the user's runningirrlichd --record(the dashboard stays connected; the session shows up live). The precheck refuses if no--recorddaemon is found —infra_fail. A recording daemon must run withIRRLICHT_PERMISSION_MODE=grant-all— the consent-first gate (#570) otherwise leaves a fresh daemon monitoring nothing until its wizard is answered. (run-cell.sh / run-cell-multi.sh set it on the daemons they spawn.)
Steps
1. Driver-gap → port the missing step (only if route is driver-gap)
Port the primitive named in the cell's driver=gap:<primitive> from the
reference driver into the agent's driver — the recipe is sound, only a step type
is missing:
grep -n '<primitive>)' replaydata/agents/claudecode/driver-interactive.sh \
replaydata/agents/codex/driver-interactive.sh
Adapt the three seams (tmux input; turn/effect detection — a
reset_session/restart/resume must detect the NEW session id, not the old;
the multi-session contract session.uuids/transcript.paths for primitives
that mint a new session). Add the primitive to the driver's DRIVE_ELICITS
constant so recipe-lint treats it as genuinely produced. Verify + commit the
driver alone:
bash -n replaydata/agents/<agent>/driver-interactive.sh
source tools/onboarding-factory/scripts/lib/recipe-lint.sh
driver_step_types_from_file replaydata/agents/<agent>/driver-interactive.sh | grep -qx '<primitive>'
git add replaydata/agents/<agent>/driver-interactive.sh
git commit -m "feat(onboard): teach <agent> driver the <primitive> step type"
If the primitive has no claudecode/codex reference, it's a NEW grammar element —
STOP and return status: needs_design. Don't invent one.
2. Record (live capture)
of record run --attach --agent <agent> --scenario <scenario>
of record run resolves the driver + orchestration script, prints the
prerequisites, and drives the agent under the recording daemon: it walks the
recipe in tmux and captures the daemon's events.jsonl + the agent's transcript
into a STAGING dir (.build/refresh/<agent>/<folder>-<ts>/). It does NOT touch
replaydata/ — promotion is the next step. (--dry-run prints the resolved plan
without driving — useful to confirm wiring.)
Then promote the staged capture into the cell's recordings/<name>/:
tools/promote-recording.sh <staging-dir> <agent> <folder>
This copies events.jsonl + the transcript + a manifest.json into a new
replaydata/agents/<agent>/scenarios/<folder>/recordings/<name>/. It does NOT
write any artifacts cache into metadata.json: the on-disk recordings/<name>/
tree IS the record (the single source of truth). The replay golden is added by
Step 5; nothing else needs wiring.
Retry exactly once on a timeout / transcript_missing outcome (often a
lazy-transcript nudge or trailing-sleep timing issue). On a second failure, or
on a classified cli_not_found / cli_too_old / auth_failed /
daemon-not-running, return status: infra_fail (don't loop, don't mark the cell
un-doable — the environment is the problem). When unsure of the failure class,
classify the staging dir:
bash tools/onboarding-factory/scripts/lib/classify-failure.sh <staging-dir>
3. Verify EVERY observation
of record verify --agent <agent> --scenario <scenario>
This runs the go-test-style verify engine: the state-phase validation AND the
observation vector — exact-match model/agent, non-zero + tolerance
cost/tokens, with a soft-diff of the full vector against the prior committed
recording (flagged, not failed, on live jitter). Report the per-field result in
observations. Hard spec-phase failures are real: a sub-100% pass that is NOT
known_failing still commits (the recording is real captured data and
replay-fixtures.sh should surface the drift) but the notes MUST say
"VALIDATION DRIFT — needs editorial review." Never rebase expected.jsonl to
make a failing verify pass — resolving real drift is a separate maintainer
task.
Things that legitimately differ run-to-run (don't tighten for these):
timestamps, UUIDs, PIDs, token counts, cost, cache-read counts. Structural
drift (state-transition order, distinct session count, process_exited count)
between two consecutive recordings means the recipe has variance — tighten it
(more sleep, different ordering) before committing.
4. Backflow — correct the cell if the recording disagrees
If the LIVE recording refutes the doc-based assessment (e.g. assessed
daemon=full but the transcript/store proved the signal isn't emitted →
incapable; or it's atomic so streaming never happens), correct the cell IN THE
SAME COMMIT — this is the backflow loop, not a cue:
of cell write --agent <agent> --scenario <scenario> --file /tmp/<agent>-<scenario>.corrected-metadata.json
Update the affected pillar, add a caveat citing the recording that proved it,
and set observability_correction in your return. For a daemon=bug cell, do
not run gh issue create — outward-facing writes are denied in your
subagent permission context, so it silently degrades to a cell note and files
nothing. Instead, write the issue body to a temp file and hand it back for the
dispatcher to file (with the user's consent):
cat > /tmp/<agent>-<scenario>.issue.md <<'EOF'
<cited events.jsonl evidence + what the spec requires>
EOF
Return it as the issue: payload (the path + a one-line title). Keep
known_failing set in the spec meta with the bug behavior cited in notes; the
issue number is wired into the cell by a later touch, once the dispatcher has
filed it.
5. Refresh the replay golden (mandatory)
A fresh recording without its transcript.jsonl.replay.json.golden leaves
go test ./core/... (the byte-identity replay test) red. Regenerate this
cell's golden(s) only — never a blanket UPDATE_REPLAY_GOLDENS=1 across the
tree (that commits other agents' pre-existing drift):
tools/onboarding-factory/scripts/refresh-golden.sh <agent> <scenario>
It's idempotent — a --re-record that reproduced byte-identical output reports
"no golden change."
6. Commit the recording (mandatory before returning)
git add replaydata/agents/<agent>/scenarios/<id>_<scenario>/
git commit -m "feat(onboard): record <agent>/<scenario> (<pass_rate>)"
git rev-parse --short HEAD
Always commit before returning — a dirty replaydata/ tree makes the next
cell's recording precheck refuse. of validate should pass after the commit; it
now also gates recording completeness — the newest recording must carry
events.jsonl, manifest.json, a transcript, and (for a jsonl transcript) its
transcript.jsonl.replay.json.golden.
End commit messages with the trailer
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>.
Return contract
Return ONLY this (≤7 lines). Shared semantics + envelope rules live in
../return-contract.md:
status: pass | infra_fail | prereq_blocked | needs_design | frozen
commit_sha: <short sha> # the recording commit (or driver commit), "n/a" otherwise
pass_rate: <N/M phases> # "n/a" for non-pass statuses
observations: model=<ok|MISMATCH> cost=<ok|zero> tokens=<ok|zero> agent=<ok|MISMATCH>
observability_correction: <none | the live recording overrode the assess verdict — e.g. assessed daemon=full but the store proved no trace (→ incapable/bug)>
issue: <none | /tmp/<agent>-<scenario>.issue.md title="<one-line title>"> # daemon=bug only; the dispatcher files it
notes: <one or two sentences — drift flag, retry count, infra/prereq reason>
Anti-patterns
- Don't write
replaydata/by hand.of record runstages;promote-recording.shcopies the staged capture intorecordings/<name>/;refresh-golden.shwrites the golden;of cell writedoes the backflow correction; the driver is a script underreplaydata/agents/<agent>/. Nojq -i, no hand-edited recordings or metadata. The on-disk recording is the single source of truth — there is no artifacts cache to maintain. - Don't retry more than once, and don't retry a driver gap — a missing step won't appear on a re-run; port it (Step 1) or return.
- Don't rebase
expected.jsonlto make a failing verify pass — flag the drift, don't paper over it. - Don't run
gh issue create— outward-facing writes are denied in your context, so it files nothing. Return theissue:payload and let the dispatcher file it with the user's consent. - Don't run an isolated daemon while production
irrlichdis up — use--attach. - Don't skip the golden refresh, and don't blanket-regenerate goldens — refresh only this cell's.
- Don't return without committing — it breaks the next cell in a serialized sweep.