name: otbox description: Snapshottable full test environment for opentraces. Provisions isolated boxes (local fs, docker, or SSH-leased remote), seeds a fully-populated world, runs declarative journey TOMLs against tiered (bronze/silver/gold) checkpoints, supports artifact-preferred captures with synthetic fallback, and tears down with zero host residue.
otbox — agent playbook
otbox runs from the repo-root shim ./otbox (mirrors otd). It builds
isolated opentraces "worlds" (a box = ~/.opentraces + project repo +
fake HF remote, all under .otbox/boxes/<id>/), snapshots them, and
runs declarative TOML journeys with assertions. You mostly work at the
top: pick a journey or write a new TOML, run it, inspect the verdict.
Verbose reference: tests/otbox/README.md. This file is the imperative
recipe sheet.
Lifecycle (Tier 0 — local, default, no network)
./otbox up --seed smoke # provision + seed
./otbox journey agent-session-trail-explain-happy # run a gold journey
./otbox artifacts # bundle PR evidence
./otbox down # zero-residue teardown
Lifecycle (Tier 1 — remote SSH/Tailscale, opt-in)
export OT_OTBOX_TIER1=1
export OT_OTBOX_SSH_TARGET=user@host # or a Tailscale name
./otbox warmup # provision a remote box
./otbox sync # rsync dirty working tree in
./otbox seed smoke
./otbox snapshot t1-base # archive on remote, pulled back
./otbox down
./otbox up --from t1-base --driver remote
./otbox journey cli-publish-happy-path --artifacts
Catalogue: what's already there
./otbox list # boxes, snapshots, drivers, seeds, journeys
make otbox-inventory # rebuild Click x journey ownership map (strict)
Inspect the rollup at tests/otbox/catalogue/journey-inventory.md
(185 lines, regenerated by ./otbox matrix --inventory --strict). It
maps every Click command to its trajectory, current max tier, owning
journeys, and JTBD one-liner.
Current state: 59 catalogue journeys, 5 gold-tier
(agent-session-trail-explain-happy, dataset-sync-skill-history,
pr-blame-on-captured-branch, security-sanitize-captured-content,
survival-walk-reverted).
Recipe: add a new journey TOML
Drop a file under tests/otbox/catalogue/journeys/<name>.toml. The
runner is generic — no harness code change required.
name = "your-journey-name"
description = """One paragraph explaining what consumer surface this exercises."""
lane = "core" # core | extended | diagnostic
tier = 0 # 0 = local/docker, 1 = remote
tier_label = "gold" # bronze (smoke) | silver | gold (real captured state)
trajectories = ["build-dataset-from-lineage"] # must exist in kb/plans/063
persona = "agent"
requires = ["cli", "git"] # capability gate; missing -> SKIP not FAIL
from_checkpoints = ["c-captured-real-session"] # explicit checkpoint pin
# Declarative preconditions; the matrix resolves a satisfying checkpoint
# when from_checkpoints is absent. When both are present the pin wins
# but is validated against these (SKIP on conflict).
[preconditions]
min_captured_traces = 1
# requires_survival_states = ["alive_on_path", "reverted"]
# requires_skills = ["..."]
# requires_branch_commits_min = 2
# requires_security_findings = true
[[steps]]
type = "cli" # cli | shell | write_file | sync | service | http_get | tmux
id = "explain-edit"
argv = ["trail", "explain", "--trace", "{trace_id}", "--step", "{step_index}", "--json"]
[[assertions]]
kind = "returncode" # returncode | stdout_contains | stderr_contains
step = "explain-edit" # stdout_json | path_exists | file_count_min
equals = 0
[[assertions]]
kind = "stdout_json"
step = "explain-edit"
path = "relation"
equals = "anchored_in_git"
Templating vars come from journey._context(): {project},
{home}, {fake_remote}, {state_dir}, {opentraces_dir},
{box_root}, {box_id}, {repo_root}, {port}. Captured-session
checkpoints add {trace_id}, {session_id}, {commit_sha},
{step_index}, {transcript_path} (from
box.notes["c_captured_session_audit"]). PR-branch captures add
{branch_name}, {base_commit_sha}, {head_commit_sha},
{branch_commit_count}.
Reference: tests/otbox/catalogue/journeys/agent-session-trail-explain-happy.toml
(gold, real captured state) and
tests/otbox/catalogue/journeys/build-dataset-lineage-explain-empty-state-contract.toml
(bronze, empty-state contract sibling).
Recipe: add a captured-session checkpoint
Captured-session checkpoints are artifact-preferred with synthetic
fallback (plan 072). Copy
tests/otbox/checkpoints/_captured_session.py as a template — it is
the canonical shape. Wire it up in
tests/otbox/checkpoints/__init__.py next to the others:
from . import _your_checkpoint # noqa: E402,F401
The delta function MUST start with restore_from_capture(driver, box, _CAPTURE_NAME). On hit, re-derive the audit from the restored box
state via read_state_json(driver, box) + trace_for_session(state, session_id) and stamp capture_metadata_from_artifact(metadata).
On miss, run the synthetic chain (fake harness + ingest + commit +
watcher tick + trail mature) and stamp
synthetic_capture_metadata().
Register with provides={...} so declarative preconditions resolve:
register(Checkpoint(
name="c-captured-your-variant",
composed_from="c-captured-real-session",
delta=_your_delta,
cache=True,
description="...",
provides={
"captured_traces": 1,
"survival_states": ["reverted"],
"branch_commits": 2,
# "skills": [...], "has_security_findings": True,
},
))
Snapshots cache content-addressed at
.otbox/snapshots/_checkpoint-<name>-<hash>.tar.gz. Editing your
delta source invalidates the cache automatically.
Recipe: add a simulated-user scenario
Drop a TOML under tests/otbox/simulated_users/scenarios/<name>.toml:
name = "your-scenario"
description = "..."
agent = "claude" # claude | codex | hermes | echo
binary_name = "claude" # resolved via shutil.which() at runtime
[initial_state]
template = "single-file-python-project" # dir under simulated_users/templates/
[[turns]]
prompt = "Add a farewell helper to src/app.py"
expect_regex = "(?i)(adding|let me)"
timeout_s = 60
[[turns]]
prompt = "yes"
expect_regex = "(?i)(done|committed)"
timeout_s = 60
[capture]
artifact_dir = "your-scenario" # leaf under tests/otbox/captures/
expected_paths = ["src/app.py"]
Reference: tests/otbox/simulated_users/scenarios/echo-meta.toml
(default-CI safe — uses the synthetic echo binary).
Recipe: refresh a captured artifact
make capture-refresh SCENARIO=echo-meta # always works (echo binary)
make capture-refresh SCENARIO=add-helper-function # needs real `claude` on PATH
What happens: resolves the base checkpoint
(c-installed-source by default), forks a fresh box, overlays the
scenario's template, drives the agent binary through tmux, snapshots
the resulting box to tests/otbox/captures/<artifact_dir>/snapshot.tar.gz
plus metadata.json (scenario digest, agent version, schema version,
captured-at timestamp).
Verify the artifact lights up: re-resolve the matching captured-session checkpoint and inspect the audit:
./otbox down --all
./otbox up # cold-resolves checkpoint chain
./otbox status --json | jq '.notes' # check capture_metadata.source
# expect "source": "artifact" (or "synthetic" on miss)
SKIPs cleanly (exit 0) when binary_name is not on PATH — default CI
never depends on a real agent install.
Recipe: investigate a journey/scenario failure
- Per-step transcripts:
./otbox logs --box <id>or read.otbox/boxes/<id>/logs/journey-<name>.json. - Capture-refresh pane log (forensic context, even on FAIL/SKIP):
.otbox/boxes/<id>/logs/capture-refresh/<scenario>/pane.log. - The box is left up on
capture-refreshFAIL (exit 3) for inspection. Drop into it with./otbox ssh --box <id>(Tier 0 = local shell atbox.project; Tier 1 = SSH into the remote). - Force a cold rebuild of one checkpoint by deleting its snapshot:
./otbox snapshot-rm _checkpoint-c-captured-real-session-<hash>(the hash is visible via./otbox list --json | jq .snapshots). - Re-run the full matrix with filters:
./otbox matrix --journey 'agent-session-*' --checkpoint 'c-captured-*'.
The make-target matrix
| Target | Gates / runs |
|---|---|
make otbox-slice |
thin Tier 0 vertical slice (one journey, fast smoke) |
make otbox-journeys |
full Tier 0 catalogue sweep + zero-residue check |
make otbox-tier1 |
full Tier 1 slice + catalogue (sets OT_OTBOX_TIER1=1) |
make otbox-matrix |
./otbox matrix — every (journey x base-checkpoint) pair |
make otbox-inventory |
./otbox matrix --inventory --strict — fails on plan 063 SSoT drift |
make otbox-agent-session |
plan 064 captured-session vertical slice |
make capture-refresh SCENARIO=<name> |
refresh one capture artifact (SKIPs without binary) |
make otbox-journeys is the autonomous-delivery verification command
— Tier 0 only, no network, no Docker, runs against the repo .venv.
What this skill MUST NOT touch
- The synthetic fallback. Every captured-session checkpoint has a synthetic chain (fake claude + fixture corpus) that runs when the artifact is absent. It exists so default CI passes without committing multi-MB artifacts. Do not delete or skip it — it is the OSS-safe path.
AGENT_FACING_TRAJECTORIES_MIN_GOLDintests/otbox/jtbd.py. This frozenset names trajectories that MUST have at least one gold journey or--strictinventory fails. Adding to it is a consensus decision (it forces new captured-state coverage); shipping a gold journey for an existing entry is the safe edit.refs/opentraces/local/events/v1. The canonical TrailEvent log ref. Journey assertions onevent_log_refliteral-match this; don't rename it.box_idpaths inside artifact archives.restore_from_capturerewrites the origin box root to the current one; manual untar will bake the wrong absolute paths into restored boxes.