otbox

star 79

Snapshottable full test environment for opentraces. Provisions isolated boxes (local fs, docker, or SSH-leased remote), seeds a fully-populated world, runs declarative journey TOMLs against tiered (bronze/silver/gold) checkpoints, supports artifact-preferred captures with synthetic fallback, and tears down with zero host residue.

JayFarei By JayFarei schedule Updated 5/16/2026

name: otbox description: Snapshottable full test environment for opentraces. Provisions isolated boxes (local fs, docker, or SSH-leased remote), seeds a fully-populated world, runs declarative journey TOMLs against tiered (bronze/silver/gold) checkpoints, supports artifact-preferred captures with synthetic fallback, and tears down with zero host residue.

otbox — agent playbook

otbox runs from the repo-root shim ./otbox (mirrors otd). It builds isolated opentraces "worlds" (a box = ~/.opentraces + project repo + fake HF remote, all under .otbox/boxes/<id>/), snapshots them, and runs declarative TOML journeys with assertions. You mostly work at the top: pick a journey or write a new TOML, run it, inspect the verdict.

Verbose reference: tests/otbox/README.md. This file is the imperative recipe sheet.

Lifecycle (Tier 0 — local, default, no network)

./otbox up --seed smoke                            # provision + seed
./otbox journey agent-session-trail-explain-happy  # run a gold journey
./otbox artifacts                                  # bundle PR evidence
./otbox down                                       # zero-residue teardown

Lifecycle (Tier 1 — remote SSH/Tailscale, opt-in)

export OT_OTBOX_TIER1=1
export OT_OTBOX_SSH_TARGET=user@host    # or a Tailscale name
./otbox warmup                          # provision a remote box
./otbox sync                            # rsync dirty working tree in
./otbox seed smoke
./otbox snapshot t1-base                # archive on remote, pulled back
./otbox down
./otbox up --from t1-base --driver remote
./otbox journey cli-publish-happy-path --artifacts

Catalogue: what's already there

./otbox list                            # boxes, snapshots, drivers, seeds, journeys
make otbox-inventory                    # rebuild Click x journey ownership map (strict)

Inspect the rollup at tests/otbox/catalogue/journey-inventory.md (185 lines, regenerated by ./otbox matrix --inventory --strict). It maps every Click command to its trajectory, current max tier, owning journeys, and JTBD one-liner.

Current state: 59 catalogue journeys, 5 gold-tier (agent-session-trail-explain-happy, dataset-sync-skill-history, pr-blame-on-captured-branch, security-sanitize-captured-content, survival-walk-reverted).

Recipe: add a new journey TOML

Drop a file under tests/otbox/catalogue/journeys/<name>.toml. The runner is generic — no harness code change required.

name = "your-journey-name"
description = """One paragraph explaining what consumer surface this exercises."""
lane = "core"                  # core | extended | diagnostic
tier = 0                       # 0 = local/docker, 1 = remote
tier_label = "gold"            # bronze (smoke) | silver | gold (real captured state)
trajectories = ["build-dataset-from-lineage"]   # must exist in kb/plans/063
persona = "agent"
requires = ["cli", "git"]      # capability gate; missing -> SKIP not FAIL
from_checkpoints = ["c-captured-real-session"]  # explicit checkpoint pin

# Declarative preconditions; the matrix resolves a satisfying checkpoint
# when from_checkpoints is absent. When both are present the pin wins
# but is validated against these (SKIP on conflict).
[preconditions]
min_captured_traces = 1
# requires_survival_states = ["alive_on_path", "reverted"]
# requires_skills = ["..."]
# requires_branch_commits_min = 2
# requires_security_findings = true

[[steps]]
type = "cli"                   # cli | shell | write_file | sync | service | http_get | tmux
id = "explain-edit"
argv = ["trail", "explain", "--trace", "{trace_id}", "--step", "{step_index}", "--json"]

[[assertions]]
kind = "returncode"            # returncode | stdout_contains | stderr_contains
step = "explain-edit"          #   stdout_json | path_exists | file_count_min
equals = 0

[[assertions]]
kind = "stdout_json"
step = "explain-edit"
path = "relation"
equals = "anchored_in_git"

Templating vars come from journey._context(): {project}, {home}, {fake_remote}, {state_dir}, {opentraces_dir}, {box_root}, {box_id}, {repo_root}, {port}. Captured-session checkpoints add {trace_id}, {session_id}, {commit_sha}, {step_index}, {transcript_path} (from box.notes["c_captured_session_audit"]). PR-branch captures add {branch_name}, {base_commit_sha}, {head_commit_sha}, {branch_commit_count}.

Reference: tests/otbox/catalogue/journeys/agent-session-trail-explain-happy.toml (gold, real captured state) and tests/otbox/catalogue/journeys/build-dataset-lineage-explain-empty-state-contract.toml (bronze, empty-state contract sibling).

Recipe: add a captured-session checkpoint

Captured-session checkpoints are artifact-preferred with synthetic fallback (plan 072). Copy tests/otbox/checkpoints/_captured_session.py as a template — it is the canonical shape. Wire it up in tests/otbox/checkpoints/__init__.py next to the others:

from . import _your_checkpoint  # noqa: E402,F401

The delta function MUST start with restore_from_capture(driver, box, _CAPTURE_NAME). On hit, re-derive the audit from the restored box state via read_state_json(driver, box) + trace_for_session(state, session_id) and stamp capture_metadata_from_artifact(metadata). On miss, run the synthetic chain (fake harness + ingest + commit + watcher tick + trail mature) and stamp synthetic_capture_metadata().

Register with provides={...} so declarative preconditions resolve:

register(Checkpoint(
    name="c-captured-your-variant",
    composed_from="c-captured-real-session",
    delta=_your_delta,
    cache=True,
    description="...",
    provides={
        "captured_traces": 1,
        "survival_states": ["reverted"],
        "branch_commits": 2,
        # "skills": [...], "has_security_findings": True,
    },
))

Snapshots cache content-addressed at .otbox/snapshots/_checkpoint-<name>-<hash>.tar.gz. Editing your delta source invalidates the cache automatically.

Recipe: add a simulated-user scenario

Drop a TOML under tests/otbox/simulated_users/scenarios/<name>.toml:

name = "your-scenario"
description = "..."
agent = "claude"               # claude | codex | hermes | echo
binary_name = "claude"         # resolved via shutil.which() at runtime

[initial_state]
template = "single-file-python-project"   # dir under simulated_users/templates/

[[turns]]
prompt = "Add a farewell helper to src/app.py"
expect_regex = "(?i)(adding|let me)"
timeout_s = 60

[[turns]]
prompt = "yes"
expect_regex = "(?i)(done|committed)"
timeout_s = 60

[capture]
artifact_dir = "your-scenario"      # leaf under tests/otbox/captures/
expected_paths = ["src/app.py"]

Reference: tests/otbox/simulated_users/scenarios/echo-meta.toml (default-CI safe — uses the synthetic echo binary).

Recipe: refresh a captured artifact

make capture-refresh SCENARIO=echo-meta              # always works (echo binary)
make capture-refresh SCENARIO=add-helper-function    # needs real `claude` on PATH

What happens: resolves the base checkpoint (c-installed-source by default), forks a fresh box, overlays the scenario's template, drives the agent binary through tmux, snapshots the resulting box to tests/otbox/captures/<artifact_dir>/snapshot.tar.gz plus metadata.json (scenario digest, agent version, schema version, captured-at timestamp).

Verify the artifact lights up: re-resolve the matching captured-session checkpoint and inspect the audit:

./otbox down --all
./otbox up                                # cold-resolves checkpoint chain
./otbox status --json | jq '.notes'       # check capture_metadata.source
# expect "source": "artifact" (or "synthetic" on miss)

SKIPs cleanly (exit 0) when binary_name is not on PATH — default CI never depends on a real agent install.

Recipe: investigate a journey/scenario failure

  • Per-step transcripts: ./otbox logs --box <id> or read .otbox/boxes/<id>/logs/journey-<name>.json.
  • Capture-refresh pane log (forensic context, even on FAIL/SKIP): .otbox/boxes/<id>/logs/capture-refresh/<scenario>/pane.log.
  • The box is left up on capture-refresh FAIL (exit 3) for inspection. Drop into it with ./otbox ssh --box <id> (Tier 0 = local shell at box.project; Tier 1 = SSH into the remote).
  • Force a cold rebuild of one checkpoint by deleting its snapshot: ./otbox snapshot-rm _checkpoint-c-captured-real-session-<hash> (the hash is visible via ./otbox list --json | jq .snapshots).
  • Re-run the full matrix with filters: ./otbox matrix --journey 'agent-session-*' --checkpoint 'c-captured-*'.

The make-target matrix

Target Gates / runs
make otbox-slice thin Tier 0 vertical slice (one journey, fast smoke)
make otbox-journeys full Tier 0 catalogue sweep + zero-residue check
make otbox-tier1 full Tier 1 slice + catalogue (sets OT_OTBOX_TIER1=1)
make otbox-matrix ./otbox matrix — every (journey x base-checkpoint) pair
make otbox-inventory ./otbox matrix --inventory --strict — fails on plan 063 SSoT drift
make otbox-agent-session plan 064 captured-session vertical slice
make capture-refresh SCENARIO=<name> refresh one capture artifact (SKIPs without binary)

make otbox-journeys is the autonomous-delivery verification command — Tier 0 only, no network, no Docker, runs against the repo .venv.

What this skill MUST NOT touch

  • The synthetic fallback. Every captured-session checkpoint has a synthetic chain (fake claude + fixture corpus) that runs when the artifact is absent. It exists so default CI passes without committing multi-MB artifacts. Do not delete or skip it — it is the OSS-safe path.
  • AGENT_FACING_TRAJECTORIES_MIN_GOLD in tests/otbox/jtbd.py. This frozenset names trajectories that MUST have at least one gold journey or --strict inventory fails. Adding to it is a consensus decision (it forces new captured-state coverage); shipping a gold journey for an existing entry is the safe edit.
  • refs/opentraces/local/events/v1. The canonical TrailEvent log ref. Journey assertions on event_log_ref literal-match this; don't rename it.
  • box_id paths inside artifact archives. restore_from_capture rewrites the origin box root to the current one; manual untar will bake the wrong absolute paths into restored boxes.
Install via CLI
npx skills add https://github.com/JayFarei/opentraces --skill otbox
Repository Details
star Stars 79
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator