name: drive description: Drive the REAL claude binary through clawback's PTY channel at a scripted inter-turn gap schedule — the faithful load arm. Because the bytes on the wire are produced by genuine Claude Code, this is the only driver that exercises the 1h-TTL nested-cache-control rewrite (arm A2) authentically. Use for headline/moat numbers and to cross-check the HTTP replay arm (.skills/replay). Requires node-pty and a running clawback proxy.
clawback PTY-driven load driver (faithful arm)
Run benchmark/bin/drive_pty.js from the project root. It launches the real
claude binary inside a PTY clawback owns (reusing src/launch_claude.js →
ptyProcess.write(), the same mechanism auto-continue uses) and types
scripted prompts on a schedule. Data is captured by the proxy's
--turn-log — start the proxy (with its turn-log) via .skills/monitor
first, pointed at the same --port.
Why this arm exists
The request bytes are, by definition, exactly what Claude Code sends —
including the precise cache_control breakpoint nesting that the 1h-TTL moat
rewrites. The HTTP replay arm (.skills/replay) can only approximate that
with a fixture. Agreement between the two arms is what makes a claim
bulletproof; divergence is itself a finding.
Requirements
node-ptyinstalled (clawback's optional dep). Without it the driver errors out — it cannot fall back to non-PTY (it needs the master FD to inject keystrokes).- A running clawback proxy with a
--turn-log. - The real
claudebinary onPATH(or--command <path>). A real run spends real tokens throughclaude.
Usage
# print the planned gap schedule + prompt loading, spawn nothing:
node benchmark/bin/drive_pty.js --dry-run --profile L0 --turns 5
# real run at L2 (5–30 min gaps), 30 turns:
node benchmark/bin/drive_pty.js --profile L2 --turns 30 \
--prompts benchmark/prompts/coding.txt --settle-sec 8
Key options
--profile L0|L1|L2|L3|L4— inter-turn gap range (mirrors.skills/replay). L4 (overnight/date-rollover) can't be faithfully automated in a short run; use a real overnight window.--turns N,--gap-sec N(fixed-gap override).--prompts PATH— one prompt per line (#comments ok); the driver cyclesprompt[i % N]. The shippedbenchmark/prompts/coding.txtis read-only-ish on purpose (asks for explanation, not edits) so long unattended runs don't mutate the tree.--settle-sec N— turn is "done" after N s of PTY silence (quiescence; Claude Code's TUI has no machine done-marker).--max-turn-seccaps it.--model ID— pin claude's model (cost control; e.g.claude-haiku-4-5-20251001).--effort low|medium|high|xhigh|max— pin claude's reasoning level (claude --effort). Effort resizes the thinking blocks that ride inside each turn and get cached, so it shifts the cache economics clawback is measured on. Jagged availability: Haiku has none;xhighis Opus 4.7/4.8 only.--command claude,--cwd DIR,--port 8787,--transcript PATH.
Reads no fixture; the proxy writes them now
This arm is the real client, so it reads no benchmark/fixtures/ — it
types live prompts at claude. The PTY itself sees only terminal bytes, never
the decrypted HTTP body, so the driver can't write a fixture. The proxy
can: --capture-body dumps the pristine /v1/messages body before any
mutation. .skills/capture wires this driver (one turn) to a
--capture-body proxy to mint a real fixture for the replay arm. See
.skills/capture.
After a run
Analyze the proxy's turn-log with .skills/bench, then chart with
.skills/plot. Compare against the .skills/replay (clean) numbers.