history-compression

name: history-compression description: How stagewise's agent history compression pipeline works — boundary selection, recency bias, chained compressions, and the SQLite-backed test harness for replaying real compressions in LLM playgrounds. Use when debugging, tuning, or extending history compression, when investigating context-window overflow, or when the user wants to probe compression quality against real chat histories.

History Compression

Stagewise summarizes long agent histories into a single briefing stored on a "boundary" message. Everything before the boundary is replaced by the briefing; everything after stays verbatim. Recency bias baked into both boundary math + LLM prompt.

Key files

All paths are repo-root-relative.

apps/browser/src/backend/agents/shared/base-agent/base-agent.ts — trigger + boundary logic (compressHistoryInternal, ~L1898; trigger check in handlePostStep ~L2208).
apps/browser/src/backend/agents/shared/base-agent/history-compression/index.ts — model cascade + generateSimpleCompressedHistory.
apps/browser/src/backend/agents/shared/base-agent/history-compression/prompt.ts — COMPRESSION_SYSTEM_PROMPT, COMPRESSION_TARGET_CHARS = 30_000, buildCompressionUserMessage (dynamic budget hint).
apps/browser/src/backend/agents/shared/base-agent/history-compression/serialization.ts — convertAgentMessagesToCompactMessageHistoryString, estimateMessageTokens.
scripts/experiments/extract-compression-test-data.ts — SQLite → playground-ready per-compression bundles.

Trigger

After every step → handlePostStep checks:

usedTokens > min(compactionThreshold × contextWindow, 200k)

compactionThreshold default 0.65; chat agent overrides to 0.5.
200k hard cap (HISTORY_COMPRESSION_HARD_CAP_TOKENS) = 1M-ctx models trigger at the same absolute count as a 200k-context model running near its full window.
Runs via void (async, non-blocking). Guarded by _isCompressingHistory flag → no concurrent runs.
Silent failure — agent keeps going, context overflow later surfaces normal model error.

Boundary selection (`compressHistoryInternal`)

Kept-budget = min(0.2 × contextWindow, 40k tokens) (KEPT_BUDGET_FRACTION, KEPT_BUDGET_HARD_CAP_TOKENS). Preferred floor = max(5, config.minUncompressedMessages ?? 10).

Walk backward from history end:

Accumulate estimateMessageTokens(msg) until next msg would bust budget → boundary there.
Else stop once kept-count ≥ floor.
Edge: single last msg > budget → keep just that one, warn.
boundary < 1 → nothing to compress, skip.

Then: messagesToCompact = history.slice(0, boundary) → compress → write result to history[boundary].metadata.compressedHistory.

Token estimation quirks

estimateMessageTokens = ceil(chars / 4). Includes:

Text parts.
Tool-call toolName + JSON-stringified input + output.
Metadata overhead: env-snapshot, compressedHistory, mentions, attachments.
PER_MESSAGE_OVERHEAD_CHARS = 400 flat — accounts for XML wrappers/role tags the pipeline injects but aren't in parts. Without it, budget walk under-counts → compression triggers too late.

Chained compressions

When messagesToCompact already contains a prior compressedHistory:

Serializer (convertAgentMessagesToCompactMessageHistoryString) walks backward and stops at first compressedHistory it finds, emitting it as <previous-chat-history>...</previous-chat-history>. Older raw messages never re-serialized.
buildCompressionUserMessage reads prior briefing length → injects ratio-bucketed budget hint:
- <60% target → "incorporate verbatim, do NOT shorten".
- 60–85% → "light condensation to oldest sections".
- ≥85% → "condense oldest fully-resolved sections".
Prompt mandates: keep every ## heading, shorten oldest sections only, preserve all [](path:...) links + user decisions + outcomes verbatim. Recent sections untouched.

→ Chain is bounded: each round re-absorbs prior briefing under the same 30k target.

Serialization format

Input to LLM is XML-ish:

<user> — text + [attached: ...], [mentioned: ...] metadata annotations.
<assistant> — text + one-liner tool markers: [read: path], [edited: path (N edits)], [shell: label → ✓ / exit N / timed out], [lint: paths → clean / N errors, M warnings], [asked user: title → field: answer; ...], [searched: "query"], [created: path], [wrote: path].
<previous-chat-history> — inlined prior briefing (see above).
Error state on any tool → ✗ <msg> suffix.
Unknown tool types → [tool-xxx] generic marker (never silently dropped).

Prompt design (`apps/browser/src/backend/agents/shared/base-agent/history-compression/prompt.ts`)

Target 30k chars soft ("goal, not ceiling — longer > losing detail").
2nd-person for agent, 3rd-person for user.
## headings per topic, flowing prose inside. No bullets/tables/code blocks.
Recency bias: old resolved = 2–4 sentences; recent/active = full detail ending with current status.
MUST preserve verbatim: [](path:...) links, markdown links, user decisions/preferences/constraints, color values, directory structures, config.
Output plain markdown. Never emit <previous-chat-history> or any XML wrapper in output.

Model cascade (`apps/browser/src/backend/agents/shared/base-agent/history-compression/index.ts`)

gemini-3.1-flash-lite → 2. gpt-5.4-nano → 3. claude-haiku-4.5.

Each 30s abort timeout, temperature: 0.1, maxOutputTokens: 20000.
Min valid output: 30 chars (shorter → fallback).
Final fallback: active chat model (only if not already tried).
All fail → throws; caller (compressHistoryInternal) logs + reports, agent continues uncompressed.

Tuning knobs

Knob	Where	Default	Effect
`compactionThreshold`	`config.historyCompressionThreshold`	0.65 (chat: 0.5)	Trigger fraction of ctx window
`HISTORY_COMPRESSION_HARD_CAP_TOKENS`	`base-agent.ts` const	200_000	Absolute trigger cap
`KEPT_BUDGET_FRACTION`	`base-agent.ts` const	0.2	Fraction kept uncompressed
`KEPT_BUDGET_HARD_CAP_TOKENS`	`base-agent.ts` const	40_000	Absolute kept cap
`minUncompressedMessages`	`config`	10	Floor on kept msg count
`COMPRESSION_TARGET_CHARS`	`prompt.ts` const	30_000	Soft briefing size target
`HISTORY_COMPRESSION_TIMEOUT_MS`	`index.ts` const	30_000	Per-model attempt timeout
`HISTORY_COMPRESSION_MODELS`	`index.ts` const	3-model cascade	Compression model order
`PER_MESSAGE_OVERHEAD_CHARS`	`serialization.ts`	400	Metadata overhead fudge

Invariant: kept budget < compression trigger (else nothing ever compresses).

Test harness (`scripts/experiments/extract-compression-test-data.ts`)

Replays every real compression from local stagewise SQLite into playground-ready bundles.

npx tsx scripts/experiments/extract-compression-test-data.ts --channel prerelease
npx tsx scripts/experiments/extract-compression-test-data.ts --channel dev --min-messages 10

Channels map to <appData>/{stagewise | stagewise-prerelease | stagewise-dev}/stagewise/agents/instances.sqlite.

Per chat, for each boundary message (every real compression event):

Slices messages[0..boundary).
Runs real convertAgentMessagesToCompactMessageHistoryString + buildCompressionUserMessage (imported from app source → fidelity guaranteed).
Writes to experiments-data/history-compression/<channel>/NNN-title/compression-NNN/:
- system-prompt.md — static prompt.
- user-message.md — dynamic user msg with budget hint.
- compact-history.xml — raw serialized input.
- actual-output.md — what the real in-app LLM produced.
- metadata.json — indices, char counts, prev-tag leak check.

→ Paste system + user into AI Studio/Claude → diff against actual-output.md. Covers full chain (1st → Nth compression) so chained-compression drift is testable.

Common tasks

"Why didn't compression trigger?" → check usedTokens vs trigger formula; verify compactionThreshold ≥ 0; check _isCompressingHistory not stuck.
"Compression is too aggressive/lossy" → bump COMPRESSION_TARGET_CHARS; lower compactionThreshold so it triggers earlier with smaller inputs.
"Too few kept messages after compression" → raise minUncompressedMessages or KEPT_BUDGET_FRACTION (but keep < trigger).
"Output leaks <previous-chat-history> tags" → check metadata.json actualOutputIncludesPreviousTag; prompt already forbids it, likely model regression → bump cascade order.
"Boundary drift after compression" → compressHistoryInternal re-finds boundary by id after LLM round-trip (user may have undone messages mid-compression); missing id → silent skip + warn.