pre-turn-budget-guardian - SKILL.md Agent Skill

name: pre-turn-budget-guardian description: Enforce a token budget ceiling on Claude Code sessions by checking projected usage BEFORE each turn and halting with a structured stop reason if the budget would be exceeded. Prevents runaway loops and silent token burn. Completes the token management trilogy alongside boot-tax-monitor (measures startup) and token-burn-auditor (measures waste).

Pre-Turn Token Budget Guardian

Enforces a hard or soft token budget on the current session. Unlike boot-tax-monitor (measures startup overhead) and token-burn-auditor (finds waste after the fact), this skill prevents overspend by checking projected usage before each significant operation.

Trigger

Use when the user says "set a token budget", "budget guardian", "limit my spend", "cap this session", "don't let me burn more than X tokens", "token ceiling", "prevent runaway", or wants to enforce cost limits on a Claude Code session.

Phase 1: Set Budget Parameters

Ask for or accept defaults:

Parameter	Default	Description
`budget_tokens`	500,000	Max total tokens (input + output) for this session
`budget_usd`	$5.00	Max estimated cost for this session
`warning_threshold`	80%	Alert when this % of budget is consumed
`enforcement`	soft	`soft` = warn and ask to continue; `hard` = halt immediately
`model`	auto-detect	Model in use (for pricing calculation)

If the user gives a dollar amount, convert to tokens using current pricing:

Model	Input $/1M	Output $/1M	Blended avg $/1M
Haiku 3.5	$0.80	$4.00	~$2.40
Sonnet 4	$3.00	$15.00	~$9.00
Opus 4	$15.00	$75.00	~$45.00

Formula: budget_tokens = (budget_usd / blended_per_token) * 1,000,000

Phase 2: Measure Current Usage

Check current session token consumption:

If the agent platform DB available (store/the agent platform.db):

SELECT SUM(context_tokens) as total_input, SUM(output_tokens) as total_output,
       SUM(cost_usd) as total_cost, COUNT(*) as turns
FROM token_usage WHERE session_id = '<current_session>';

If session JSONL available (~/.claude/projects/*/sessions/): Parse the most recent session file for token usage entries.
If neither available, estimate from conversation length:
- Count user messages and assistant responses
- Estimate ~1,000 tokens per user turn, ~2,000 per assistant turn
- This is a rough fallback -- note the estimate in output

Phase 3: Project Next Turn Cost

Before the user's next significant operation, estimate what it will cost:

Simple question/answer: ~3,000-5,000 tokens
Code generation: ~5,000-15,000 tokens
Multi-file edit: ~15,000-30,000 tokens
Agent task with tool use: ~20,000-50,000 tokens
Sub-agent spawn: ~50,000-100,000 tokens

Calculate:

used_tokens = total_input + total_output
projected_after_next = used_tokens + estimated_next_turn
remaining = budget_tokens - used_tokens

Phase 4: Enforce

At warning threshold (default 80%):

Token Budget Warning
====================
Used: XXK / XXK tokens (XX%)
Est. cost: $X.XX / $X.XX
Remaining: ~XX turns at current rate

Continue? [y/n]

At budget limit:

Soft mode: Show warning, ask to continue or stop Hard mode: Output structured stop and refuse further operations:

Token Budget Exceeded
=====================
Budget: XXK tokens ($X.XX)
Used: XXK tokens ($X.XX)
Stop reason: BUDGET_EXCEEDED

Session halted. To continue, run /pre-turn-budget-guardian with a higher budget.

Phase 5: Ongoing Monitoring

After setting the budget, append a status line after each significant response:

[Budget: XXK/XXK tokens (XX%) | ~XX turns remaining]

If the user runs multiple operations, track cumulative usage and warn proactively when approaching the threshold.

Runaway Loop Detection

The specific failure mode this prevents: Claude Code's autoCompact or retry logic silently burning thousands of tokens in a loop. Detect this by:

If 3+ consecutive turns have >10,000 output tokens each, flag as potential loop
If token consumption rate doubles between consecutive turns, flag as acceleration
If the same tool is called 5+ times in a row with similar inputs, flag as retry storm

On detection:

Runaway Loop Detected
=====================
Pattern: [retry storm / token acceleration / repeated tool calls]
Tokens burned in loop: ~XXK
Action: [HALTED (hard) / WARNING (soft)]

Integration Notes

boot-tax-monitor: Measures what you pay at startup (static). Run first to know your baseline.
token-burn-auditor: Finds waste patterns after the fact (diagnostic). Run periodically.
pre-turn-budget-guardian: Prevents overspend in real-time (enforcement). Run during sessions.

Together these form the token management trilogy: measure, diagnose, enforce.

Source Attribution

Technique derived from Nate's Newsletter (2026-04-03): "Your Agent Is 80% Plumbing" -- pre-turn token budget checking as a core infrastructure primitive, inspired by Claude Code's internal budget guardian that checks projected usage before each API call and halts with structured stop reasons on budget exhaustion.