name: 15-cost-context-management description: "Learn to manage session costs, context window usage, and model routing to get the most value from Claude Code."
Cost and Context Management
Estimated time: 15 minutes Prerequisites: Module 01 (Claude Code installed and working)
Learn to manage session costs, context window usage, and model routing to get the most value from Claude Code.
Orientation
Print this once at the start:
You're learning cost and context management.
This takes about 15 minutes.
We'll cover:
1. Understanding session costs and the /cost command
2. Context window — what fills it and how to manage it
3. Model routing — when to use which model
4. Session discipline patterns
5. Fast mode and model selection
6. Auto-compaction settings
7. June 2026 billing changes (Agent SDK credits)
You'll need:
- Claude Code installed and working (Module 01)
Progress Tracking
On module start, write a progress marker:
mkdir -p ~/.claude/courseware-progress && date -u +%Y-%m-%dT%H:%M:%SZ > ~/.claude/courseware-progress/15.started
Preflight
# Check 1 — Claude Code installed?
command -v claude &>/dev/null && echo "EXISTS: Claude Code" || echo "MISSING: Claude Code — run /learn-01-vertex-setup first"
# Check 2 — Check current CLAUDE.md for cost-related instructions
if [ -f "CLAUDE.md" ]; then
if grep -qi "cost\|budget\|context\|compact\|model" CLAUDE.md 2>/dev/null; then
echo "EXISTS: CLAUDE.md mentions cost/context topics"
else
echo "INFO: CLAUDE.md exists but doesn't cover cost management"
fi
else
echo "INFO: No CLAUDE.md in current directory"
fi
# Check 3 — Check for subagent/model routing in settings
if [ -f "$HOME/.claude/settings.json" ]; then
echo "EXISTS: Settings file present"
else
echo "INFO: No global settings file"
fi
Print a summary.
Step 1 — Understanding Session Costs
Tell the user:
Every message in Claude Code sends your full conversation context
to the model. The cost depends on:
Input tokens — everything Claude reads (your messages, tool results,
system prompt, CLAUDE.md, MCP server data)
Output tokens — everything Claude writes back (responses, tool calls)
The /cost command shows your current session's token usage and cost.
Let's check it now.
Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.
Ask the user to type /cost or show them the current session metrics. Explain:
Key numbers to watch:
- Total cost: cumulative for this session
- Context size: how much of the window is used
- Cache hit rate: higher is better (repeated content is cheaper)
Cost scales with context size. A session that starts at $0.01
per message can grow to $0.10+ per message as context fills up.
The most effective cost control is keeping sessions short and focused.
Step 2 — Context Window Management
Tell the user:
The context window is finite. Everything loads into it:
ALWAYS LOADED (every message):
- System prompt
- CLAUDE.md files (workspace + project)
- Memory files loaded by auto-memory
- Your conversation history (messages + tool results)
- MCP server descriptions
LOADED ON DEMAND:
- File contents (Read tool results)
- Command output (Bash tool results)
- Search results
- Subagent results
When the window fills up, Claude Code compresses older messages.
This is why Claude "forgets" things from early in a session.
Management tools:
/compact — manually compress context (keeps a summary)
/clear — wipe context and start fresh (same session)
/cost — check current usage
New session — exit and relaunch (cleanest reset)
Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.
Demonstrate the impact:
# Show what's in the always-loaded context
echo "=== Context contributors ==="
# CLAUDE.md files
WORKSPACE_CLAUDE=$(find ../.. -maxdepth 1 -name "CLAUDE.md" 2>/dev/null | head -1)
if [ -n "$WORKSPACE_CLAUDE" ]; then
WC_SIZE=$(wc -c < "$WORKSPACE_CLAUDE" 2>/dev/null | tr -d ' ')
echo "Workspace CLAUDE.md: ${WC_SIZE} bytes"
fi
if [ -f "CLAUDE.md" ]; then
PC_SIZE=$(wc -c < "CLAUDE.md" | tr -d ' ')
echo "Project CLAUDE.md: ${PC_SIZE} bytes"
fi
# Memory files
MEM_DIR="$HOME/.claude/projects/$(pwd | sed 's|/|-|g' | sed 's|^-||')/memory"
if [ -d "$MEM_DIR" ]; then
MEM_SIZE=$(du -sh "$MEM_DIR" 2>/dev/null | cut -f1)
MEM_COUNT=$(ls "$MEM_DIR"/*.md 2>/dev/null | wc -l | tr -d ' ')
echo "Memory files: $MEM_COUNT files ($MEM_SIZE)"
fi
# Skills
SKILL_COUNT=$(find "$HOME/.claude/skills" -name "SKILL.md" 2>/dev/null | wc -l | tr -d ' ')
echo "Global skills: $SKILL_COUNT"
echo ""
echo "All of these load into EVERY message. Keep them lean."
Step 3 — Model Routing
Tell the user:
Claude Code can route to different models for different tasks.
This is a powerful cost lever:
Opus — most capable, most expensive
Use for: planning, debugging, architecture, complex reasoning
Cost: highest per token
Sonnet — balanced capability and cost
Use for: code review, search, analysis, moderate tasks
Haiku — fastest, cheapest
Use for: formatting, parsing, simple extraction, mechanical tasks
Routing strategies:
1. MANUAL: Use /model to switch mid-session
Good for: switching from planning (Opus) to execution (Sonnet)
2. SKILLS: Set model: haiku in skill frontmatter
Good for: repetitive skills that don't need full reasoning
3. SUBAGENTS: Dispatch subagents with model selection
Good for: isolating expensive work from cheap work
4. FAST MODE: Toggle /fast for faster output on the same model
Good for: when you need speed, not a model change
Step 4 — Session Discipline Patterns
Tell the user:
Patterns that keep costs down and quality up:
ONE TASK PER SESSION
Start a session, do one thing, exit. Context stays small,
costs stay low, Claude doesn't get confused.
FRONT-LOAD CONTEXT
Put critical info in CLAUDE.md, not in conversation.
It loads every time without growing the context.
USE SUBAGENTS FOR RESEARCH
Research fills context fast. Dispatch an agent for research,
get back a summary. The full search results stay in the
subagent's context, not yours.
COMPACT EARLY
Don't wait for context to fill. /compact after completing
a major step, before starting the next one.
EXIT AND RESTART
The cheapest message is the first one in a new session.
Don't be afraid to start fresh.
WRITE IT DOWN
Plans, findings, decisions — write to files, not conversation.
Files survive context compression and session restarts.
Step 5 -- Fast mode
Explain:
Fast mode makes Claude respond faster by using Opus with optimized
output speed. It does NOT downgrade to a smaller model — you get the
same Opus quality, just faster.
Toggle fast mode:
/fast
Fast mode is available on Opus 4.6 and Opus 4.7. It costs more per
token but saves wall-clock time. Use it when:
- You're iterating quickly and waiting is the bottleneck
- You're debugging and need rapid feedback
- The task is time-sensitive
Don't use it when:
- You're running long autonomous tasks (cost adds up)
- You're doing bulk processing (use Sonnet subagents instead)
Step 6 -- Context visualization and compaction
Explain:
Two commands help you understand and manage context:
/context — visualizes your context window usage breakdown
Shows how much space is used by: system prompt,
CLAUDE.md, skills, conversation, tool results
/usage — shows token usage and cost for the session
Auto-compaction settings (in ~/.claude/settings.json or CLAUDE.md):
autoCompactAt — percentage of context to trigger auto-compact
(default: ~80%). Lower = more aggressive compaction.
sessionLimit — hard token limit per session
thinkingTokenLimit — cap on thinking/reasoning tokens
Example settings.json:
{
"autoCompactAt": 0.6,
"sessionLimit": 100000
}
Prompt caching also reduces costs. When you send the same context
repeatedly (like CLAUDE.md content), Claude caches it. The cache has
a 5-minute TTL — keeping sessions active means cache hits, which are
significantly cheaper than cold reads.
Step 7 -- Billing changes (June 2026)
Explain:
Starting June 15, 2026, Agent SDK and headless (claude -p) usage on
Anthropic subscription plans moves to a separate monthly credit:
Plan Monthly Agent SDK Credit
Pro $20
Max 5x $100
Max 20x $200
This is metered at full API rates with no rollover.
IMPORTANT FOR RED HAT: This billing change does NOT affect Vertex AI
usage. Red Hat uses Claude Code through Google Cloud Vertex AI, which
is billed through your GCP project. The Agent SDK credit separation
only applies to direct Anthropic subscription plans.
What this means in practice:
- Interactive Claude Code sessions: unaffected (same as today)
- claude -p (headless): if you use Anthropic directly, tracked separately
- GitHub Actions with Claude: if using Anthropic directly, tracked separately
- All Vertex AI usage: billed through GCP, no change
Verification
Ask the user:
Let's verify understanding. Answer these:
1. What's the single most effective way to reduce session costs?
2. You're 30 minutes into a session and /cost shows $3.50.
The next message will send all that context again.
What should you do before continuing?
3. You have a skill that reformats JSON files.
What model should it use and why?
Expected answers:
- Keep sessions short and focused (one task per session, or restart between tasks).
- Use /compact to compress context, or start a new session if the current task is done.
- Haiku — it's a mechanical task (formatting) that doesn't need reasoning.
Accept reasonable variations.
If successful, print:
All checks passed. You understand cost and context management.
Challenge
Audit your current session and make one improvement:
1. Check /cost (or /usage) for this session
Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.
2. Count how many CLAUDE.md files load into your context
(workspace + project + any parent directories)
3. Identify the single largest context contributor
4. Make one change to reduce your per-message context size
(trim a CLAUDE.md, remove an unused skill, compact, etc.)
Tell me:
1. Your session cost so far
2. Number of CLAUDE.md files loading
3. Biggest context contributor
4. What change you made
Challenge Verification
Any reasonable answers pass. The key insight is that context size drives cost, and managing it is an active practice.
Write the completion marker:
date -u +%Y-%m-%dT%H:%M:%SZ > ~/.claude/courseware-progress/15.done
If successful, print:
Module 15 complete.
You now understand cost and context management.
Key takeaways:
- Every message sends full context — cost grows with conversation
- /cost, /compact, /clear are your management tools
- Model routing (Opus/Sonnet/Haiku) is a direct cost lever
- CLAUDE.md loads every message — keep it lean and relevant
- Subagents isolate expensive research from your main context
- One task per session is the simplest cost discipline
- /usage (replaces /cost + /stats) for session cost and token breakdown
- /context for visualizing context window usage
- /fast for faster Opus output (same quality, higher speed)
- Auto-compaction settings: autoCompactAt, sessionLimit
- Vertex AI billing is separate from Anthropic subscription changes
Cost reduction checklist:
[ ] Session cost target: under $5 for most tasks
[ ] /compact after major steps
[ ] Subagents for research-heavy work
[ ] Haiku for mechanical skills
[ ] CLAUDE.md under 500 lines
[ ] Exit and restart between unrelated tasks
Next module: /learn-16-multi-repo-workspaces
Questions or feedback? https://github.com/rhpds/claude-code-courseware/issues