15-cost-context-management - SKILL.md Agent Skill

name: 15-cost-context-management description: "Learn to manage session costs, context window usage, and model routing to get the most value from Claude Code."

Cost and Context Management

Estimated time: 15 minutes Prerequisites: Module 01 (Claude Code installed and working)

Learn to manage session costs, context window usage, and model routing to get the most value from Claude Code.

Orientation

Print this once at the start:

You're learning cost and context management.
This takes about 15 minutes.

We'll cover:
  1. Understanding session costs and the /cost command
  2. Context window — what fills it and how to manage it
  3. Model routing — when to use which model
  4. Session discipline patterns
  5. Fast mode and model selection
  6. Auto-compaction settings
  7. June 2026 billing changes (Agent SDK credits)

You'll need:
  - Claude Code installed and working (Module 01)

Progress Tracking

On module start, write a progress marker:

mkdir -p ~/.claude/courseware-progress && date -u +%Y-%m-%dT%H:%M:%SZ > ~/.claude/courseware-progress/15.started

Preflight

# Check 1 — Claude Code installed?
command -v claude &>/dev/null && echo "EXISTS: Claude Code" || echo "MISSING: Claude Code — run /learn-01-vertex-setup first"

# Check 2 — Check current CLAUDE.md for cost-related instructions
if [ -f "CLAUDE.md" ]; then
  if grep -qi "cost\|budget\|context\|compact\|model" CLAUDE.md 2>/dev/null; then
    echo "EXISTS: CLAUDE.md mentions cost/context topics"
  else
    echo "INFO: CLAUDE.md exists but doesn't cover cost management"
  fi
else
  echo "INFO: No CLAUDE.md in current directory"
fi

# Check 3 — Check for subagent/model routing in settings
if [ -f "$HOME/.claude/settings.json" ]; then
  echo "EXISTS: Settings file present"
else
  echo "INFO: No global settings file"
fi

Print a summary.

Step 1 — Understanding Session Costs

Tell the user:

Every message in Claude Code sends your full conversation context
to the model. The cost depends on:

  Input tokens  — everything Claude reads (your messages, tool results,
                  system prompt, CLAUDE.md, MCP server data)
  Output tokens — everything Claude writes back (responses, tool calls)

The /cost command shows your current session's token usage and cost.
Let's check it now.

Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.

Ask the user to type /cost or show them the current session metrics. Explain:

Key numbers to watch:
  - Total cost: cumulative for this session
  - Context size: how much of the window is used
  - Cache hit rate: higher is better (repeated content is cheaper)

Cost scales with context size. A session that starts at $0.01
per message can grow to $0.10+ per message as context fills up.
The most effective cost control is keeping sessions short and focused.

Step 2 — Context Window Management

Tell the user:

The context window is finite. Everything loads into it:

  ALWAYS LOADED (every message):
    - System prompt
    - CLAUDE.md files (workspace + project)
    - Memory files loaded by auto-memory
    - Your conversation history (messages + tool results)
    - MCP server descriptions

  LOADED ON DEMAND:
    - File contents (Read tool results)
    - Command output (Bash tool results)
    - Search results
    - Subagent results

When the window fills up, Claude Code compresses older messages.
This is why Claude "forgets" things from early in a session.

Management tools:
  /compact     — manually compress context (keeps a summary)
  /clear       — wipe context and start fresh (same session)
  /cost        — check current usage
  New session  — exit and relaunch (cleanest reset)

Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.

Demonstrate the impact:

# Show what's in the always-loaded context
echo "=== Context contributors ==="

# CLAUDE.md files
WORKSPACE_CLAUDE=$(find ../.. -maxdepth 1 -name "CLAUDE.md" 2>/dev/null | head -1)
if [ -n "$WORKSPACE_CLAUDE" ]; then
  WC_SIZE=$(wc -c < "$WORKSPACE_CLAUDE" 2>/dev/null | tr -d ' ')
  echo "Workspace CLAUDE.md: ${WC_SIZE} bytes"
fi

if [ -f "CLAUDE.md" ]; then
  PC_SIZE=$(wc -c < "CLAUDE.md" | tr -d ' ')
  echo "Project CLAUDE.md: ${PC_SIZE} bytes"
fi

# Memory files
MEM_DIR="$HOME/.claude/projects/$(pwd | sed 's|/|-|g' | sed 's|^-||')/memory"
if [ -d "$MEM_DIR" ]; then
  MEM_SIZE=$(du -sh "$MEM_DIR" 2>/dev/null | cut -f1)
  MEM_COUNT=$(ls "$MEM_DIR"/*.md 2>/dev/null | wc -l | tr -d ' ')
  echo "Memory files: $MEM_COUNT files ($MEM_SIZE)"
fi

# Skills
SKILL_COUNT=$(find "$HOME/.claude/skills" -name "SKILL.md" 2>/dev/null | wc -l | tr -d ' ')
echo "Global skills: $SKILL_COUNT"

echo ""
echo "All of these load into EVERY message. Keep them lean."

Step 3 — Model Routing

Tell the user:

Claude Code can route to different models for different tasks.
This is a powerful cost lever:

  Opus    — most capable, most expensive
            Use for: planning, debugging, architecture, complex reasoning
            Cost: highest per token

  Sonnet  — balanced capability and cost
            Use for: code review, search, analysis, moderate tasks

  Haiku   — fastest, cheapest
            Use for: formatting, parsing, simple extraction, mechanical tasks

Routing strategies:

  1. MANUAL: Use /model to switch mid-session
     Good for: switching from planning (Opus) to execution (Sonnet)

  2. SKILLS: Set model: haiku in skill frontmatter
     Good for: repetitive skills that don't need full reasoning

  3. SUBAGENTS: Dispatch subagents with model selection
     Good for: isolating expensive work from cheap work

  4. FAST MODE: Toggle /fast for faster output on the same model
     Good for: when you need speed, not a model change

Step 4 — Session Discipline Patterns

Tell the user:

Patterns that keep costs down and quality up:

  ONE TASK PER SESSION
    Start a session, do one thing, exit. Context stays small,
    costs stay low, Claude doesn't get confused.

  FRONT-LOAD CONTEXT
    Put critical info in CLAUDE.md, not in conversation.
    It loads every time without growing the context.

  USE SUBAGENTS FOR RESEARCH
    Research fills context fast. Dispatch an agent for research,
    get back a summary. The full search results stay in the
    subagent's context, not yours.

  COMPACT EARLY
    Don't wait for context to fill. /compact after completing
    a major step, before starting the next one.

  EXIT AND RESTART
    The cheapest message is the first one in a new session.
    Don't be afraid to start fresh.

  WRITE IT DOWN
    Plans, findings, decisions — write to files, not conversation.
    Files survive context compression and session restarts.

Step 5 -- Fast mode

Explain:

Fast mode makes Claude respond faster by using Opus with optimized
output speed. It does NOT downgrade to a smaller model — you get the
same Opus quality, just faster.

Toggle fast mode:
  /fast

Fast mode is available on Opus 4.6 and Opus 4.7. It costs more per
token but saves wall-clock time. Use it when:
  - You're iterating quickly and waiting is the bottleneck
  - You're debugging and need rapid feedback
  - The task is time-sensitive

Don't use it when:
  - You're running long autonomous tasks (cost adds up)
  - You're doing bulk processing (use Sonnet subagents instead)

Step 6 -- Context visualization and compaction

Explain:

Two commands help you understand and manage context:

  /context   — visualizes your context window usage breakdown
               Shows how much space is used by: system prompt,
               CLAUDE.md, skills, conversation, tool results

  /usage     — shows token usage and cost for the session

Auto-compaction settings (in ~/.claude/settings.json or CLAUDE.md):

  autoCompactAt       — percentage of context to trigger auto-compact
                       (default: ~80%). Lower = more aggressive compaction.
  sessionLimit        — hard token limit per session
  thinkingTokenLimit  — cap on thinking/reasoning tokens

Example settings.json:
  {
    "autoCompactAt": 0.6,
    "sessionLimit": 100000
  }

Prompt caching also reduces costs. When you send the same context
repeatedly (like CLAUDE.md content), Claude caches it. The cache has
a 5-minute TTL — keeping sessions active means cache hits, which are
significantly cheaper than cold reads.

Step 7 -- Billing changes (June 2026)

Explain:

Starting June 15, 2026, Agent SDK and headless (claude -p) usage on
Anthropic subscription plans moves to a separate monthly credit:

  Plan          Monthly Agent SDK Credit
  Pro           $20
  Max 5x        $100
  Max 20x       $200

This is metered at full API rates with no rollover.

IMPORTANT FOR RED HAT: This billing change does NOT affect Vertex AI
usage. Red Hat uses Claude Code through Google Cloud Vertex AI, which
is billed through your GCP project. The Agent SDK credit separation
only applies to direct Anthropic subscription plans.

What this means in practice:
  - Interactive Claude Code sessions: unaffected (same as today)
  - claude -p (headless): if you use Anthropic directly, tracked separately
  - GitHub Actions with Claude: if using Anthropic directly, tracked separately
  - All Vertex AI usage: billed through GCP, no change

Verification

Ask the user:

Let's verify understanding. Answer these:

1. What's the single most effective way to reduce session costs?

2. You're 30 minutes into a session and /cost shows $3.50.
   The next message will send all that context again.
   What should you do before continuing?

3. You have a skill that reformats JSON files.
   What model should it use and why?

Expected answers:

Keep sessions short and focused (one task per session, or restart between tasks).
Use /compact to compress context, or start a new session if the current task is done.
Haiku — it's a mechanical task (formatting) that doesn't need reasoning.

Accept reasonable variations.

If successful, print:

All checks passed. You understand cost and context management.

Challenge

Audit your current session and make one improvement:

1. Check /cost (or /usage) for this session

Note: /cost and /stats have been merged into /usage. Both old commands
still work as shortcuts, but /usage is the canonical command.

2. Count how many CLAUDE.md files load into your context
   (workspace + project + any parent directories)
3. Identify the single largest context contributor
4. Make one change to reduce your per-message context size
   (trim a CLAUDE.md, remove an unused skill, compact, etc.)

Tell me:
  1. Your session cost so far
  2. Number of CLAUDE.md files loading
  3. Biggest context contributor
  4. What change you made

Challenge Verification

Any reasonable answers pass. The key insight is that context size drives cost, and managing it is an active practice.

Write the completion marker:

date -u +%Y-%m-%dT%H:%M:%SZ > ~/.claude/courseware-progress/15.done

If successful, print:

Module 15 complete.

You now understand cost and context management.
Key takeaways:
  - Every message sends full context — cost grows with conversation
  - /cost, /compact, /clear are your management tools
  - Model routing (Opus/Sonnet/Haiku) is a direct cost lever
  - CLAUDE.md loads every message — keep it lean and relevant
  - Subagents isolate expensive research from your main context
  - One task per session is the simplest cost discipline
  - /usage (replaces /cost + /stats) for session cost and token breakdown
  - /context for visualizing context window usage
  - /fast for faster Opus output (same quality, higher speed)
  - Auto-compaction settings: autoCompactAt, sessionLimit
  - Vertex AI billing is separate from Anthropic subscription changes

Cost reduction checklist:
  [ ] Session cost target: under $5 for most tasks
  [ ] /compact after major steps
  [ ] Subagents for research-heavy work
  [ ] Haiku for mechanical skills
  [ ] CLAUDE.md under 500 lines
  [ ] Exit and restart between unrelated tasks

Next module: /learn-16-multi-repo-workspaces

Questions or feedback? https://github.com/rhpds/claude-code-courseware/issues