claude-code-effort-models

star 0

Use when deciding how to work on a task - choosing effort level, model, or fast mode to balance reasoning depth, speed, and token cost.

ucsandman By ucsandman schedule Updated 6/8/2026

name: claude-code-effort-models description: "Use when deciding how to work on a task - choosing effort level, model, or fast mode to balance reasoning depth, speed, and token cost." user-invocable: true

Claude Code Effort, Models, and Fast Mode

Quick reference for tuning how Claude Code works: model selection, effort levels, and speed mode. Read this mid-session when deciding how to approach a task.


Models via /model

Switch with /model (opens picker) or /model <name> to set directly. Persists to next session.

Available models:

  • opus → Claude Opus 4.8 (latest, strongest reasoning)
  • sonnet → Claude Sonnet 4.6 (daily coding, balanced)
  • haiku → Claude Haiku 4.5 (fast, simple tasks)
  • best → currently opus
  • opusplan → opus during planning, auto-switches to sonnet for execution
  • opus[1m] / sonnet[1m] → same model with 1M token context window (long sessions only)

Model IDs (full names):

  • claude-opus-4-8
  • claude-sonnet-4-6
  • claude-haiku-4-5-20251001

When to pick:

  • Opus: hard reasoning, architecture decisions, complex refactors, code review, debugging. Default for Max/Team/Enterprise/API; higher token cost.
  • Sonnet: most coding work, features, edits, tests. Good speed/capability tradeoff.
  • Haiku: searches, simple formatting, routine tasks. Cheapest.
  • opusplan: complex feature design where planning matters but execution is straightforward.

Effort Levels via /effort

Controls adaptive reasoning depth per message. Raise it for complex problems; lower it for routine tasks. Persistent across sessions unless overridden by env var.

Available levels (varies by model):

  • low — minimal thinking, fastest, cheapest. Use: latency-sensitive, low-complexity tasks.
  • medium — lighter reasoning, cost-conscious work that trades some intelligence.
  • high — default on Opus 4.8, Opus 4.6, Sonnet 4.6. Balances tokens and capability.
  • xhigh — deeper reasoning, higher token spend. Default on Opus 4.7. Use: tricky architecture, intricate bugs.
  • max — deepest reasoning, unbounded tokens, session-only. Can overthink; test first.

Special: /effort ultracode (Opus only, session-only) sends xhigh to model AND orchestrates dynamic workflows for substantive tasks. Reserved for ambitious multi-phase work.

Usage:

  • /effort — open slider picker
  • /effort high — set directly
  • /effort auto — reset to model default
  • Env: CLAUDE_CODE_EFFORT_LEVEL=xhigh
  • Skill frontmatter: effort: xhigh

Token tradeoff: low < medium < high < xhigh < max. Each step costs more tokens but enables deeper reasoning for complex tasks.


Fast Mode via /fast

Opus only. Same model quality, ~2.5x faster output, higher cost per token. Toggle with /fast or "fastMode": true in settings.json.

Pricing (per MTok):

  • Opus 4.8 fast: $10 input / $50 output (vs $5/$25 standard)
  • Opus 4.7 fast: $30 input / $150 output (vs $5/$25 standard)
  • Opus 4.6 fast: $30 input / $150 output (deprecated; migrate to 4.8 or 4.7)

How it works:

  • NOT a different model. Same Opus, different API config prioritizing latency.
  • Auto-switches you to Opus if on Sonnet/Haiku.
  • Persists to next session by default (admins can set per-session reset).
  • ↯ icon shows it's active.
  • Shares rate limit pool across Opus 4.8/4.7/4.6; auto-falls back to standard speed if rate-limited.

When to use:

  • Rapid iteration, live debugging, tight deadlines. Cost matters less than speed.
  • NOT good for long autonomous tasks, batch work, CI/CD.

Cost gotcha: enabling fast mode mid-conversation re-caches full history at fast-mode price. Enable at session start for best cost.

Requirements:

  • Anthropic API or Claude subscription (Pro/Max/Team/Enterprise) with usage credits enabled.
  • NOT available on Bedrock, Vertex, Foundry, or AWS Platform.
  • Team/Enterprise admins must explicitly enable it; disabled by default org-wide.

Token and Cache Angle

Model choice drives cost:

  • Haiku ~5x cheaper input than Sonnet, ~15x cheaper than Opus. Suitable for searches/format work.
  • Sonnet ~1/3 Opus input cost. Suitable for most coding.
  • Opus 3–10x more per token than Sonnet; justifiable for hard reasoning.

Effort and fast mode interact:

  • Higher effort = more tokens on same model.
  • Fast mode = same tokens, higher per-token price, faster latency.
  • Combine lower effort + fast mode for max speed on straightforward tasks.
  • Don't raise both for cost-sensitive work.

Prompt caching: a stable warm prefix re-reads at ~10% of input price. Keep the prefix stable (don't rewrite early messages mid-session). Switching /model mid-session invalidates the cache for the next turn.


Quick Decision Tree

Task is routine (format, search, simple edit)?haiku, low effort, standard mode.

Task is typical coding (features, tests, refactors)?sonnet, high effort (or medium to cut cost), standard mode.

Task is hard (architecture, complex bug, design)?opus, xhigh effort, standard mode.

You need output in seconds, not minutes? → Use fast mode on Opus (higher cost, lower latency). Not a model change.

You need the deepest reasoning on an ambitious task?opus, max effort, standard mode. Session-only, unbounded tokens.


Environment Variables

  • ANTHROPIC_MODEL=<name> — set model for this session only.
  • CLAUDE_CODE_EFFORT_LEVEL=<level> — effort level; overrides session choice.
  • CLAUDE_CODE_DISABLE_FAST_MODE=1 — disable fast mode entirely.
  • ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL — pin specific model versions (useful on Bedrock, Vertex, Foundry).

See Also

Install via CLI
npx skills add https://github.com/ucsandman/claude-code-capability-primer --skill claude-code-effort-models
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator