name: claude-batch-ai
description: |
Fast batch AI processing using claude -p CLI with OAuth credentials.
Use when: (1) you need programmatic Claude calls but have no API key,
(2) batch processing hundreds of items with AI (titles, summaries, labels),
(3) claude -p is slow (75s+ per call) due to MCP/tool loading overhead.
Key insight: --tools "" --disable-slash-commands cuts startup from 75s to 9s.
Covers parallel execution with ThreadPoolExecutor for throughput.
author: Claude Code
version: 1.0.0
date: 2026-02-11
Claude CLI Batch AI Processing
Problem
You need to make many programmatic Claude API calls (e.g., generating titles, summaries, labels for hundreds of items) but:
- No
ANTHROPIC_API_KEYis available in the environment - Claude Code uses OAuth credentials stored in the macOS keychain, not API keys
- Naive
claude -pcalls take 75+ seconds each due to MCP server and tool loading
Context / Trigger Conditions
- Need batch AI processing (100+ items)
- No API key set (
ANTHROPIC_API_KEYenv var is empty) - Claude Code is installed and authenticated via OAuth (
claude.ailogin) - Keychain shows
Claude Code-credentialswithclaudeAiOauthtokens - Individual
claude -pcalls are unacceptably slow
Solution
1. Strip CLI overhead with flags
The critical optimization: disable all tools and skills to eliminate MCP server startup.
# SLOW (75s) — loads all MCP servers, tools, skills
echo "prompt" | claude -p --model haiku
# FAST (9s) — bare model call, no tool/MCP overhead
echo "prompt" | claude -p --model haiku --no-session-persistence --tools "" --disable-slash-commands
Flag breakdown:
--tools ""— Disables ALL tool loading (Bash, Edit, Read, MCP servers). This is the biggest win.--disable-slash-commands— Skips skill/slash-command loading--no-session-persistence— Don't save session to disk (prevents thousands of junk JSONL files)--model haiku— Use cheapest/fastest model for simple tasks
2. Parallelize with ThreadPoolExecutor
import concurrent.futures
import subprocess
CLAUDE_BIN = str(Path.home() / ".local" / "bin" / "claude")
def call_claude(prompt):
result = subprocess.run(
[CLAUDE_BIN, "-p", "--model", "haiku",
"--no-session-persistence", "--tools", "", "--disable-slash-commands"],
input=prompt,
capture_output=True,
text=True,
timeout=60,
)
if result.returncode != 0:
raise RuntimeError(result.stderr.strip()[:200])
return result.stdout.strip()
# 6 workers: ~0.5 items/sec throughput (9s per call / 6 parallel = 1.5s effective)
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as executor:
futures = {executor.submit(call_claude, prompt): item_id
for item_id, prompt in work_items}
for future in concurrent.futures.as_completed(futures):
item_id = futures[future]
result = future.result()
# process result...
3. Performance characteristics
| Workers | Per-call | Throughput | 764 items |
|---|---|---|---|
| 1 | ~9s | 0.11/s | ~115 min |
| 3 | ~9s | 0.33/s | ~39 min |
| 6 | ~9s | 0.5/s | ~24 min |
| 8 | ~10s | 0.7/s | ~18 min |
Beyond 8 workers, diminishing returns due to CPU contention from Node.js startup.
4. Clean model output
Haiku sometimes returns markdown formatting or preamble. Clean it:
import re
title = result.stdout.strip()
title = re.sub(r'\*+', '', title) # Remove **bold**
title = re.sub(r'^#+\s*', '', title) # Remove # headers
title = title.strip('"\'').rstrip('.') # Remove quotes/periods
if '\n' in title:
title = title.split('\n')[0].strip() # Take first line only
Verification
# Single call should complete in <15s
time echo "Say hello" | claude -p --model haiku --no-session-persistence --tools "" --disable-slash-commands
# Should output just the response, no tool calls or system messages
Example
Real-world usage: generating AI titles for 764 conversation sessions.
cd ~/.claude-conversations
python3 generate_titles.py --workers 6
# Result: 764 titles, 0 errors, 24.3 minutes, ~0.5 titles/sec
Notes
- OAuth tokens are stored in macOS keychain under
Claude Code-credentials - The
--tools ""flag is the key insight — without it, each call loads all configured MCP servers (kubernetes, playwright, memory-keeper, etc.), adding 60+ seconds - For API-key users, the
anthropicPython SDK is faster than CLI subprocess calls - Rate limiting is handled by the Claude infrastructure; no manual throttling needed
- Each
claude -pcall spawns a full Node.js process — memory usage scales with workers --no-session-persistenceprevents generating thousands of JSONL transcript files