claude-batch-ai - SKILL.md Agent Skill

name: claude-batch-ai description: | Fast batch AI processing using `claude -p` CLI with OAuth credentials. Use when: (1) you need programmatic Claude calls but have no API key, (2) batch processing hundreds of items with AI (titles, summaries, labels), (3) `claude -p` is slow (75s+ per call) due to MCP/tool loading overhead. Key insight: `--tools "" --disable-slash-commands` cuts startup from 75s to 9s. Covers parallel execution with ThreadPoolExecutor for throughput. author: Claude Code version: 1.0.0 date: 2026-02-11

Claude CLI Batch AI Processing

Problem

You need to make many programmatic Claude API calls (e.g., generating titles, summaries, labels for hundreds of items) but:

No ANTHROPIC_API_KEY is available in the environment
Claude Code uses OAuth credentials stored in the macOS keychain, not API keys
Naive claude -p calls take 75+ seconds each due to MCP server and tool loading

Context / Trigger Conditions

Need batch AI processing (100+ items)
No API key set (ANTHROPIC_API_KEY env var is empty)
Claude Code is installed and authenticated via OAuth (claude.ai login)
Keychain shows Claude Code-credentials with claudeAiOauth tokens
Individual claude -p calls are unacceptably slow

Solution

1. Strip CLI overhead with flags

The critical optimization: disable all tools and skills to eliminate MCP server startup.

# SLOW (75s) — loads all MCP servers, tools, skills
echo "prompt" | claude -p --model haiku

# FAST (9s) — bare model call, no tool/MCP overhead
echo "prompt" | claude -p --model haiku --no-session-persistence --tools "" --disable-slash-commands

Flag breakdown:

--tools "" — Disables ALL tool loading (Bash, Edit, Read, MCP servers). This is the biggest win.
--disable-slash-commands — Skips skill/slash-command loading
--no-session-persistence — Don't save session to disk (prevents thousands of junk JSONL files)
--model haiku — Use cheapest/fastest model for simple tasks

2. Parallelize with ThreadPoolExecutor

import concurrent.futures
import subprocess

CLAUDE_BIN = str(Path.home() / ".local" / "bin" / "claude")

def call_claude(prompt):
    result = subprocess.run(
        [CLAUDE_BIN, "-p", "--model", "haiku",
         "--no-session-persistence", "--tools", "", "--disable-slash-commands"],
        input=prompt,
        capture_output=True,
        text=True,
        timeout=60,
    )
    if result.returncode != 0:
        raise RuntimeError(result.stderr.strip()[:200])
    return result.stdout.strip()

# 6 workers: ~0.5 items/sec throughput (9s per call / 6 parallel = 1.5s effective)
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as executor:
    futures = {executor.submit(call_claude, prompt): item_id
               for item_id, prompt in work_items}
    for future in concurrent.futures.as_completed(futures):
        item_id = futures[future]
        result = future.result()
        # process result...

3. Performance characteristics

Workers	Per-call	Throughput	764 items
1	~9s	0.11/s	~115 min
3	~9s	0.33/s	~39 min
6	~9s	0.5/s	~24 min
8	~10s	0.7/s	~18 min

Beyond 8 workers, diminishing returns due to CPU contention from Node.js startup.

4. Clean model output

Haiku sometimes returns markdown formatting or preamble. Clean it:

import re
title = result.stdout.strip()
title = re.sub(r'\*+', '', title)        # Remove **bold**
title = re.sub(r'^#+\s*', '', title)     # Remove # headers
title = title.strip('"\'').rstrip('.')   # Remove quotes/periods
if '\n' in title:
    title = title.split('\n')[0].strip() # Take first line only

Verification

# Single call should complete in <15s
time echo "Say hello" | claude -p --model haiku --no-session-persistence --tools "" --disable-slash-commands

# Should output just the response, no tool calls or system messages

Example

Real-world usage: generating AI titles for 764 conversation sessions.

cd ~/.claude-conversations
python3 generate_titles.py --workers 6
# Result: 764 titles, 0 errors, 24.3 minutes, ~0.5 titles/sec

Notes

OAuth tokens are stored in macOS keychain under Claude Code-credentials
The --tools "" flag is the key insight — without it, each call loads all configured MCP servers (kubernetes, playwright, memory-keeper, etc.), adding 60+ seconds
For API-key users, the anthropic Python SDK is faster than CLI subprocess calls
Rate limiting is handled by the Claude infrastructure; no manual throttling needed
Each claude -p call spawns a full Node.js process — memory usage scales with workers
--no-session-persistence prevents generating thousands of JSONL transcript files