invoking-cli-agents - SKILL.md Agent Skill

name: invoking-cli-agents description: Use when needing to programmatically invoke a CLI coding agent (Claude Code, OpenCode) from Python, delegate work to a sub-agent, or orchestrate multiple agents. Covers AgentShell instantiation, prompt execution, response streaming, session resumption, tool scoping, and cost tracking.

Invoking CLI Agents with AgentShell

AgentShell is a Python library that runs CLI coding agents headlessly and returns structured output. It hides agent-specific CLI differences behind a unified interface so your code works regardless of which agent runs underneath.

When to Use

You need to invoke Claude Code, OpenCode, or another CLI agent from Python
You want to delegate a coding task to a sub-agent and collect the result
You need to orchestrate multi-step workflows across agents
You want to stream agent output in real-time

When NOT to Use

You want to call the Anthropic API directly (use the SDK instead)
The CLI agent is already running interactively and you just need its output

Installation

uv add agent-shell-py

Core Concepts

AgentShell has two methods: execute() for collecting a complete response, and stream() for real-time event processing. Both are async.

Execute: Run and Collect

Use when you want the final answer and don't need intermediate output.

from agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType

shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)

response = await shell.execute(
    cwd="/path/to/project",
    prompt="Analyse the authentication module and list all public functions",
    allowed_tools=["Read", "Glob", "Grep"],
    model="sonnet",
)

print(response.response)      # Full text output
print(f"Cost: ${response.cost:.4f}")
print(f"Session: {response.session_id}")

Stream: Real-Time Events

Use when you need progress feedback, want to display output incrementally, or need to react to specific event types (tool use, thinking, errors).

async for event in shell.stream(
    cwd="/path/to/project",
    prompt="Refactor the auth module to use dependency injection",
    allowed_tools=["Read", "Edit", "Bash"],
    model="sonnet",
    effort="high",
    include_thinking=True,
):
    if event.type == "system":
        print(f"Session: {event.session_id}")
    elif event.type == "thinking":
        print(f"[thinking] {event.content}")
    elif event.type == "tool_use":
        print(f"[tool] {event.content}")
    elif event.type == "text":
        print(event.content)
    elif event.type == "result":
        print(f"Done. Cost: ${event.cost:.4f}, Duration: {event.duration:.1f}s")

Session Resumption

Pass session_id from a previous response to continue the conversation. This enables multi-turn workflows where each step builds on the last.

# Step 1: Analyse
analysis = await shell.execute(
    cwd="/path/to/project",
    prompt="Analyse this codebase and identify areas that need refactoring",
    allowed_tools=["Read", "Glob", "Grep"],
    model="sonnet",
)

# Step 2: Act on the analysis (same session)
refactor = await shell.execute(
    cwd="/path/to/project",
    prompt="Now refactor the top priority item you identified",
    allowed_tools=["Read", "Edit", "Bash"],
    model="sonnet",
    session_id=analysis.session_id,
)

Parameters

Parameter	Type	Default	Purpose
`cwd`	`str`	required	Working directory (must exist)
`prompt`	`str`	required	Task or question for the agent
`allowed_tools`	`list[str] \| None`	`None`	Restrict which tools the agent can use. `None` = all tools. Claude Code only.
`model`	`str \| None`	`None`	Model alias or full name (e.g. `"sonnet"`, `"claude-sonnet-4-6"`)
`effort`	`str \| None`	`None`	Reasoning effort: `"low"`, `"medium"`, `"high"`, `"max"`. Claude Code only.
`include_thinking`	`bool`	`False`	Filter only: yields thinking events in `stream()` if already present in CLI output. Does not add a CLI flag. Claude Code `stream()` only. Dropped by `execute()`.
`auto_approve`	`bool`	`True`	Skip tool permission prompts. Claude Code only.
`session_id`	`str \| None`	`None`	Resume a previous session

Agent parity warning: OpenCode currently only maps model and session_id to CLI flags. Parameters like allowed_tools, effort, include_thinking, and auto_approve are accepted but silently ignored. Do not rely on allowed_tools for safety when using OpenCode.

Supported Agents

from agent_shell.models.agent import AgentType

AgentType.CLAUDE_CODE   # Claude Code CLI
AgentType.OPENCODE      # OpenCode CLI

Gemini CLI, Copilot CLI, and Codex have enum values but no adapter yet.

Tool Scoping (Claude Code Only)

Restrict what the agent can do by passing allowed_tools. This is critical for safety when delegating work.

Important: Tool scoping only works with Claude Code. OpenCode ignores allowed_tools — the agent will have access to all tools regardless of what you pass. Do not use OpenCode for safety-sensitive delegation where tool restriction is required.

Gotcha: allowed_tools=[] (empty list) is falsy in Python, so no --allowed-tools flag is sent — the agent gets full tool access. To restrict tools, always pass a non-empty list. There is no way to disable all tools via this parameter.

# Read-only analysis (Claude Code)
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)

response = await shell.execute(
    cwd=project_path,
    prompt="Review this code for security issues",
    allowed_tools=["Read", "Glob", "Grep"],  # No write access
)

# Full write access for implementation
response = await shell.execute(
    cwd=project_path,
    prompt="Implement the fix you recommended",
    allowed_tools=["Read", "Edit", "Write", "Bash", "Glob", "Grep"],
)

Error Handling

AgentShell raises exceptions for some errors, but CLI agent failures are reported as events, not exceptions.

from pathlib import Path

# cwd validation - raises ValueError if directory doesn't exist
try:
    response = await shell.execute(cwd="/nonexistent", prompt="hello")
except ValueError as e:
    print(f"Bad directory: {e}")

# Unsupported agent type - raises ValueError at construction
try:
    shell = AgentShell(agent_type=AgentType.GEMINI_CLI)
except ValueError as e:
    print(f"No adapter: {e}")

# KeyboardInterrupt - AgentShell cancels the subprocess cleanly
try:
    response = await shell.execute(cwd=project_path, prompt="long task...")
except KeyboardInterrupt:
    print("Agent cancelled")

CLI agent failures do not raise exceptions. execute() returns an AgentResponse with whatever text was accumulated (which may be empty) and gives no indication of failure. When streaming, failures surface in two ways:

StreamEvent(type="error") — CLI process errors or OpenCode agent errors
StreamEvent(type="result", content="error") — Claude Code agent-level failures

Check for both when streaming:

async for event in shell.stream(cwd=project_path, prompt="do something"):
    if event.type == "error":
        print(f"Agent error: {event.content}")
        break
    elif event.type == "result" and event.content == "error":
        print("Agent reported failure")
        break
    elif event.type == "text":
        print(event.content)

Logging

import logging

logging.getLogger("agent_shell").setLevel(logging.DEBUG)
logging.getLogger("agent_shell").addHandler(logging.StreamHandler())

INFO captures tool calls, session IDs, costs, errors. DEBUG adds raw JSON events.

Quick Reference

Want to...	Do this
Get a complete answer	`await shell.execute(cwd, prompt)`
Stream events live	`async for event in shell.stream(cwd, prompt)`
Continue a conversation	Pass `session_id=response.session_id`
Limit agent capabilities	Pass `allowed_tools=["Read", "Glob"]`
Track costs	Read `response.cost` or `event.cost`
Use a specific model	Pass `model="sonnet"`
Increase reasoning depth	Pass `effort="high"`
See agent thinking	Pass `include_thinking=True`
Cancel a running agent	`KeyboardInterrupt` (handled automatically)

Common Mistakes

Mistake	Fix
Passing a non-existent `cwd`	Validate the path exists before calling
Forgetting `await` on `execute()`	Both `execute()` and the async iterator from `stream()` require async context
Not scoping `allowed_tools` (Claude Code)	Always restrict tools to the minimum needed for the task
Assuming `allowed_tools` works with OpenCode	OpenCode ignores this parameter — use Claude Code for tool-restricted tasks
Expecting `execute()` to include thinking	`include_thinking` only affects `stream()` events; `execute()` drops thinking
Not checking for error events in `stream()`	Agent failures yield `StreamEvent(type="error")`, they don't raise exceptions
Ignoring `session_id` for multi-step work	Without it, each call starts a fresh conversation with no prior context
Using an unsupported `AgentType`	Only `CLAUDE_CODE` and `OPENCODE` have adapters currently

API Reference

For detailed model definitions, event types, and adapter protocol: see api-reference.md.