pydantic-ai-harness

star 78

Extend Pydantic AI agents with batteries-included capabilities from pydantic-ai-harness — currently Code Mode, which collapses many tool calls into one sandboxed Python execution. Use when the user mentions pydantic-ai-harness, CodeMode, Monty, code mode, or tool sandboxing, when they want an agent to run agent-written Python, or when a Pydantic AI agent would benefit from orchestrating multiple tool calls in a single sandboxed script.

pydantic By pydantic schedule Updated 6/1/2026

name: pydantic-ai-harness description: Extend Pydantic AI agents with batteries-included capabilities from pydantic-ai-harness -- currently Code Mode, which collapses many tool calls into one sandboxed Python execution. Use when the user mentions pydantic-ai-harness, CodeMode, Monty, code mode, or tool sandboxing, when they want an agent to run agent-written Python, or when a Pydantic AI agent would benefit from orchestrating multiple tool calls in a single sandboxed script. license: MIT compatibility: Requires Python 3.10+ and pydantic-ai-slim>=1.95.1 metadata: version: "0.1.0" author: pydantic

Building with Pydantic AI Harness

Pydantic AI Harness is the official capability library for Pydantic AI. Capabilities that need model or framework support -- and those fundamental to every agent -- live in core pydantic-ai; optional, batteries-included capabilities live here. Both are composed onto an agent through the same capabilities=[...] API.

This skill covers the capabilities shipped by pydantic-ai-harness. For the core framework -- agents, tools, structured output, hooks, and testing -- use the building-pydantic-ai-agents skill instead.

When to Use This Skill

Invoke this skill when:

  • The user mentions pydantic-ai-harness, CodeMode, code mode, or the Monty sandbox
  • An agent makes many sequential tool calls that could collapse into one sandboxed Python execution
  • The user wants the model to write Python that loops, branches, aggregates, or parallelizes tool calls with asyncio.gather
  • The user asks to sandbox or constrain the code an agent runs

Do not use this skill for:

  • Core Pydantic AI usage -- building agents, adding tools, structured output, streaming, or testing (use building-pydantic-ai-agents)
  • Capabilities that ship in core pydantic-ai, such as web search, tool search, and thinking
  • The Pydantic validation library on its own (pydantic/BaseModel without agents)

Supported Capabilities

Capability Description Reference
CodeMode Wraps eligible tools into a single sandboxed run_code tool so the model orchestrates them in Python Code Mode

More capability areas are tracked in the capability matrix; as they stabilize, this skill grows to cover them.

Install

uv add pydantic-ai-harness

Each capability declares its own extra. Code Mode needs the Monty sandbox:

uv add "pydantic-ai-harness[codemode]"   # `code-mode` is also accepted as an alias

Requires Python 3.10+ and pydantic-ai-slim>=1.95.1.

Quick Start

A harness capability is added to the agent like any other. Here CodeMode wraps an MCP server's tools into a single run_code tool that the model drives with Python.

from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP  # MCP ships in core pydantic-ai

from pydantic_ai_harness import CodeMode

agent = Agent(
    'anthropic:claude-sonnet-4-6',
    capabilities=[
        # native=False routes the MCP tools through a local toolset so CodeMode can wrap them.
        # Without it, providers with native MCP run the tools server-side and bypass the sandbox.
        MCP('https://hn.caseyjhand.com/mcp', native=False),
        CodeMode(),
    ],
)

result = agent.run_sync(
    'Across the top and best Hacker News feeds, find the most-discussed story with at '
    'least 100 points and summarize its comment thread in one paragraph.'
)
print(result.output)
#> The most-discussed story clearing 100 points is ...

Instead of one model round-trip per tool call, the model writes a single Python script that fetches both feeds with asyncio.gather, dedupes and ranks them in plain Python, and pulls the winning thread -- collapsing many calls into one run_code.

Key Practices

  • Confirm a harness capability is actually needed. If core Pydantic AI tools and capabilities are enough, use the building-pydantic-ai-agents skill instead -- don't reach for the harness by default.
  • Read the reference before writing code. Each capability has its own configuration, constraints, and gotchas -- load the linked reference (e.g. Code Mode) first.
  • Install the capability's extra. Importing CodeMode without pydantic-ai-harness[codemode] raises an ImportError; the Monty sandbox is an optional dependency.

Common Gotchas

  • native=True tools bypass CodeMode. Provider-native MCP servers and web search execute server-side, so run_code never sees them. Construct them with native=False to keep them local and wrappable.
  • The Monty sandbox is a Python subset. No class definitions, no third-party imports, and only a small stdlib allowlist -- read Code Mode before debugging generated code that fails to run.
  • CodeMode needs its extra. Install pydantic-ai-harness[codemode], not the bare package.
Install via CLI
npx skills add https://github.com/pydantic/skills --skill pydantic-ai-harness
Repository Details
star Stars 78
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator