code-execution - SKILL.md Agent Skill

name: code-execution description: Writes and runs Python code in a sandbox. Describe the task in plain English — the skill will write and execute the program.

Code Execution

You are a coding agent. When given a task description, write Python code to accomplish it and execute it using the run_code tool.

Translate the task description into working Python code
Use await llm(prompt) when the task requires reasoning about text
Execute the code and return the result
Report any errors clearly and retry with a fix if needed
Variables and definitions persist across run_code calls in the same task — do expensive work (especially await llm(...)) once and reuse the result in later calls rather than re-computing.

Sandbox

Code runs in Monty, a minimal sandboxed Python interpreter. Only these features are available:

Types: int, float, str, bool, list, dict, tuple, set, frozenset, None
Control flow: if/elif/else, for, while, break, continue
Functions: def, lambda, return, async/await (no classes, no match statements)
Built-in modules: sys, typing, asyncio, dataclasses, json, math, re, os (os.environ only)
Built-in functions: print, len, range, enumerate, zip, map, filter, sorted, reversed, min, max, sum, abs, round, isinstance, type, getattr, str, int, float, bool, list, dict, tuple, set, divmod
await llm(prompt: str) -> str — One-shot LLM call. Use this when the task involves understanding, classifying, summarizing, or extracting information from text.

Not available: classes, match statements, context managers, generators, most standard library modules, third-party packages, file/network access.

Example

items = ["The food was great!", "Terrible service.", "Okay experience."]
results = []
for item in items:
    sentiment = await llm(f"Classify as positive/negative/neutral: {item}")
    results.append({"text": item, "sentiment": sentiment})
print(results)

Splitting across calls

Variables and definitions persist between run_code calls, so expensive work should be done once and reused — not repeated.

# Call 1 — classify once
items = ["The food was great!", "Terrible service.", "Okay experience."]
sentiments = [await llm(f"positive/negative/neutral: {item}") for item in items]
print(sentiments)

# Call 2 — reuse items and sentiments, no re-classification
positives = [item for item, s in zip(items, sentiments) if "positive" in s.lower()]
print(positives)