name: "toolself-unifying-task-execution" description: "Implement self-reconfiguring agent workflows where configuration (sub-goals, strategy, toolbox, context) is a mutable tool the agent calls at runtime. Use when: 'build an adaptive agent', 'self-reconfiguring pipeline', 'agent that adjusts its own strategy', 'dynamic tool selection workflow', 'multi-stage task with runtime adaptation', 'ToolSelf pattern'."
ToolSelf: Self-Reconfiguring Agent Workflows
This skill teaches Claude to build agent systems that treat their own configuration as a callable tool, following the ToolSelf paradigm (arXiv:2602.07883). Instead of hard-coding agent behavior upfront, you expose four mutable parameters -- sub-goals, execution strategy, active toolbox, and accumulated context -- as a reconfigure() action the agent invokes between execution stages. This unifies task execution and self-adjustment into a single action loop, letting agents autonomously pivot strategy, prune tools, and compress context as tasks evolve.
When to Use
- When building a multi-stage agent pipeline where later stages depend on discoveries from earlier ones (e.g., research-then-code, crawl-then-analyze)
- When the user asks for an agent that "adapts its approach" or "adjusts strategy mid-task" rather than following a rigid plan
- When implementing error recovery where the agent should autonomously try a different approach after failure
- When designing a workflow with a large tool catalog where only a subset is relevant at each stage
- When context windows risk pollution from earlier stages and you need stage-local context with a compressed global history
- When building orchestration code that dispatches sub-agents and needs to dynamically reassign goals
Key Technique
Configuration-as-Tool. Traditional agent loops fix their system prompt, tool list, and goals before execution begins. ToolSelf replaces this static config with a dynamic 4-tuple C = (q, sigma, T, K): the current sub-goal q, the execution strategy/persona sigma, the active toolbox T (a subset of all available tools), and task-relevant knowledge K. At the end of each execution stage, the agent can call a reconfigure(proposed_goal, rationale, requirements) tool that archives the current stage's summary into a global history and generates a fresh configuration for the next stage. This means the agent never accumulates raw conversation history across stages -- it works with a compressed summary plus fresh, stage-specific context.
Why it works. The reconfiguration boundary serves three purposes: (1) it forces the agent to articulate what it learned and what it needs next (the rationale), which improves planning quality; (2) it resets stage-local context to prevent token bloat -- the paper shows max input tokens grow only ~8% despite 4x more execution steps; (3) it enables toolbox pruning so the agent only sees tools relevant to its current sub-goal, reducing decision complexity. The pattern is implementable without any model fine-tuning -- the training stages (CAT) described in the paper improve internalization, but the architectural pattern alone yields significant gains.
Step-by-Step Workflow
Define the global tool catalog. List every tool/function the agent could possibly need across all stages. Assign each a short name and category tag (e.g.,
web_search:retrieval,code_exec:compute,file_write:output).Design the configuration schema. Create a data structure with four fields:
@dataclass class AgentConfig: sub_goal: str # What the agent is trying to accomplish this stage strategy: str # How it should approach the sub-goal (persona/method) active_tools: list[str] # Subset of tool names available this stage knowledge: str # Compressed relevant context from prior stagesImplement the reconfigure tool. Write a function the agent can call that accepts three arguments:
proposed_next_goal(what to do next),rationale(why the current stage is complete and what was learned), andrequirements(what strategy/tools/knowledge the next stage needs). This function archives the current stage summary into a global history list and returns a newAgentConfig.Implement the terminate tool. Write a companion function the agent calls when the overall task is complete, which collects the final answer from the last stage.
Build the stage execution loop. Each iteration: inject the current
AgentConfiginto the system prompt, restrict available tools toactive_toolsplusreconfigureandterminate, run the agent until it calls one of those two meta-tools, then either loop (reconfigure) or return (terminate).Seed the initial configuration. Parse the user's request into an initial sub-goal, pick a default strategy (e.g., "analytical reasoning"), select a starting tool subset, and set knowledge to the raw user query.
Implement global history compression. When the agent calls
reconfigure, summarize the current stage's trajectory into 2-3 sentences and append to a running history list. Pass only this compressed history (not raw conversation) into the next stage's knowledge field.Add guardrails. Set a maximum stage count (e.g., 8) to prevent infinite reconfiguration loops. Require the rationale field to be non-empty. Log each reconfiguration event for debugging.
Wire up error-triggered reconfiguration. When a tool call fails or returns unexpected results within a stage, the agent should call
reconfigurewith a rationale explaining the failure and a revised strategy, rather than retrying blindly.Test with a multi-hop task. Run a task requiring at least 3 distinct phases (e.g., "find data, analyze it, generate a report") and verify that the agent produces distinct configurations per stage with appropriate tool subsets.
Concrete Examples
Example 1: Adaptive Research-to-Code Pipeline
User: "Find the current population of the 5 largest EU countries,
then write a Python script that creates a bar chart of the results."
Stage 1 Config:
sub_goal: "Retrieve current population figures for the 5 largest EU countries"
strategy: "web research with cross-reference verification"
active_tools: [web_search, web_fetch]
knowledge: "User wants populations of top 5 EU countries by population"
Agent executes: searches for populations, collects data for Germany,
France, Italy, Spain, Poland.
Agent calls reconfigure():
proposed_goal: "Generate a Python bar chart script using the collected data"
rationale: "Retrieved verified populations: DE=84.5M, FR=68.2M,
IT=58.9M, ES=48.6M, PL=36.8M. Research phase complete."
requirements: "Need code generation tools. No further web access needed.
Carry forward the population data."
Stage 2 Config:
sub_goal: "Generate a Python bar chart script with the population data"
strategy: "code generation with matplotlib best practices"
active_tools: [code_write, code_exec]
knowledge: "Populations: DE=84.5M, FR=68.2M, IT=58.9M, ES=48.6M, PL=36.8M"
Agent executes: writes and tests the matplotlib script.
Agent calls terminate() with the final script.
Example 2: Error Recovery via Self-Reconfiguration
User: "Calculate the driving distance between the two most
recently founded state capitals in the US."
Stage 1 Config:
sub_goal: "Identify the two most recently founded US state capitals"
strategy: "systematic web lookup"
active_tools: [web_search, web_fetch]
knowledge: "Need founding dates of all 50 US state capitals"
Agent executes: searches for "most recently founded state capitals."
Initial results are ambiguous -- conflicting dates for several cities.
Agent calls reconfigure():
proposed_goal: "Verify founding dates by checking individual city pages"
rationale: "Bulk search returned conflicting data. Need per-city
verification. Narrowed candidates to: Juneau (1881),
Oklahoma City (1889), Phoenix (1881), Carson City (1858)."
requirements: "Keep web tools. Strategy should shift to targeted
per-city lookups rather than aggregate searches."
Stage 2 Config:
sub_goal: "Verify founding dates of candidate capitals"
strategy: "targeted per-city Wikipedia lookups with date extraction"
active_tools: [web_search, web_fetch]
knowledge: "Candidates: Juneau, Oklahoma City, Phoenix, Carson City.
Need verified founding/incorporation dates."
Agent executes: confirms Oklahoma City (1889) and Juneau (1906 as capital).
Agent calls reconfigure():
proposed_goal: "Calculate driving distance between Oklahoma City and Juneau"
rationale: "Confirmed: Juneau became capital 1906, OKC became capital 1910.
These are the two most recent. Now need distance calculation."
requirements: "Need distance/maps API. No more web search needed."
Stage 3 Config:
sub_goal: "Get driving distance between Oklahoma City, OK and Juneau, AK"
strategy: "Use mapping API or known route data"
active_tools: [web_search, code_exec]
knowledge: "Two most recent state capitals: Oklahoma City (1910), Juneau (1906)"
Agent executes and calls terminate() with the answer.
Example 3: Implementation in Code
# Minimal ToolSelf loop implementation
import json
from dataclasses import dataclass, asdict
@dataclass
class AgentConfig:
sub_goal: str
strategy: str
active_tools: list[str]
knowledge: str
def run_toolself_agent(user_query: str, all_tools: dict, max_stages: int = 8):
global_history = []
config = seed_initial_config(user_query, all_tools)
for stage in range(max_stages):
# Build stage-local prompt with current config
prompt = build_stage_prompt(config, global_history)
available = {k: all_tools[k] for k in config.active_tools}
available["reconfigure"] = reconfigure_tool
available["terminate"] = terminate_tool
# Run agent within this stage
result = execute_stage(prompt, available)
if result.tool_called == "terminate":
return result.final_answer
if result.tool_called == "reconfigure":
# Archive and compress current stage
summary = compress_stage(stage, config, result.trajectory)
global_history.append(summary)
# Generate next config from reconfiguration request
config = generate_next_config(
result.proposed_goal,
result.rationale,
result.requirements,
all_tools,
global_history
)
return "Max stages reached. Last result: " + str(result)
Best Practices
Do: Make the rationale field mandatory and substantive. The quality of self-reconfiguration depends entirely on the agent articulating what it learned and why it needs to change. A rationale like "done with stage 1" is useless; "Found that the API returns paginated results, need to switch to batch fetching" drives good next-stage configuration.
Do: Aggressively prune the toolbox per stage. If a stage is about code generation, remove web search tools. Fewer tools means fewer irrelevant options for the agent to consider and fewer hallucinated tool calls.
Do: Keep global history entries short (2-3 sentences each). The entire point of stage boundaries is context compression. If you pass full conversation logs, you lose the benefit.
Do: Log every reconfiguration event (stage number, old config, rationale, new config) for debugging. Reconfiguration traces are the primary diagnostic artifact.
Avoid: Letting the agent reconfigure without a meaningful state change. If the proposed next goal is identical to the current one, reject the reconfiguration and force continued execution or a genuinely different approach.
Avoid: Setting max stages too high. Most tasks resolve in 3-5 stages. More than 8 suggests the task decomposition is too granular or the agent is thrashing.
Error Handling
| Problem | Response |
|---|---|
| Agent loops between two configurations | Detect repeated sub-goals in global history; force a strategy change or escalate to the user |
| Reconfigure called with empty rationale | Reject the call and instruct the agent to explain what it learned before reconfiguring |
| Tool call fails within a stage | Agent should call reconfigure with failure details rather than retrying the same approach indefinitely |
| Max stages exhausted | Return the best partial result from the last completed stage with an explanation of where progress stalled |
| Knowledge field exceeds token budget | Truncate oldest history entries first; keep the most recent 2-3 stage summaries intact |
Limitations
- Not for single-shot tasks. If a task can be completed in one tool call or one reasoning step, the reconfiguration overhead adds latency for no benefit.
- Requires a meaningful tool catalog. If the agent only has 2-3 tools, toolbox pruning provides negligible value. The pattern shines with 10+ tools.
- Configuration generation quality. Without the CAT fine-tuning described in the paper, the quality of auto-generated configurations depends on the base model's instruction-following ability. Explicit schema validation helps.
- No backtracking. The linear stage progression doesn't support reverting to a prior stage's configuration. If the agent reconfigures incorrectly, it must reconfigure again forward, which can waste stages.
- Debugging complexity. Multi-stage traces are harder to debug than single-pass agent runs. Invest in logging from the start.
Reference
ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation -- Zhou et al., 2026. Key sections: Section 3 for the configuration-as-tool formalism (C = (q, sigma, T, K) and the reconfigure/terminate tool definitions), Section 4 for the CAT training pipeline, and Section 5.3 for the case studies showing multi-stage reconfiguration traces on GAIA benchmarks.