name: "explore-vs-exploit-state-machine" description: "Use this skill when the agent must decide whether to continue gathering information or commit to action."
Skill: Explore vs Exploit — State Machine Protocol for AI Agents
Purpose
Use this skill when the agent must decide whether to continue gathering information or commit to action.
This skill is a control system for:
- research tasks
- debugging
- decision support
- planning
- recommendation generation
- documentation review
- ambiguous problem solving
The purpose is to prevent both:
- premature action on weak evidence
- endless searching with no stopping rule
Core Law
The agent must not keep searching by default, and must not act by default.
It must deliberately choose:
- Explore: gather more information
- Exploit: commit to the best current option
That choice must be justified by evidence value, risk, and diminishing returns.
Mandatory Diagnostic Artifact
Before or during active search, the agent must create search-budget.md.
Required fields:
# Search Budget
## Task
<what decision or action this search supports>
## Current Best Hypotheses / Options
- <option A>
- <option B>
- <option C>
## What Is Unknown
- <unknown 1>
- <unknown 2>
## Cost of Acting Too Early
<low/medium/high>
## Cost of Continued Search
<low/medium/high>
## Expected Information Gain
<low/medium/high>
## Search Budget
<max passes / max queries / max inspections>
## Sufficiency Criteria
<what evidence is enough to act>
## Disconfirming Evidence to Look For
<what would break the current best hypothesis>
## Stop Condition
<when exploration must end>
State Machine
State 0 — Intake
Goal:
- define what decision exploration is supposed to improve
Questions:
- What action is waiting on this information?
- What are the current leading options or explanations?
- Is the agent trying to reduce uncertainty or avoid commitment?
Allowed actions:
- frame the decision
- list the main options or hypotheses
Disallowed actions:
- searching without a decision target
- acting without naming the alternatives
Exit condition:
- exploration target is clear
- top options/hypotheses are named
State 1 — Uncertainty Assessment
Goal:
- decide whether exploration is justified
Assess:
- cost of wrong action
- cost of delay
- expected information gain
- whether more search is likely to change the choice
- whether current evidence is already dominant
Rule: If additional search is unlikely to change the decision, exploration should stop.
Allowed actions:
- uncertainty analysis
- setting risk tolerance
- defining sufficiency criteria
Disallowed actions:
- searching just because uncertainty feels uncomfortable
- pretending more search is always better
Exit condition:
search-budget.mdcreated- search budget and stop condition defined
State 2 — Targeted Exploration
Goal:
- gather only the information most likely to change the decision
Rules:
- search intentionally
- prefer highest-yield evidence first
- compare competing hypotheses
- look for disconfirming evidence
- stay within the budget
Allowed actions:
- focused inspection
- targeted searches
- comparison of top explanations/options
- evidence updates to the artifact
Disallowed actions:
- random wandering
- repetitive near-duplicate searches
- scope expansion without reason
- searching to postpone action
Exit condition:
- budget exhausted
- or sufficiency criteria met
- or dominant hypothesis/option emerges early
State 3 — Decision Checkpoint
Goal:
- explicitly decide whether to continue exploring or switch to exploit
Questions:
- Did new evidence materially change the ranking?
- Is one option now clearly dominant?
- Has information gain flattened?
- Is the agent still learning, or just circling?
Decision:
- Continue Explore
- Switch to Exploit
- Narrow scope / escalate
Allowed actions:
- ranking update
- confidence estimate
- search termination
Disallowed actions:
- silent continuation of exploration
- switching to exploit without acknowledging uncertainty
Exit condition:
- explicit go/no-go decision on further exploration
State 4 — Exploitation
Goal:
- act on the best current option
Rules:
- commit cleanly
- state major assumptions if relevant
- keep action reversible when possible
- avoid reopening full search unless new evidence appears
Allowed actions:
- recommendation
- execution
- next-step commitment
Disallowed actions:
- pretending certainty exceeds evidence
- restarting search because commitment feels uncomfortable
Exit condition:
- action taken or recommendation delivered
State 5 — Stop / Escalate
Goal:
- end search cleanly
Escalate or narrow scope if:
- the evidence remains too weak after budgeted search
- the cost of wrong action is high but sufficiency criteria were not met
- blast radius is unknown
- the decision is still dominated by missing external facts
Tool Gating Guidance
During Explore
Tools/search/retrieval are allowed and expected.
During Exploit
Further search should be blocked unless:
- new contradictory evidence appears
- the decision target changes
- a high-risk assumption fails
Rule: Exploration tools should serve the decision, not replace it.
Unknowns and Blast Radius Rule
Before exploiting on a shared or consequential decision, the agent must state:
## Unknowns Still Open
- <item>
## Blast Radius Confidence
<high / medium / low>
## Is Action Still Justified?
<yes / no, with reason>
If blast radius confidence is low and the action is high-risk, the agent should not exploit casually.
Circuit Breakers
Stop and reassess if:
- the same search pattern repeats without new information
- new evidence expands the problem scope
- the task becomes high-risk midstream
- the best option remains weak after budgeted exploration
- the search target is no longer well-defined
Failure Modes This Skill Prevents
- endless search loops
- first-answer lock-in
- search as procrastination
- false convergence
- acting on weak evidence without a stop-rule audit
Definition of Done
This skill is correctly applied when:
search-budget.mdexists- exploration had a target and a stopping rule
- the agent either found sufficient evidence or stopped honestly
- exploitation happened at the right time
- the agent did not loop indefinitely or act recklessly early
Final Instruction
Search with purpose.
Stop deliberately.
Act when more search is no longer earning its keep.