explore-vs-exploit-state-machine - SKILL.md Agent Skill

name: "explore-vs-exploit-state-machine" description: "Use this skill when the agent must decide whether to continue gathering information or commit to action."

Skill: Explore vs Exploit — State Machine Protocol for AI Agents

Purpose

Use this skill when the agent must decide whether to continue gathering information or commit to action.

This skill is a control system for:

research tasks
debugging
decision support
planning
recommendation generation
documentation review
ambiguous problem solving

The purpose is to prevent both:

premature action on weak evidence
endless searching with no stopping rule

Core Law

The agent must not keep searching by default, and must not act by default.

It must deliberately choose:

Explore: gather more information
Exploit: commit to the best current option

That choice must be justified by evidence value, risk, and diminishing returns.

Mandatory Diagnostic Artifact

Before or during active search, the agent must create search-budget.md.

Required fields:

# Search Budget

## Task
<what decision or action this search supports>

## Current Best Hypotheses / Options
- <option A>
- <option B>
- <option C>

## What Is Unknown
- <unknown 1>
- <unknown 2>

## Cost of Acting Too Early
<low/medium/high>

## Cost of Continued Search
<low/medium/high>

## Expected Information Gain
<low/medium/high>

## Search Budget
<max passes / max queries / max inspections>

## Sufficiency Criteria
<what evidence is enough to act>

## Disconfirming Evidence to Look For
<what would break the current best hypothesis>

## Stop Condition
<when exploration must end>

State Machine

State 0 — Intake

Goal:

define what decision exploration is supposed to improve

Questions:

What action is waiting on this information?
What are the current leading options or explanations?
Is the agent trying to reduce uncertainty or avoid commitment?

Allowed actions:

frame the decision
list the main options or hypotheses

Disallowed actions:

searching without a decision target
acting without naming the alternatives

Exit condition:

exploration target is clear
top options/hypotheses are named

State 1 — Uncertainty Assessment

Goal:

decide whether exploration is justified

Assess:

cost of wrong action
cost of delay
expected information gain
whether more search is likely to change the choice
whether current evidence is already dominant

Rule: If additional search is unlikely to change the decision, exploration should stop.

Allowed actions:

uncertainty analysis
setting risk tolerance
defining sufficiency criteria

Disallowed actions:

searching just because uncertainty feels uncomfortable
pretending more search is always better

Exit condition:

search-budget.md created
search budget and stop condition defined

State 2 — Targeted Exploration

Goal:

gather only the information most likely to change the decision

Rules:

search intentionally
prefer highest-yield evidence first
compare competing hypotheses
look for disconfirming evidence
stay within the budget

Allowed actions:

focused inspection
targeted searches
comparison of top explanations/options
evidence updates to the artifact

Disallowed actions:

random wandering
repetitive near-duplicate searches
scope expansion without reason
searching to postpone action

Exit condition:

budget exhausted
or sufficiency criteria met
or dominant hypothesis/option emerges early

State 3 — Decision Checkpoint

Goal:

explicitly decide whether to continue exploring or switch to exploit

Questions:

Did new evidence materially change the ranking?
Is one option now clearly dominant?
Has information gain flattened?
Is the agent still learning, or just circling?

Decision:

Continue Explore
Switch to Exploit
Narrow scope / escalate

Allowed actions:

ranking update
confidence estimate
search termination

Disallowed actions:

silent continuation of exploration
switching to exploit without acknowledging uncertainty

Exit condition:

explicit go/no-go decision on further exploration

State 4 — Exploitation

Goal:

act on the best current option

Rules:

commit cleanly
state major assumptions if relevant
keep action reversible when possible
avoid reopening full search unless new evidence appears

Allowed actions:

recommendation
execution
next-step commitment

Disallowed actions:

pretending certainty exceeds evidence
restarting search because commitment feels uncomfortable

Exit condition:

action taken or recommendation delivered

State 5 — Stop / Escalate

Goal:

end search cleanly

Escalate or narrow scope if:

the evidence remains too weak after budgeted search
the cost of wrong action is high but sufficiency criteria were not met
blast radius is unknown
the decision is still dominated by missing external facts

Tool Gating Guidance

During Explore

Tools/search/retrieval are allowed and expected.

During Exploit

Further search should be blocked unless:

new contradictory evidence appears
the decision target changes
a high-risk assumption fails

Rule: Exploration tools should serve the decision, not replace it.

Unknowns and Blast Radius Rule

Before exploiting on a shared or consequential decision, the agent must state:

## Unknowns Still Open
- <item>

## Blast Radius Confidence
<high / medium / low>

## Is Action Still Justified?
<yes / no, with reason>

If blast radius confidence is low and the action is high-risk, the agent should not exploit casually.

Circuit Breakers

Stop and reassess if:

the same search pattern repeats without new information
new evidence expands the problem scope
the task becomes high-risk midstream
the best option remains weak after budgeted exploration
the search target is no longer well-defined

Failure Modes This Skill Prevents

endless search loops
first-answer lock-in
search as procrastination
false convergence
acting on weak evidence without a stop-rule audit

Definition of Done

This skill is correctly applied when:

search-budget.md exists
exploration had a target and a stopping rule
the agent either found sufficient evidence or stopped honestly
exploitation happened at the right time
the agent did not loop indefinitely or act recklessly early

Final Instruction

Search with purpose.
Stop deliberately.
Act when more search is no longer earning its keep.