explore-vs-exploit-state-machine

star 1

Use this skill when the agent must decide whether to continue gathering information or commit to action.

StepowskiEric By StepowskiEric schedule Updated 4/23/2026

name: "explore-vs-exploit-state-machine" description: "Use this skill when the agent must decide whether to continue gathering information or commit to action."

Skill: Explore vs Exploit — State Machine Protocol for AI Agents

Purpose

Use this skill when the agent must decide whether to continue gathering information or commit to action.

This skill is a control system for:

  • research tasks
  • debugging
  • decision support
  • planning
  • recommendation generation
  • documentation review
  • ambiguous problem solving

The purpose is to prevent both:

  • premature action on weak evidence
  • endless searching with no stopping rule

Core Law

The agent must not keep searching by default, and must not act by default.

It must deliberately choose:

  • Explore: gather more information
  • Exploit: commit to the best current option

That choice must be justified by evidence value, risk, and diminishing returns.


Mandatory Diagnostic Artifact

Before or during active search, the agent must create search-budget.md.

Required fields:

# Search Budget

## Task
<what decision or action this search supports>

## Current Best Hypotheses / Options
- <option A>
- <option B>
- <option C>

## What Is Unknown
- <unknown 1>
- <unknown 2>

## Cost of Acting Too Early
<low/medium/high>

## Cost of Continued Search
<low/medium/high>

## Expected Information Gain
<low/medium/high>

## Search Budget
<max passes / max queries / max inspections>

## Sufficiency Criteria
<what evidence is enough to act>

## Disconfirming Evidence to Look For
<what would break the current best hypothesis>

## Stop Condition
<when exploration must end>

State Machine

State 0 — Intake

Goal:

  • define what decision exploration is supposed to improve

Questions:

  • What action is waiting on this information?
  • What are the current leading options or explanations?
  • Is the agent trying to reduce uncertainty or avoid commitment?

Allowed actions:

  • frame the decision
  • list the main options or hypotheses

Disallowed actions:

  • searching without a decision target
  • acting without naming the alternatives

Exit condition:

  • exploration target is clear
  • top options/hypotheses are named

State 1 — Uncertainty Assessment

Goal:

  • decide whether exploration is justified

Assess:

  • cost of wrong action
  • cost of delay
  • expected information gain
  • whether more search is likely to change the choice
  • whether current evidence is already dominant

Rule: If additional search is unlikely to change the decision, exploration should stop.

Allowed actions:

  • uncertainty analysis
  • setting risk tolerance
  • defining sufficiency criteria

Disallowed actions:

  • searching just because uncertainty feels uncomfortable
  • pretending more search is always better

Exit condition:

  • search-budget.md created
  • search budget and stop condition defined

State 2 — Targeted Exploration

Goal:

  • gather only the information most likely to change the decision

Rules:

  • search intentionally
  • prefer highest-yield evidence first
  • compare competing hypotheses
  • look for disconfirming evidence
  • stay within the budget

Allowed actions:

  • focused inspection
  • targeted searches
  • comparison of top explanations/options
  • evidence updates to the artifact

Disallowed actions:

  • random wandering
  • repetitive near-duplicate searches
  • scope expansion without reason
  • searching to postpone action

Exit condition:

  • budget exhausted
  • or sufficiency criteria met
  • or dominant hypothesis/option emerges early

State 3 — Decision Checkpoint

Goal:

  • explicitly decide whether to continue exploring or switch to exploit

Questions:

  • Did new evidence materially change the ranking?
  • Is one option now clearly dominant?
  • Has information gain flattened?
  • Is the agent still learning, or just circling?

Decision:

  • Continue Explore
  • Switch to Exploit
  • Narrow scope / escalate

Allowed actions:

  • ranking update
  • confidence estimate
  • search termination

Disallowed actions:

  • silent continuation of exploration
  • switching to exploit without acknowledging uncertainty

Exit condition:

  • explicit go/no-go decision on further exploration

State 4 — Exploitation

Goal:

  • act on the best current option

Rules:

  • commit cleanly
  • state major assumptions if relevant
  • keep action reversible when possible
  • avoid reopening full search unless new evidence appears

Allowed actions:

  • recommendation
  • execution
  • next-step commitment

Disallowed actions:

  • pretending certainty exceeds evidence
  • restarting search because commitment feels uncomfortable

Exit condition:

  • action taken or recommendation delivered

State 5 — Stop / Escalate

Goal:

  • end search cleanly

Escalate or narrow scope if:

  • the evidence remains too weak after budgeted search
  • the cost of wrong action is high but sufficiency criteria were not met
  • blast radius is unknown
  • the decision is still dominated by missing external facts

Tool Gating Guidance

During Explore

Tools/search/retrieval are allowed and expected.

During Exploit

Further search should be blocked unless:

  • new contradictory evidence appears
  • the decision target changes
  • a high-risk assumption fails

Rule: Exploration tools should serve the decision, not replace it.


Unknowns and Blast Radius Rule

Before exploiting on a shared or consequential decision, the agent must state:

## Unknowns Still Open
- <item>

## Blast Radius Confidence
<high / medium / low>

## Is Action Still Justified?
<yes / no, with reason>

If blast radius confidence is low and the action is high-risk, the agent should not exploit casually.


Circuit Breakers

Stop and reassess if:

  • the same search pattern repeats without new information
  • new evidence expands the problem scope
  • the task becomes high-risk midstream
  • the best option remains weak after budgeted exploration
  • the search target is no longer well-defined

Failure Modes This Skill Prevents

  • endless search loops
  • first-answer lock-in
  • search as procrastination
  • false convergence
  • acting on weak evidence without a stop-rule audit

Definition of Done

This skill is correctly applied when:

  • search-budget.md exists
  • exploration had a target and a stopping rule
  • the agent either found sufficient evidence or stopped honestly
  • exploitation happened at the right time
  • the agent did not loop indefinitely or act recklessly early

Final Instruction

Search with purpose.
Stop deliberately.
Act when more search is no longer earning its keep.

Install via CLI
npx skills add https://github.com/StepowskiEric/GrimoireStack --skill explore-vs-exploit-state-machine
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
Occupations
More from Creator
StepowskiEric
StepowskiEric Explore all skills →