name: campaign
description: >
Question-level autonomous campaign runner (task #586) — the
orchestration layer ABOVE /issue. Takes a kind: campaign task number
pinned to ONE docs/open_questions.md anchor and loops: ingest landed
child clean-results into a writable world model → check stop criteria →
propose next experiments → file + spawn children as ordinary
/issue <child> --auto sessions (full adversarial-planner + critic
stack unchanged) → repeat, maximizing parallelism across
shared-nothing arms. Executes ONLY from status approved (the user
reviews the ## Campaign Brief first — workflow.yaml §
gates.campaign_brief_approval); budget/stop enforcement lives in
campaign_state.py + the watcher's campaign pass, not in the prompt.
Re-entrant: every invocation runs one decision round and exits; the
/campaign-tick cron re-drives it.
user_invocable: true
/campaign — question-level autonomous campaign runner
One required argument: the campaign task number <N> (the integer naming
tasks/<status>/<N>/, kind: campaign).
This skill runs in a DEDICATED autonomous Happy session spawned via
uv run python scripts/spawn_session.py spawn-campaign --issue <N>
(env: EPM_AUTONOMOUS_SESSION=1, EPM_CAMPAIGN_SESSION=1). It is the
campaign's only driver. Children are ordinary kind: experiment tasks
executed by their own /issue <child> sessions with the full
planner / critic / analyzer stack unchanged — this skill NEVER runs an
experiment itself, never touches a pod, and never reads a child's raw
results: it ingests only the CRITIC-GATED clean-result body at
awaiting_promotion / completed.
Design grounding (2025-26 autonomous-research literature): invest in the proposal operator, not tree search; keep a writable claim→evidence world model; the deciding eval lives outside the agent (critic-gated clean-results); parallel arms share state through the world model and are designed shared-nothing so they can run concurrently; humans own the problem + rubric (brief IN, digest + final report OUT).
State surfaces
tasks/<status>/<N>/artifacts/campaign-state.json— machine state (budget, limits, stop, experiment DAG). Read/write ONLY viaexplore_persona_space.campaign_state(atomic, schema-validated).tasks/<status>/<N>/artifacts/world-model.md— claim →#taskevidence ledger, appended at every ingest.docs/open_questions.mdstays user-gated read-only; the final report PROPOSES a diff, never applies it.- Commit both artifacts by explicit path after every mutation
(concurrent-committer rules: never
git add -A).
Resolve the artifacts directory via uv run python scripts/task.py find <N> — never a hand-built tasks/... path.
No user asks, ever
This skill never calls AskUserQuestion — the ONE human gate is upstream:
a campaign executes only from status approved, after the user reviews
the ## Campaign Brief and runs task.py set-status <N> approved
(workflow.yaml § gates.campaign_brief_approval). Everything after that
gate is autonomous: decisions follow the autonomous carve-out (pick max
info-gain per GPU-hour toward the question, post Decision: <X> in the
relevant marker note, continue). Gates OUT are the daily
epm:campaign-digest and the final report; per-child promotion stays
user-only but does NOT block the campaign.
Step 0 — validate + arm (idempotent)
uv run python scripts/task.py view <N> --json. Verifykind: campaign— anything else: print one line ("not a campaign task") and EXIT.Verify status:
proposed/planning/plan_pending→ EXIT with ONE line: the user must review the## Campaign Briefand runuv run python scripts/task.py set-status <N> approved(workflow.yaml § gates.campaign_brief_approval). Do not loop, do not arm the cron.approved→ first entry; continue.running→ re-entry (tick re-drive or watcher respawn); skip to Step 1 after re-arming the cron (sub-steps 5-7 are idempotent).completed/archived/blocked→ CRON-TEARDOWN (see /campaign-tick Step 4), one line, EXIT.
Verify the body has a
## Campaign BriefH2 carrying: the question anchor (q:<slug>intodocs/open_questions.md), the hypothesis list, the initial experiment table (columnsid | title | hypothesis | depends_on | gpu_hours_est), and optional frontmattercampaign:budget overrides. A malformed brief → postepm:failure v1(failure_class: data, naming the missing piece), set statusblocked, EXIT — fail loud, never guess a DAG.Initialize state if absent (
init_state_from_briefparses the brief and persists atomically):uv run python -c " from explore_persona_space import campaign_state, task_workflow t = task_workflow.get_task(<N>) try: campaign_state.load_state(<N>) print('state: already initialized') except FileNotFoundError: campaign_state.init_state_from_brief(<N>, t['frontmatter'], t['body']) print('state: initialized from brief') "Commit
artifacts/campaign-state.jsonby explicit path.Budget/limit seeding is single-pathed through
init_state_from_briefwith fixed precedence: frontmattercampaign:overrides (the user-reviewed brief wins) > caps recorded in~/.eps-autonomous/campaign-<N>.jsonbyspawn-campaign's CLI flags (--budget-gpu-hours/--max-concurrent/--per-child-cap) >campaign_statemodule defaults. The state file is the ONLY budget enforcement surface after init.Set status →
running(task.py set-status <N> running).Register this session for the watcher's campaign pass (idempotent — safe on re-entry):
uv run python scripts/spawn_session.py register-current --issue <N> --mode campaign.Arm the recurring tick (idempotent ARM-GUARD:
CronListfirst, skip if a job with prompt exactly"/campaign-tick <N>"exists):CronCreate("*/45 * * * *", prompt="/campaign-tick <N>", recurring=True, durable=False). (45 min as of 2026-06-12 — the 10-min watcher carries fast detection; the tick is the in-session re-driver of last resort.)Post
epm:campaign-started v1(anchor, budget, limits, DAG size). Then fall through to Step 1.
Step 1 — decision round (the core loop body)
Re-entered on every tick re-drive and on every child-landing wake. Each round is bounded: reconcile → stop-check → extend-if-dry → file+spawn → digest → EXIT (the session idles between rounds; the cron is the driver).
1.1 Reconcile children → ingest landed results
uv run python scripts/task.py list-children <N> --json.
Orphan-child adoption (crash-recovery cross-check, FIRST): any
list-children row whose id does NOT match a known child_task in state
is an orphan from a crash between task.py new and the state save. Read
its body for the embedded <!-- campaign_experiment_id: <id> --> line
(Step 1.4 stamps it into every child) and ADOPT it into that DAG row:
set child_task: <orphan id>, status from the child's actual status
(filed if no session is driving it yet — re-spawn it in 1.4), and add
its gpu_hours_est to budget.gpu_hours_committed if the row was still
planned (the crash window lost the commit). An orphan with NO embedded
experiment id (or one not in the DAG) → post epm:failure v1
(failure_class: data, naming the orphan child id), set status
blocked, EXIT — never guess which arm a child belongs to.
Then, for each experiment row in state with status filed / running /
landed / waiting-user, compare its child_task's current status:
- Child reached
awaiting_promotionorcompleted→ ingest:- Read the child's clean-result body — title claim + confidence tag
- the findings-skim ONLY (the critic-gated artifact; never raw
completions, never
epm:interpretationdrafts). For a v3 child (sentinel<!-- clean-result-v3 -->) the findings-skim is## Takeaways+## Findings; for a v2/legacy child it is the
## TL;DRblock — branch on the sentinel. - the findings-skim ONLY (the critic-gated artifact; never raw
completions, never
- Record
headline+confidenceon the experiment row. - Append a claim→evidence row to
artifacts/world-model.md:| <claim> | #<child> | <confidence> | <date> |. - Set
belief_shift: yes|no— did this result change the campaign's working belief relative to the world model's prior state?yes→ resetstop.dry_counterto 0;no→ increment it. THEN updatestop.current_confidence— the CAMPAIGN-LEVEL confidence in the world model's current answer to the question (judged across the accumulated evidence ledger, NOT copied from this child's per-claim tag; LOW/MODERATE/HIGH/DETERMINATE). This field is what the confidence-target stop criterion reads. - Reconcile committed hours (IDEMPOTENT recipe — used here and at a
waiting-userresume): if the child's approved plan recorded actual GPU-hours (epm:planmarker / plan frontmattergpu_hours_total) andplan_hours != row.gpu_hours_est, adjustbudget.gpu_hours_committedby(plan_hours − row.gpu_hours_est)AND setrow.gpu_hours_est = plan_hours. After any reconcile the row's est always equals its committed contribution, so re-running the recipe (e.g. at ingest after awaiting-userresume already applied it) is a no-op — never double-add. Note the original est-vs-plan gap for the next digest. - Mark the row
ingested, save state, commit both artifacts by explicit path, postepm:campaign-child-ingested v1.
- Read the child's clean-result body — title claim + confidence tag
- Child
blocked→ leave one respawn attempt to the normal watcher path this round; if STILLblockedon the next round, mark the rowabandonedand RELEASE its committed hours: first run the idempotent reconcile recipe (step 5) sorow.gpu_hours_estequals the row's committed contribution, then subtractrow.gpu_hours_estfrombudget.gpu_hours_committed, note the abandonment + any already-burned pod hours in the world model, save + commit. (Released hours are an optimistic refund — a pod that already burned time is gone; the digest surfaces the gap.) - Child parked at
plan_pending(its plan exceeded the per-child GPU-hour cap) → mark the rowwaiting-user; it does NOT occupy a concurrency slot; surface it in the next digest. If the user later approves, the row flips back torunningat the next reconcile — and RECONCILE its committed hours to the approved plan'sgpu_hours_total(the whole reason it parked is that the plan exceeded the estimate) using the SAME idempotent recipe as ingest step 5: adjustbudget.gpu_hours_committedby(plan_hours − row.gpu_hours_est), then setrow.gpu_hours_est = plan_hours— a later re-run at ingest is then a no-op. Surface the est-vs-plan gap in the next digest. - Child still in flight (any ACTIVE status) → row stays
running.
1.2 Stop-check
uv run python -c "
from explore_persona_space import campaign_state, task_workflow
state = campaign_state.load_state(<N>)
tags = task_workflow.get_task(<N>)['frontmatter'].get('tags') or []
print(campaign_state.check_stop(state, user_stop='campaign-stop' in tags))
"
The user-stop signal is the campaign-stop tag
(task.py add-tag <N> campaign-stop). On (True, reason) → record
stop.stopped = true, stop.stop_reason = reason, save, go to Step 2.
1.3 Extend the DAG if dry
If ready_experiments(state) is empty AND open_slots(state) > 0 AND
the stop-check did not fire: spawn ONE fresh-context proposal agent
(Agent(subagent_type="general-purpose")) whose brief contains PATHS to
(never inlined bodies of): the campaign task body, world-model.md, the
docs/open_questions.md anchor section, and the clean-result titles of
the parent + all ingested children. It returns 1-4 NEW experiments in
the DAG row format (id | title | hypothesis | depends_on | gpu_hours_est), each justified by info-gain-per-GPU-hour and designed
for INDEPENDENCE (shared-nothing arms: own pod, own adapters, no shared
mutable artifacts) wherever possible so they can run concurrently.
Append the rows to state (validate ids unique, deps resolve), save +
commit. A proposal round that returns zero viable experiments increments
stop.dry_counter (it will trip the dry-limit stop within
dry_limit rounds — no infinite proposal loop).
1.4 File + spawn (parallelize aggressively; state save BEFORE spawn)
Select from ready_experiments(state), in order, at most
open_slots(state) rows, taking each row only while the cumulative
selected gpu_hours_est stays <= budget_headroom(state) — never
commit hours past the headroom. For each selected experiment row, in
THIS order (the state save sits between filing and spawning so a crash
in any window cannot duplicate a child or leak committed hours):
- File the child:
uv run python scripts/task.py new --kind experiment --parent <N> --title "<row title>" --goal "<one-sentence goal from the hypothesis>" --body-file <tmp>— the body OPENS with the fixed line<!-- campaign_experiment_id: <row id> -->(the deterministic reconcile key for orphan adoption, Step 1.1) and carries the hypothesis, the single-variable design sketch,gpu_hours_est, and a pointer to the campaign world model (tasks/<status>/<N>/artifacts/world-model.mdvia the dashboard path, plus the question anchor). - IMMEDIATELY persist the filing — BEFORE any spawn: update the row to
status: filed,child_task: <child>,budget.gpu_hours_committed += gpu_hours_est; save state and commit the state artifact by explicit path. (A crash aftertask.py newbut before this save is recovered by 1.1's orphan adoption via the embedded experiment id; a crash after this save merely leaves afiledrow whose session the next round spawns — no duplicate, no leaked hours.) - Spawn its autonomous session:
uv run python scripts/spawn_session.py spawn-issue --issue <child> --auto --auto-approve-gpu-hours <per_child_gpu_hours_cap>. Stagger successive spawns by a few seconds (429 token-pacing). - Flip the row
filed→running; save + commit. - Post
epm:campaign-child-spawned v1(child id, experiment id, committed hours, remaining headroom, open slots).
A filed row with a child_task but no live session (observed at the
next reconcile) is the crash-between-save-and-spawn case: re-run steps
3-5 for it — task.py new is NEVER re-run for a row that already has a
child_task.
The full /issue pipeline (clarifier → adversarial-planner → critic
ensemble → implementer → experimenter → analyzer → clean-result-critic)
runs unchanged inside each child session. The campaign never bypasses
it.
1.5 Digest
If last_digest_at is null or older than 24h: post
epm:campaign-digest v1 (budget spent/committed/total in GPU-hours,
children table, current working belief + stop.current_confidence, next
planned arms, any waiting-user rows, and any est-vs-approved-plan
GPU-hour gaps noted at 1.1's reconciliations) + a one-line
PushNotification (soft-fail — swallow errors). Set last_digest_at,
save + commit. Then EXIT the round (the tick cron re-drives).
Step 2 — finalize
- Post
epm:campaign-stopped v1withstop_reason. - Write the final campaign report into the task body
(
task.py set-body <N> --file <report> --snapshot— snapshots the original brief tooriginal-body.md): the question, the belief trajectory (initial hypothesis set → final working belief + confidence), the per-child table (id / title / headline / confidence / GPU-hours), the claim→evidence ledger, and a PROPOSEDdocs/open_questions.mddiff in a fenced block — NOT applied; living-docs mutations stay user-gated, unchanged. - Set status
completed. - CRON-TEARDOWN (delete the
"/campaign-tick <N>"cron — whole-string prompt equality, same rule as /issue-tick Step 4). PushNotificationone-liner (campaign done + stop reason), then EXIT. The watcher's campaign pass reapscampaign-<N>.jsonon its next tick.
Failure posture
- Any unrecoverable error (state file corrupt, brief unparseable,
task.pymutations failing repeatedly) → postepm:failure v1with the appropriatefailure_class, set statusblocked, EXIT. The watcher GCs the registry entry atblocked; the user triages. - Child failures are NOT campaign failures: a blocked child is retried once by the watcher, then abandoned (Step 1.1) and the campaign continues toward the question with the remaining budget.
- Budget discipline is structural: Step 1.4 cannot commit past
budget_headroom,check_stophalts the loop at exhaustion, and the watcher's budget backstop alerts if committed ever exceeds total (harness-side circuit breaker). All caps are GPU-hours, never dollars.