name: autoresearch description: "Autonomous iteration loop: modify, verify, keep/discard against any metric" version: 2.2.1
Autoresearch — Autonomous Goal-directed Iteration
Safety Invariants (all subcommands)
- Never push, publish, or deploy without explicit user approval.
- Bounded by default. Override with
Iterations: unlimited. - All results logged to
autoresearch/{subcommand}-{YYMMDD}-{HHMM}/directory. - Chain handoff via
handoff.json. Evals reads*-results.tsv.
Dispatch (bare $autoresearch)
Parse the invocation in this order:
| Condition | Mode |
|---|---|
Metric: or Verify: present |
Classic — existing metric loop, unchanged |
| Free-form natural-language goal, no metric/verify | Orchestrator — see Orchestrator section |
| Nothing | Setup wizard — interactive config builder |
--classic flag |
Force Classic regardless of goal text |
--auto flag |
Force Orchestrator regardless of goal text |
Print a banner on every invocation: [autoresearch] mode: classic | orchestrator | wizard.
Subcommands
| Command | Does | Default Iterations |
|---|---|---|
$autoresearch |
Iterate against a metric: modify → verify → keep/discard | 25 |
$autoresearch plan |
Convert a goal into validated Scope, Metric, Verify config | N/A |
$autoresearch debug |
Hunt bugs: hypothesize → test → falsify → repeat | 15 |
$autoresearch fix |
Crush errors one-by-one until zero remain | 20 |
$autoresearch security |
STRIDE + OWASP audit with red-team personas | 15 |
$autoresearch ship |
Ship through 8 phases: checklist → dry-run → deploy → verify | N/A |
$autoresearch scenario |
Generate edge cases across 12 dimensions | 20 |
$autoresearch predict |
5 expert personas debate before implementation | N/A |
$autoresearch learn |
Scout codebase → generate docs or wiki → validate → fix loop | 10 |
$autoresearch reason |
Adversarial debate with blind judges until convergence | 8 |
$autoresearch probe |
8 personas interrogate requirements until saturation | 15 |
$autoresearch improve |
Research ICP challenges, discover improvements, generate PRDs | 15 |
$autoresearch evals |
Analyze iteration results: trends, plateaus, regressions | N/A |
$autoresearch regression |
Regression stability gate: baseline vs candidate, verdict STABLE/UNSTABLE | N/A |
Universal Flags
| Flag | Applies To | Purpose |
|---|---|---|
Iterations: N |
All looping | Set iteration count |
Iterations: unlimited |
All looping | Opt-in unbounded |
--evals |
All looping | Mid-loop checkpoints + final summary |
--evals-interval N |
All looping | Override checkpoint frequency |
--chain <targets> |
All | Sequential handoff after completion |
--<subcommand> |
All | Shorthand for --chain <subcommand> |
--dry-run |
Orchestrator | Print derived config + planned pipeline; no execution |
--max-cycles N |
Orchestrator | Hard ceiling on orchestration cycles (default 50) |
--classic |
Bare $autoresearch |
Force Classic metric-loop mode |
--auto |
Bare $autoresearch |
Force Orchestrator mode |
Orchestrator
Activated when a plain-language goal is given without Metric:/Verify:. Classifies the goal into a Goal archetype — see references/orchestrator-routing.md for the archetype table and router decision table.
Two modes based on archetype:
- Orchestration loop — predicate-bearing archetypes (ship-ready, optimize-metric, fix-broken, harden, build-feature, explore). Goal has a mechanical Success predicate; the loop runs until that predicate is met.
- Single-pass dispatch — subjective/terminal archetypes (document, what-to-build, decide-design). Routes once to the fitting subcommand (learn / improve / reason), lets it self-terminate, then reports. No loop, no Plateau, no ship gate.
Orchestration Loop Steps
Backed by scripts/orchestrate.sh (deterministic seam — all routing logic lives there). Subcommands exposed: classify, next-hop, units, plateau, screen-cmd, verdict, validate-state, screen-state-predicate.
- Classify —
scripts/orchestrate.sh classify "<goal>"→ archetype label + mode. - Derive predicate — reuse
planlogic to produce a concrete Success predicate: exact shell command + expected output. Foroptimize-metric, run the full plan/wizard derivation internally. - Confirm — ONE
request_user_inputshowing: archetype, mode, concrete predicate (command + expected output), terminal choice (stop-at-verified vs proceed-to-ship). Misclassifications are caught here, not mid-run. - Round-0 dry-run — prove the predicate command runs and returns a value; safety-screen every derived command via
screen-cmd; print projected cycle budget. Stop here if--dry-run. - Loop until predicate satisfied:
a. Assess state via cheap signals (last
handoff.json, regression verdict, error count) + affected-test verify. b.scripts/orchestrate.sh next-hop orchestrator-state.json→ next subcommand. c. Run subcommand (its own bounded inner loop). d. Record per-hop outcome ∈ {progressed, no-op, failed, blocked}. e. Fold hop'shandoff.jsonintoorchestrator-state.json. f.scripts/orchestrate.sh units→ recompute Units remaining. - Stop conditions (checked after each hop):
- Predicate met → ship gate (only if ship is in the pipeline) else
CONVERGED. scripts/orchestrate.sh plateau orchestrator-state.json→ true → stop + reportPLATEAU.- Cycles > ceiling (default 50, override
--max-cycles N) → stop + reportCEILING. - Hop outcome
blocked/failedwith no alternative route → checkpoint + stop + reportBLOCKED.
- Predicate met → ship gate (only if ship is in the pipeline) else
Orchestrator State
orchestrator-state.json — orchestrator-owned, additive. Tracks: goal, archetype, predicate, terminal-choice, units_remaining history, cycle count, per-hop pipeline log with outcomes, current incumbent. Each hop's handoff.json is unchanged (single-hop bridge); the orchestrator reads it and folds it in. Two clearly-owned state objects, no overlap.
Orchestrator Safety Invariants
- Never auto-approve ship/deploy/push. The orchestrator never passes
--autotoship; deploy always requires explicit user approval. - Data-migration behind anchored DB-URL allowlist. Reuses regression's allowlist — host must be
localhost/127.0.0.1/container hostname, or database name carries_test/_cisuffix. Bare substring match does not qualify. Anything else refused. - screen-cmd on every derived command — run before the loop starts AND on every command read from a persisted state file on resume. Persisted commands are never trusted; resume re-screens the pinned predicate via
screen-state-predicateand refuses onrefuse. - No un-screened commands mid-loop. The autonomous loop cannot introduce new shell commands that bypass
screen-cmd. - Predicate pinned, not re-derived. Round-0 writes the derived Success predicate verbatim into
orchestrator-state.json; every cycle and every resume reuses that exact string so "done" is reproducible across runs. - Validate the ledger before routing.
validate-stategatesorchestrator-state.json(required fields + coarse types); a malformed ledger is not trusted to route from. - Independent verify before convergence. High-impact changes accepted on the working signal set
pending_verify;next-hoproutes to averifyhop (held-out / adversarial check) beforeDONEor ship. The verify hop never auto-approves ship. - Unknown-units cycles excluded from Plateau counter. A cycle where
unitsreturnsunknown(e.g. runner crash) is not counted as zero-progress; repeatedunknownroutes toBLOCKED.