name: swarm description: Dispatch parallel agents. practices:
- microservices
- team-topologies
- mythical-man-month hexagonal_role: supporting consumes:
- implement
- vibe produces:
- .agents/swarm/results/*.json context_rel:
- kind: customer-of
with: crank
skill_api_version: 1
context:
window: fork
intent:
mode: task
sections:
exclude:
- HISTORY intel_scope: full metadata: tier: orchestration dependencies:
- implement
- vibe output_contract: .agents/swarm/results/*.json
Swarm Skill
Spawn isolated agents to execute tasks in parallel. Fresh context per agent (Ralph Wiggum pattern).
Loop position
Move 5 (wave execution) of the operating loop, specifically the parallel-fork primitive /crank invokes. Refuses to spawn parallel agents on a wave that has not cleared the wave-validity check in the slice validation plan: write scopes must be disjoint, no shared migration/contract/CLI surface, integration order declared when it matters, one owner per slice, discard path per slice. Parallelism is explicit ownership, not swarm chaos. Default to sequential when the wave-validity rows are not all green.
Coupled-chain rule (DERIVED-surface collision). Two slices that both regenerate a shared derived surface — cli-command-surface, registry.json, context-map, or the codex manifest — collide even if their source files are disjoint. Such a coupled chain (schema→write-model→export→gate→tests) MUST run sequential: each link branches off the freshly-MERGED prior link's SHA, never a pre-fan snapshot. Reserve parallel waves for provably-disjoint surfaces, cap 4–6.
Integration modes:
- Direct - Create TaskList tasks, invoke
/swarm - Via Crank -
/crankcreates tasks from beads, invokes/swarmfor each wave
Requires multi-agent runtime. Swarm needs a runtime that can spawn parallel subagents. If unavailable, work must be done sequentially in the current session.
Architecture (Mayor-First)
Mayor (this session)
|
+-> Plan: TaskCreate with dependencies
|
+-> Identify wave: tasks with no blockers
|
+-> Select spawn backend (gc if available; runtime-native: Claude teams in Claude runtime, Codex sub-agents in Codex runtime; fallback tasks if unavailable)
|
+-> Assign: TaskUpdate(taskId, owner="worker-<id>", status="in_progress")
|
+-> Spawn workers via selected backend
| Workers receive pre-assigned task, execute atomically
|
+-> Wait for completion (wait() | SendMessage | TaskOutput)
|
+-> Validate: Review changes when complete
|
+-> Cleanup backend resources (close_agent | TeamDelete | none)
|
+-> Repeat: New team + new plan if more work needed
Execution
Read references/execution-steps.md when you need the full procedural detail (Steps 0–6): backend detection, gc dispatch, task typing + file manifests, context briefing, manifest auto-population, advisory bead clustering, wave identification, pre-spawn conflict check, test-file naming validation, multi-wave base-SHA refresh, and worker dispatch.
Every TaskCreate must include metadata.issue_type plus a metadata.files array.
Do not spawn workers with overlapping file manifests into the same shared-worktree wave.
Example Flow
Mayor: "Let's build a user auth system"
1. /plan -> Creates tasks:
#1 [pending] Create User model
#2 [pending] Add password hashing (blockedBy: #1)
#3 [pending] Create login endpoint (blockedBy: #1)
#4 [pending] Add JWT tokens (blockedBy: #3)
#5 [pending] Write tests (blockedBy: #2, #3, #4)
2. /swarm -> Spawns agent for #1 (only unblocked task)
3. Agent #1 completes -> #1 now completed
-> #2 and #3 become unblocked
4. /swarm -> Spawns agents for #2 and #3 in parallel
5. Continue until #5 completes
6. /vibe -> Validate everything
Scope-Escape Protocol
When a worker discovers work outside their assigned scope, they MUST NOT modify files outside their file manifest. Instead, append to .agents/swarm/scope-escapes.jsonl:
{"worker": "<worker-id>", "finding": "<description>", "suggested_files": ["path/to/file"], "timestamp": "<ISO8601>"}
For richer scope-escape narration (status classification, concrete next step, evidence), see references/scope-escape-template.md. Use the template when a single-line JSONL entry is insufficient for the operator to act on.
The lead reviews scope escapes after each wave and creates follow-up tasks as needed.
Key Points
- Runtime-native local mode - Auto-selects the native backend for the current runtime (gc pool, Claude teams, or Codex sub-agents)
- Universal orchestration contract - Same swarm behavior across Claude and Codex sessions
- Pre-assigned tasks - Mayor assigns tasks before spawning; workers never race-claim
- Fresh worker contexts - New sub-agents/teammates per wave preserve Ralph isolation
- Wave execution - Only unblocked tasks spawn
- Mayor orchestrates - You control the flow, workers write results to disk
- Thin results - Workers write
.agents/swarm/results/<id>.json, orchestrator reads files (NOT Task returns or SendMessage content) - Retry via message/input - Use
send_input(Codex) orSendMessage(Claude) for coordination only - Atomic execution - Each worker works until task done
- Graceful degradation - If multi-agent unavailable, work executes sequentially in current session
Workflow Integration
This ties into the full workflow:
/research -> Understand the problem
/plan -> Decompose into beads issues
/crank -> Autonomous epic loop
+-- /swarm -> Execute each wave in parallel
/vibe -> Validate results
/post-mortem -> Extract learnings
Direct use (no beads):
TaskCreate -> Define tasks
/swarm -> Execute in parallel
The knowledge flywheel captures learnings from each agent.
Task Management Commands
# List all tasks
TaskList()
# Mark task complete after notification
TaskUpdate(taskId="1", status="completed")
# Add dependency between tasks
TaskUpdate(taskId="2", addBlockedBy=["1"])
Parameters
| Parameter | Description | Default |
|---|---|---|
--max-workers=N |
Max concurrent workers | 5 |
--from-wave <json-file> |
Load wave from OL hero hunt output (see OL Wave Integration) | - |
--per-task-commits |
Commit per task instead of per wave (for attribution/audit) | Off (per-wave) |
When to Use Swarm
| Scenario | Use |
|---|---|
| Multiple independent tasks | /swarm (parallel) |
| Sequential dependencies | /swarm with blockedBy |
| Mix of both | /swarm spawns waves, each wave parallel |
Why This Works: Ralph Wiggum Pattern
Follows the Ralph Wiggum Pattern: fresh context per execution unit.
- Wave-scoped worker set = spawn workers -> execute -> cleanup -> repeat (fresh context each wave)
- Mayor IS the loop - Orchestration layer, manages state across waves
- Workers are atomic - One task, one spawn, one result
- TaskList as memory - State persists in task status, not agent context
- Filesystem for EVERYTHING - Code artifacts AND result status written to disk, not passed through context
- Backend messaging for signals only - Short coordination signals (under 100 tokens), never work details
Ralph alignment source: ../shared/references/ralph-loop-contract.md.
Integration with Crank
When /crank invokes /swarm: Crank bridges beads to TaskList, swarm executes with fresh-context agents, crank syncs results back.
| You Want | Use | Why |
|---|---|---|
| Fresh-context parallel execution | /swarm |
Each spawned agent is a clean slate |
| Autonomous epic loop | /crank |
Loops waves via swarm until epic closes |
| Just swarm, no beads | /swarm directly |
TaskList only, skip beads |
| RPI progress gates | /ratchet |
Tracks progress; does not execute work |
Codex-Specific Coordination (folded from codex-team)
When the selected backend is Codex (Codex CLI on PATH or Codex sub-agents), swarm follows the same wave model, but Codex agents are headless — they cannot negotiate locks or wait turns. The orchestrator IS the lock manager: all conflict prevention happens before spawning, via file-target analysis.
Backend selection (Codex path), in order:
spawn_agentavailable → Codex experimental sub-agents (preferred)- Codex CLI available → Codex CLI via Bash (
codex exec ...) skilltool read-only (OpenCode) → OpenCode subagents (task(subagent_type="general", ...))- None → fall back to the runtime-native swarm backend / sequential
Pre-flight (CLI backend only): verify which codex, then test the configured default model with codex exec --full-auto -C "$(pwd)" "echo ok". If either fails, fall back to another backend.
Canonical command + flag order:
codex exec --full-auto -C "$(pwd)" -o <output-file> "<prompt>"
Flag order: --full-auto → -C → -o → prompt (insert -m "<model>" before -C only when intentionally pinning a model; otherwise the user's default is used). Valid flags: --full-auto, -m, -C, -o, --json, --output-schema, --add-dir, -s. Do NOT use -q / --quiet (don't exist).
- Cross-project tasks:
--add-dir /path/to/other/repo(repeatable) grants access beyond-C. - Progress monitoring: add
--jsonto stream JSONL events (turn.started/turn.completedwith tokenusage). No events for 60s → agent likely stuck. - Sandbox levels (
-s):read-onlyfor judges/reviewers,workspace-write(default with--full-auto),danger-full-accessonly in externally-sandboxed environments. Prefer-s read-onlyfor review/analysis tasks.
File-target strategy (apply before spawning Codex agents):
| File Overlap | Strategy | Action |
|---|---|---|
| All tasks touch same file | Merge | Combine into 1 agent with all fixes |
| Some tasks share files | Multi-wave | Shared-file tasks go sequential across waves |
| No overlap | Parallel | Spawn all agents at once |
For multi-wave Codex runs, the lead reads prior-wave results, then injects a summary of what changed (not raw diffs) into the next wave's prompts. Limits: ≤6 agents/wave, 120s default timeout, ≤3 waves (reconsider decomposition beyond that). Output dir: .agents/codex-team/ (or the standard .agents/swarm/results/). Concrete tool calls: references/backend-codex-subagents.md.
Codex troubleshooting: codex not found → npm i -g @openai/codex or fall back; default model unavailable → verify the echo ok pre-flight or pin with -m; empty/missing output → ensure the -o directory exists and is writable.
OL Wave Integration
Read references/ol-wave-integration.md when you invoke /swarm --from-wave <json-file> — covers pre-flight ol CLI check, input JSON format, task creation from wave entries, completion backflow via ol hero ratchet, and example flow.
References
- Executable acceptance: references/swarm.feature — wave-validity gate, fresh-context workers, conflict-free ownership, results+cleanup (soc-qk4b)
- Local Mode Details:
skills/swarm/references/local-mode.md - Validation Contract:
skills/swarm/references/validation-contract.md
Examples
Building a User Auth System
User says: /swarm
What happens:
- Agent identifies unblocked tasks from TaskList (e.g., "Create User model")
- Agent selects spawn backend using runtime-native priority (Claude session -> Claude teams; Codex session -> Codex sub-agents)
- Agent spawns worker for task #1, assigns ownership via TaskUpdate
- Worker completes, team lead validates changes
- Agent identifies next wave (tasks #2 and #3 now unblocked)
- Agent spawns two workers in parallel for Wave 2
Result: Multi-wave execution with fresh-context workers per wave, zero race conditions.
Direct Swarm Without Beads
User says: Create three tasks for API refactor, then /swarm
What happens:
- User creates TaskList tasks with TaskCreate
- Agent calls
/swarmwithout beads integration - Agent identifies parallel tasks (no dependencies)
- Agent spawns all three workers simultaneously
- Workers execute atomically, report to team lead via SendMessage or task completion
- Team lead validates all changes, commits once per wave
Result: Parallel execution of independent tasks using TaskList only.
Worktree Isolation (Multi-Epic Dispatch)
Read references/shared-checkout-discipline.md first when the target checkout (~/dev/<repo>) is shared with peer agents — it documents when worktrees are mandatory (vs. optional) and the three failure modes (branch-deletion data loss, swarm attribution confounded, destructive-recovery temptation) that motivate the discipline.
Read references/worktree-isolation.md when you need to dispatch workers across multiple epics or run waves with overlapping files — covers isolation semantics per backend, effort levels, post-spawn verification, manual worktree creation/routing/merge-back, the Merge Arbiter Protocol, cleanup, and the --worktrees / --no-worktrees parameters.
Worktree reaping (teardown). After a worker's PR is confirmed MERGED (gh pr view --json state = MERGED), reap its tree: git worktree remove <path> --force then git worktree prune. Leave unmerged-PR worktrees intact. Target zero orphaned worktrees — bound the live count to in-flight PRs. (The committed disposition gate scans the tracked file set, not live on-disk worktrees, so this teardown is the operational backstop.)
Troubleshooting
Read references/troubleshooting.md for full diagnostics.
| Problem | See |
|---|---|
| Worktree isolation did not engage | references/troubleshooting.md |
| Workers produce file conflicts | references/troubleshooting.md |
| Team creation fails | references/troubleshooting.md |
| Codex agents unavailable | references/troubleshooting.md |
| Workers timeout or hang | references/troubleshooting.md |
| gc backend detected but workers unresponsive | references/troubleshooting.md |
| Tasks assigned but workers never spawn | references/troubleshooting.md |
Related skills
/using-atm— out-of-session ATM substrate when a swarm needs persistent panes and human attach/steer.
Reference Documents
- references/shared-checkout-discipline.md
- references/agent-genie-coordination-contract.md — Eight-field contract each parallel stream declares before claiming work- references/conflict-recovery.md
- references/cold-start-contexts.md
- references/backend-background-tasks.md
- references/backend-claude-teams.md
- references/backend-codex-subagents.md
- references/backend-inline.md
- references/claude-code-latest-features.md
- references/execution-steps.md
- references/local-mode.md
- references/ol-wave-integration.md
- references/ralph-loop-contract.md
- references/troubleshooting.md
- references/validation-contract.md
- references/worker-pitfalls.md
- references/worker-specs.md
- references/worktree-isolation.md
- ../shared/references/backend-background-tasks.md
- ../shared/references/backend-claude-teams.md
- ../shared/references/backend-codex-subagents.md
- ../shared/references/backend-inline.md
- ../shared/references/claude-code-latest-features.md
- references/pre-spawn-friction-gates.md
- references/scope-escape-template.md
- references/worker-pre-task-checks.md
- ../shared/references/ralph-loop-contract.md