name: choosing-swarm-patterns description: Use when coordinating multiple AI agents with Agent Relay's workflow engine and need to pick the right orchestration pattern - covers the 10 core patterns (fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, hierarchical) plus 14 specialized ones, with decision framework and accurate workflow/YAML examples.
Choosing Swarm Patterns
Overview
The Agent Relay workflow engine (@relayflows/core) supports 24 swarm patterns via a single swarm.pattern field. Patterns are configured declaratively in YAML or programmatically via the workflow() fluent builder — there are no standalone fanOut(...) / hubAndSpoke(...) helpers. Pick the simplest pattern that solves the problem; add complexity only when the system proves it's insufficient.
Two ways to run a pattern
1. YAML (portable):
import { runWorkflow } from '@relayflows/core';
const run = await runWorkflow('workflows/feature-dev.yaml', {
vars: { task: 'Add OAuth login' },
});
2. Fluent builder (programmatic):
import { workflow } from '@relayflows/core';
const run = await workflow('feature-dev')
.pattern('hub-spoke')
.channel('swarm-feature-dev')
.agent('lead', { cli: 'claude', role: 'lead' })
.agent('developer', { cli: 'codex', role: 'worker', interactive: false })
.step('plan', { agent: 'lead', task: 'Plan {{task}}' })
.step('implement', { agent: 'developer', task: 'Implement: {{steps.plan.output}}', dependsOn: ['plan'] })
.run();
Both paths hit the same WorkflowRunner.
Quick Decision Framework
Is the task independent per agent?
YES → fan-out (parallel workers, hub collects)
Does each step need the previous step's output?
YES → Is it strictly linear?
YES → pipeline
NO → dag (parallel where possible, `dependsOn` edges)
Does a coordinator need to stay alive and adapt?
YES → hub-spoke (single-level hub + workers)
hierarchical (structurally identical in current impl; use for naming/intent)
Is the task about making a decision?
YES → Do agents need to argue opposing sides?
YES → debate (adversarial, full mesh)
NO → consensus (cooperative, full mesh + coordination.consensusStrategy)
Does the right specialist emerge during processing?
YES → handoff (sequential chain, one active at a time)
Do all agents need to freely collaborate?
YES → mesh (full peer-to-peer edges)
Is cost the primary concern?
YES → cascade (chain of increasingly capable agents; each step's prompt
decides whether to pass through or redo the prior output)
Pattern Reference (Core 10)
| # | Pattern | Topology (actual edges) | Best For |
|---|---|---|---|
| 1 | fan-out | Hub broadcasts to N workers; workers reply to hub only | Independent subtasks (reviews, research, tests) |
| 2 | pipeline | Linear chain (agenti → agent{i+1}) | Ordered stages (design → implement → test) |
| 3 | hub-spoke | Hub ↔ spokes (bidirectional); no spoke-to-spoke | Dynamic coordination, lead reviews/adjusts |
| 4 | consensus | Full mesh; decision via coordination.consensusStrategy |
Architecture decisions, approval gates |
| 5 | mesh | Full mesh (every agent ↔ every other) | Brainstorming, collaborative debugging |
| 6 | handoff | Chain; passes control forward | Triage, specialist routing |
| 7 | cascade | Chain of dependsOn steps; all run on success, downstream skipped on upstream failure (no built-in "fall through") |
Cost optimization: cheap first, each step's prompt passes through or redoes |
| 8 | dag | Edges from step dependsOn |
Mixed dependencies, parallel where possible |
| 9 | debate | Full mesh (same topology as mesh; roles drive behavior) | Rigorous adversarial examination |
| 10 | hierarchical | Hub + subordinates (single-level in current impl) | Large teams; semantic distinction from hub-spoke |
Heads up:
hierarchicalresolves to the same edge structure ashub-spokeincoordinator.ts:313-319. Multi-level tree topology is not currently implemented — use pattern name for intent, but expect the same runtime graph.
Additional Patterns (role-driven)
These 14 additional patterns exist in SwarmPattern (types.ts:114-139). The coordinator has role-based auto-selection heuristics (coordinator.ts:51-165), but they only fire when swarm.pattern is omitted — YAML validation requires it (runner.ts:2105-2117), so auto-selection is effectively a programmatic-API feature. In YAML, set swarm.pattern explicitly.
Topology is still resolved per-pattern once selected; the "Triggering roles" column reflects what the coordinator looks for to shape edges (per coordinator.ts:250-450):
| Pattern | Roles the topology keys off | Topology |
|---|---|---|
map-reduce |
mapper + reducer |
coordinator → mappers → reducers → coordinator |
scatter-gather |
— | hub → workers → hub |
supervisor |
supervisor |
supervisor ↔ workers |
reflection |
critic or reviewer (auto-select uses critic only) |
producers → critic → producers (loop) |
red-team |
attacker/red-team + defender/blue-team |
adversarial mesh with optional judges |
verifier |
verifier |
producers → verifiers → back to producers |
auction |
auctioneer |
auctioneer → bidders → auctioneer |
escalation |
tier-* |
tiered chain, escalate up / report down |
saga |
saga-orchestrator, compensate-handler |
orchestrator ↔ participants |
circuit-breaker |
primary + fallback/backup |
try primary, fallback on failure |
blackboard |
blackboard / shared-workspace |
shared state hub |
swarm |
hive-mind / swarm-agent |
stigmergy-style |
competitive |
— (declared explicitly) | independent parallel implementations + judge |
review-loop |
implement* + 2+ reviewer* |
implementer ↔ reviewers |
Structured Squad Review Loop
For serious implementation work, especially workflow generation or product-contract changes, prefer a composite squad-review-loop recipe over a plain single implementer plus final reviewer. This is a workflow authoring recipe built from existing patterns, not a separate SDK enum unless the local runner has added one.
Use this when the fastest reliable path is small teams of 2-3 agents working in parallel with live feedback:
- Split the work into bounded implementation squads. Each squad owns a non-overlapping file or subsystem scope.
- Give each squad an implementer plus a shadow/review partner. The shadow follows the implementer in real time, checks alignment with the spec, and posts concise feedback before the work drifts.
- Require the implementer to self-reflect before external review: compare the final diff against the spec, AGENTS.md / CLAUDE.md, recent local conventions, tests, and declared non-goals.
- Run an independent self-review/fresh-eyes agent that reads the actual files and recent repo context, not just the chat transcript.
- Send that review back to the implementer for one repair round.
- After squads converge, run a final two-agent review team, usually one Claude reviewer and one Codex reviewer, independently. They compare notes, merge findings, and produce one final verdict.
- Spawn fresh fix agents for final-review findings. Those fix agents self-reflect, then the final reviewers re-check the post-fix state until the spec is fully satisfied or a blocker is documented.
Pattern selection for this recipe:
- Use
supervisororhub-spokewhen a lead needs to coordinate live squads. - Use
review-loopwhen the main risk is code quality and feedback iteration. - Use
reflectionwhen critic feedback should loop directly back to producers. - Use
verifierwhen completion evidence matters more than design debate. - Use
competitiveonly when independent alternative implementations are useful; otherwise split by ownership scope.
Keep squads small. Two or three agents per squad is usually the useful limit: implementer, shadow/reviewer, and optionally test/validation owner. More agents belong in separate squads or in the final review team.
Pattern Details
All examples below use real API shapes (WorkflowBuilder / YAML), verified against @relayflows/core's builder.d.ts and schema.d.ts.
YAML fragments vs complete configs: The per-pattern YAML snippets below are fragments that show only the pattern-relevant shape. A runnable YAML file also requires
version: "1.0"andname: <id>at the top (runner.ts:2105-2117). See the Complete YAML Example for the full structure.Topology edges exclude
interactive: falseagents.resolveTopology(coordinator.ts:218-237) drops non-interactive agents from the message graph — they run as one-shot subprocesses with no relay connection. Topology claims like "hub ↔ spokes" describe the interactive-agent edges; workers markedinteractive: falseare spawned and collected via stdout, not via relay messages.
1. fan-out — Parallel Workers
await workflow('review')
.pattern('fan-out')
.agent('lead', { cli: 'claude', role: 'lead' })
.agent('auth-rev', { cli: 'claude', role: 'worker', interactive: false })
.agent('db-rev', { cli: 'claude', role: 'worker', interactive: false })
.step('review-auth', { agent: 'auth-rev', task: 'Review auth.ts' })
.step('review-db', { agent: 'db-rev', task: 'Review db.ts' })
.run();
Workers run independently; hub aggregates. No inter-worker edges.
2. pipeline — Sequential Stages
swarm: { pattern: pipeline }
agents:
- { name: designer, cli: claude }
- { name: implementer, cli: codex, interactive: false }
- { name: tester, cli: codex, interactive: false }
workflows:
- name: build
steps:
- {
name: design,
agent: designer,
task: 'Design the API schema',
verification: { type: output_contains, value: DONE },
}
- {
name: implement,
agent: implementer,
dependsOn: [design],
task: 'Implement: {{steps.design.output}}',
}
- { name: test, agent: tester, dependsOn: [implement], task: 'Write integration tests' }
Each stage receives the previous stage's output via {{steps.<name>.output}}. Halts on step failure unless onError: retry / continue.
3. hub-spoke — Persistent Coordinator
await workflow('api-build')
.pattern('hub-spoke')
.channel('swarm-api')
.agent('lead', { cli: 'claude', role: 'lead' })
.agent('db-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
.agent('api-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
.step('models', { agent: 'db-worker', task: 'Build database models' })
.step('routes', { agent: 'api-worker', task: 'Build route handlers', dependsOn: ['models'] })
.step('review', { agent: 'lead', task: 'Review everything', dependsOn: ['routes'] })
.run();
Hub (picked via role: lead or first agent) stays on the channel and direct-messages interactive workers via the flat send_dm MCP tool, often exposed by workflow prompts as mcp__relaycast__send_dm.
Don't set
interactive: falseon a hub-spoke worker if you want it to receive coordination DMs —resolveTopologystrips non-interactive agents from the message graph (coordinator.ts:218-237). Useinteractive: falseonly when the worker is a one-shot subprocess whose stdout you collect via{{steps.X.output}}without any mid-run coordination.
4. consensus — Cooperative Voting
swarm: { pattern: consensus }
agents:
- { name: perf, cli: claude, role: reviewer }
- { name: dx, cli: claude, role: reviewer }
- { name: sec, cli: claude, role: reviewer }
coordination:
consensusStrategy: majority # declarative marker: majority | unanimous | quorum
votingThreshold: 0.66
workflows:
- name: decide
steps:
- { name: evaluate-perf, agent: perf, task: 'Evaluate perf of Fastify migration' }
- { name: evaluate-dx, agent: dx, task: 'Evaluate DX of Fastify migration' }
- { name: evaluate-sec, agent: sec, task: 'Evaluate security of Fastify migration' }
Full-mesh topology. Caveat: coordination.consensusStrategy and votingThreshold are declared in CoordinationConfig (types.ts:768-772) but the runner has no built-in vote-tallying logic — the fields only influence coordinator auto-selection (coordinator.ts:63-64). To implement voting, aggregate the step outputs in a downstream lead/judge step that reads {{steps.evaluate-*.output}}.
5. mesh — Peer Collaboration
await workflow('debug-auth')
.pattern('mesh')
.channel('swarm-debug')
.agent('logs', { cli: 'claude' })
.agent('code', { cli: 'claude' })
.agent('repro', { cli: 'claude' })
.step('logs', { agent: 'logs', task: 'Check server logs' })
.step('code', { agent: 'code', task: 'Review auth code' })
.step('repro', { agent: 'repro', task: 'Write repro test' })
.run();
Every agent ↔ every other agent. Use for collaborative exploration without hierarchy.
6. handoff — Dynamic Routing
swarm: { pattern: handoff }
agents:
- { name: triage, cli: claude }
- { name: billing, cli: claude }
- { name: tech, cli: claude }
workflows:
- name: support
steps:
- { name: triage, agent: triage, task: 'Triage: {{request}}' }
- { name: billing, agent: billing, dependsOn: [triage], task: 'Handle billing' }
- { name: tech, agent: tech, dependsOn: [triage], task: 'Handle tech issues' }
Chain passes control forward. Note: The runner doesn't support "route to one branch and skip the others" declaratively — dependsOn steps all run when their dependencies complete, and skipping is only triggered by upstream failure (runner.ts:7057-7088). For true pick-one routing, have the triage step emit a routing token in its output and let each downstream step's prompt check {{steps.triage.output}} and no-op if it doesn't match.
7. cascade — Cost-Aware Fallthrough
await workflow('answer')
.pattern('cascade')
.agent('haiku', { cli: 'claude', model: 'claude-haiku-4-5-20251001' })
.agent('sonnet', { cli: 'claude', model: 'claude-sonnet-4-6' })
.agent('opus', { cli: 'claude', model: 'claude-opus-4-7' })
.step('try-haiku', { agent: 'haiku', task: '{{question}}' })
.step('try-sonnet', {
agent: 'sonnet',
task: 'If this is a complete answer, echo it verbatim. Otherwise answer anew:\n{{steps.try-haiku.output}}',
dependsOn: ['try-haiku'],
})
.step('try-opus', {
agent: 'opus',
task: 'Final-tier answer, using prior attempts for context:\n{{steps.try-sonnet.output}}',
dependsOn: ['try-sonnet'],
})
.run();
Important: cascade only sets edge topology. The runner has no skip-on-success logic for the cascade pattern — a chain of dependsOn steps all execute in order on success, and failed upstream steps mark their dependents as skipped (step-executor.ts:329-334, runner.ts:7057-7088). So a verification-gated first step won't "fall through" to later steps on failure, and won't skip them on success either. The idiom above delegates the escalation decision to the prompt of each downstream step (read the upstream answer and pass-through or redo). No confidence-score parsing exists in-engine.
8. dag — Directed Acyclic Graph
await workflow('fullstack')
.pattern('dag')
.maxConcurrency(3)
.agent('dev', { cli: 'codex', role: 'worker' })
.step('scaffold', { agent: 'dev', task: 'Create project scaffold' })
.step('frontend', { agent: 'dev', task: 'Build React UI', dependsOn: ['scaffold'] })
.step('backend', { agent: 'dev', task: 'Build API', dependsOn: ['scaffold'] })
.step('integrate', { agent: 'dev', task: 'Wire together', dependsOn: ['frontend', 'backend'] })
.run();
Runner derives execution waves from dependsOn; independent nodes run in parallel up to swarm.maxConcurrency. The dag pattern is auto-selected when any step has dependsOn.
9. debate — Adversarial Refinement
Debate currently shares the full-mesh topology with mesh and consensus. Differentiate via roles + task prompts:
swarm: { pattern: debate }
agents:
- { name: pro, cli: claude, role: debater, task: 'Argue FOR monorepo' }
- { name: con, cli: claude, role: debater, task: 'Argue FOR polyrepo' }
- { name: judge, cli: claude, role: judge, task: 'Decide after 3 rounds' }
coordination:
barriers:
- { name: debate-done, waitFor: [pro-round-3, con-round-3] }
Drive rounds and verdicts through the agent's system prompt/task, not a dedicated maxRounds knob — there isn't one at the pattern level.
10. hierarchical — Multi-Level (structurally hub-spoke today)
await workflow('large-team')
.pattern('hierarchical')
.agent('lead', { cli: 'claude', role: 'lead' })
.agent('fe-coord', { cli: 'claude', role: 'coordinator' })
.agent('be-coord', { cli: 'claude', role: 'coordinator' })
.agent('fe-dev', { cli: 'codex', role: 'worker', interactive: false })
.agent('be-dev', { cli: 'codex', role: 'worker', interactive: false })
.step('plan', { agent: 'lead', task: 'Coordinate full-stack app' })
.step('fe-plan', { agent: 'fe-coord', task: 'Manage frontend', dependsOn: ['plan'] })
.step('be-plan', { agent: 'be-coord', task: 'Manage backend', dependsOn: ['plan'] })
.step('fe-impl', { agent: 'fe-dev', task: 'Build components', dependsOn: ['fe-plan'] })
.step('be-impl', { agent: 'be-dev', task: 'Build API', dependsOn: ['be-plan'] })
.run();
Coordinator/worker distinction is expressed in step dependsOn graph, not topology. Agent edges collapse to single-level hub-spoke.
Verification & Completion Signals
An agent step can complete in several ways (runner.ts:5353-5395, runner.ts:4527-4538):
- Verification pass — when the step declares a
verificationblock and the output satisfies it. - Clean process exit — agent exits 0 with no verification configured.
- Evidence-based — channel posts, file changes, or coordination signals trigger completion.
- Owner decision — a
lead-role agent postsCOMPLETE/INCOMPLETE_RETRY/INCOMPLETE_FAILfor the step.
Verification block shape:
verification:
type: output_contains # or: exit_code | file_exists | custom
value: DONE # or: PLAN_COMPLETE, IMPLEMENTATION_COMPLETE, REVIEW_COMPLETE
Conventional signals expected by the @relayflows/core runner:
ACK: ...— received a taskDONE: ...— task complete
The runner captures PTY chunks as step output and also records channel posts + file changes as StepCompletionEvidence. Legacy fallback: a file at .relay/summaries/{stepName}.md is read if PTY output is empty.
Agent Relay MCP - Correct Tool Names
The old category-expanded names are wrong. Current Agent Relay MCP tools are
flat names. In a client that decorates MCP tools, the prefix comes from the
configured server key; workflow prompts commonly show mcp__relaycast__send_dm,
while an agent-relay server key may expose mcp__agent_relay__send_dm.
| Purpose | Canonical tool | Common workflow-prefixed form |
|---|---|---|
| Send DM to another agent | send_dm |
mcp__relaycast__send_dm |
| Check inbox | check_inbox |
mcp__relaycast__check_inbox |
| List agents | list_agents |
mcp__relaycast__list_agents |
| Post to a channel | post_message |
mcp__relaycast__post_message |
| Reply in a thread | reply_to_thread |
mcp__relaycast__reply_to_thread |
| Spawn sub-agent | add_agent |
mcp__relaycast__add_agent |
| Remove sub-agent | remove_agent |
mcp__relaycast__remove_agent |
interactive: falseagents run as non-interactive subprocesses with no relay connection. They must not call Relay MCP tools.
Reflection (Trajectories)
Reflection is not a reflectionThreshold callback. It's configured via the trajectories: block:
trajectories:
enabled: true
reflectOnBarriers: true # config flag exists but runner does NOT currently invoke this path
reflectOnConverge: true # fires at parallel convergence points (runner.ts:2762-2779)
autoDecisions: true # record retry/skip/fail decisions
What actually runs today: only reflectOnConverge is wired into the runner (runner.ts:2762-2779). shouldReflectOnBarriers is defined in trajectory.ts:486-487 but not called — set the flag if you want forward compatibility, but don't depend on it.
Programmatic equivalent:
workflow('x').trajectories({ enabled: true, reflectOnConverge: true });
For a first-class critic loop, use the reflection pattern (agents with role: critic get wired as reviewers in coordinator.ts:363-378).
Common Mistakes
| Mistake | Why It Fails | Fix |
|---|---|---|
| Using mesh/debate for everything | Full-mesh blows up message volume past ~5 agents | Use hub-spoke or dag for most tasks |
| Pipeline for independent work | Sequential bottleneck | Use fan-out or dag |
| Hub-spoke for 2 agents | Hub is unnecessary overhead | Use pipeline or fan-out |
Expecting consensusStrategy to tally votes |
Runner has no vote-tally logic; field only affects coordinator auto-selection | Aggregate votes in a judge/lead step that reads {{steps.*.output}} |
| Handoff with "routing = skip other branches" | Skipping only fires on upstream failure, not routing decisions | Emit a routing token in triage output; downstream prompts self-no-op if token doesn't match |
| Cascade expecting skip-on-success | Runner has no cascade skip logic; failed upstream skips downstream | Chain downstream prompts to pass-through or redo based on {{steps.previous.output}} |
Relying on reflectOnBarriers |
Config flag exists but runner never calls it | Use reflectOnConverge for convergence reflection; use reflection pattern for critic loops |
interactive: false agent calling MCP |
Non-interactive subprocess has no relay | Use interactive: true (default) or emit output on stdout |
Relying on multi-level hierarchical |
Topology is single-level hub in current impl | Use pattern for naming; model levels via dependsOn graph |
Writing mcp__relaycast__send(...) |
Wrong tool name | Use post_message / mcp__relaycast__post_message or send_dm / mcp__relaycast__send_dm |
Resume & Re-run
// Resume a failed run:
await runWorkflow('feature-dev.yaml', { resume: '<runId>' });
// Skip ahead, re-using cached outputs from an earlier run:
await runWorkflow('feature-dev.yaml', {
startFrom: 'review',
previousRunId: '<runId>',
});
Cached outputs live in .agent-relay/step-outputs/; runs in .agent-relay/workflow-runs.jsonl. Env vars RESUME_RUN_ID, START_FROM, PREVIOUS_RUN_ID are auto-detected.
Complete YAML Example
version: '1.0'
name: feature-dev
description: 'Blueprint-style feature development with quality gates.'
swarm:
pattern: hub-spoke
maxConcurrency: 2
timeoutMs: 3600000
channel: swarm-feature-dev
idleNudge: { nudgeAfterMs: 120000, escalateAfterMs: 120000, maxNudges: 1 }
agents:
- { name: lead, cli: claude, role: lead, permissions: { access: full } }
- { name: planner, cli: codex, role: planner, interactive: false, permissions: { access: readonly } }
- { name: developer, cli: codex, role: worker, interactive: false, permissions: { access: readwrite } }
- { name: reviewer, cli: claude, role: reviewer, permissions: { access: readonly } }
workflows:
- name: feature-delivery
onError: retry
preflight:
- { command: 'git status --porcelain', failIf: non-empty, description: 'Clean worktree' }
steps:
- name: plan
agent: planner
task: 'Plan: {{task}}'
verification: { type: output_contains, value: PLAN_COMPLETE }
- name: implement
agent: developer
dependsOn: [plan]
task: 'Implement: {{steps.plan.output}}'
verification: { type: output_contains, value: IMPLEMENTATION_COMPLETE }
- name: test
type: deterministic
dependsOn: [implement]
command: npm test
- name: review
agent: reviewer
dependsOn: [test]
task: 'Review implementation'
verification: { type: output_contains, value: REVIEW_COMPLETE }
coordination:
barriers:
- { name: delivery-ready, waitFor: [plan, implement, review], timeoutMs: 900000 }
trajectories:
enabled: true
reflectOnBarriers: true
reflectOnConverge: true
errorHandling:
strategy: retry
maxRetries: 2
retryDelayMs: 5000
Built-in templates live in @relayflows/core/dist/builtin-templates/ (feature-dev, bug-fix, code-review, competitive, documentation, refactor, review-loop, security-audit).
Source of Truth
| Claim | File |
|---|---|
| Pattern enum (24 patterns) | @relayflows/core/dist/schema.d.ts (SwarmPattern) |
| Topology resolution per pattern | @relayflows/core/dist/coordinator.js |
| Interactive-only topology edges | @relayflows/core/dist/coordinator.js filters interactive: false agents |
| Pattern auto-selection heuristics | @relayflows/core/dist/coordinator.js |
WorkflowBuilder fluent API |
@relayflows/core/dist/builder.d.ts |
runWorkflow(yamlPath, options) |
@relayflows/core/dist/run.d.ts |
YAML validation requires version + name + swarm.pattern |
@relayflows/core/dist/runner.js |
| MCP tool names | packages/cli/src/cli/agent-relay-mcp.ts, @relayflows/core/dist/channel-messenger.js |
| Completion modes (verification / evidence / owner / process-exit) | @relayflows/core/dist/runner.js, @relayflows/core/dist/step-executor.js |
| Trajectory reflection | @relayflows/core/dist/trajectory.js, @relayflows/core/dist/runner.js |