choosing-swarm-patterns

name: choosing-swarm-patterns description: Use when coordinating multiple AI agents with Agent Relay's workflow engine and need to pick the right orchestration pattern - covers the 10 core patterns (fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, hierarchical) plus 14 specialized ones, with decision framework and accurate workflow/YAML examples.

Overview

The Agent Relay workflow engine (@relayflows/core) supports 24 swarm patterns via a single swarm.pattern field. Patterns are configured declaratively in YAML or programmatically via the workflow() fluent builder — there are no standalone fanOut(...) / hubAndSpoke(...) helpers. Pick the simplest pattern that solves the problem; add complexity only when the system proves it's insufficient.

Two ways to run a pattern

1. YAML (portable):

import { runWorkflow } from '@relayflows/core';

const run = await runWorkflow('workflows/feature-dev.yaml', {
  vars: { task: 'Add OAuth login' },
});

2. Fluent builder (programmatic):

import { workflow } from '@relayflows/core';

const run = await workflow('feature-dev')
  .pattern('hub-spoke')
  .channel('swarm-feature-dev')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('developer', { cli: 'codex', role: 'worker', interactive: false })
  .step('plan', { agent: 'lead', task: 'Plan {{task}}' })
  .step('implement', { agent: 'developer', task: 'Implement: {{steps.plan.output}}', dependsOn: ['plan'] })
  .run();

Both paths hit the same WorkflowRunner.

Quick Decision Framework

Is the task independent per agent?
  YES → fan-out (parallel workers, hub collects)

Does each step need the previous step's output?
  YES → Is it strictly linear?
    YES → pipeline
    NO  → dag (parallel where possible, `dependsOn` edges)

Does a coordinator need to stay alive and adapt?
  YES → hub-spoke (single-level hub + workers)
        hierarchical (structurally identical in current impl; use for naming/intent)

Is the task about making a decision?
  YES → Do agents need to argue opposing sides?
    YES → debate (adversarial, full mesh)
    NO  → consensus (cooperative, full mesh + coordination.consensusStrategy)

Does the right specialist emerge during processing?
  YES → handoff (sequential chain, one active at a time)

Do all agents need to freely collaborate?
  YES → mesh (full peer-to-peer edges)

Is cost the primary concern?
  YES → cascade (chain of increasingly capable agents; each step's prompt
        decides whether to pass through or redo the prior output)

Pattern Reference (Core 10)

#	Pattern	Topology (actual edges)	Best For
1	fan-out	Hub broadcasts to N workers; workers reply to hub only	Independent subtasks (reviews, research, tests)
2	pipeline	Linear chain (agenti → agent{i+1})	Ordered stages (design → implement → test)
3	hub-spoke	Hub ↔ spokes (bidirectional); no spoke-to-spoke	Dynamic coordination, lead reviews/adjusts
4	consensus	Full mesh; decision via `coordination.consensusStrategy`	Architecture decisions, approval gates
5	mesh	Full mesh (every agent ↔ every other)	Brainstorming, collaborative debugging
6	handoff	Chain; passes control forward	Triage, specialist routing
7	cascade	Chain of `dependsOn` steps; all run on success, downstream skipped on upstream failure (no built-in "fall through")	Cost optimization: cheap first, each step's prompt passes through or redoes
8	dag	Edges from step `dependsOn`	Mixed dependencies, parallel where possible
9	debate	Full mesh (same topology as mesh; roles drive behavior)	Rigorous adversarial examination
10	hierarchical	Hub + subordinates (single-level in current impl)	Large teams; semantic distinction from hub-spoke

Heads up: hierarchical resolves to the same edge structure as hub-spoke in coordinator.ts:313-319. Multi-level tree topology is not currently implemented — use pattern name for intent, but expect the same runtime graph.

Additional Patterns (role-driven)

These 14 additional patterns exist in SwarmPattern (types.ts:114-139). The coordinator has role-based auto-selection heuristics (coordinator.ts:51-165), but they only fire when swarm.pattern is omitted — YAML validation requires it (runner.ts:2105-2117), so auto-selection is effectively a programmatic-API feature. In YAML, set swarm.pattern explicitly.

Topology is still resolved per-pattern once selected; the "Triggering roles" column reflects what the coordinator looks for to shape edges (per coordinator.ts:250-450):

Pattern	Roles the topology keys off	Topology
`map-reduce`	`mapper` + `reducer`	coordinator → mappers → reducers → coordinator
`scatter-gather`	—	hub → workers → hub
`supervisor`	`supervisor`	supervisor ↔ workers
`reflection`	`critic` or `reviewer` (auto-select uses `critic` only)	producers → critic → producers (loop)
`red-team`	`attacker`/`red-team` + `defender`/`blue-team`	adversarial mesh with optional judges
`verifier`	`verifier`	producers → verifiers → back to producers
`auction`	`auctioneer`	auctioneer → bidders → auctioneer
`escalation`	`tier-*`	tiered chain, escalate up / report down
`saga`	`saga-orchestrator`, `compensate-handler`	orchestrator ↔ participants
`circuit-breaker`	`primary` + `fallback`/`backup`	try primary, fallback on failure
`blackboard`	`blackboard` / `shared-workspace`	shared state hub
`swarm`	`hive-mind` / `swarm-agent`	stigmergy-style
`competitive`	— (declared explicitly)	independent parallel implementations + judge
`review-loop`	`implement` + 2+ `reviewer`	implementer ↔ reviewers

Structured Squad Review Loop

Split the work into bounded implementation squads. Each squad owns a non-overlapping file or subsystem scope.
Give each squad an implementer plus a shadow/review partner. The shadow follows the implementer in real time, checks alignment with the spec, and posts concise feedback before the work drifts.
Require the implementer to self-reflect before external review: compare the final diff against the spec, AGENTS.md / CLAUDE.md, recent local conventions, tests, and declared non-goals.
Run an independent self-review/fresh-eyes agent that reads the actual files and recent repo context, not just the chat transcript.
Send that review back to the implementer for one repair round.
After squads converge, run a final two-agent review team, usually one Claude reviewer and one Codex reviewer, independently. They compare notes, merge findings, and produce one final verdict.
Spawn fresh fix agents for final-review findings. Those fix agents self-reflect, then the final reviewers re-check the post-fix state until the spec is fully satisfied or a blocker is documented.
Use supervisor or hub-spoke when a lead needs to coordinate live squads.
Use review-loop when the main risk is code quality and feedback iteration.
Use reflection when critic feedback should loop directly back to producers.
Use verifier when completion evidence matters more than design debate.
Use competitive only when independent alternative implementations are useful; otherwise split by ownership scope.

Pattern Details

1. fan-out — Parallel Workers

await workflow('review')
  .pattern('fan-out')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('auth-rev', { cli: 'claude', role: 'worker', interactive: false })
  .agent('db-rev', { cli: 'claude', role: 'worker', interactive: false })
  .step('review-auth', { agent: 'auth-rev', task: 'Review auth.ts' })
  .step('review-db', { agent: 'db-rev', task: 'Review db.ts' })
  .run();

2. pipeline — Sequential Stages

swarm: { pattern: pipeline }
agents:
  - { name: designer, cli: claude }
  - { name: implementer, cli: codex, interactive: false }
  - { name: tester, cli: codex, interactive: false }
workflows:
  - name: build
    steps:
      - {
          name: design,
          agent: designer,
          task: 'Design the API schema',
          verification: { type: output_contains, value: DONE },
        }
      - {
          name: implement,
          agent: implementer,
          dependsOn: [design],
          task: 'Implement: {{steps.design.output}}',
        }
      - { name: test, agent: tester, dependsOn: [implement], task: 'Write integration tests' }

3. hub-spoke — Persistent Coordinator

await workflow('api-build')
  .pattern('hub-spoke')
  .channel('swarm-api')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('db-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
  .agent('api-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
  .step('models', { agent: 'db-worker', task: 'Build database models' })
  .step('routes', { agent: 'api-worker', task: 'Build route handlers', dependsOn: ['models'] })
  .step('review', { agent: 'lead', task: 'Review everything', dependsOn: ['routes'] })
  .run();

4. consensus — Cooperative Voting

swarm: { pattern: consensus }
agents:
  - { name: perf, cli: claude, role: reviewer }
  - { name: dx, cli: claude, role: reviewer }
  - { name: sec, cli: claude, role: reviewer }
coordination:
  consensusStrategy: majority # declarative marker: majority | unanimous | quorum
  votingThreshold: 0.66
workflows:
  - name: decide
    steps:
      - { name: evaluate-perf, agent: perf, task: 'Evaluate perf of Fastify migration' }
      - { name: evaluate-dx, agent: dx, task: 'Evaluate DX of Fastify migration' }
      - { name: evaluate-sec, agent: sec, task: 'Evaluate security of Fastify migration' }

5. mesh — Peer Collaboration

await workflow('debug-auth')
  .pattern('mesh')
  .channel('swarm-debug')
  .agent('logs', { cli: 'claude' })
  .agent('code', { cli: 'claude' })
  .agent('repro', { cli: 'claude' })
  .step('logs', { agent: 'logs', task: 'Check server logs' })
  .step('code', { agent: 'code', task: 'Review auth code' })
  .step('repro', { agent: 'repro', task: 'Write repro test' })
  .run();

6. handoff — Dynamic Routing

swarm: { pattern: handoff }
agents:
  - { name: triage, cli: claude }
  - { name: billing, cli: claude }
  - { name: tech, cli: claude }
workflows:
  - name: support
    steps:
      - { name: triage, agent: triage, task: 'Triage: {{request}}' }
      - { name: billing, agent: billing, dependsOn: [triage], task: 'Handle billing' }
      - { name: tech, agent: tech, dependsOn: [triage], task: 'Handle tech issues' }

7. cascade — Cost-Aware Fallthrough

await workflow('answer')
  .pattern('cascade')
  .agent('haiku', { cli: 'claude', model: 'claude-haiku-4-5-20251001' })
  .agent('sonnet', { cli: 'claude', model: 'claude-sonnet-4-6' })
  .agent('opus', { cli: 'claude', model: 'claude-opus-4-7' })
  .step('try-haiku', { agent: 'haiku', task: '{{question}}' })
  .step('try-sonnet', {
    agent: 'sonnet',
    task: 'If this is a complete answer, echo it verbatim. Otherwise answer anew:\n{{steps.try-haiku.output}}',
    dependsOn: ['try-haiku'],
  })
  .step('try-opus', {
    agent: 'opus',
    task: 'Final-tier answer, using prior attempts for context:\n{{steps.try-sonnet.output}}',
    dependsOn: ['try-sonnet'],
  })
  .run();

8. dag — Directed Acyclic Graph

await workflow('fullstack')
  .pattern('dag')
  .maxConcurrency(3)
  .agent('dev', { cli: 'codex', role: 'worker' })
  .step('scaffold', { agent: 'dev', task: 'Create project scaffold' })
  .step('frontend', { agent: 'dev', task: 'Build React UI', dependsOn: ['scaffold'] })
  .step('backend', { agent: 'dev', task: 'Build API', dependsOn: ['scaffold'] })
  .step('integrate', { agent: 'dev', task: 'Wire together', dependsOn: ['frontend', 'backend'] })
  .run();

9. debate — Adversarial Refinement

swarm: { pattern: debate }
agents:
  - { name: pro, cli: claude, role: debater, task: 'Argue FOR monorepo' }
  - { name: con, cli: claude, role: debater, task: 'Argue FOR polyrepo' }
  - { name: judge, cli: claude, role: judge, task: 'Decide after 3 rounds' }
coordination:
  barriers:
    - { name: debate-done, waitFor: [pro-round-3, con-round-3] }

10. hierarchical — Multi-Level (structurally hub-spoke today)

await workflow('large-team')
  .pattern('hierarchical')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('fe-coord', { cli: 'claude', role: 'coordinator' })
  .agent('be-coord', { cli: 'claude', role: 'coordinator' })
  .agent('fe-dev', { cli: 'codex', role: 'worker', interactive: false })
  .agent('be-dev', { cli: 'codex', role: 'worker', interactive: false })
  .step('plan', { agent: 'lead', task: 'Coordinate full-stack app' })
  .step('fe-plan', { agent: 'fe-coord', task: 'Manage frontend', dependsOn: ['plan'] })
  .step('be-plan', { agent: 'be-coord', task: 'Manage backend', dependsOn: ['plan'] })
  .step('fe-impl', { agent: 'fe-dev', task: 'Build components', dependsOn: ['fe-plan'] })
  .step('be-impl', { agent: 'be-dev', task: 'Build API', dependsOn: ['be-plan'] })
  .run();

Verification & Completion Signals

An agent step can complete in several ways in the `@relayflows/core` runner:

verification:
  type: output_contains # or: exit_code | file_exists | custom
  value: DONE # or: PLAN_COMPLETE, IMPLEMENTATION_COMPLETE, REVIEW_COMPLETE

Agent Relay MCP - Correct Tool Names

The old category-expanded names are wrong. Current Agent Relay MCP tools are flat names. In a client that decorates MCP tools, the prefix comes from the configured server key; workflow prompts commonly show mcp__relaycast__send_dm, while an agent-relay server key may expose mcp__agent_relay__send_dm.

Purpose	Canonical tool	Common workflow-prefixed form
Send DM to another agent	`send_dm`	`mcp__relaycast__send_dm`
Check inbox	`check_inbox`	`mcp__relaycast__check_inbox`
List agents	`list_agents`	`mcp__relaycast__list_agents`
Post to a channel	`post_message`	`mcp__relaycast__post_message`
Reply in a thread	`reply_to_thread`	`mcp__relaycast__reply_to_thread`
Spawn sub-agent	`add_agent`	`mcp__relaycast__add_agent`
Remove sub-agent	`remove_agent`	`mcp__relaycast__remove_agent`

interactive: false agents run as non-interactive subprocesses with no relay connection. They must not call Relay MCP tools.

Reflection (Trajectories)

Reflection is not a `reflectionThreshold` callback. It's configured via the `trajectories:` block:

trajectories:
  enabled: true
  reflectOnBarriers: true # config flag exists but runner does NOT currently invoke this path
  reflectOnConverge: true # fires at parallel convergence points (runner.ts:2762-2779)
  autoDecisions: true # record retry/skip/fail decisions

Common Mistakes

Mistake	Why It Fails	Fix
Using mesh/debate for everything	Full-mesh blows up message volume past ~5 agents	Use hub-spoke or dag for most tasks
Pipeline for independent work	Sequential bottleneck	Use fan-out or dag
Hub-spoke for 2 agents	Hub is unnecessary overhead	Use pipeline or fan-out
Expecting `consensusStrategy` to tally votes	Runner has no vote-tally logic; field only affects coordinator auto-selection	Aggregate votes in a judge/lead step that reads `{{steps.*.output}}`
Handoff with "routing = skip other branches"	Skipping only fires on upstream failure, not routing decisions	Emit a routing token in triage output; downstream prompts self-no-op if token doesn't match
Cascade expecting skip-on-success	Runner has no cascade skip logic; failed upstream skips downstream	Chain downstream prompts to pass-through or redo based on `{{steps.previous.output}}`
Relying on `reflectOnBarriers`	Config flag exists but runner never calls it	Use `reflectOnConverge` for convergence reflection; use `reflection` pattern for critic loops
`interactive: false` agent calling MCP	Non-interactive subprocess has no relay	Use `interactive: true` (default) or emit output on stdout
Relying on multi-level `hierarchical`	Topology is single-level hub in current impl	Use pattern for naming; model levels via `dependsOn` graph
Writing `mcp__relaycast__send(...)`	Wrong tool name	Use `post_message` / `mcp__relaycast__post_message` or `send_dm` / `mcp__relaycast__send_dm`

Resume & Re-run

```ts

// Resume a failed run:
await runWorkflow('feature-dev.yaml', { resume: '<runId>' });

// Skip ahead, re-using cached outputs from an earlier run:
await runWorkflow('feature-dev.yaml', {
  startFrom: 'review',
  previousRunId: '<runId>',
});

Complete YAML Example

```yaml

version: '1.0'
name: feature-dev
description: 'Blueprint-style feature development with quality gates.'
swarm:
  pattern: hub-spoke
  maxConcurrency: 2
  timeoutMs: 3600000
  channel: swarm-feature-dev
  idleNudge: { nudgeAfterMs: 120000, escalateAfterMs: 120000, maxNudges: 1 }
agents:
  - { name: lead, cli: claude, role: lead, permissions: { access: full } }
  - { name: planner, cli: codex, role: planner, interactive: false, permissions: { access: readonly } }
  - { name: developer, cli: codex, role: worker, interactive: false, permissions: { access: readwrite } }
  - { name: reviewer, cli: claude, role: reviewer, permissions: { access: readonly } }
workflows:
  - name: feature-delivery
    onError: retry
    preflight:
      - { command: 'git status --porcelain', failIf: non-empty, description: 'Clean worktree' }
    steps:
      - name: plan
        agent: planner
        task: 'Plan: {{task}}'
        verification: { type: output_contains, value: PLAN_COMPLETE }
      - name: implement
        agent: developer
        dependsOn: [plan]
        task: 'Implement: {{steps.plan.output}}'
        verification: { type: output_contains, value: IMPLEMENTATION_COMPLETE }
      - name: test
        type: deterministic
        dependsOn: [implement]
        command: npm test
      - name: review
        agent: reviewer
        dependsOn: [test]
        task: 'Review implementation'
        verification: { type: output_contains, value: REVIEW_COMPLETE }
coordination:
  barriers:
    - { name: delivery-ready, waitFor: [plan, implement, review], timeoutMs: 900000 }
trajectories:
  enabled: true
  reflectOnBarriers: true
  reflectOnConverge: true
errorHandling:
  strategy: retry
  maxRetries: 2
  retryDelayMs: 5000

Source of Truth

Claim	File
Pattern enum (24 patterns)	`@relayflows/core/dist/schema.d.ts` (`SwarmPattern`)
Topology resolution per pattern	`@relayflows/core/dist/coordinator.js`
Interactive-only topology edges	`@relayflows/core/dist/coordinator.js` filters `interactive: false` agents
Pattern auto-selection heuristics	`@relayflows/core/dist/coordinator.js`
`WorkflowBuilder` fluent API	`@relayflows/core/dist/builder.d.ts`
`runWorkflow(yamlPath, options)`	`@relayflows/core/dist/run.d.ts`
YAML validation requires `version` + `name` + `swarm.pattern`	`@relayflows/core/dist/runner.js`
MCP tool names	`packages/cli/src/cli/agent-relay-mcp.ts`, `@relayflows/core/dist/channel-messenger.js`
Completion modes (verification / evidence / owner / process-exit)	`@relayflows/core/dist/runner.js`, `@relayflows/core/dist/step-executor.js`
Trajectory reflection	`@relayflows/core/dist/trajectory.js`, `@relayflows/core/dist/runner.js`