choosing-swarm-patterns

star 726

Use when coordinating multiple AI agents with Agent Relay's workflow engine and need to pick the right orchestration pattern - covers the 10 core patterns (fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, hierarchical) plus 14 specialized ones, with decision framework and accurate workflow/YAML examples.

AgentWorkforce By AgentWorkforce schedule Updated 6/10/2026

name: choosing-swarm-patterns description: Use when coordinating multiple AI agents with Agent Relay's workflow engine and need to pick the right orchestration pattern - covers the 10 core patterns (fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, hierarchical) plus 14 specialized ones, with decision framework and accurate workflow/YAML examples.

Choosing Swarm Patterns

Overview

The Agent Relay workflow engine (@relayflows/core) supports 24 swarm patterns via a single swarm.pattern field. Patterns are configured declaratively in YAML or programmatically via the workflow() fluent builder — there are no standalone fanOut(...) / hubAndSpoke(...) helpers. Pick the simplest pattern that solves the problem; add complexity only when the system proves it's insufficient.

Two ways to run a pattern

1. YAML (portable):

import { runWorkflow } from '@relayflows/core';

const run = await runWorkflow('workflows/feature-dev.yaml', {
  vars: { task: 'Add OAuth login' },
});

2. Fluent builder (programmatic):

import { workflow } from '@relayflows/core';

const run = await workflow('feature-dev')
  .pattern('hub-spoke')
  .channel('swarm-feature-dev')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('developer', { cli: 'codex', role: 'worker', interactive: false })
  .step('plan', { agent: 'lead', task: 'Plan {{task}}' })
  .step('implement', { agent: 'developer', task: 'Implement: {{steps.plan.output}}', dependsOn: ['plan'] })
  .run();

Both paths hit the same WorkflowRunner.

Quick Decision Framework

Is the task independent per agent?
  YES → fan-out (parallel workers, hub collects)

Does each step need the previous step's output?
  YES → Is it strictly linear?
    YES → pipeline
    NO  → dag (parallel where possible, `dependsOn` edges)

Does a coordinator need to stay alive and adapt?
  YES → hub-spoke (single-level hub + workers)
        hierarchical (structurally identical in current impl; use for naming/intent)

Is the task about making a decision?
  YES → Do agents need to argue opposing sides?
    YES → debate (adversarial, full mesh)
    NO  → consensus (cooperative, full mesh + coordination.consensusStrategy)

Does the right specialist emerge during processing?
  YES → handoff (sequential chain, one active at a time)

Do all agents need to freely collaborate?
  YES → mesh (full peer-to-peer edges)

Is cost the primary concern?
  YES → cascade (chain of increasingly capable agents; each step's prompt
        decides whether to pass through or redo the prior output)

Pattern Reference (Core 10)

# Pattern Topology (actual edges) Best For
1 fan-out Hub broadcasts to N workers; workers reply to hub only Independent subtasks (reviews, research, tests)
2 pipeline Linear chain (agenti → agent{i+1}) Ordered stages (design → implement → test)
3 hub-spoke Hub ↔ spokes (bidirectional); no spoke-to-spoke Dynamic coordination, lead reviews/adjusts
4 consensus Full mesh; decision via coordination.consensusStrategy Architecture decisions, approval gates
5 mesh Full mesh (every agent ↔ every other) Brainstorming, collaborative debugging
6 handoff Chain; passes control forward Triage, specialist routing
7 cascade Chain of dependsOn steps; all run on success, downstream skipped on upstream failure (no built-in "fall through") Cost optimization: cheap first, each step's prompt passes through or redoes
8 dag Edges from step dependsOn Mixed dependencies, parallel where possible
9 debate Full mesh (same topology as mesh; roles drive behavior) Rigorous adversarial examination
10 hierarchical Hub + subordinates (single-level in current impl) Large teams; semantic distinction from hub-spoke

Heads up: hierarchical resolves to the same edge structure as hub-spoke in coordinator.ts:313-319. Multi-level tree topology is not currently implemented — use pattern name for intent, but expect the same runtime graph.

Additional Patterns (role-driven)

These 14 additional patterns exist in SwarmPattern (types.ts:114-139). The coordinator has role-based auto-selection heuristics (coordinator.ts:51-165), but they only fire when swarm.pattern is omitted — YAML validation requires it (runner.ts:2105-2117), so auto-selection is effectively a programmatic-API feature. In YAML, set swarm.pattern explicitly.

Topology is still resolved per-pattern once selected; the "Triggering roles" column reflects what the coordinator looks for to shape edges (per coordinator.ts:250-450):

Pattern Roles the topology keys off Topology
map-reduce mapper + reducer coordinator → mappers → reducers → coordinator
scatter-gather hub → workers → hub
supervisor supervisor supervisor ↔ workers
reflection critic or reviewer (auto-select uses critic only) producers → critic → producers (loop)
red-team attacker/red-team + defender/blue-team adversarial mesh with optional judges
verifier verifier producers → verifiers → back to producers
auction auctioneer auctioneer → bidders → auctioneer
escalation tier-* tiered chain, escalate up / report down
saga saga-orchestrator, compensate-handler orchestrator ↔ participants
circuit-breaker primary + fallback/backup try primary, fallback on failure
blackboard blackboard / shared-workspace shared state hub
swarm hive-mind / swarm-agent stigmergy-style
competitive — (declared explicitly) independent parallel implementations + judge
review-loop implement* + 2+ reviewer* implementer ↔ reviewers

Structured Squad Review Loop

For serious implementation work, especially workflow generation or product-contract changes, prefer a composite squad-review-loop recipe over a plain single implementer plus final reviewer. This is a workflow authoring recipe built from existing patterns, not a separate SDK enum unless the local runner has added one.

Use this when the fastest reliable path is small teams of 2-3 agents working in parallel with live feedback:

  1. Split the work into bounded implementation squads. Each squad owns a non-overlapping file or subsystem scope.
  2. Give each squad an implementer plus a shadow/review partner. The shadow follows the implementer in real time, checks alignment with the spec, and posts concise feedback before the work drifts.
  3. Require the implementer to self-reflect before external review: compare the final diff against the spec, AGENTS.md / CLAUDE.md, recent local conventions, tests, and declared non-goals.
  4. Run an independent self-review/fresh-eyes agent that reads the actual files and recent repo context, not just the chat transcript.
  5. Send that review back to the implementer for one repair round.
  6. After squads converge, run a final two-agent review team, usually one Claude reviewer and one Codex reviewer, independently. They compare notes, merge findings, and produce one final verdict.
  7. Spawn fresh fix agents for final-review findings. Those fix agents self-reflect, then the final reviewers re-check the post-fix state until the spec is fully satisfied or a blocker is documented.

Pattern selection for this recipe:

  • Use supervisor or hub-spoke when a lead needs to coordinate live squads.
  • Use review-loop when the main risk is code quality and feedback iteration.
  • Use reflection when critic feedback should loop directly back to producers.
  • Use verifier when completion evidence matters more than design debate.
  • Use competitive only when independent alternative implementations are useful; otherwise split by ownership scope.

Keep squads small. Two or three agents per squad is usually the useful limit: implementer, shadow/reviewer, and optionally test/validation owner. More agents belong in separate squads or in the final review team.

Pattern Details

All examples below use real API shapes (WorkflowBuilder / YAML), verified against @relayflows/core's builder.d.ts and schema.d.ts.

YAML fragments vs complete configs: The per-pattern YAML snippets below are fragments that show only the pattern-relevant shape. A runnable YAML file also requires version: "1.0" and name: <id> at the top (runner.ts:2105-2117). See the Complete YAML Example for the full structure.

Topology edges exclude interactive: false agents. resolveTopology (coordinator.ts:218-237) drops non-interactive agents from the message graph — they run as one-shot subprocesses with no relay connection. Topology claims like "hub ↔ spokes" describe the interactive-agent edges; workers marked interactive: false are spawned and collected via stdout, not via relay messages.

1. fan-out — Parallel Workers

await workflow('review')
  .pattern('fan-out')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('auth-rev', { cli: 'claude', role: 'worker', interactive: false })
  .agent('db-rev', { cli: 'claude', role: 'worker', interactive: false })
  .step('review-auth', { agent: 'auth-rev', task: 'Review auth.ts' })
  .step('review-db', { agent: 'db-rev', task: 'Review db.ts' })
  .run();

Workers run independently; hub aggregates. No inter-worker edges.

2. pipeline — Sequential Stages

swarm: { pattern: pipeline }
agents:
  - { name: designer, cli: claude }
  - { name: implementer, cli: codex, interactive: false }
  - { name: tester, cli: codex, interactive: false }
workflows:
  - name: build
    steps:
      - {
          name: design,
          agent: designer,
          task: 'Design the API schema',
          verification: { type: output_contains, value: DONE },
        }
      - {
          name: implement,
          agent: implementer,
          dependsOn: [design],
          task: 'Implement: {{steps.design.output}}',
        }
      - { name: test, agent: tester, dependsOn: [implement], task: 'Write integration tests' }

Each stage receives the previous stage's output via {{steps.<name>.output}}. Halts on step failure unless onError: retry / continue.

3. hub-spoke — Persistent Coordinator

await workflow('api-build')
  .pattern('hub-spoke')
  .channel('swarm-api')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('db-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
  .agent('api-worker', { cli: 'claude', role: 'worker' }) // interactive by default — hub DMs it
  .step('models', { agent: 'db-worker', task: 'Build database models' })
  .step('routes', { agent: 'api-worker', task: 'Build route handlers', dependsOn: ['models'] })
  .step('review', { agent: 'lead', task: 'Review everything', dependsOn: ['routes'] })
  .run();

Hub (picked via role: lead or first agent) stays on the channel and direct-messages interactive workers via the flat send_dm MCP tool, often exposed by workflow prompts as mcp__relaycast__send_dm.

Don't set interactive: false on a hub-spoke worker if you want it to receive coordination DMs — resolveTopology strips non-interactive agents from the message graph (coordinator.ts:218-237). Use interactive: false only when the worker is a one-shot subprocess whose stdout you collect via {{steps.X.output}} without any mid-run coordination.

4. consensus — Cooperative Voting

swarm: { pattern: consensus }
agents:
  - { name: perf, cli: claude, role: reviewer }
  - { name: dx, cli: claude, role: reviewer }
  - { name: sec, cli: claude, role: reviewer }
coordination:
  consensusStrategy: majority # declarative marker: majority | unanimous | quorum
  votingThreshold: 0.66
workflows:
  - name: decide
    steps:
      - { name: evaluate-perf, agent: perf, task: 'Evaluate perf of Fastify migration' }
      - { name: evaluate-dx, agent: dx, task: 'Evaluate DX of Fastify migration' }
      - { name: evaluate-sec, agent: sec, task: 'Evaluate security of Fastify migration' }

Full-mesh topology. Caveat: coordination.consensusStrategy and votingThreshold are declared in CoordinationConfig (types.ts:768-772) but the runner has no built-in vote-tallying logic — the fields only influence coordinator auto-selection (coordinator.ts:63-64). To implement voting, aggregate the step outputs in a downstream lead/judge step that reads {{steps.evaluate-*.output}}.

5. mesh — Peer Collaboration

await workflow('debug-auth')
  .pattern('mesh')
  .channel('swarm-debug')
  .agent('logs', { cli: 'claude' })
  .agent('code', { cli: 'claude' })
  .agent('repro', { cli: 'claude' })
  .step('logs', { agent: 'logs', task: 'Check server logs' })
  .step('code', { agent: 'code', task: 'Review auth code' })
  .step('repro', { agent: 'repro', task: 'Write repro test' })
  .run();

Every agent ↔ every other agent. Use for collaborative exploration without hierarchy.

6. handoff — Dynamic Routing

swarm: { pattern: handoff }
agents:
  - { name: triage, cli: claude }
  - { name: billing, cli: claude }
  - { name: tech, cli: claude }
workflows:
  - name: support
    steps:
      - { name: triage, agent: triage, task: 'Triage: {{request}}' }
      - { name: billing, agent: billing, dependsOn: [triage], task: 'Handle billing' }
      - { name: tech, agent: tech, dependsOn: [triage], task: 'Handle tech issues' }

Chain passes control forward. Note: The runner doesn't support "route to one branch and skip the others" declaratively — dependsOn steps all run when their dependencies complete, and skipping is only triggered by upstream failure (runner.ts:7057-7088). For true pick-one routing, have the triage step emit a routing token in its output and let each downstream step's prompt check {{steps.triage.output}} and no-op if it doesn't match.

7. cascade — Cost-Aware Fallthrough

await workflow('answer')
  .pattern('cascade')
  .agent('haiku', { cli: 'claude', model: 'claude-haiku-4-5-20251001' })
  .agent('sonnet', { cli: 'claude', model: 'claude-sonnet-4-6' })
  .agent('opus', { cli: 'claude', model: 'claude-opus-4-7' })
  .step('try-haiku', { agent: 'haiku', task: '{{question}}' })
  .step('try-sonnet', {
    agent: 'sonnet',
    task: 'If this is a complete answer, echo it verbatim. Otherwise answer anew:\n{{steps.try-haiku.output}}',
    dependsOn: ['try-haiku'],
  })
  .step('try-opus', {
    agent: 'opus',
    task: 'Final-tier answer, using prior attempts for context:\n{{steps.try-sonnet.output}}',
    dependsOn: ['try-sonnet'],
  })
  .run();

Important: cascade only sets edge topology. The runner has no skip-on-success logic for the cascade pattern — a chain of dependsOn steps all execute in order on success, and failed upstream steps mark their dependents as skipped (step-executor.ts:329-334, runner.ts:7057-7088). So a verification-gated first step won't "fall through" to later steps on failure, and won't skip them on success either. The idiom above delegates the escalation decision to the prompt of each downstream step (read the upstream answer and pass-through or redo). No confidence-score parsing exists in-engine.

8. dag — Directed Acyclic Graph

await workflow('fullstack')
  .pattern('dag')
  .maxConcurrency(3)
  .agent('dev', { cli: 'codex', role: 'worker' })
  .step('scaffold', { agent: 'dev', task: 'Create project scaffold' })
  .step('frontend', { agent: 'dev', task: 'Build React UI', dependsOn: ['scaffold'] })
  .step('backend', { agent: 'dev', task: 'Build API', dependsOn: ['scaffold'] })
  .step('integrate', { agent: 'dev', task: 'Wire together', dependsOn: ['frontend', 'backend'] })
  .run();

Runner derives execution waves from dependsOn; independent nodes run in parallel up to swarm.maxConcurrency. The dag pattern is auto-selected when any step has dependsOn.

9. debate — Adversarial Refinement

Debate currently shares the full-mesh topology with mesh and consensus. Differentiate via roles + task prompts:

swarm: { pattern: debate }
agents:
  - { name: pro, cli: claude, role: debater, task: 'Argue FOR monorepo' }
  - { name: con, cli: claude, role: debater, task: 'Argue FOR polyrepo' }
  - { name: judge, cli: claude, role: judge, task: 'Decide after 3 rounds' }
coordination:
  barriers:
    - { name: debate-done, waitFor: [pro-round-3, con-round-3] }

Drive rounds and verdicts through the agent's system prompt/task, not a dedicated maxRounds knob — there isn't one at the pattern level.

10. hierarchical — Multi-Level (structurally hub-spoke today)

await workflow('large-team')
  .pattern('hierarchical')
  .agent('lead', { cli: 'claude', role: 'lead' })
  .agent('fe-coord', { cli: 'claude', role: 'coordinator' })
  .agent('be-coord', { cli: 'claude', role: 'coordinator' })
  .agent('fe-dev', { cli: 'codex', role: 'worker', interactive: false })
  .agent('be-dev', { cli: 'codex', role: 'worker', interactive: false })
  .step('plan', { agent: 'lead', task: 'Coordinate full-stack app' })
  .step('fe-plan', { agent: 'fe-coord', task: 'Manage frontend', dependsOn: ['plan'] })
  .step('be-plan', { agent: 'be-coord', task: 'Manage backend', dependsOn: ['plan'] })
  .step('fe-impl', { agent: 'fe-dev', task: 'Build components', dependsOn: ['fe-plan'] })
  .step('be-impl', { agent: 'be-dev', task: 'Build API', dependsOn: ['be-plan'] })
  .run();

Coordinator/worker distinction is expressed in step dependsOn graph, not topology. Agent edges collapse to single-level hub-spoke.

Verification & Completion Signals

An agent step can complete in several ways (runner.ts:5353-5395, runner.ts:4527-4538):

  • Verification pass — when the step declares a verification block and the output satisfies it.
  • Clean process exit — agent exits 0 with no verification configured.
  • Evidence-based — channel posts, file changes, or coordination signals trigger completion.
  • Owner decision — a lead-role agent posts COMPLETE / INCOMPLETE_RETRY / INCOMPLETE_FAIL for the step.

Verification block shape:

verification:
  type: output_contains # or: exit_code | file_exists | custom
  value: DONE # or: PLAN_COMPLETE, IMPLEMENTATION_COMPLETE, REVIEW_COMPLETE

Conventional signals expected by the @relayflows/core runner:

  • ACK: ... — received a task
  • DONE: ... — task complete

The runner captures PTY chunks as step output and also records channel posts + file changes as StepCompletionEvidence. Legacy fallback: a file at .relay/summaries/{stepName}.md is read if PTY output is empty.

Agent Relay MCP - Correct Tool Names

The old category-expanded names are wrong. Current Agent Relay MCP tools are flat names. In a client that decorates MCP tools, the prefix comes from the configured server key; workflow prompts commonly show mcp__relaycast__send_dm, while an agent-relay server key may expose mcp__agent_relay__send_dm.

Purpose Canonical tool Common workflow-prefixed form
Send DM to another agent send_dm mcp__relaycast__send_dm
Check inbox check_inbox mcp__relaycast__check_inbox
List agents list_agents mcp__relaycast__list_agents
Post to a channel post_message mcp__relaycast__post_message
Reply in a thread reply_to_thread mcp__relaycast__reply_to_thread
Spawn sub-agent add_agent mcp__relaycast__add_agent
Remove sub-agent remove_agent mcp__relaycast__remove_agent

interactive: false agents run as non-interactive subprocesses with no relay connection. They must not call Relay MCP tools.

Reflection (Trajectories)

Reflection is not a reflectionThreshold callback. It's configured via the trajectories: block:

trajectories:
  enabled: true
  reflectOnBarriers: true # config flag exists but runner does NOT currently invoke this path
  reflectOnConverge: true # fires at parallel convergence points (runner.ts:2762-2779)
  autoDecisions: true # record retry/skip/fail decisions

What actually runs today: only reflectOnConverge is wired into the runner (runner.ts:2762-2779). shouldReflectOnBarriers is defined in trajectory.ts:486-487 but not called — set the flag if you want forward compatibility, but don't depend on it.

Programmatic equivalent:

workflow('x').trajectories({ enabled: true, reflectOnConverge: true });

For a first-class critic loop, use the reflection pattern (agents with role: critic get wired as reviewers in coordinator.ts:363-378).

Common Mistakes

Mistake Why It Fails Fix
Using mesh/debate for everything Full-mesh blows up message volume past ~5 agents Use hub-spoke or dag for most tasks
Pipeline for independent work Sequential bottleneck Use fan-out or dag
Hub-spoke for 2 agents Hub is unnecessary overhead Use pipeline or fan-out
Expecting consensusStrategy to tally votes Runner has no vote-tally logic; field only affects coordinator auto-selection Aggregate votes in a judge/lead step that reads {{steps.*.output}}
Handoff with "routing = skip other branches" Skipping only fires on upstream failure, not routing decisions Emit a routing token in triage output; downstream prompts self-no-op if token doesn't match
Cascade expecting skip-on-success Runner has no cascade skip logic; failed upstream skips downstream Chain downstream prompts to pass-through or redo based on {{steps.previous.output}}
Relying on reflectOnBarriers Config flag exists but runner never calls it Use reflectOnConverge for convergence reflection; use reflection pattern for critic loops
interactive: false agent calling MCP Non-interactive subprocess has no relay Use interactive: true (default) or emit output on stdout
Relying on multi-level hierarchical Topology is single-level hub in current impl Use pattern for naming; model levels via dependsOn graph
Writing mcp__relaycast__send(...) Wrong tool name Use post_message / mcp__relaycast__post_message or send_dm / mcp__relaycast__send_dm

Resume & Re-run

// Resume a failed run:
await runWorkflow('feature-dev.yaml', { resume: '<runId>' });

// Skip ahead, re-using cached outputs from an earlier run:
await runWorkflow('feature-dev.yaml', {
  startFrom: 'review',
  previousRunId: '<runId>',
});

Cached outputs live in .agent-relay/step-outputs/; runs in .agent-relay/workflow-runs.jsonl. Env vars RESUME_RUN_ID, START_FROM, PREVIOUS_RUN_ID are auto-detected.

Complete YAML Example

version: '1.0'
name: feature-dev
description: 'Blueprint-style feature development with quality gates.'
swarm:
  pattern: hub-spoke
  maxConcurrency: 2
  timeoutMs: 3600000
  channel: swarm-feature-dev
  idleNudge: { nudgeAfterMs: 120000, escalateAfterMs: 120000, maxNudges: 1 }
agents:
  - { name: lead, cli: claude, role: lead, permissions: { access: full } }
  - { name: planner, cli: codex, role: planner, interactive: false, permissions: { access: readonly } }
  - { name: developer, cli: codex, role: worker, interactive: false, permissions: { access: readwrite } }
  - { name: reviewer, cli: claude, role: reviewer, permissions: { access: readonly } }
workflows:
  - name: feature-delivery
    onError: retry
    preflight:
      - { command: 'git status --porcelain', failIf: non-empty, description: 'Clean worktree' }
    steps:
      - name: plan
        agent: planner
        task: 'Plan: {{task}}'
        verification: { type: output_contains, value: PLAN_COMPLETE }
      - name: implement
        agent: developer
        dependsOn: [plan]
        task: 'Implement: {{steps.plan.output}}'
        verification: { type: output_contains, value: IMPLEMENTATION_COMPLETE }
      - name: test
        type: deterministic
        dependsOn: [implement]
        command: npm test
      - name: review
        agent: reviewer
        dependsOn: [test]
        task: 'Review implementation'
        verification: { type: output_contains, value: REVIEW_COMPLETE }
coordination:
  barriers:
    - { name: delivery-ready, waitFor: [plan, implement, review], timeoutMs: 900000 }
trajectories:
  enabled: true
  reflectOnBarriers: true
  reflectOnConverge: true
errorHandling:
  strategy: retry
  maxRetries: 2
  retryDelayMs: 5000

Built-in templates live in @relayflows/core/dist/builtin-templates/ (feature-dev, bug-fix, code-review, competitive, documentation, refactor, review-loop, security-audit).

Source of Truth

Claim File
Pattern enum (24 patterns) @relayflows/core/dist/schema.d.ts (SwarmPattern)
Topology resolution per pattern @relayflows/core/dist/coordinator.js
Interactive-only topology edges @relayflows/core/dist/coordinator.js filters interactive: false agents
Pattern auto-selection heuristics @relayflows/core/dist/coordinator.js
WorkflowBuilder fluent API @relayflows/core/dist/builder.d.ts
runWorkflow(yamlPath, options) @relayflows/core/dist/run.d.ts
YAML validation requires version + name + swarm.pattern @relayflows/core/dist/runner.js
MCP tool names packages/cli/src/cli/agent-relay-mcp.ts, @relayflows/core/dist/channel-messenger.js
Completion modes (verification / evidence / owner / process-exit) @relayflows/core/dist/runner.js, @relayflows/core/dist/step-executor.js
Trajectory reflection @relayflows/core/dist/trajectory.js, @relayflows/core/dist/runner.js
Install via CLI
npx skills add https://github.com/AgentWorkforce/relay --skill choosing-swarm-patterns
Repository Details
star Stars 726
call_split Forks 58
navigation Branch main
article Path SKILL.md
More from Creator
AgentWorkforce
AgentWorkforce Explore all skills →