coordination-patterns

star 0

This skill should be used when the user asks about "agent coordination", "MAS architecture", "blackboard pattern", "orchestrator pattern", "how agents communicate", "multi-agent workflow", "event-driven agents", "context engineering", "control flow", "stateless reducers", or needs to design how multiple agents work together. Covers patterns from Planner-Executor-Verifier to event-driven architectures, plus 12 Factor Agents principles.

MarkusDegen By MarkusDegen schedule Updated 1/17/2026

name: Coordination Patterns description: This skill should be used when the user asks about "agent coordination", "MAS architecture", "blackboard pattern", "orchestrator pattern", "how agents communicate", "multi-agent workflow", "event-driven agents", "context engineering", "control flow", "stateless reducers", or needs to design how multiple agents work together. Covers patterns from Planner-Executor-Verifier to event-driven architectures, plus 12 Factor Agents principles. version: 1.1.0

Coordination Patterns

Pattern Selection Guide

Pattern When to Use Complexity
Planner→Executor→Verifier Default starting point Low
Blackboard Multiple agents, shared state Medium
Orchestrator-Worker Dynamic task assignment Medium
Hierarchical Deep delegation chains High
Event-Driven High reliability needs High
Market-Based Dynamic load balancing High

Pattern 1: Planner → Executor → Verifier (Minimum Viable MAS)

The baseline pattern that works most often.

User Request
    │
    ▼
┌─────────┐
│ Planner │ ──► Task Graph
└─────────┘
    │
    ▼
┌──────────┐
│ Executor │ ──► Results
└──────────┘
    │
    ▼
┌──────────┐
│ Verifier │ ──► PASS/FAIL
└──────────┘
    │
    ├─► PASS ──► Return Result
    │
    └─► FAIL ──► Re-plan with Feedback

Implementation

## Coordination Protocol

1. Planner receives requirements, outputs task graph
2. Executor processes tasks sequentially or parallel
3. Verifier checks all outputs against requirements
4. On FAIL: Planner receives feedback, re-plans
5. Max 3 iterations before escalation

When to Use

  • Starting a new MAS project
  • Tasks have clear decomposition
  • Verification is important but not adversarial

Limitations

  • Sequential bottleneck if tasks independent
  • Single verifier may miss issues
  • No internal checkpoints

Pattern 2: Blackboard Architecture

All agents read from and write to shared state.

┌─────────────────────────────────────┐
│           BLACKBOARD                │
│  ┌─────────┐ ┌─────────┐ ┌───────┐  │
│  │  Plan   │ │ Results │ │ State │  │
│  └─────────┘ └─────────┘ └───────┘  │
└─────────────────────────────────────┘
       ▲            ▲           ▲
       │            │           │
   ┌───┴───┐   ┌────┴────┐  ┌───┴────┐
   │Planner│   │Executors│  │Verifier│
   └───────┘   └─────────┘  └────────┘

Implementation

{
  "blackboard": {
    "sections": {
      "plan": {
        "owner": "Planner",
        "writers": ["Planner"],
        "readers": ["Executor", "Verifier"]
      },
      "results": {
        "owner": "Executor",
        "writers": ["Executor"],
        "readers": ["Verifier", "Orchestrator"]
      },
      "verdicts": {
        "owner": "Verifier",
        "writers": ["Verifier"],
        "readers": ["Planner", "Orchestrator"]
      }
    }
  }
}

Key Rules

  1. No direct overwrites: Agents cannot modify others' sections
  2. Versioned updates: Every write increments version
  3. Read permissions: Explicit per section
  4. Conflict-free: Writers have exclusive sections

When to Use

  • Multiple agents need shared context
  • Want to avoid direct agent-to-agent communication
  • Need audit trail of all state changes

Pattern 3: Orchestrator-Worker

Central orchestrator assigns tasks to worker agents.

          ┌──────────────┐
          │ Orchestrator │
          └──────────────┘
           /      |      \
          ▼       ▼       ▼
    ┌────────┐ ┌────────┐ ┌────────┐
    │Worker 1│ │Worker 2│ │Worker 3│
    └────────┘ └────────┘ └────────┘

Implementation

## Orchestrator Responsibilities
- Receive user request
- Decompose into tasks
- Assign to available workers
- Collect results
- Handle failures and retries
- Return final result

## Worker Protocol
- Poll for tasks or receive push
- Execute assigned task
- Report result to orchestrator
- No direct worker-to-worker communication

When to Use

  • Dynamic workload distribution
  • Workers are interchangeable
  • Need central control point

Pattern 4: Hierarchical Agent

Layered delegation with parent-child relationships.

              ┌───────────┐
              │  Manager  │
              └───────────┘
               /         \
              ▼           ▼
       ┌──────────┐ ┌──────────┐
       │ TeamLead │ │ TeamLead │
       └──────────┘ └──────────┘
        /       \       |
       ▼         ▼      ▼
   ┌──────┐ ┌──────┐ ┌──────┐
   │Worker│ │Worker│ │Worker│
   └──────┘ └──────┘ └──────┘

Implementation

## Delegation Rules
- Parents decompose tasks for children
- Children report completion to parent only
- No cross-branch communication
- Escalation goes up the hierarchy

## Span of Control
- Optimal: 3-5 direct reports per agent
- Max: 7 (coordination overhead scales)

When to Use

  • Complex domains with natural hierarchy
  • Different abstraction levels needed
  • Clear chains of responsibility

Pattern 5: Event-Driven Architecture

Agents react to events rather than direct calls.

┌─────────────────────────────────────┐
│           EVENT BUS                 │
└─────────────────────────────────────┘
     ▲         ▲         ▲         ▲
     │ publish │ publish │ publish │
     │         │         │         │
┌────┴──┐ ┌────┴──┐ ┌────┴──┐ ┌────┴──┐
│Agent A│ │Agent B│ │Agent C│ │Agent D│
└───────┘ └───────┘ └───────┘ └───────┘
     │         │         │         │
     ▼ sub     ▼ sub     ▼ sub     ▼ sub
┌─────────────────────────────────────┐
│           EVENT BUS                 │
└─────────────────────────────────────┘

Event Types

{
  "event_types": {
    "TaskCreated": { "triggers": ["Executor"] },
    "TaskCompleted": { "triggers": ["Verifier", "Logger"] },
    "VerificationFailed": { "triggers": ["Planner"] },
    "SystemError": { "triggers": ["AlertHandler"] }
  }
}

Benefits

  • Reliable coordination: Replayable events enable recovery
  • Loose coupling: Agents don't know about each other
  • Scalable: Easy to add new agents
  • Auditable: Complete event history

When to Use

  • High reliability requirements
  • Need fault tolerance
  • Complex event dependencies
  • Async processing beneficial

Pattern 6: Escalation Over Consensus

When agents disagree, escalate—don't vote.

┌─────────┐ ┌─────────┐
│ Agent A │ │ Agent B │
└─────────┘ └─────────┘
     │           │
     ▼           ▼
  Output A    Output B
     │           │
     └─────┬─────┘
           │ (conflict?)
           ▼
     ┌───────────┐
     │ Escalator │ ──► Final Decision
     └───────────┘

Why Not Vote?

  • Averaging dilutes correctness
  • Majority can be wrong
  • Loses minority insight

Escalation Protocol

1. Detect conflict (outputs differ significantly)
2. Preserve both outputs with reasoning
3. Escalate to higher-authority agent
4. Authority decides based on evidence, not popularity
5. Log decision rationale for learning

Anti-Patterns to Avoid

Anti-Pattern 1: Synchronous Blocking Chains

Bad: Agent A calls Agent B calls Agent C, each waiting.

Impact: Latency accumulates, one failure blocks all.

Fix: Use async message passing or events.

Anti-Pattern 2: Implicit State Sharing

Bad: Agents assume shared context without explicit state.

Impact: Race conditions, state corruption.

Fix: Use blackboard with explicit read/write permissions.

Anti-Pattern 3: Perfect Harmony

Bad: System designed for agents to always agree.

Impact: Groupthink, missed errors.

Fix: Add controlled friction (critics, independent verification).

12 Factor Agents: Control Flow and State

The 12 Factor Agents framework provides engineering principles for coordination.

Factor 3: Own Your Context Building

Principle: Everything that makes agents good is context engineering. Understand what happens at the token level.

Context building components:

  1. System prompt - Agent identity and instructions
  2. RAG results - Retrieved relevant information
  3. Memory - Episodic and semantic recall
  4. Agentic history - Previous steps in this workflow
  5. Structured output instructions - Format requirements

Explicit Context Building Pattern:

def build_context(agent_id: str, task: dict) -> list:
    """Explicit context assembly - no magic."""
    context = []

    # 1. System prompt (Factor 2 - own your prompts)
    context.append({"role": "system", "content": AGENT_PROMPTS[agent_id]})

    # 2. RAG - retrieve relevant documents
    relevant_docs = retrieve(task["query"], top_k=3)
    context.append({"role": "system", "content": format_docs(relevant_docs)})

    # 3. Memory - recall from past
    memories = recall(agent_id, task["context"])
    context.append({"role": "system", "content": format_memories(memories)})

    # 4. Agentic history - what happened so far
    history = get_workflow_history(task["workflow_id"])
    context.append({"role": "system", "content": format_history(history)})

    # 5. Current task
    context.append({"role": "user", "content": task["input"]})

    return context

Context Budget:

Component Token Budget Purpose
System prompt 500-1000 Agent identity
RAG results 1000-2000 Relevant knowledge
Memory 500-1000 Past experiences
History 500-1000 Workflow context
Task input Variable Current request

Key insight: If you don't understand what happens at the token level, you miss optimization opportunities.

Factor 5/6: Unified Execution and Business State

Principle: Enable Launch/Pause/Resume with simple APIs. Unify what's happening (execution) with what's happened (business).

Unified State Schema:

{
  "workflow_id": "uuid",
  "status": "running|paused|completed|failed",

  "execution_state": {
    "current_step": "step_name",
    "next_step": "step_name|null",
    "waiting_for": "human_input|external_api|null",
    "retry_config": {
      "attempts": 2,
      "max_attempts": 3,
      "backoff_ms": 1000
    }
  },

  "business_state": {
    "messages": [],
    "tool_calls": [],
    "tool_results": [],
    "decisions_made": [],
    "artifacts_produced": []
  },

  "timestamps": {
    "created": "ISO8601",
    "last_updated": "ISO8601",
    "paused_at": "ISO8601|null",
    "resumed_at": "ISO8601|null"
  }
}

Launch/Pause/Resume API:

class WorkflowController:
    def launch(self, workflow_id: str, initial_input: dict) -> str:
        """Start workflow, return workflow_id."""
        state = create_initial_state(workflow_id, initial_input)
        self.state_store.save(workflow_id, state)
        self.execute_next_step(workflow_id)
        return workflow_id

    def pause(self, workflow_id: str, reason: str) -> bool:
        """Pause workflow, preserving all state."""
        state = self.state_store.load(workflow_id)
        state["status"] = "paused"
        state["timestamps"]["paused_at"] = now()
        state["execution_state"]["pause_reason"] = reason
        self.state_store.save(workflow_id, state)
        return True

    def resume(self, workflow_id: str) -> bool:
        """Resume from exactly where we left off."""
        state = self.state_store.load(workflow_id)
        state["status"] = "running"
        state["timestamps"]["resumed_at"] = now()
        self.state_store.save(workflow_id, state)
        self.execute_next_step(workflow_id)
        return True

See references/state-management.md for detailed implementation.

Factor 8: Own Your Control Flow

Principle: Don't let the LLM control the entire DAG. If you own control flow, you can Break, Switch, Summarize, Judge.

Control Flow Operations:

Operation Purpose When to Use
Break Stop agent loop early Error threshold, timeout, explicit stop signal
Switch Route to different agent Based on output classification
Summarize Compress context Approaching token limit
Judge Evaluate quality Before committing results

Anti-pattern: LLM-Controlled DAG

# BAD: LLM decides what to do next autonomously
response = llm.call("You have full control. What should we do next?")
next_action = parse_action(response)  # LLM controls flow

Pattern: Code-Controlled DAG

# GOOD: Code owns the control flow
def workflow_step(state: dict) -> dict:
    # 1. Execute current step
    result = execute_step(state["current_step"], state)

    # 2. Code decides next step (not LLM)
    if result["needs_human"]:
        return transition(state, "human_input")  # BREAK for human
    elif result["context_tokens"] > 6000:
        return transition(state, "summarize")    # SUMMARIZE
    elif result["quality_uncertain"]:
        return transition(state, "verify")       # JUDGE
    elif result["category"] == "complex":
        return transition(state, "specialist")   # SWITCH
    else:
        return transition(state, result["next"]) # CONTINUE

Smaller Focused Prompts Beat Long Autonomous Runs:

Instead of:
  "Do everything: plan, execute, verify, report"

Use:
  Step 1: "Create a plan" (focused prompt)
  [Code evaluates plan quality]
  Step 2: "Execute task X" (focused prompt)
  [Code checks result]
  Step 3: "Verify result" (focused prompt)
  [Code decides next step]

Factor 12: Stateless Reducers

Principle: Agent logic as pure functions that reduce (state, event) → new_state. Enables replay, debugging, and reasoning about behavior.

Reducer Pattern:

def agent_reducer(state: dict, event: dict) -> dict:
    """
    Pure function: no side effects, deterministic output.

    Args:
        state: Current workflow state
        event: What just happened (user input, tool result, etc.)

    Returns:
        New state (never mutates input)
    """
    new_state = deepcopy(state)

    match event["type"]:
        case "USER_INPUT":
            new_state["business_state"]["messages"].append(event["data"])
            new_state["execution_state"]["next_step"] = "plan"

        case "PLAN_CREATED":
            new_state["business_state"]["plan"] = event["data"]
            new_state["execution_state"]["next_step"] = "execute"

        case "TASK_COMPLETED":
            new_state["business_state"]["results"].append(event["data"])
            remaining = get_remaining_tasks(new_state)
            new_state["execution_state"]["next_step"] = "execute" if remaining else "verify"

        case "VERIFICATION_FAILED":
            new_state["execution_state"]["retry_config"]["attempts"] += 1
            new_state["execution_state"]["next_step"] = "replan"

    new_state["timestamps"]["last_updated"] = now()
    return new_state

Benefits of Reducer Pattern:

  • Replay: Feed same events, get same state
  • Debugging: Inspect state at any point
  • Testing: Pure functions are easy to test
  • Time travel: Rollback by replaying subset of events

Event Log for Replay:

{
  "workflow_id": "uuid",
  "events": [
    {"seq": 1, "type": "USER_INPUT", "data": {...}, "timestamp": "..."},
    {"seq": 2, "type": "PLAN_CREATED", "data": {...}, "timestamp": "..."},
    {"seq": 3, "type": "TASK_COMPLETED", "data": {...}, "timestamp": "..."}
  ]
}

To replay: final_state = reduce(agent_reducer, events, initial_state)

Additional Resources

Reference Files

For detailed implementation patterns:

  • references/event-driven-details.md - Complete event-driven implementation
  • references/state-management.md - State synchronization strategies (includes unified state, reducers)
  • ../agent-specification/references/twelve-factor-agents.md - Quick reference for all 12 factors

Related Skills

  • agent-specification - Define each agent properly (Factors 1, 2, 4, 7)
  • production-readiness - Add observability and error handling (Factors 9, 11)
  • mas-decision-gate - Decide if multi-agent is needed (Factor 10)
Install via CLI
npx skills add https://github.com/MarkusDegen/multi-agent-skill --skill coordination-patterns
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator