orchestration-scaffold

star 8

Use this when: scaffold a local AI orchestration stack, set up LangGraph supervisor, connect Ollama to OpenCode, configure MCP servers for agent tools, wire OpenCode as the developer interface for a multi-agent system, set up A2A protocol between LangGraph and OpenCode, configure VS Code for agent development, orchestration stack on TrueNAS / Docker Compose homelab, choose between AG2 and CrewAI sub-frameworks, set up agent harness with PreToolUse hooks, define SLIs for an AI orchestration stack, design generator-evaluator pattern for agent output, local inference routing with Qwen3 or Mistral Small.

drewid74 By drewid74 schedule Updated 5/31/2026

name: orchestration-scaffold description: > Use this when: scaffold a local AI orchestration stack, set up LangGraph supervisor, connect Ollama to OpenCode, configure MCP servers for agent tools, wire OpenCode as the developer interface for a multi-agent system, set up A2A protocol between LangGraph and OpenCode, configure VS Code for agent development, orchestration stack on TrueNAS / Docker Compose homelab, choose between AG2 and CrewAI sub-frameworks, set up agent harness with PreToolUse hooks, define SLIs for an AI orchestration stack, design generator-evaluator pattern for agent output, local inference routing with Qwen3 or Mistral Small.

Orchestration Scaffold


Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│  Developer Interface                                            │
│  OpenCode TUI + VS Code (opencode.ai)                          │
│    ├── MCP Tools → Docker MCP servers                          │
│    │     ├── mcp-filesystem  :8811  (read/write workspace)     │
│    │     └── mcp-custom      :8812  (domain-specific tools)    │
│    └── A2A Protocol → LangGraph Supervisor  :8123              │
│              └── Sub-framework (choose one — see below)        │
│                        ↓                                       │
│               Ollama  :11434                                   │
│               Qwen3:32b (coding) | Mistral-Small-3.1 (fast)   │
└─────────────────────────────────────────────────────────────────┘

Flowise  :3000  — visual prototyping ONLY, not production data path

Model split:

  • Architect / planner agent → cloud model (Claude Sonnet via LiteLLM gateway)
  • Coder / executor agent → local Qwen3:32b (data locality, zero cost per call)
  • Fast routing / classification → Mistral Small 3.1

Sub-Framework Decision

Choose one. Do not use both — three orchestration layers (LangGraph → CrewAI → AG2) is over-abstracted.

Criterion Use LangGraph + AG2 Use LangGraph + CrewAI
Primary workload Code generation, tool-calling loops, iterative refinement Role-based delegation (researcher, writer, reviewer)
Team mental model State machine / graph Crew of specialists
Best for Dev agent, CI runner, code review bot Document pipeline, research assistant, content workflows
Complexity Lower (AG2 is thin) Higher (CrewAI adds role scaffolding)
Default ✅ Recommended for dev tooling When role semantics add real clarity

Required: Harness Engineering Checklist

These are non-negotiable before any production task. Missing any one = not production-safe.

1. PreToolUse Hooks (bypass-proof)

Define in opencode.json or a hook plugin. Minimum blocklist:

# hooks/pre_tool_use.py
BLOCKED_PATTERNS = [
    r"\.env$", r"\.pem$", r"\.key$", r"id_rsa", r"id_ed25519",
    r"rm\s+-rf\s+/", r"DROP\s+TABLE", r"DELETE\s+FROM.*WHERE\s+1",
]
BLOCKED_PATHS = ["/etc/passwd", "/etc/shadow", "~/.ssh/"]

def pre_tool_use(tool_name: str, input: dict) -> PreToolResult:
    # Block secrets and destructive patterns regardless of model reasoning
    ...

Hooks enforce constraints. AGENTS.md rules are advice the model can reason around. They are not equivalent.

2. Loop Termination (all three required)

# docker-compose.yml — langgraph-supervisor service
environment:
  MAX_TURNS: "25"           # hard turn limit per task
  TASK_TIMEOUT_SEC: "300"   # wall-clock timeout
  LOOP_DETECT_WINDOW: "3"   # same action N times → terminate

An agent without a step budget is a billing incident. Log which termination condition fired for every completed session.

3. Generator-Evaluator Separation

Never let an agent grade its own output.

Generator:   produces the artifact (code, PR, response)
Evaluator:   grades against predetermined criteria
             — separate agent, separate context
             — criteria defined before generation started
             — does NOT receive the generator's reasoning trace

Minimum eval suite per harness run:

  • task_completion_rate: % tasks completed without escalation
  • error_rate: % runs ending in unhandled exception
  • cost_per_task: mean + p95 token cost
  • escalation_rate: % tasks hitting human-in-loop gate

4. State Handoff Contract

When LangGraph supervisor spawns an OpenCode task via A2A, it MUST pass:

{
  "session_id": "<uuid>",
  "branch": "agent/<issue-id>",
  "workspace_dir": "/tmp/workspace-<issue-id>",
  "progress_file": "/tmp/workspace-<issue-id>/progress.md",
  "files_modified": [],
  "in_scope": ["src/", "tests/"],
  "out_of_scope": [".env", "infra/", "*.pem"],
  "max_turns": 20
}

5. AGENTS.md (≤ 100 lines, table-of-contents only)

# AGENTS.md

## System of Record
→ docs/architecture/overview.md
→ docs/conventions/code-style.md
→ docs/operations/runbook.md

## Prohibited Actions (enforced by hook — listed here for visibility)
- Do not write to .env, *.pem, *.key, id_*
- Do not run destructive patterns outside workspace
- Do not modify files in out_of_scope list

## Escalate When
- Three consecutive failed attempts on same task
- Any write to production systems
- Unsure of scope — ask, don't guess

Depth lives in docs/. Progressive disclosure: agents navigate to deeper docs as needed.


OpenCode Configuration

opencode.json

{
  "model": "claude-sonnet-4-5",
  "mcp": {
    "filesystem": {
      "type": "local",
      "command": "docker",
      "args": ["run", "--rm", "-v", "${workspace}:/workspace",
               "mcp-filesystem:latest"],
      "env": { "WORKSPACE_DIR": "${workspace}" }
    },
    "custom-tools": {
      "type": "http",
      "url": "http://localhost:8812/mcp"
    },
    "langgraph": {
      "type": "http",
      "url": "http://localhost:8123/mcp"
    }
  },
  "agents": {
    "default": ".opencode/agents/orchestrator.md"
  }
}

.opencode/agents/orchestrator.md

---
mode: primary
model: claude-sonnet-4-5
description: Orchestration coordinator — plans tasks, delegates to LangGraph, validates results
tools:
  - mcp:filesystem
  - mcp:custom-tools
  - mcp:langgraph
---

You are the orchestration coordinator. You plan tasks and delegate execution to the LangGraph supervisor.

Rules:
- Never execute code directly — delegate to LangGraph via the `langgraph` MCP tool
- Validate all outputs from sub-agents before reporting completion
- Follow the state handoff contract when spawning tasks
- Read progress.md before taking any action on a resumed task

.opencode/agents/coder.md

---
mode: subagent
model: ollama/qwen3:32b
description: Code executor — implements code changes, runs tests, reports results
tools:
  - mcp:filesystem
  - bash
---

You are the code executor. You implement exactly what the orchestrator specifies.

Rules:
- Stay in scope (in_scope list from handoff contract)
- Write progress.md after each significant action
- Run tests before reporting completion
- If tests fail 3 times, escalate — do not continue guessing

Docker Compose

services:
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
    volumes:
      - ollama_models:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  langgraph-supervisor:
    build: ./langgraph
    ports: ["8123:8123"]
    environment:
      OLLAMA_URL: http://ollama:11434
      LITELLM_URL: http://litellm:4000         # cloud model gateway
      MAX_TURNS: "25"
      TASK_TIMEOUT_SEC: "300"
      LOOP_DETECT_WINDOW: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8123/health"]
      interval: 15s
      timeout: 5s
      retries: 3
    depends_on:
      ollama:
        condition: service_healthy

  mcp-filesystem:
    image: mcp/filesystem:latest
    ports: ["8811:8811"]
    volumes:
      - workspace:/workspace:rw
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8811/health"]
      interval: 15s

  mcp-custom-tools:
    build: ./mcp-tools
    ports: ["8812:8812"]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8812/health"]
      interval: 15s

  flowise:
    image: flowiseai/flowise:latest
    ports: ["3000:3000"]
    volumes:
      - flowise_data:/root/.flowise
    profiles: ["dev"]          # dev profile only — not started in production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/api/v1/ping"]
      interval: 30s

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana:latest
    ports: ["3001:3000"]
    volumes:
      - ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
      - ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources:ro
      - grafana_data:/var/lib/grafana

  loki:
    image: grafana/loki:latest
    ports: ["3100:3100"]
    volumes:
      - ./monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki_data:/loki

volumes:
  ollama_models:
  workspace:
  flowise_data:
  prometheus_data:
  grafana_data:
  loki_data:

SRE: SLIs, Alerts, Runbooks

SLIs (define before going live)

# Task completion rate
rate(orchestration_tasks_total{status="complete"}[5m])
/ rate(orchestration_tasks_total[5m])

# LangGraph supervisor p99 latency
histogram_quantile(0.99,
  rate(langgraph_task_duration_seconds_bucket[5m])
)

# Ollama inference error rate
rate(ollama_requests_total{status=~"5.."}[5m])
/ rate(ollama_requests_total[5m])

# Agent loop termination rate (should stay low)
rate(agent_loop_terminations_total{reason="max_turns"}[1h])

Alerts

# monitoring/alerts.yml
groups:
  - name: orchestration
    rules:
      - alert: TaskCompletionRateLow
        expr: |
          rate(orchestration_tasks_total{status="complete"}[10m])
          / rate(orchestration_tasks_total[10m]) < 0.80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Task completion below 80%"
          runbook: "docs/operations/runbook.md#task-completion-low"

      - alert: OllamaHighErrorRate
        expr: |
          rate(ollama_requests_total{status=~"5.."}[5m])
          / rate(ollama_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Ollama error rate above 5%"
          runbook: "docs/operations/runbook.md#ollama-errors"

      - alert: AgentLoopRunaway
        expr: rate(agent_loop_terminations_total{reason="max_turns"}[1h]) > 0.1
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "More than 10% of agents hitting max_turns limit"
          runbook: "docs/operations/runbook.md#loop-runaway"

Rules:

  • Alert only on user-facing impact — not CPU/memory thresholds
  • Every alert needs for:, severity:, and runbook: annotation
  • Tail-sample traces: 100% errors, 1% success
  • Dashboards provisioned from git — never manual Grafana UI

Runbook Template

# Runbook: <Alert Name>

## Symptom
<What the user sees>

## Immediate Mitigation
```bash
# Copy-paste commands
docker compose restart langgraph-supervisor

Investigation

# Logs
docker compose logs langgraph-supervisor --tail=100

Root Cause Categories

  • Ollama OOM → check docker stats ollama
  • Model not loaded → curl localhost:11434/api/tags
  • Loop runaway → check agent_loop_terminations_total metric

Escalate If

```

A2A Integration (LangGraph → OpenCode)

# langgraph/nodes/opencode_node.py
import httpx
from typing import TypedDict

class AgentTask(TypedDict):
    session_id: str
    branch: str
    workspace_dir: str
    progress_file: str
    files_modified: list[str]
    in_scope: list[str]
    out_of_scope: list[str]
    max_turns: int
    instruction: str

async def call_opencode(task: AgentTask) -> dict:
    """Delegate a task to OpenCode via A2A protocol."""
    async with httpx.AsyncClient(timeout=300) as client:
        response = await client.post(
            "http://localhost:28482/a2a/tasks/send",  # a2a-opencode adapter
            json={
                "message": {
                    "role": "user",
                    "parts": [{"type": "text", "text": task["instruction"]}],
                },
                "metadata": {k: v for k, v in task.items() if k != "instruction"},
            }
        )
        response.raise_for_status()
        return response.json()

Reference: github.com/shashikanth-gs/a2a-opencode


VS Code Integration

Recommended Extensions

  • ms-vscode.remote-containers — run OpenCode sessions inside devcontainer
  • ms-vscode.docker — inspect running stack containers
  • grafana.vscode-jsonnet — edit dashboard JSON/Jsonnet provisioning files
  • redhat.vscode-yaml — docker-compose and prometheus config validation

.vscode/tasks.json

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Start orchestration stack",
      "type": "shell",
      "command": "docker compose up -d ollama langgraph-supervisor mcp-filesystem mcp-custom-tools prometheus grafana loki",
      "group": "build"
    },
    {
      "label": "Start with Flowise (dev)",
      "type": "shell",
      "command": "docker compose --profile dev up -d",
      "group": "build"
    },
    {
      "label": "Tail agent logs",
      "type": "shell",
      "command": "docker compose logs -f langgraph-supervisor mcp-custom-tools",
      "isBackground": true
    },
    {
      "label": "Open Grafana",
      "type": "shell",
      "command": "start http://localhost:3001",
      "group": "test"
    }
  ]
}

.vscode/launch.json (attach to LangGraph)

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Attach to LangGraph Supervisor",
      "type": "python",
      "request": "attach",
      "connect": { "host": "localhost", "port": 5678 },
      "pathMappings": [
        { "localRoot": "${workspaceFolder}/langgraph", "remoteRoot": "/app" }
      ]
    }
  ]
}

Security Checklist

  • PreToolUse hook blocks secrets files and destructive patterns
  • MCP servers with write operations require API key (hmac.compare_digest)
  • Agent subprocess env is explicit allowlist — never inherit=True
  • out_of_scope list in every handoff contract includes .env, *.pem, *.key
  • Flowise behind authentication when dev profile is active
  • LiteLLM virtual key per agent — budget-capped, not shared master key
  • Secret detection in PostToolUse hook before logging tool outputs

Common Pitfalls

Pitfall Symptom Fix
Three orchestration layers (LangGraph + CrewAI + AG2) Debugging impossible; latency compounds Choose one sub-framework
Flowise in production data path Non-reproducible behavior, hard to test Flowise is dev/prototyping only (profiles: ["dev"])
No max_turns on LangGraph nodes Runaway agent burns tokens until timeout Set MAX_TURNS env var; log termination reason
Single MCP server for all tools One restart takes down all tools One MCP server per domain
AGENTS.md as the only constraint layer Model reasons around "prohibited" rules PreToolUse hooks for anything critical
Self-evaluation Agent always reports success Separate Evaluator agent
Progress file not written Context reset = full restart from zero Coder agent writes progress.md after each action
Dashboards created in Grafana UI Lost on container rebuild Provision from monitoring/grafana/dashboards/*.json in git

References

Install via CLI
npx skills add https://github.com/drewid74/ai_skills --skill orchestration-scaffold
Repository Details
star Stars 8
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator