name: orchestration-scaffold description: > Use this when: scaffold a local AI orchestration stack, set up LangGraph supervisor, connect Ollama to OpenCode, configure MCP servers for agent tools, wire OpenCode as the developer interface for a multi-agent system, set up A2A protocol between LangGraph and OpenCode, configure VS Code for agent development, orchestration stack on TrueNAS / Docker Compose homelab, choose between AG2 and CrewAI sub-frameworks, set up agent harness with PreToolUse hooks, define SLIs for an AI orchestration stack, design generator-evaluator pattern for agent output, local inference routing with Qwen3 or Mistral Small.
Orchestration Scaffold
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ Developer Interface │
│ OpenCode TUI + VS Code (opencode.ai) │
│ ├── MCP Tools → Docker MCP servers │
│ │ ├── mcp-filesystem :8811 (read/write workspace) │
│ │ └── mcp-custom :8812 (domain-specific tools) │
│ └── A2A Protocol → LangGraph Supervisor :8123 │
│ └── Sub-framework (choose one — see below) │
│ ↓ │
│ Ollama :11434 │
│ Qwen3:32b (coding) | Mistral-Small-3.1 (fast) │
└─────────────────────────────────────────────────────────────────┘
Flowise :3000 — visual prototyping ONLY, not production data path
Model split:
- Architect / planner agent → cloud model (Claude Sonnet via LiteLLM gateway)
- Coder / executor agent → local Qwen3:32b (data locality, zero cost per call)
- Fast routing / classification → Mistral Small 3.1
Sub-Framework Decision
Choose one. Do not use both — three orchestration layers (LangGraph → CrewAI → AG2) is over-abstracted.
| Criterion | Use LangGraph + AG2 | Use LangGraph + CrewAI |
|---|---|---|
| Primary workload | Code generation, tool-calling loops, iterative refinement | Role-based delegation (researcher, writer, reviewer) |
| Team mental model | State machine / graph | Crew of specialists |
| Best for | Dev agent, CI runner, code review bot | Document pipeline, research assistant, content workflows |
| Complexity | Lower (AG2 is thin) | Higher (CrewAI adds role scaffolding) |
| Default | ✅ Recommended for dev tooling | When role semantics add real clarity |
Required: Harness Engineering Checklist
These are non-negotiable before any production task. Missing any one = not production-safe.
1. PreToolUse Hooks (bypass-proof)
Define in opencode.json or a hook plugin. Minimum blocklist:
# hooks/pre_tool_use.py
BLOCKED_PATTERNS = [
r"\.env$", r"\.pem$", r"\.key$", r"id_rsa", r"id_ed25519",
r"rm\s+-rf\s+/", r"DROP\s+TABLE", r"DELETE\s+FROM.*WHERE\s+1",
]
BLOCKED_PATHS = ["/etc/passwd", "/etc/shadow", "~/.ssh/"]
def pre_tool_use(tool_name: str, input: dict) -> PreToolResult:
# Block secrets and destructive patterns regardless of model reasoning
...
Hooks enforce constraints. AGENTS.md rules are advice the model can reason around. They are not equivalent.
2. Loop Termination (all three required)
# docker-compose.yml — langgraph-supervisor service
environment:
MAX_TURNS: "25" # hard turn limit per task
TASK_TIMEOUT_SEC: "300" # wall-clock timeout
LOOP_DETECT_WINDOW: "3" # same action N times → terminate
An agent without a step budget is a billing incident. Log which termination condition fired for every completed session.
3. Generator-Evaluator Separation
Never let an agent grade its own output.
Generator: produces the artifact (code, PR, response)
Evaluator: grades against predetermined criteria
— separate agent, separate context
— criteria defined before generation started
— does NOT receive the generator's reasoning trace
Minimum eval suite per harness run:
task_completion_rate: % tasks completed without escalationerror_rate: % runs ending in unhandled exceptioncost_per_task: mean + p95 token costescalation_rate: % tasks hitting human-in-loop gate
4. State Handoff Contract
When LangGraph supervisor spawns an OpenCode task via A2A, it MUST pass:
{
"session_id": "<uuid>",
"branch": "agent/<issue-id>",
"workspace_dir": "/tmp/workspace-<issue-id>",
"progress_file": "/tmp/workspace-<issue-id>/progress.md",
"files_modified": [],
"in_scope": ["src/", "tests/"],
"out_of_scope": [".env", "infra/", "*.pem"],
"max_turns": 20
}
5. AGENTS.md (≤ 100 lines, table-of-contents only)
# AGENTS.md
## System of Record
→ docs/architecture/overview.md
→ docs/conventions/code-style.md
→ docs/operations/runbook.md
## Prohibited Actions (enforced by hook — listed here for visibility)
- Do not write to .env, *.pem, *.key, id_*
- Do not run destructive patterns outside workspace
- Do not modify files in out_of_scope list
## Escalate When
- Three consecutive failed attempts on same task
- Any write to production systems
- Unsure of scope — ask, don't guess
Depth lives in docs/. Progressive disclosure: agents navigate to deeper docs as needed.
OpenCode Configuration
opencode.json
{
"model": "claude-sonnet-4-5",
"mcp": {
"filesystem": {
"type": "local",
"command": "docker",
"args": ["run", "--rm", "-v", "${workspace}:/workspace",
"mcp-filesystem:latest"],
"env": { "WORKSPACE_DIR": "${workspace}" }
},
"custom-tools": {
"type": "http",
"url": "http://localhost:8812/mcp"
},
"langgraph": {
"type": "http",
"url": "http://localhost:8123/mcp"
}
},
"agents": {
"default": ".opencode/agents/orchestrator.md"
}
}
.opencode/agents/orchestrator.md
---
mode: primary
model: claude-sonnet-4-5
description: Orchestration coordinator — plans tasks, delegates to LangGraph, validates results
tools:
- mcp:filesystem
- mcp:custom-tools
- mcp:langgraph
---
You are the orchestration coordinator. You plan tasks and delegate execution to the LangGraph supervisor.
Rules:
- Never execute code directly — delegate to LangGraph via the `langgraph` MCP tool
- Validate all outputs from sub-agents before reporting completion
- Follow the state handoff contract when spawning tasks
- Read progress.md before taking any action on a resumed task
.opencode/agents/coder.md
---
mode: subagent
model: ollama/qwen3:32b
description: Code executor — implements code changes, runs tests, reports results
tools:
- mcp:filesystem
- bash
---
You are the code executor. You implement exactly what the orchestrator specifies.
Rules:
- Stay in scope (in_scope list from handoff contract)
- Write progress.md after each significant action
- Run tests before reporting completion
- If tests fail 3 times, escalate — do not continue guessing
Docker Compose
services:
ollama:
image: ollama/ollama:latest
ports: ["11434:11434"]
volumes:
- ollama_models:/root/.ollama
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
langgraph-supervisor:
build: ./langgraph
ports: ["8123:8123"]
environment:
OLLAMA_URL: http://ollama:11434
LITELLM_URL: http://litellm:4000 # cloud model gateway
MAX_TURNS: "25"
TASK_TIMEOUT_SEC: "300"
LOOP_DETECT_WINDOW: "3"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8123/health"]
interval: 15s
timeout: 5s
retries: 3
depends_on:
ollama:
condition: service_healthy
mcp-filesystem:
image: mcp/filesystem:latest
ports: ["8811:8811"]
volumes:
- workspace:/workspace:rw
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8811/health"]
interval: 15s
mcp-custom-tools:
build: ./mcp-tools
ports: ["8812:8812"]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8812/health"]
interval: 15s
flowise:
image: flowiseai/flowise:latest
ports: ["3000:3000"]
volumes:
- flowise_data:/root/.flowise
profiles: ["dev"] # dev profile only — not started in production
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/api/v1/ping"]
interval: 30s
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
grafana:
image: grafana/grafana:latest
ports: ["3001:3000"]
volumes:
- ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
- ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources:ro
- grafana_data:/var/lib/grafana
loki:
image: grafana/loki:latest
ports: ["3100:3100"]
volumes:
- ./monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro
- loki_data:/loki
volumes:
ollama_models:
workspace:
flowise_data:
prometheus_data:
grafana_data:
loki_data:
SRE: SLIs, Alerts, Runbooks
SLIs (define before going live)
# Task completion rate
rate(orchestration_tasks_total{status="complete"}[5m])
/ rate(orchestration_tasks_total[5m])
# LangGraph supervisor p99 latency
histogram_quantile(0.99,
rate(langgraph_task_duration_seconds_bucket[5m])
)
# Ollama inference error rate
rate(ollama_requests_total{status=~"5.."}[5m])
/ rate(ollama_requests_total[5m])
# Agent loop termination rate (should stay low)
rate(agent_loop_terminations_total{reason="max_turns"}[1h])
Alerts
# monitoring/alerts.yml
groups:
- name: orchestration
rules:
- alert: TaskCompletionRateLow
expr: |
rate(orchestration_tasks_total{status="complete"}[10m])
/ rate(orchestration_tasks_total[10m]) < 0.80
for: 10m
labels:
severity: warning
annotations:
summary: "Task completion below 80%"
runbook: "docs/operations/runbook.md#task-completion-low"
- alert: OllamaHighErrorRate
expr: |
rate(ollama_requests_total{status=~"5.."}[5m])
/ rate(ollama_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Ollama error rate above 5%"
runbook: "docs/operations/runbook.md#ollama-errors"
- alert: AgentLoopRunaway
expr: rate(agent_loop_terminations_total{reason="max_turns"}[1h]) > 0.1
for: 15m
labels:
severity: warning
annotations:
summary: "More than 10% of agents hitting max_turns limit"
runbook: "docs/operations/runbook.md#loop-runaway"
Rules:
- Alert only on user-facing impact — not CPU/memory thresholds
- Every alert needs
for:,severity:, andrunbook:annotation - Tail-sample traces: 100% errors, 1% success
- Dashboards provisioned from git — never manual Grafana UI
Runbook Template
# Runbook: <Alert Name>
## Symptom
<What the user sees>
## Immediate Mitigation
```bash
# Copy-paste commands
docker compose restart langgraph-supervisor
Investigation
# Logs
docker compose logs langgraph-supervisor --tail=100
Root Cause Categories
- Ollama OOM → check
docker stats ollama - Model not loaded →
curl localhost:11434/api/tags - Loop runaway → check
agent_loop_terminations_totalmetric
Escalate If
A2A Integration (LangGraph → OpenCode)
# langgraph/nodes/opencode_node.py
import httpx
from typing import TypedDict
class AgentTask(TypedDict):
session_id: str
branch: str
workspace_dir: str
progress_file: str
files_modified: list[str]
in_scope: list[str]
out_of_scope: list[str]
max_turns: int
instruction: str
async def call_opencode(task: AgentTask) -> dict:
"""Delegate a task to OpenCode via A2A protocol."""
async with httpx.AsyncClient(timeout=300) as client:
response = await client.post(
"http://localhost:28482/a2a/tasks/send", # a2a-opencode adapter
json={
"message": {
"role": "user",
"parts": [{"type": "text", "text": task["instruction"]}],
},
"metadata": {k: v for k, v in task.items() if k != "instruction"},
}
)
response.raise_for_status()
return response.json()
Reference: github.com/shashikanth-gs/a2a-opencode
VS Code Integration
Recommended Extensions
ms-vscode.remote-containers— run OpenCode sessions inside devcontainerms-vscode.docker— inspect running stack containersgrafana.vscode-jsonnet— edit dashboard JSON/Jsonnet provisioning filesredhat.vscode-yaml— docker-compose and prometheus config validation
.vscode/tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "Start orchestration stack",
"type": "shell",
"command": "docker compose up -d ollama langgraph-supervisor mcp-filesystem mcp-custom-tools prometheus grafana loki",
"group": "build"
},
{
"label": "Start with Flowise (dev)",
"type": "shell",
"command": "docker compose --profile dev up -d",
"group": "build"
},
{
"label": "Tail agent logs",
"type": "shell",
"command": "docker compose logs -f langgraph-supervisor mcp-custom-tools",
"isBackground": true
},
{
"label": "Open Grafana",
"type": "shell",
"command": "start http://localhost:3001",
"group": "test"
}
]
}
.vscode/launch.json (attach to LangGraph)
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach to LangGraph Supervisor",
"type": "python",
"request": "attach",
"connect": { "host": "localhost", "port": 5678 },
"pathMappings": [
{ "localRoot": "${workspaceFolder}/langgraph", "remoteRoot": "/app" }
]
}
]
}
Security Checklist
- PreToolUse hook blocks secrets files and destructive patterns
- MCP servers with write operations require API key (
hmac.compare_digest) - Agent subprocess env is explicit allowlist — never
inherit=True -
out_of_scopelist in every handoff contract includes.env,*.pem,*.key - Flowise behind authentication when
devprofile is active - LiteLLM virtual key per agent — budget-capped, not shared master key
- Secret detection in PostToolUse hook before logging tool outputs
Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Three orchestration layers (LangGraph + CrewAI + AG2) | Debugging impossible; latency compounds | Choose one sub-framework |
| Flowise in production data path | Non-reproducible behavior, hard to test | Flowise is dev/prototyping only (profiles: ["dev"]) |
No max_turns on LangGraph nodes |
Runaway agent burns tokens until timeout | Set MAX_TURNS env var; log termination reason |
| Single MCP server for all tools | One restart takes down all tools | One MCP server per domain |
| AGENTS.md as the only constraint layer | Model reasons around "prohibited" rules | PreToolUse hooks for anything critical |
| Self-evaluation | Agent always reports success | Separate Evaluator agent |
| Progress file not written | Context reset = full restart from zero | Coder agent writes progress.md after each action |
| Dashboards created in Grafana UI | Lost on container rebuild | Provision from monitoring/grafana/dashboards/*.json in git |
References
- OpenCode docs: https://opencode.ai/docs
- OpenCode agents: https://opencode.ai/docs/agents
- OpenCode MCP: https://opencode.ai/docs/mcp-servers
- A2A adapter: https://github.com/shashikanth-gs/a2a-opencode
- LangGraph: https://langchain-ai.github.io/langgraph/
- AG2 (AutoGen successor): https://ag2.ai/docs
- CrewAI: https://docs.crewai.com
- Ollama: https://ollama.com/library
- FastMCP: https://github.com/jlowin/fastmcp