agent-workflow-graphs - SKILL.md Agent Skill

name: agent-workflow-graphs description: > Guide the design of graph-based workflows for AI agents -- branching, chaining, merging, conditions, suspend/resume for human-in-the-loop, streaming updates to users, and observability with tracing. Use when the user needs to build structured multi-step agent processes, add deterministic control flow to LLM-powered systems, implement durable workflows that survive crashes, or add tracing and observability. Also use when user mentions "workflow", "graph", "branching", "chaining", "suspend", "resume", "tracing", "observability", "OpenTelemetry", "durable execution", or asks how to make their agent follow a specific step sequence.

Agent Workflow Graphs

When agents have too much freedom, they produce unpredictable results. Graph-based workflows constrain the agent to a structured process while still leveraging LLM intelligence at each step.

Core Concept

A workflow graph breaks a complex task into discrete steps connected by edges. Each step can:

Call an LLM for a focused decision
Execute deterministic code
Call external APIs
Wait for human input

The key insight: the LLM makes a few binary decisions instead of one big decision.

Workflow Primitives

Branching (Fan-Out)

Trigger multiple LLM calls on the same input in parallel:

Input --> Step 1 (check symptom A)
     --> Step 2 (check symptom B)
     --> Step 3 (check symptom C)

Use when: You need to check multiple independent things. Better to have 12 parallel calls each checking one symptom than 1 call checking all 12.

Chaining (Sequential)

Feed the output of one step into the next:

Step 1 (fetch data) --> Step 2 (analyze) --> Step 3 (summarize)

Each step waits for the previous step and has access to prior results via a shared context object.

Use when: Steps have dependencies -- each needs the previous step's output.

Merging (Fan-In)

After branching paths diverge, converge their results:

Step 1 --\
           --> Merge step (combine results) --> Output
Step 2 --/

Use when: You branched earlier and need to combine independent results into a single output.

Conditions

Execute steps conditionally based on intermediate results:

Step 1 (fetch data)
  |
  v
  [condition: fetchData.status === "success"]
  |
  v
Step 2 (process data)

Use when: Workflow paths depend on runtime results (success/failure, data type, user choice).

Best Practices for Workflow Steps

Meaningful I/O at each step: Design inputs and outputs so they make sense in your tracing UI
One LLM call per step maximum: Each step should do one focused thing
Combine primitives: Loops, retries, and complex patterns are all compositions of these four primitives

Suspend and Resume

Problem

Workflows sometimes need to pause for external input (human approval, webhook callback, long-running external process).

Solution

Persist the workflow state, then resume from exactly where it left off.

Step 1 --> Step 2 --> [SUSPEND] --> waiting for human approval
                                        |
                          human approves |
                                        v
                                    [RESUME] --> Step 3 --> Step 4

Implementation Pattern

Define suspendSchema on the step that needs to pause
Call suspend() with a payload (what you're waiting for)
The workflow persists its state to a database
When the external event arrives, call resume() with the response data
The workflow continues from the suspended step

Key Insight

This is the workflow equivalent of HITL (human-in-the-loop). The workflow doesn't keep a running process alive -- it serializes state and picks back up later.

Streaming Updates

Why Streaming Matters

A 10-second blank screen feels broken. The same 10 seconds with live progress updates feels fast and responsive.

What to Stream

LLM tokens: Show text as it's generated
Workflow step updates: "Searching... Analyzing... Writing..."
Partial results: Push intermediate outputs before the workflow completes

How to Build Streaming

Stream as much as you can: Tokens, workflow steps, custom data
Use reactive tools: ElectricSQL, Turbo Streams, SSE for real-time updates
Escape hatches: If a function is stuck waiting, push partial results to the frontend

Pattern: Streaming from Workflow Steps

Each step can emit progress updates to the client while executing. The client renders updates as they arrive, creating a responsive experience even for multi-minute agent runs.

Observability and Tracing

Why Observability is Critical

LLMs are non-deterministic. The question isn't whether your agent will go off the rails. It's when and how much.

Tracing

A trace is a tree of spans showing the input/output of every function called during an agent run. Think of it like a flame chart or nested HTML document.

Standard format: OpenTelemetry (OTel) -- use it for portability across vendors.

What a Tracing UI Shows

Trace view: How long each step took (parse_input, process_request, api_call, etc.)
Input/output inspection: Exact JSON data flowing in and out of each LLM call
Call metadata: Status, start/end times, latency, operation type

Eval Integration

Tracing UIs also show eval results:

Side-by-side comparison of agent response vs. expected
Overall score per PR (to catch regressions)
Score over time, filterable by tags and run date

Best Practices

Emit traces in OpenTelemetry format for vendor portability
Use a cloud tool for production; local tracing tools (like Mastra's dev UI) for development
Look at production traces regularly -- they reveal failure patterns that tests miss

Decision Framework

Situation	Pattern
Multiple independent checks on same input	Branching (fan-out)
Sequential dependent steps	Chaining
Combining parallel results	Merging (fan-in)
Runtime-dependent paths	Conditions
Need human approval mid-workflow	Suspend/resume
Users waiting for multi-step results	Streaming updates
Debugging production agent failures	Tracing + observability

Gotchas

Workflows add complexity. Only use them when agents are too unpredictable without structure.
Design each step's I/O to be meaningful -- you'll be reading it in traces.
One LLM call per step. Multi-call steps are harder to debug and trace.
Suspend/resume requires persistent state storage (database, not memory).
Streaming isn't optional for production agent UX. Users need to see progress.
Use OpenTelemetry for tracing. Proprietary formats lock you into a vendor.

For implementation examples, see references/workflow-examples.md.