plan

star 3

Create orchestration-ready YAML task files (0N_TASKS.yaml) for /orchestrate. Decomposes goals into tasks with explicit runner, backend, mode, and lane fields. Supports code-only, design-only, hybrid plans, and explicit opt-in goal-closure execution loops. Use when user says "plan this", "create task file", "break this down into tasks", or "run the plan until the goal is closed".

grahama1970 By grahama1970 schedule Updated 5/9/2026

name: plan description: > Create orchestration-ready YAML task files (0N_TASKS.yaml) for /orchestrate. Decomposes goals into tasks with explicit runner, backend, mode, and lane fields. Supports code-only, design-only, hybrid plans, and explicit opt-in goal-closure execution loops. Use when user says "plan this", "create task file", "break this down into tasks", or "run the plan until the goal is closed". allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Task, AskUserQuestion triggers: - plan this - create task file - break this down - prepare tasks - plan implementation - create 0N_TASKS - task breakdown - decompose this - let's plan - run plan until done - execute plan with goal closure - plan and iterate until complete - assess whether the plan goal was achieved metadata: short-description: Create orchestration-ready YAML task files provides: - task-planning composes: - governance - memory - assess - task-monitor - best-practices-plan - review-plan - recommend-skill-chain - orchestrate read_before_use: - plan.py - src/plan_skill/code_runner_contract.py - src/plan_skill/dag.py - src/plan_skill/goal_closure.py - src/plan_skill/mutations.py - design_pipeline.py - interviews.py taxonomy: - orchestration - planning

STOP. READ THIS ENTIRE SKILL.MD BEFORE CALLING ANY ENDPOINT.

/plan

Create YAML task files that /orchestrate executes directly. No markdown intermediate.

Before writing any plan, run /governance and read /best-practices-plan - governance proves task understanding before planning starts, and best-practices-plan has the rules this skill enforces.

Workflow

The full pipeline is: /governance -> /plan -> /review-plan -> /orchestrate. The agent runs all four when the user says /plan. The user only needs to say /plan once.

1. /governance        - Prove understanding before planning (BLOCKING)
2. /memory recall     - Check if this problem was already solved
3. SKILL DISCOVERY    - Find existing skills that do what's needed (BLOCKING)
4. /assess            - Read the target codebase
5. Identify persona   - WHO uses this? Name them.
6. Decompose          - Break into tasks with runner/backend/mode
7. Output YAML        - Write 0N_TASKS.yaml
8. plan.py --dag      - Show execution DAG to human for approval
9. plan.py --validate - Schema validation
10. /review-plan      - Full validation (claims, routing, blind tests, overlap)
11. If PASS -> ask human: "Plan ready. Run /orchestrate?"
12. If human approves -> /orchestrate run 0N_TASKS.yaml
13. Optional, only when explicitly requested: run deterministic goal closure.

Optional Goal-Closure Loop

Normal /plan remains plan-first: it writes and validates YAML, then asks before execution. Do not run the closure loop unless the user explicitly asks for "execute until done", "run with goal closure", "iterate until complete", or equivalent wording.

When requested, /plan becomes the deterministic outer loop:

/plan --execute-closure 0N_TASKS.yaml --max-replans N
  -> plan.py --validate
  -> /review-plan review
  -> /orchestrate run
  -> /plan --assess-result using the orchestrate session
  -> stop on goal_achieved, or write follow-up/interview artifacts

Closed outcomes:

goal_achieved
partially_achieved
blocked
wrong_plan
insufficient_evidence

Closed recommended actions:

none
create_followup_plan
revise_existing_plan
ask_human

/plan decides whether to create a follow-up/amended plan or ask the human:

Closure result Plan response
goal_achieved Stop successfully
partially_achieved with failed tasks Create a follow-up plan stub for remaining work
blocked Write an /interview request artifact
wrong_plan Stop before execution and write an /interview request artifact
insufficient_evidence Write an /interview request artifact

/orchestrate remains the execution engine and evidence producer. It does not own replanning. /code-runner remains a bounded worker inside /orchestrate.

Step 2: Runner Selection (which tasks get /code-runner)

Not every task needs /code-runner. Code-runner is EXPENSIVE — worktree isolation, git commit/revert cycle, multi-round LLM loop, T0 scoring, memory learning. Most tasks are simpler than that.

Runner is auto-routed if left empty. Set it explicitly only when the heuristic is wrong.

Runner When to use Auto-routed when Example
local Shell command, no LLM needed Has command, no prompt pytest, npm install, ruff check --fix
scillm One-shot LLM call, simple edit Has prompt, no allowlist+DoD assertion Add a field, rename variable, generate docstring, classify
code-runner Complex bounded code task with verification, iteration, and context isolation Has prompt + allowlist + DoD command/assertion + blind_tests Fix multi-file bug, implement feature with test suite

Default away from code-runner. Use it ONLY when ALL of these are true:

  • The task writes/edits 1-3 specific files (use allowlist)
  • There's a runnable DoD command with a verifiable assertion
  • The DoD is expected to fail before the code change; already-passing gates are local
  • The fix may need multiple attempts (not a mechanical edit)
  • The task benefits from context isolation: only the allowlisted files should be writable
  • Dependencies are known and listed in read_context
  • Hidden/blind checks are listed in blind_tests for the orchestrate information barrier
  • The DoD command does NOT require a live server (code-runner uses git worktree isolation — the running dev server serves from the main working directory, not the worktree)

Do NOT use code-runner for:

  • setup/bootstrap/deployment orchestration
  • Docker Compose validation
  • config/docs/changelog edits
  • pure test/lint/build gates
  • already-implemented work where the DoD already passes
  • broad architecture/design decisions

For those, use local for deterministic commands or scillm for one-shot edits.

Concurrency rule: if multiple code-runner tasks share one cwd, set execution.max_concurrency: 1 unless they run in isolated worktrees or are patch-only tasks with disjoint allowlists. If any task uses apply_to_source: true, /orchestrate serializes tasks sharing that cwd.

NEVER use code-runner when the DoD calls a live HTTP endpoint (e.g. curl http://localhost:3001/...). Code-runner edits files in an isolated worktree. The dev server doesn't see those edits. The DoD curl will always hit the OLD code and fail. Use scillm (one-shot edit to the working directory) + a separate local task to restart the server and verify with curl.

Use scillm (not code-runner) for:

  • Mechanical edits: add a field, update an import, rename a variable
  • Config changes: update YAML, add an entry to a list
  • Text generation: docstrings, summaries, classifications
  • Any task where "just do it once, correctly" is sufficient
  • Server endpoint code where the DoD requires curl to a live server — edits must land in the working directory, not a worktree

Use local for:

  • Running tests, linters, formatters, build commands
  • File operations: copy, move, create directories
  • Anything that's a shell command, not LLM reasoning

code-runner fields the project agent must provide:

  • allowlist — files the LLM can write (scope boundary)
  • read_context — files the LLM should read for interface context (NOT write)
  • definition_of_done — runnable verification command with assertions
  • blind_tests — hidden assertions in /test-lab (required for code-runner, enforced by /review-plan)
  • apply_to_source — explicit complete-task opt-in; default false
  • commit_on_success — required true when apply_to_source is true
  • rollback_on_failure — required true when apply_to_source is true

If any of these are missing, the plan is wrong. Route the task to local/scillm, or split the plan into: one implementation task and one separate local verification task.

For a source-mutating code task, prefer complete-task mode:

apply_to_source: true
commit_on_success: true
rollback_on_failure: true

The plan must still provide blind_tests. /orchestrate runs those tests after the source commit and reverts that commit if hidden checks fail.

Step 2b: Code-runner course corrections

If /orchestrate dispatches a task to /code-runner and preflight returns status: preflight_fail with course_correction, /plan owns the amendment. Do not ask code-runner to continue and do not let it silently mutate YAML.

Use the correction this way:

  1. Read {task_id}.course_correction.json or result.json.course_correction.
  2. Apply the recommended reroute/amend/split to the plan YAML.
  3. Re-run plan.py --validate.
  4. Re-run /review-plan.
  5. Resume /orchestrate only after validation passes.

Valid outcomes:

recommended_action Plan response
use_local Change task to deterministic local command
use_scillm Change task to one-shot scillm prompt
amend_code_runner_contract Keep code-runner, but add missing prompt/allowlist/DoD/blind tests
split_task Split implementation from live verification
reroute_or_amend Choose the smallest safe route based on the correction rationale

Step 3: Skill Discovery (BLOCKING — do NOT skip)

Before writing ANY task, check if an existing skill already does it. The agent MUST NOT write bespoke code when a skill exists. This is the #1 source of architectural debt.

How to check:

# Search the manifest (fastest — one file, all 225+ skills)
cat ~/.pi/skills-manifest.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
for s in data['skills']:
    d = (s.get('description') or '').lower()
    if any(kw in d for kw in ['cache', 'redis', 'api']):
        print(f'  /{s[\"name\"]}: {s[\"description\"][:100]}')
"

# Or search skill names directly
ls ~/.pi/skills/ | grep -i cache

# Or ask memory
/memory recall "skill:caching" OR "skill:redis"

Decision for each piece of work:

Existing skill covers it? Action
Yes, fully CALL the skill. Do NOT rewrite it.
Yes, 60%+ EXTEND the skill. Add what's missing.
No match CREATE new code — but document WHY in capability_overlap.

Every task in the YAML should map to CALL, EXTEND, or CREATE. If the plan is mostly CREATE, you haven't looked hard enough. /review-plan will flag tasks that overlap with existing skills as FAIL.

Step 4: Compliance Governance (BLOCKING for SPARTA/CAE plans)

Plans that involve CAE (Claims-Arguments-Evidence) trees, SPARTA controls, compliance verdicts, or posture assessments MUST enforce "analyst workbench, not truth engine":

Non-negotiable rules:

Requirement Implementation
NEEDS_VERIFICATION default All CAE claims/verdicts default to NEEDS_VERIFICATION, never auto-PASS
Human review gate Every posture change requires compliance officer review before status change
No autonomous certification Agent NEVER writes "certified", "compliant", "approved" without human gate
Retrieval language only Use "retrieved", "found", "extracted" — never "determined", "verified", "confirmed"

In YAML plans, add to metadata:

metadata:
  compliance_governance:
    principle: "Analyst workbench, not truth engine"
    verdicts: "All default to NEEDS_VERIFICATION"
    human_gate: "Compliance officer reviews before status change"

In task prompts, include:

COMPLIANCE GOVERNANCE: This is an analyst workbench, not a truth engine.
- All verdicts default to NEEDS_VERIFICATION
- Human reviews before any status change
- Use retrieval language ("retrieved", "found"), not judgment language ("verified", "confirmed")

/review-plan will FAIL plans that:

  • Auto-approve verdicts without human gates
  • Use "certified", "compliant", "approved" without human review step
  • Missing compliance_governance metadata for SPARTA/CAE plans

Reference: docs/WALKTHROUGH_QRA_COVERAGE_AND_EVIDENCE_CASES.md

Steps 7-11 happen automatically after writing the YAML. The human only intervenes if /review-plan FAILs (fix the plan) or if they want to amend the DAG before execution.

If /review-plan FAILs

Triage each failure:

Failure type Action
Missing field (mode, DoD command) Fix it yourself — no human input needed
Wrong runner/backend Fix if obvious, ask human if ambiguous
Missing capability_overlap Run /memory recall and fill it in
Skill overlap detected Ask human: use existing skill or justify new code?
Ambiguous requirements Use /interview for structured choices
Claims don't match codebase Re-read the code with /assess

After fixes, re-run /review-plan. Do NOT proceed to /orchestrate until PASS. Do NOT silently skip FAILs — every FAIL must be resolved or explicitly waived by the human.

Plan Types

Auto-detected from goal text. All types use the same YAML schema.

Type Detected When Pattern
code No UI keywords local for setup/verification, scillm for one-shot edits, code-runner only for bounded iterative implementation
design views, components, TSX, dashboard, UI, React /mockup-lab (Stitch) → /ux-lab (code) → /mockup-lab review (VLM verify) → /test-interactions
hybrid Both UI and code keywords Stitch pipeline for UI views, code tasks in later waves

Design Plans

Rule: The agent NEVER designs UI. Stitch designs it. The agent codes it.

Design plans MUST specify device type (desktop, mobile, tablet). Pass --device to every /mockup-lab command. Stitch defaults to mobile if not specified.

For any plan with UI work, each component follows 3 steps:

  1. /mockup-lab — Stitch generates design, human approves (--device desktop)
  2. /ux-lab — Agent codes React component from approved screenshots
  3. /mockup-lab review — Gemini VLM verifies implementation matches design

For small changes (colors, spacing, adding a column), skip step 1 and use /ux-lab + /review-design directly.

Read .pi/skills/mockup-lab/design-to-code.yaml for the detailed checklist.

DoD for UI tasks must be visual — "tsc compiles" is not done. Done means a screenshot shows real data matching the approved design.

YAML Schema

version: 1
kind: orchestrate-plan

metadata:
  title: "Feature Name"
  goal: "one-line summary"
  plan_type: code          # code, design, or hybrid
  created: "2026-03-17"
  primary_persona:         # WHO uses this (required)
    name: "Nico Bailon"
    role: "QA Engineer"
    source: ".pi/agents/nico-bailon/AGENTS.md"

execution:
  max_concurrency: 3       # parallel lanes
  wave_barrier: false      # if true, wave N must complete before wave N+1 starts
  scheduling_policy: dependency_only  # or wave_then_dependency

capability_overlap:        # Phase 0 evidence (required)
  - "/memory recall returned: no prior Redis caching solution"
  - "Checked /fetcher, /dogpile — no overlap"

questions_blockers:
  - "None"

lanes:
  - id: "0"
    label: "Wave 0: Setup"
  - id: "1"
    label: "Wave 1: Implementation"
  - id: "2"
    label: "Wave 2: Validation"

tasks:
  - id: "1"
    title: "Add Redis to docker-compose"
    lane: "0"
    runner: "local"          # deterministic shell command
    backend: ""              # no LLM needed
    mode: ""
    depends_on: []
    command: "docker compose up -d redis && redis-cli ping"
    definition_of_done:
      command: "redis-cli ping"
      assertion: "Returns PONG"

  - id: "2"
    title: "Create cache utility module"
    lane: "1"
    runner: "code-runner"       # self-improvement loop
    backend: "codex"            # which LLM
    mode: "iterative"           # iterative/one_shot/review
    depends_on: ["1"]
    preconditions:              # checked before task dispatch
      - type: service_reachable
        url: "http://localhost:6379"
      - type: path_exists
        path: "src/"
    read_context:               # structured: path + optional line range
      - path: "src/config.py"
        start_line: 1
        end_line: 50
      - "src/types.py"          # simple string also works
    implementation:
      - "Create src/cache.py with get/set/invalidate"
      - "TTL-based expiration using Redis SETEX"
    tests:
      - "test-lab/run.sh verify-task 2 src/ --domain python"
      - "tests/test_cache.py::test_set_get_ttl"
    definition_of_done:
      command: "uv run pytest tests/test_cache.py -q"
      assertion: "Value expires after TTL"

Task Fields Reference

Runner (how the task executes)

Leave runner emptyplan.py auto-routes based on task shape. Set explicitly only to override.

Runner Auto-routed when Required Fields
local Has command, no prompt command
skill Has skill field set skill (+ optional skill_command, skill_args)
scillm Has prompt, no allowlist + DoD assertion prompt (backend/mode auto-filled)
code-runner Has prompt + allowlist + DoD command/assertion + blind_tests and genuinely needs iteration/isolation prompt, allowlist, definition_of_done.command, definition_of_done.assertion, blind_tests, read_context

Backend (which LLM model)

Backend scillm Model Best For Cost
text-claude Claude Sonnet 4.6 (OAuth) Boilerplate, scaffolding, monitoring Low
text-claude-opus Claude Opus 4.5 (OAuth) Architecture, novel design, cross-skill composition High
gpt-5.3-codex OpenAI Codex (OAuth) Code review, deep analysis, refactoring Medium
text-gemini-oauth Gemini via CLI (OAuth) Long content, large context, visual tasks Medium
text Chutes DeepSeek (PAYG) Batch work — high concurrency, no OAuth limits Low

Decision heuristic: "call existing script and check output" → text-claude. "understand 3 systems and wire them together" → text-claude-opus. "review this code" → gpt-5.3-codex. "batch 100 items" → text.

Note: OAuth models (text-claude*, gpt-5.3-codex) are NOT for batch — use text (Chutes) for high-concurrency batch work.

Mode (execution style)

Mode Use For
iterative Multi-turn agent work (coding, design iteration)
one_shot Single LLM inference (classification, extraction)
review Review/assessment tasks

Execution (plan-level scheduling)

Field Type Default Description
max_concurrency int 3 Max parallel tasks across all lanes
wave_barrier bool false If true, all wave N tasks must complete before wave N+1 starts
scheduling_policy string dependency_only dependency_only: run when deps satisfied. wave_then_dependency: enforce wave barrier first

Wave numbers are derived from lane IDs: lane "0", "0.1", "0.2" = wave 0; lane "1", "1.1" = wave 1.

Preconditions (per-task)

Checked before task dispatch. Task fails immediately if any precondition fails.

Type Required Field Description
path_exists path File or directory must exist
service_reachable url HTTP GET must return 2xx
env_var_set var Environment variable must be non-empty
command_succeeds command Shell command must exit 0
preconditions:
  - type: service_reachable
    url: "http://localhost:6379"
  - type: path_exists
    path: "src/config.py"
  - type: env_var_set
    var: "DATABASE_URL"
  - type: command_succeeds
    command: "docker ps | grep -q postgres"

Structured read_context

Files the LLM reads for context but does NOT write. Supports line ranges to reduce prompt size.

read_context:
  - "src/types.py"                    # full file (string)
  - path: "src/config.py"             # structured: full file
  - path: "src/large_module.py"       # structured: lines 100-200 only
    start_line: 100
    end_line: 200

Usage

# Emit YAML template
plan.py

# Start planning guidance for a new goal; requires matching /governance PASS
plan.py "Add Redis caching to API"

# Validate existing plan
plan.py --validate 01_TASKS.yaml

# Visualize execution DAG (waves, parallelism, routing)
plan.py --dag 01_TASKS.yaml

# Output DAG as Mermaid flowchart (for docs/PRs)
plan.py --mermaid 01_TASKS.yaml

# Convert legacy markdown to YAML
plan.py --convert 01_TASKS.md -o 01_TASKS.yaml

# Render YAML as markdown (for human review)
plan.py --render 01_TASKS.yaml

# Add a task to an existing plan (auto-assigns ID, wires deps)
plan.py --add-task 01_TASKS.yaml "title=Run integration tests|runner=local|lane=2|depends_on=2|command=pytest tests/"

# Remove a task (cleans up dangling deps)
plan.py --remove-task 01_TASKS.yaml:3

# Assess whether a completed /orchestrate session achieved the plan goal
plan.py --assess-result 01_TASKS.yaml --session /path/to/orchestrate/session --json

# Explicit opt-in execution loop: validate, review, orchestrate, assess, optionally replan
plan.py --execute-closure 01_TASKS.yaml --max-replans 2 --json

Use skills/plan/run.sh as the supported entrypoint. It pins the uv environment to /mnt/storage12tb/skills/plan/.venv and refuses repo-local PLAN_UV_ENV values.

Direct goal planning is gated in plan.py, not only in run.sh, but direct uv run --project skills/plan ... is unsupported unless UV_PROJECT_ENVIRONMENT is exported before uv starts. plan.py cannot move a repo-local .venv after uv has already created it.

Use PLAN_SKIP_GOVERNANCE=1 or --skip-governance only for developer maintenance and tests.

Pipeline Position

/governance -> /plan -> /review-plan -> /orchestrate

For design plans, the execution pipeline inside /orchestrate is:

/mockup-lab generate → /interview (human review) → /mockup-lab iterate
      ↓ approved design
/ux-lab (code React component)
      ↓ built component
/mockup-lab review (Gemini VLM visual diff)
      ↓ match_score < 90 → fix code → re-review
      ↓ match_score >= 90 → done
/test-interactions (verify interactions)
Skill Role
/governance Deterministic pre-plan understanding gate
/best-practices-plan Rules this skill MUST follow (read first)
/review-plan Validates the YAML before /orchestrate runs it
/orchestrate Executes the YAML with per-task dispatch
/test-lab Generates blind adversarial tests (required per task)
/memory Phase 0: check for prior solutions
/assess Phase 0: read the target codebase
/interview Gather requirements when goal is ambiguous
/dogpile Research unfamiliar dependencies
/mockup-lab Design generation (Stitch) + VLM review (scillm) for UI plans
/ux-lab React component development with Vite HMR

Common Mistakes

WRONG: Writing bespoke code when a skill already exists

tasks:
  - id: "1"
    title: "Implement PDF extraction"
    runner: "code-runner"
    # A /extractor skill already does this!

RIGHT: Check skill manifest first, call existing skills

cat ~/.pi/skills-manifest.json | python3 -c "..."  # search for existing skills
# Then in YAML: runner: local, command: ".pi/skills/extractor/run.sh ..."

WRONG: Skipping /review-plan and going straight to /orchestrate

/orchestrate run 01_TASKS.yaml  # untested plan, may have skill overlap or wrong routing

RIGHT: Always validate plan before execution

plan.py --validate 01_TASKS.yaml
/review-plan 01_TASKS.yaml  # must PASS before /orchestrate

WRONG: Design tasks without specifying --device for Stitch

tasks:
  - title: "Generate mockup"
    command: "mockup-lab generate"  # defaults to mobile!

RIGHT: Always specify device type for design tasks

tasks:
  - title: "Generate mockup"
    command: "mockup-lab generate --device desktop"

WRONG: Inline Python that reads .env without load_dotenv

tasks:
  - id: "1"
    title: "Query ArangoDB"
    runner: "local"
    command: |
      python3 -c "
      from arango import ArangoClient
      # Fails: ARANGO_PASS not loaded from .env
      client = ArangoClient().db('memory', password=os.environ['ARANGO_PASS'])
      "

RIGHT: Load .env before accessing environment variables

tasks:
  - id: "1"
    title: "Query ArangoDB"
    runner: "local"
    command: |
      python3 -c "
      from dotenv import load_dotenv, find_dotenv
      load_dotenv(find_dotenv())  # Load .env FIRST
      import os
      from arango import ArangoClient
      client = ArangoClient().db('memory', password=os.environ['ARANGO_PASS'])
      "
Install via CLI
npx skills add https://github.com/grahama1970/agent-skills --skill plan
Repository Details
star Stars 3
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator