plan - SKILL.md Agent Skill

name: plan description: > Create orchestration-ready YAML task files (0N_TASKS.yaml) for /orchestrate. Decomposes goals into tasks with explicit runner, backend, mode, and lane fields. Supports code-only, design-only, hybrid plans, and explicit opt-in goal-closure execution loops. Use when user says "plan this", "create task file", "break this down into tasks", or "run the plan until the goal is closed". allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Task, AskUserQuestion triggers: - plan this - create task file - break this down - prepare tasks - plan implementation - create 0N_TASKS - task breakdown - decompose this - let's plan - run plan until done - execute plan with goal closure - plan and iterate until complete - assess whether the plan goal was achieved metadata: short-description: Create orchestration-ready YAML task files provides: - task-planning composes: - governance - memory - assess - task-monitor - best-practices-plan - review-plan - recommend-skill-chain - orchestrate read_before_use: - plan.py - src/plan_skill/code_runner_contract.py - src/plan_skill/dag.py - src/plan_skill/goal_closure.py - src/plan_skill/mutations.py - design_pipeline.py - interviews.py taxonomy: - orchestration - planning

STOP. READ THIS ENTIRE SKILL.MD BEFORE CALLING ANY ENDPOINT.

/plan

Create YAML task files that /orchestrate executes directly. No markdown intermediate.

Before writing any plan, run /governance and read /best-practices-plan - governance proves task understanding before planning starts, and best-practices-plan has the rules this skill enforces.

Workflow

The full pipeline is: /governance -> /plan -> /review-plan -> /orchestrate. The agent runs all four when the user says /plan. The user only needs to say /plan once.

1. /governance        - Prove understanding before planning (BLOCKING)
2. /memory recall     - Check if this problem was already solved
3. SKILL DISCOVERY    - Find existing skills that do what's needed (BLOCKING)
4. /assess            - Read the target codebase
5. Identify persona   - WHO uses this? Name them.
6. Decompose          - Break into tasks with runner/backend/mode
7. Output YAML        - Write 0N_TASKS.yaml
8. plan.py --dag      - Show execution DAG to human for approval
9. plan.py --validate - Schema validation
10. /review-plan      - Full validation (claims, routing, blind tests, overlap)
11. If PASS -> ask human: "Plan ready. Run /orchestrate?"
12. If human approves -> /orchestrate run 0N_TASKS.yaml
13. Optional, only when explicitly requested: run deterministic goal closure.

Optional Goal-Closure Loop

Normal /plan remains plan-first: it writes and validates YAML, then asks before execution. Do not run the closure loop unless the user explicitly asks for "execute until done", "run with goal closure", "iterate until complete", or equivalent wording.

When requested, /plan becomes the deterministic outer loop:

/plan --execute-closure 0N_TASKS.yaml --max-replans N
  -> plan.py --validate
  -> /review-plan review
  -> /orchestrate run
  -> /plan --assess-result using the orchestrate session
  -> stop on goal_achieved, or write follow-up/interview artifacts

Closed outcomes:

goal_achieved
partially_achieved
blocked
wrong_plan
insufficient_evidence

Closed recommended actions:

none
create_followup_plan
revise_existing_plan
ask_human

/plan decides whether to create a follow-up/amended plan or ask the human:

Closure result	Plan response
`goal_achieved`	Stop successfully
`partially_achieved` with failed tasks	Create a follow-up plan stub for remaining work
`blocked`	Write an `/interview` request artifact
`wrong_plan`	Stop before execution and write an `/interview` request artifact
`insufficient_evidence`	Write an `/interview` request artifact

/orchestrate remains the execution engine and evidence producer. It does not own replanning. /code-runner remains a bounded worker inside /orchestrate.

Step 2: Runner Selection (which tasks get /code-runner)

Not every task needs /code-runner. Code-runner is EXPENSIVE — worktree isolation, git commit/revert cycle, multi-round LLM loop, T0 scoring, memory learning. Most tasks are simpler than that.

Runner is auto-routed if left empty. Set it explicitly only when the heuristic is wrong.

Runner	When to use	Auto-routed when	Example
`local`	Shell command, no LLM needed	Has `command`, no `prompt`	`pytest`, `npm install`, `ruff check --fix`
`scillm`	One-shot LLM call, simple edit	Has `prompt`, no `allowlist`+DoD assertion	Add a field, rename variable, generate docstring, classify
`code-runner`	Complex bounded code task with verification, iteration, and context isolation	Has `prompt` + `allowlist` + DoD command/assertion + `blind_tests`	Fix multi-file bug, implement feature with test suite

Default away from code-runner. Use it ONLY when ALL of these are true:

The task writes/edits 1-3 specific files (use allowlist)
There's a runnable DoD command with a verifiable assertion
The DoD is expected to fail before the code change; already-passing gates are local
The fix may need multiple attempts (not a mechanical edit)
The task benefits from context isolation: only the allowlisted files should be writable
Dependencies are known and listed in read_context
Hidden/blind checks are listed in blind_tests for the orchestrate information barrier
The DoD command does NOT require a live server (code-runner uses git worktree isolation — the running dev server serves from the main working directory, not the worktree)

Do NOT use code-runner for:

setup/bootstrap/deployment orchestration
Docker Compose validation
config/docs/changelog edits
pure test/lint/build gates
already-implemented work where the DoD already passes
broad architecture/design decisions

For those, use local for deterministic commands or scillm for one-shot edits.

Concurrency rule: if multiple code-runner tasks share one cwd, set execution.max_concurrency: 1 unless they run in isolated worktrees or are patch-only tasks with disjoint allowlists. If any task uses apply_to_source: true, /orchestrate serializes tasks sharing that cwd.

NEVER use code-runner when the DoD calls a live HTTP endpoint (e.g. curl http://localhost:3001/...). Code-runner edits files in an isolated worktree. The dev server doesn't see those edits. The DoD curl will always hit the OLD code and fail. Use scillm (one-shot edit to the working directory) + a separate local task to restart the server and verify with curl.

Use scillm (not code-runner) for:

Mechanical edits: add a field, update an import, rename a variable
Config changes: update YAML, add an entry to a list
Text generation: docstrings, summaries, classifications
Any task where "just do it once, correctly" is sufficient
Server endpoint code where the DoD requires curl to a live server — edits must land in the working directory, not a worktree

Use local for:

Running tests, linters, formatters, build commands
File operations: copy, move, create directories
Anything that's a shell command, not LLM reasoning

code-runner fields the project agent must provide:

allowlist — files the LLM can write (scope boundary)
read_context — files the LLM should read for interface context (NOT write)
definition_of_done — runnable verification command with assertions
blind_tests — hidden assertions in /test-lab (required for code-runner, enforced by /review-plan)
apply_to_source — explicit complete-task opt-in; default false
commit_on_success — required true when apply_to_source is true
rollback_on_failure — required true when apply_to_source is true

If any of these are missing, the plan is wrong. Route the task to local/scillm, or split the plan into: one implementation task and one separate local verification task.

For a source-mutating code task, prefer complete-task mode:

apply_to_source: true
commit_on_success: true
rollback_on_failure: true

The plan must still provide blind_tests. /orchestrate runs those tests after the source commit and reverts that commit if hidden checks fail.

Step 2b: Code-runner course corrections

If /orchestrate dispatches a task to /code-runner and preflight returns status: preflight_fail with course_correction, /plan owns the amendment. Do not ask code-runner to continue and do not let it silently mutate YAML.

Use the correction this way:

Read {task_id}.course_correction.json or result.json.course_correction.
Apply the recommended reroute/amend/split to the plan YAML.
Re-run plan.py --validate.
Re-run /review-plan.
Resume /orchestrate only after validation passes.

Valid outcomes:

`recommended_action`	Plan response
`use_local`	Change task to deterministic `local` command
`use_scillm`	Change task to one-shot `scillm` prompt
`amend_code_runner_contract`	Keep `code-runner`, but add missing prompt/allowlist/DoD/blind tests
`split_task`	Split implementation from live verification
`reroute_or_amend`	Choose the smallest safe route based on the correction rationale

Step 3: Skill Discovery (BLOCKING — do NOT skip)

Before writing ANY task, check if an existing skill already does it. The agent MUST NOT write bespoke code when a skill exists. This is the #1 source of architectural debt.

How to check:

# Search the manifest (fastest — one file, all 225+ skills)
cat ~/.pi/skills-manifest.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
for s in data['skills']:
    d = (s.get('description') or '').lower()
    if any(kw in d for kw in ['cache', 'redis', 'api']):
        print(f'  /{s[\"name\"]}: {s[\"description\"][:100]}')
"

# Or search skill names directly
ls ~/.pi/skills/ | grep -i cache

# Or ask memory
/memory recall "skill:caching" OR "skill:redis"

Decision for each piece of work:

Existing skill covers it?	Action
Yes, fully	CALL the skill. Do NOT rewrite it.
Yes, 60%+	EXTEND the skill. Add what's missing.
No match	CREATE new code — but document WHY in capability_overlap.

Every task in the YAML should map to CALL, EXTEND, or CREATE. If the plan is mostly CREATE, you haven't looked hard enough. /review-plan will flag tasks that overlap with existing skills as FAIL.

Step 4: Compliance Governance (BLOCKING for SPARTA/CAE plans)

Plans that involve CAE (Claims-Arguments-Evidence) trees, SPARTA controls, compliance verdicts, or posture assessments MUST enforce "analyst workbench, not truth engine":

Non-negotiable rules:

Requirement	Implementation
NEEDS_VERIFICATION default	All CAE claims/verdicts default to NEEDS_VERIFICATION, never auto-PASS
Human review gate	Every posture change requires compliance officer review before status change
No autonomous certification	Agent NEVER writes "certified", "compliant", "approved" without human gate
Retrieval language only	Use "retrieved", "found", "extracted" — never "determined", "verified", "confirmed"

In YAML plans, add to metadata:

metadata:
  compliance_governance:
    principle: "Analyst workbench, not truth engine"
    verdicts: "All default to NEEDS_VERIFICATION"
    human_gate: "Compliance officer reviews before status change"

In task prompts, include:

COMPLIANCE GOVERNANCE: This is an analyst workbench, not a truth engine.
- All verdicts default to NEEDS_VERIFICATION
- Human reviews before any status change
- Use retrieval language ("retrieved", "found"), not judgment language ("verified", "confirmed")

/review-plan will FAIL plans that:

Auto-approve verdicts without human gates
Use "certified", "compliant", "approved" without human review step
Missing compliance_governance metadata for SPARTA/CAE plans

Reference: docs/WALKTHROUGH_QRA_COVERAGE_AND_EVIDENCE_CASES.md

Steps 7-11 happen automatically after writing the YAML. The human only intervenes if /review-plan FAILs (fix the plan) or if they want to amend the DAG before execution.

If /review-plan FAILs

Triage each failure:

Failure type	Action
Missing field (mode, DoD command)	Fix it yourself — no human input needed
Wrong runner/backend	Fix if obvious, ask human if ambiguous
Missing capability_overlap	Run `/memory recall` and fill it in
Skill overlap detected	Ask human: use existing skill or justify new code?
Ambiguous requirements	Use `/interview` for structured choices
Claims don't match codebase	Re-read the code with `/assess`

After fixes, re-run /review-plan. Do NOT proceed to /orchestrate until PASS. Do NOT silently skip FAILs — every FAIL must be resolved or explicitly waived by the human.

Plan Types

Auto-detected from goal text. All types use the same YAML schema.

Type	Detected When	Pattern
code	No UI keywords	`local` for setup/verification, `scillm` for one-shot edits, `code-runner` only for bounded iterative implementation
design	views, components, TSX, dashboard, UI, React	`/mockup-lab` (Stitch) → `/ux-lab` (code) → `/mockup-lab review` (VLM verify) → `/test-interactions`
hybrid	Both UI and code keywords	Stitch pipeline for UI views, code tasks in later waves

Design Plans

Rule: The agent NEVER designs UI. Stitch designs it. The agent codes it.

Design plans MUST specify device type (desktop, mobile, tablet). Pass --device to every /mockup-lab command. Stitch defaults to mobile if not specified.

For any plan with UI work, each component follows 3 steps:

/mockup-lab — Stitch generates design, human approves (--device desktop)
/ux-lab — Agent codes React component from approved screenshots
/mockup-lab review — Gemini VLM verifies implementation matches design

For small changes (colors, spacing, adding a column), skip step 1 and use /ux-lab + /review-design directly.

Read .pi/skills/mockup-lab/design-to-code.yaml for the detailed checklist.

DoD for UI tasks must be visual — "tsc compiles" is not done. Done means a screenshot shows real data matching the approved design.

YAML Schema

version: 1
kind: orchestrate-plan

metadata:
  title: "Feature Name"
  goal: "one-line summary"
  plan_type: code          # code, design, or hybrid
  created: "2026-03-17"
  primary_persona:         # WHO uses this (required)
    name: "Nico Bailon"
    role: "QA Engineer"
    source: ".pi/agents/nico-bailon/AGENTS.md"

execution:
  max_concurrency: 3       # parallel lanes
  wave_barrier: false      # if true, wave N must complete before wave N+1 starts
  scheduling_policy: dependency_only  # or wave_then_dependency

capability_overlap:        # Phase 0 evidence (required)
  - "/memory recall returned: no prior Redis caching solution"
  - "Checked /fetcher, /dogpile — no overlap"

questions_blockers:
  - "None"

lanes:
  - id: "0"
    label: "Wave 0: Setup"
  - id: "1"
    label: "Wave 1: Implementation"
  - id: "2"
    label: "Wave 2: Validation"

tasks:
  - id: "1"
    title: "Add Redis to docker-compose"
    lane: "0"
    runner: "local"          # deterministic shell command
    backend: ""              # no LLM needed
    mode: ""
    depends_on: []
    command: "docker compose up -d redis && redis-cli ping"
    definition_of_done:
      command: "redis-cli ping"
      assertion: "Returns PONG"

  - id: "2"
    title: "Create cache utility module"
    lane: "1"
    runner: "code-runner"       # self-improvement loop
    backend: "codex"            # which LLM
    mode: "iterative"           # iterative/one_shot/review
    depends_on: ["1"]
    preconditions:              # checked before task dispatch
      - type: service_reachable
        url: "http://localhost:6379"
      - type: path_exists
        path: "src/"
    read_context:               # structured: path + optional line range
      - path: "src/config.py"
        start_line: 1
        end_line: 50
      - "src/types.py"          # simple string also works
    implementation:
      - "Create src/cache.py with get/set/invalidate"
      - "TTL-based expiration using Redis SETEX"
    tests:
      - "test-lab/run.sh verify-task 2 src/ --domain python"
      - "tests/test_cache.py::test_set_get_ttl"
    definition_of_done:
      command: "uv run pytest tests/test_cache.py -q"
      assertion: "Value expires after TTL"

Task Fields Reference

Runner (how the task executes)

Leave runner empty — plan.py auto-routes based on task shape. Set explicitly only to override.

Runner	Auto-routed when	Required Fields
`local`	Has `command`, no `prompt`	`command`
`skill`	Has `skill` field set	`skill` (+ optional `skill_command`, `skill_args`)
`scillm`	Has `prompt`, no `allowlist` + DoD assertion	`prompt` (backend/mode auto-filled)
`code-runner`	Has `prompt` + `allowlist` + DoD command/assertion + `blind_tests` and genuinely needs iteration/isolation	`prompt`, `allowlist`, `definition_of_done.command`, `definition_of_done.assertion`, `blind_tests`, `read_context`

Backend (which LLM model)

Backend	scillm Model	Best For	Cost
`text-claude`	Claude Sonnet 4.6 (OAuth)	Boilerplate, scaffolding, monitoring	Low
`text-claude-opus`	Claude Opus 4.5 (OAuth)	Architecture, novel design, cross-skill composition	High
`gpt-5.3-codex`	OpenAI Codex (OAuth)	Code review, deep analysis, refactoring	Medium
`text-gemini-oauth`	Gemini via CLI (OAuth)	Long content, large context, visual tasks	Medium
`text`	Chutes DeepSeek (PAYG)	Batch work — high concurrency, no OAuth limits	Low

Decision heuristic: "call existing script and check output" → text-claude. "understand 3 systems and wire them together" → text-claude-opus. "review this code" → gpt-5.3-codex. "batch 100 items" → text.

Note: OAuth models (text-claude*, gpt-5.3-codex) are NOT for batch — use text (Chutes) for high-concurrency batch work.

Mode (execution style)

Mode	Use For
`iterative`	Multi-turn agent work (coding, design iteration)
`one_shot`	Single LLM inference (classification, extraction)
`review`	Review/assessment tasks

Execution (plan-level scheduling)

Field	Type	Default	Description
`max_concurrency`	int	3	Max parallel tasks across all lanes
`wave_barrier`	bool	false	If true, all wave N tasks must complete before wave N+1 starts
`scheduling_policy`	string	`dependency_only`	`dependency_only`: run when deps satisfied. `wave_then_dependency`: enforce wave barrier first

Wave numbers are derived from lane IDs: lane "0", "0.1", "0.2" = wave 0; lane "1", "1.1" = wave 1.

Preconditions (per-task)

Checked before task dispatch. Task fails immediately if any precondition fails.

Type	Required Field	Description
`path_exists`	`path`	File or directory must exist
`service_reachable`	`url`	HTTP GET must return 2xx
`env_var_set`	`var`	Environment variable must be non-empty
`command_succeeds`	`command`	Shell command must exit 0

preconditions:
  - type: service_reachable
    url: "http://localhost:6379"
  - type: path_exists
    path: "src/config.py"
  - type: env_var_set
    var: "DATABASE_URL"
  - type: command_succeeds
    command: "docker ps | grep -q postgres"

Structured read_context

Files the LLM reads for context but does NOT write. Supports line ranges to reduce prompt size.

read_context:
  - "src/types.py"                    # full file (string)
  - path: "src/config.py"             # structured: full file
  - path: "src/large_module.py"       # structured: lines 100-200 only
    start_line: 100
    end_line: 200

Usage

# Emit YAML template
plan.py

# Start planning guidance for a new goal; requires matching /governance PASS
plan.py "Add Redis caching to API"

# Validate existing plan
plan.py --validate 01_TASKS.yaml

# Visualize execution DAG (waves, parallelism, routing)
plan.py --dag 01_TASKS.yaml

# Output DAG as Mermaid flowchart (for docs/PRs)
plan.py --mermaid 01_TASKS.yaml

# Convert legacy markdown to YAML
plan.py --convert 01_TASKS.md -o 01_TASKS.yaml

# Render YAML as markdown (for human review)
plan.py --render 01_TASKS.yaml

# Add a task to an existing plan (auto-assigns ID, wires deps)
plan.py --add-task 01_TASKS.yaml "title=Run integration tests|runner=local|lane=2|depends_on=2|command=pytest tests/"

# Remove a task (cleans up dangling deps)
plan.py --remove-task 01_TASKS.yaml:3

# Assess whether a completed /orchestrate session achieved the plan goal
plan.py --assess-result 01_TASKS.yaml --session /path/to/orchestrate/session --json

# Explicit opt-in execution loop: validate, review, orchestrate, assess, optionally replan
plan.py --execute-closure 01_TASKS.yaml --max-replans 2 --json

Use skills/plan/run.sh as the supported entrypoint. It pins the uv environment to /mnt/storage12tb/skills/plan/.venv and refuses repo-local PLAN_UV_ENV values.

Direct goal planning is gated in plan.py, not only in run.sh, but direct uv run --project skills/plan ... is unsupported unless UV_PROJECT_ENVIRONMENT is exported before uv starts. plan.py cannot move a repo-local .venv after uv has already created it.

Use PLAN_SKIP_GOVERNANCE=1 or --skip-governance only for developer maintenance and tests.

Pipeline Position

/governance -> /plan -> /review-plan -> /orchestrate

For design plans, the execution pipeline inside /orchestrate is:

/mockup-lab generate → /interview (human review) → /mockup-lab iterate
      ↓ approved design
/ux-lab (code React component)
      ↓ built component
/mockup-lab review (Gemini VLM visual diff)
      ↓ match_score < 90 → fix code → re-review
      ↓ match_score >= 90 → done
/test-interactions (verify interactions)

Skill	Role
`/governance`	Deterministic pre-plan understanding gate
`/best-practices-plan`	Rules this skill MUST follow (read first)
`/review-plan`	Validates the YAML before `/orchestrate` runs it
`/orchestrate`	Executes the YAML with per-task dispatch
`/test-lab`	Generates blind adversarial tests (required per task)
`/memory`	Phase 0: check for prior solutions
`/assess`	Phase 0: read the target codebase
`/interview`	Gather requirements when goal is ambiguous
`/dogpile`	Research unfamiliar dependencies
`/mockup-lab`	Design generation (Stitch) + VLM review (scillm) for UI plans
`/ux-lab`	React component development with Vite HMR

Common Mistakes

WRONG: Writing bespoke code when a skill already exists

tasks:
  - id: "1"
    title: "Implement PDF extraction"
    runner: "code-runner"
    # A /extractor skill already does this!

RIGHT: Check skill manifest first, call existing skills

cat ~/.pi/skills-manifest.json | python3 -c "..."  # search for existing skills
# Then in YAML: runner: local, command: ".pi/skills/extractor/run.sh ..."

WRONG: Skipping /review-plan and going straight to /orchestrate

/orchestrate run 01_TASKS.yaml  # untested plan, may have skill overlap or wrong routing

RIGHT: Always validate plan before execution

plan.py --validate 01_TASKS.yaml
/review-plan 01_TASKS.yaml  # must PASS before /orchestrate

WRONG: Design tasks without specifying --device for Stitch

tasks:
  - title: "Generate mockup"
    command: "mockup-lab generate"  # defaults to mobile!

RIGHT: Always specify device type for design tasks

tasks:
  - title: "Generate mockup"
    command: "mockup-lab generate --device desktop"

WRONG: Inline Python that reads .env without load_dotenv

tasks:
  - id: "1"
    title: "Query ArangoDB"
    runner: "local"
    command: |
      python3 -c "
      from arango import ArangoClient
      # Fails: ARANGO_PASS not loaded from .env
      client = ArangoClient().db('memory', password=os.environ['ARANGO_PASS'])
      "

RIGHT: Load .env before accessing environment variables

tasks:
  - id: "1"
    title: "Query ArangoDB"
    runner: "local"
    command: |
      python3 -c "
      from dotenv import load_dotenv, find_dotenv
      load_dotenv(find_dotenv())  # Load .env FIRST
      import os
      from arango import ArangoClient
      client = ArangoClient().db('memory', password=os.environ['ARANGO_PASS'])
      "