name: run-plan description: Execute all remaining steps of a plan autonomously in a loop, without pausing between steps. Delegates each step to a subagent that runs the /next-step-taker workflow (execute, validate, review, update tracking). The main agent is purely an orchestrator — it reads the plan, spawns subagents, commits, and loops. Only stops on blockers (test failures, unresolved review findings, design decisions requiring user input) or when the plan is finished. Use when asked to "run the plan", "execute all steps", "finish the plan", "auto-run", or "keep going until done". Argument is the plan name (e.g., "/run-plan splash-modal-prerender"). argument-hint: plan-name
Run Plan Skill
Autonomously execute all remaining steps of a plan. The main agent is an orchestrator only — it reads the plan, delegates each step to a subagent running /next-step-taker, commits, and loops. It never directly edits implementation files, runs tests, or runs build/make commands.
Branch Guard
- If on
main/master: rungmas, suggest a feature branch, ask user to confirm before proceeding - If on a feature branch: proceed
Sandbox Rules
- Git commands (
git add,git commit,git status,git diff, etc.): run in default sandbox — never usedangerouslyDisableSandbox - Docker/make commands: use
dangerouslyDisableSandbox: true(Docker socket requires it) - Subagent prompts must include these rules so subagents follow them too
Workflow
Step 1: Locate and Read the Plan
- Find the plan in
plans/**matching $ARGUMENTS (plans are now nested atplans/<topic>/<name>.md) - Derive
<topic>from the plan file's parent directory name (e.g., if the plan is atplans/auth/login-flow.md,<topic>isauth) - Read the full plan to understand scope and remaining work
- Count total steps and identify which are already complete (
[x]) vs remaining ([ ]) - Report to user: "Found N remaining steps out of M total. Starting from Phase/Step X."
Step 2: Execution Loop
For each remaining incomplete step, the main agent orchestrates:
2a. Spawn Execution Subagent
Launch a subagent via the Agent tool with this prompt pattern:
Run /next-step-taker for the plan "<plan-name>".
The next incomplete step is Step N: <step name>.
Important overrides for this run:
- Do NOT pause at the end to ask the user — complete the full workflow (execute, validate, review, fix, update tracking) and return your final report.
- Follow all CLAUDE.md guidelines.
- Use dangerouslyDisableSandbox: true for Docker/make commands only. Never for git commands.
- CRITICAL: Never dismiss a test failure as "pre-existing" or "flaky" because the test file wasn't modified on this branch. Current changes can break tests indirectly (shared fixtures, CSS/selector changes, templates, timing, imports). For every failure: (1) read the traceback, (2) check if branch changes could affect the failing path, (3) fix if related, (4) if confirmed unrelated, rerun in isolation 2-3 times and report findings.
The subagent handles the entire next-step-taker workflow internally:
- Execute the step (read files, implement changes)
- Validate (build, tests via its own sub-subagents)
- Review pipeline (3 parallel review subagents + fix subagent)
- Update plan tracking (checkmarks, dates)
- Return its final report (what changed, validation results, review findings)
2b. Evaluate Subagent Result
The main agent reads the subagent's report and decides:
Auto-continue when ALL of these are true:
- Validation passed (build clean, tests green)
- Review pipeline returned all PASS or all findings were fixed
- No unresolved items requiring user decisions
Stop and ask when ANY of these are true:
- Test failures that couldn't be auto-fixed
- Review has UNRESOLVED findings (require user decision)
- Fix subagent returned
VALIDATION: FAIL - The step's plan instructions were ambiguous
- The plan's
finishedflag was set totrue(all steps done)
2c. Smoke Test After UI-Affecting Steps
If the completed step changed frontend code (JS, templates, CSS) or test locators/selectors, spawn a smoke test subagent before committing to catch issues early:
Run a quick UI smoke test against built assets. Execute:
make test-ui-parallel-built n=2
Write the full test output to /tmp/claude/smoke-test-step-N.txt.
Report: total passed, total failed. If failures, include test names and error summaries.
CRITICAL: Every Bash call that runs `make` or `docker` MUST set dangerouslyDisableSandbox: true.
Example:
Bash(command: "make test-ui-parallel-built n=2 > \"/tmp/claude/smoke-test-step-N.txt\" 2>&1", dangerouslyDisableSandbox: true)
If the smoke test fails, enter the Test Fix Loop (Section 2e) before committing. If it passes, proceed to commit.
2d. Commit (via subagent)
Spawn a commit subagent to handle the entire commit workflow. The main agent must never load or execute /git-commit itself — doing so pulls staging, diff analysis, message drafting, and pre-commit hook fix loops into the main context window.
Commit the current changes using /git-commit.
Follow all CLAUDE.md guidelines.
Never use dangerouslyDisableSandbox for git commands.
Why a subagent: /git-commit involves reading diffs, generating messages, and potentially multiple pre-commit fix iterations — all of which pollute the orchestrator's context if run inline. The subagent absorbs this work and returns only a short summary.
2e. Test Fix Loop (when tests fail)
When any test run (smoke test, final suite, or step validation) reports failures:
- Test runner subagent writes full output to a temp file (
/tmp/claude/<descriptive-name>.txt) - Main agent reads the temp file to understand failure count and patterns
- Fix subagent is spawned with:
Fix the following UI/integration test failures. The full test output is at: <path-to-temp-file> Read this file to understand all failures, then fix them. <include user decisions if any were provided> CRITICAL: Never dismiss a failure as "pre-existing" — check if branch changes could affect the failing path indirectly (shared fixtures, CSS, templates, timing, imports). If confirmed unrelated, rerun in isolation 2-3 times to verify flakiness. After fixing, run `make vite-build` and `make test-js` to verify JS changes. Do NOT run the full test suite — just implement fixes. CRITICAL: Every Bash call that runs `make` or `docker` MUST set dangerouslyDisableSandbox: true. Example: Bash(command: "make vite-build > \"/tmp/claude/vite-build.txt\" 2>&1", dangerouslyDisableSandbox: true) - Re-run test subagent writes output to a new temp file
- Repeat up to 3 iterations. If failures persist after 3 rounds, stop and report to user.
- Clean up temp files — delete all temp test output files once tests pass or the loop exits.
Step 3: Completion
When all steps are done or the plan is marked finished:
- Run the full test suite via subagents — sequentially, never simultaneously:
- Spawn integration test subagent:
make test-integration-parallel. Write output to/tmp/claude/final-integration-results.txt. - After it completes, spawn UI test subagent:
make test-ui-parallel-built. Write output to/tmp/claude/final-ui-results.txt. - Always use
test-ui-parallel-built— UI tests must run against built Vite assets, never the dev server. - CRITICAL: Tell each test subagent that every Bash call running
makeordockerMUST setdangerouslyDisableSandbox: true. Include an example:Bash(command: "make test-integration-parallel > \"/tmp/claude/final-integration-results.txt\" 2>&1", dangerouslyDisableSandbox: true). - Main agent reads each result file to determine pass/fail.
- Investigate every failure — never dismiss a failure as "pre-existing" or "flaky" because the test file wasn't modified on this branch. Current changes can break tests indirectly (shared fixtures, CSS/selector changes, templates, timing, imports). For each failure: read the traceback, check if branch changes could affect the failing path, and either fix it or confirm it's unrelated by rerunning in isolation 2-3 times.
- Spawn integration test subagent:
- If failures exist, enter the Test Fix Loop (Section 2e).
- Clean up all temp test output files.
- Delete all files in
plans/<topic>/tmp/— this is the subagent communication directory created during the run; it is not needed after completion. - Report final summary:
Plan "<name>" — COMPLETE
Steps completed this run: X
Total steps: Y
Test results: integration PASS/FAIL, UI PASS/FAIL
All changes committed. Ready for /git-push when you are.
If failures persist after the fix loop, report them and stop for user guidance.
Key Differences from next-step-taker
| Behavior | next-step-taker | run-plan |
|---|---|---|
| Pause after each step | Always | Only on blockers |
| Auto-commit | No | Yes, via commit subagent |
| Final test suite | No | Yes, runs all tests at end |
| Scope | Single step | All remaining steps |
| Main agent edits code | Never | Never — delegates to subagent |
Important Notes
- Main agent is orchestrator only — never directly edit implementation files, run tests, run make commands, or make code changes. ALL execution (code changes, tests, builds, make targets) MUST be delegated to subagents. This includes the final test suite in Step 3.
- Main agent CAN: read the plan, read temp test output files, run
git diff --name-only, spawn subagents (including commit subagents), re-read the plan between steps, and report progress. Main agent CANNOT invoke/git-commitdirectly — it must always be delegated to a subagent. - Tests run via synchronous Bash inside subagents. Subagents invoke
make test-*synchronously (withdangerouslyDisableSandbox: true) and block until it exits. The orchestrator waits for the subagent's Agent-tool reply — that reply IS the completion signal. Do not poll the subagent's result file while the subagent is still in flight, do not arm a Monitor on it, and do not reach into a container to inspect process state (docker compose exec ... pgrep, rawps, etc.). After the subagent replies, the orchestrator may read/tmp/claude/test-*.txtfor full output. - Each subagent runs the full /next-step-taker workflow — including its own validation and review sub-subagents. The main agent does not duplicate that work.
- Test output goes to temp files — test runner subagents write output to
/tmp/claude/<name>.txt. The main agent or fix subagent reads from these files. Clean up temp files when no longer needed. - Sandbox discipline — git commands use default sandbox; Docker/make commands use
dangerouslyDisableSandbox: true. Include this rule in all subagent prompts. - If a step modifies the plan itself (e.g., adds sub-steps), the main agent re-reads the plan before continuing.
- When stopping on a blocker, report: which step failed, what was tried, what needs user input.
- Investigate every test failure — never dismiss a failure as "pre-existing" or "flaky" because the test file wasn't modified on this branch. Current changes can break tests indirectly (shared fixtures, CSS/selector changes, templates, timing, imports). For each failure: read the traceback, check if branch changes could affect the failing path, and either fix it or confirm it's unrelated by rerunning in isolation 2-3 times. Include this rule in all subagent prompts that run or evaluate tests.