name: execute-plan
description: "Use when the user says 'execute the plan', 'run the plan', 'implement the plan', '执行计划', '跑计划', or has a verified plan file ready for execution. Executes plan tasks in segments via the Workflow tool, with explicit hard-stop checkpoint gates (batch 1 always; any batch with a dependency hub; any batch with an explicit <!-- checkpoint --> marker) — at each hard-stop the skill presents the segment summary and waits for the user to say 'continue'. Cross-session resume is authoritative via the on-disk checkpoint file .claude/execute-plan-checkpoint.json (completed map). Reports blocked/failed tasks for the user to fix — does not attempt fixes itself. Not when: plan has not been verified (run verify-plan first). Also invoked by run-phase at Step 4."
model: sonnet
Overview
This skill orchestrates plan execution in three layers:
Main agent (this skill) — owns ALL bookkeeping file I/O: parse plan, run
compute_checkpoints.py, and is the SOLE writer of BOTH the checkpoint file (completedmap + segment metadata) andexecution-report.md, written per-segment from the Workflow's schema-validated results array. Owns ALL user gating at hard-stops. Invokes the Workflow tool once per segment.Workflow script (
${CLAUDE_PLUGIN_ROOT}/skills/execute-plan/execute-plan.workflow.js) — owns the deterministic per-segment loop. Receives a segment payload viaargs, dispatches oneagent()per task withmodel:'sonnet'and the{task_id, status, files_written, evidence}schema, honors dependency-skip, returns the per-task results array. Has no filesystem access; cannot pause for user input. One Workflow invocation = one segment (a segment runs from the current position up to and including the next hard-stop batch, or the end).Task-executor agent (
dev-workflow:execute-plan, model=sonnet) — invoked by the Workflow script for each task. Makes the task's code/file changes, runs its**Verify:**, and returns the structured{task_id, status, files_written, evidence}result. Writes NO bookkeeping files — not the checkpoint, not the report. (This is the single-writer model: delegating mutation of the shared resume-critical JSON to per-task agents produced inconsistent shapes, so the main agent records everything from the reliable schema-validated return.)
A hard-stop is a natural Workflow return point — the script returns at the end of a hard-stop batch, the main agent presents the segment summary, and waits for "continue" before invoking the next segment.
Workflow opt-in note: The Workflow tool requires a sanctioned trigger — a skill whose instructions tell it to call Workflow. This skill IS that sanctioned trigger; calling Workflow({scriptPath, args}) from this skill is the authorized entry point, not "ultracode".
Cross-session resume: Workflow({resumeFromRunId}) is same-session only. The on-disk .claude/execute-plan-checkpoint.json is the sole cross-session source of truth. resumeFromRunId is treated as a same-session cache optimization; if it disagrees with the checkpoint's completed map, the checkpoint wins. Only ids absent from completed are passed to the workflow.
The plan's final verification task is no longer required — full test suite execution is handled separately by dev-workflow:test-changes.
Process
Step 1: Pre-checks
- Read the plan file
- Verification pre-check: Look for a
## Verificationsection withVerdict: Approvedin the plan file- If found: verification is done, continue
- If not found: invoke
dev-workflow:verify-planbefore proceeding. If verify-plan returns "must-revise", apply revisions and re-verify before continuing
- Task Contract pre-check:
- If plan frontmatter has
contract_version: 1or later, every task in the execution range must include**Task Contract:**. - If any selected task lacks
Task Contract, collect its id into acontract_failedlist (do NOT write the checkpoint here — it does not exist until Step 2.3). These ids are recorded asfailedduring the post-segment reconciliation (Step 2.4, "Reconcile") once the file exists, and are passed infailed_or_blockedso their dependents are skipped. Never dispatch a contract-failed task. - If
contract_versionis missing, treat the plan as legacy mode: warn once in the execution report and continue.
- If plan frontmatter has
- Decision Points: If the plan file contains a
## Decisionssection with unresolved decisions (no**Chosen:**line), present them before dispatching:- For each DP, write a short-form translation in the
questionfield: one-line summary of what the decision controls + each option prefixed with its originalA:/B:/C:label describing what concretely happens. Do NOT paste the full DP block (Context / Options / Recommendation headings) verbatim. The plan file's DP body stays unchanged. - For each
blockingdecision: present via AskUserQuestion (one call per DP). - For
recommendeddecisions: batch via a single AskUserQuestion; all content inside thequestionfield, DPs separated by\n---\n, ending with\n\n全部接受推荐,还是逐个审查?. - If the user does NOT choose to accept all: present each DP individually via separate AskUserQuestion calls. Do not assume any DP is accepted until the user explicitly confirms it.
- Record user choices: edit the plan file, replace
**Recommendation:**or**Recommendation (unverified):**with**Chosen:** {Option A | B | C}using the original label.
- For each DP, write a short-form translation in the
Behavior Note: Plan-time test-impl split pattern
Some plans contain task pairs like Task N-tests and Task N-impl — a single conceptual unit split across two tasks. The pair is treated as a single unit for batch grouping by compute_checkpoints.py (pair-keep rule: a batch may run one over batch_size to keep a -tests/-impl pair intact, so no batch boundary — and therefore no hard-stop — lands mid-pair). Each pair member is dispatched to the task-executor agent as a normal sequential task; the second agent reads the now-committed test files as input and is constrained by Task N-impl's Regression shield: to not modify them.
If you encounter a single task whose Files list includes BOTH test and non-test files (i.e., the plan was hand-written without the split), execute it as-is — do not attempt to split at dispatch time (that would require agent contract changes not implemented in this codebase).
Step 2: Initialize and Dispatch (segmented)
Read the plan file and count total tasks (count
### Task N:/### Task N-tests:/### Task N-impl:headings; canonical ids are"<N>"/"<N>-tests"/"<N>-impl").Run
compute_checkpoints.pyagainst the plan file to obtain the batch plan:python3 ${CLAUDE_PLUGIN_ROOT}/skills/execute-plan/scripts/compute_checkpoints.py <plan_file> --k 3 --batch-size 5Parse the JSON output:
{batch_size, total, batches, dependents, hard_stops, tasks}. Thetasksarray is[{id, depends_on}]— the per-task dependency edges the workflow needs for dependency-skip. (Forward it in the args payload; do NOT re-derive deps from the plan by hand.) Note:depends_oncaptures one**Depends on:**ref per task (the canonical plan form), not multiple.Initialize / resume checkpoint file at
.claude/execute-plan-checkpoint.json:- If exists and
statusisin_progress: resume — readcompletedmap; only ids NOT incompletedwith statusdoneare pending.- Plan-edit reconciliation (always run on resume): if
actual_total != state.total: updatestate.total = actual_totaland surface "ℹ️ Plan now has {N} tasks (was {M}); resuming with updated total."
- Plan-edit reconciliation (always run on resume): if
- If exists and
statusiscomplete: skip execution, proceed to Step 3. - If does not exist: create with initial state:
{ "plan_file": "<plan file path>", "total": <N>, "batch_size": <from compute_checkpoints.py>, "k": <from compute_checkpoints.py>, "hard_stops": [<from compute_checkpoints.py>], "completed": {}, "status": "in_progress" }
Writer discipline (single-writer model): The main agent is the SOLE writer of the checkpoint file (both
completedand segment metadata) ANDexecution-report.md. The task-executor agents write neither — they only make code changes and return the schema-validated{task_id, status, files_written, evidence}result. After each Workflow segment returns, the main agent recordscompleted+ the report from that results array (Step 2.4 Record). Rationale: per-task agents mutating the shared resume-critical JSON produced inconsistent shapes (a list instead of an{id:status}map, losing the failed status); the schema-validated return is the only reliable structured source, so all bookkeeping flows from it.- If exists and
Segment loop — repeat until no pending tasks remain. Pending = ids absent from
completed. Once an id is incompletedwith ANY status (done/failed/blocked) it is terminal for this run and is never re-dispatched — a failed task is surfaced for the fix pass, not retried in the same loop (that would just fail again). The loop exits when every id is incompleted. a. Build the next segment. Take the pending ids (absent fromcompleted), group them into the batches fromcompute_checkpoints.py's output, and slice up to and including the next hard-stop batch (or the last batch if no more hard-stops). The first batch in a segment is always a hard-stop (batch 0), so the first segment always ends at the first batch. Drop anycontract_failedids from the dispatch set (they are recorded as failed in Reconcile, not run). b. Invoke the Workflow tool with the segment payload:Workflow({ scriptPath: "${CLAUDE_PLUGIN_ROOT}/skills/execute-plan/execute-plan.workflow.js", args: { plan_file: <plan file path>, checkpoint_file: ".claude/execute-plan-checkpoint.json", project_root: <project root>, tasks: <the `tasks` array from compute_checkpoints.py — [{id, depends_on}]>, batches: <sliced segment batches>, failed_or_blocked: <ids already in `completed` as failed/blocked, PLUS contract_failed ids>, } })Await the Workflow call's completion and read its returned results array. (
failed_or_blockedseeds the workflow's in-run dependency tracker so a dependency that failed in an EARLIER segment, or pre-flight, still blocks its dependents here.) c. Spot-check claimed files. For each result, verify every path infiles_writtenexists on disk. (The dev-workflowverify-agent-output.pyhook surfaces mismatches at agent return time, but spot-check here is a defense-in-depth.) d. Recordcompletedfrom the return array (authoritative). For every result in the returned array, writecompleted[task_id] = {status, files_written, evidence}to the checkpoint. Also write anycontract_failedids as{status:"failed", evidence:"missing Task Contract"}. This is the single source ofcompleted— the agents wrote nothing, so the array is authoritative and the{id:status}shape is guaranteed. (If a resumed checkpoint'scompletedis ever a malformed shape from an older run, normalize it to the{id:{status}}map here.) e. Writeexecution-report.mdfrom the return array. If the report doesn't exist or lacks a section for this plan, create it with the header (## Execution Report/**Plan:** <path>/**Status:** in-progress/**Tasks:** …/### Task Results). Append one row per result (- Task {id}: {title} ✅/❌/⏭️ — {evidence}) and update this plan's header counters (completed/blocked/failed). All scoped to THIS plan's section (the report is a shared append-log). f. Hard-stop gate — if the segment ended on a hard-stop batch AND there are still pending tasks: present the segment summary + verification, then STOP and wait for the user to say "continue" before invoking the next segment. A hard-stop is NOT completion; do NOT emit the "execution complete" signal here. g. If no pending tasks remain (every id incompleted): exit the segment loop and proceed to Terminal-write.Terminal-write (the completion signal — fixes the run-phase handshake). When the loop exits with every id in
completed: a. Compute terminal status:completeif everycompleted[id].status == "done", elsecompleted_with_failures. b. Write the checkpointstatus: "<terminal>"FIRST. c. Flipexecution-report.md's**Status:** in-progress→**Status:** <terminal>— scoped to THIS plan's section (locate the**Plan:** <this plan path>heading and flip the Status line under it;execution-report.mdis a shared append-log with one Status line per plan, so never do a global replace). d. Cleanup: if terminal iscomplete, delete.claude/execute-plan-checkpoint.json. Ifcompleted_with_failures, KEEP it — it is the cross-session source for the fix pass and for re-running failed tasks. Delete only AFTER the terminal signal is written (b+c), never as the success-detection mechanism itself. e. The skill's explicit return is the primary completion signal (run-phase reads it in-context); the report**Status:**line is the durable cross-session backup. Resume (Step 2.3,status: complete→ skip) relies on the backup.Retrying a failed task across sessions: because pending = "absent from
completed", afailed/blockedtask is NOT re-run on resume. To retry one after fixing it, the fixer (run-phase Step 7 / standalone Step 3) must clear that id'scompletedentry (or delete the whole checkpoint) — otherwise the loop sees it terminal and exits without retrying.
Step 3: Process Results
When execution completes (all tasks have a completed entry):
- Read the report file at
docs/06-plans/execution-report.md- If the agent's return message contains
Report written to:, use that path - If the file does not exist: fall back to parsing the agent's return message
- If the agent's return message contains
- Present summary to the user: completed/blocked/failed counts
- If blocked or failed tasks exist: list them with reasons
Standalone mode (not within run-phase):
- If failures: fix in main context (opus)
- Suggest
dev-workflow:test-changesto run the full test suite - Suggest
dev-workflow:implementation-reviewerfor plan-vs-code audit - Suggest
dev-workflow:finish-branchfor branch integration
State Integration
When running within a phase orchestrated by run-phase:
If .claude/dev-workflow-state.json exists and phase_step is execute:
- After the Terminal-write step (every id in
completed, terminal status written), return an explicit completion signal to run-phase:Execution complete: <complete | completed_with_failures>with the report path.phase_stepis owned by the orchestrator (do NOT mutate it from this skill). run-phase reads this return in-context (it is the same main agent) — this is the primary signal; the report**Status:**line is the durable backup. Do NOT rely on the checkpointstatusfor the run-phase handshake — oncompletethe checkpoint is deleted. - A hard-stop pause is NOT completion: return
Paused at hard-stop: waiting for "continue"instead, and do NOT emit the completion signal. run-phase must not advance to Step 5 on a pause return.
Completion Criteria
- All plan tasks have a
completedentry in.claude/execute-plan-checkpoint.json - Execution report reviewed and presented to user
- State file cleaned up (deleted) if all
completedentries aredone; otherwise retained withstatus: "completed_with_failures" - When in run-phase context: handoff signal output (only after full completion, never at a hard-stop pause)
- When standalone: wrap-up suggestions presented