name: ~aod-run description: Full lifecycle orchestrator that chains all 6 AOD stages (Discover, Define, Plan, Build, Deliver, Document) with disk-persisted state for session resilience and governance gates at every boundary. Use this skill when you need to run the full lifecycle, orchestrate stages, resume orchestration, or check orchestration status.
Full Lifecycle Orchestrator Skill
Purpose
Single-command lifecycle orchestrator that chains all 6 AOD stages autonomously, pausing at governance gates for Triad sign-offs and persisting state to disk for session resilience. After Deliver completes, Document runs automatically as Stage 6.
Entry Points:
- Raw idea:
"description"→ start at Discover - Issue number:
#NNNorNNN→ resume from issue's current stage - Resume:
--resume→ continue from last state checkpoint - Status:
--status/--status #NNN→ read-only display
State File: .aod/run-state.json (atomic write-then-rename via run-state.sh)
Navigation
| Section | Purpose | Location |
|---|---|---|
| Step 1: Route by Mode | Entry point routing | This file |
| Step 2: Core State Machine Loop | Central orchestration logic | This file |
| Plan Substage Tracking | Spec → project_plan → tasks | This file |
| Stage Skill Mapping | Stage-to-skill invocation table | This file |
| Post-Stage Context Extraction | Artifact discovery after each stage | This file |
| Stage Map Display | Visual progress indicators | This file |
| Transition Messages | Stage transition headers | This file |
| GitHub Integration | Label updates after stage completion | This file |
| Error Logging | Error/event capture in state file | This file |
| Governance Gate Detection | Reading approval/rejection from frontmatter | references/governance.md |
| Governance Tier | Light/Standard/Full gate rules | references/governance.md |
| Rejection / Retry / Circuit Breaker / Blocked | Governance result handling | references/governance.md |
| New Idea / Issue / Resume / Status Entry | Mode-specific handlers | references/entry-modes.md |
| Dry-Run Entry | --dry-run preview handler (read-only) |
references/dry-run.md |
| Corrupted State / Lifecycle Complete | Error and completion handlers | references/error-recovery.md |
Execution
When this skill is invoked, the command file passes a parsed mode and arguments:
Mode: {idea | issue | resume | status}
Issue: {number or "none"}
Idea: {text or "none"}
DryRun: {true or false}
Step 1: Route by Mode
Read the mode and DryRun flag from the invocation context. Route to the appropriate handler:
DryRun + Status check (first): If DryRun == true AND Mode == status, display "Note: --status is already read-only. --dry-run flag ignored." and route to Status Entry as normal.
DryRun check (second): If DryRun == true (and Mode is NOT status):
MANDATORY: You MUST use the Read tool to load references/dry-run.md before proceeding with dry-run handling. Do NOT rely on memory of prior dry-run content. If the file cannot be read, display an error and STOP.
Follow the Dry-Run Entry instructions from that file. The Dry-Run Entry handler will perform read-only detection and display a preview, then exit without entering the Core Loop.
Mode routing (if DryRun is false):
MANDATORY: You MUST use the Read tool to load references/entry-modes.md before proceeding with any entry mode handler. Do NOT rely on memory of prior entry mode content. If the file cannot be read, display an error and STOP.
| Mode | Handler | Description |
|---|---|---|
idea |
New Idea Entry (in entry-modes.md) | Create initial state, start at Discover |
issue |
Issue Entry (in entry-modes.md) | Read GitHub Issue, create/load state, resume |
resume |
Resume Entry (in entry-modes.md) | Load state file, validate, continue |
status |
Status Entry (in entry-modes.md) | Read-only display, then exit |
After the entry handler sets up state, all modes (except status) converge to the Consent Prompt, then the Core Loop.
Step 1b: Autonomous Mode Consent
Before entering the Core Loop, display the autonomous mode consent prompt and capture the user's choice. This is the only human interaction point in the entire autonomous run.
Display:
AOD ORCHESTRATOR — Autonomous Mode
====================================
This will run the full lifecycle autonomously:
Discover → Define → Plan → Build → Deliver → Document
Autonomous mode will:
- Auto-select defaults for all interactive prompts
- Auto-retry governance rejections (up to 3 attempts)
- Halt on circuit breaker or BLOCKED (requires manual fix + resume)
- Split across sessions if the feature is too large
All decisions are logged to run-state.json for post-run review.
Proceed? (Y/n)
Handle response:
- Yes (or empty/default): Set
autonomous_mode = true. All stage skill invocations will include--autonomousin args. Continue to Core Loop. - No: Set
autonomous_mode = false. Skills are invoked without--autonomous(interactive mode). Continue to Core Loop.
On resume (--resume): If the state file contains "autonomous_mode": true, skip the consent prompt and continue in autonomous mode. Do not re-ask.
Step 2: Core State Machine Loop
This is the central orchestration logic. It runs after any entry handler has established or loaded state.
Loop algorithm:
Read loop context: Use Bash to read loop context via
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_get_loop_context'— returns{stage}|{substage}|{stage_status}(e.g.,plan|spec|in_progress). Parse the pipe-delimited result. Do NOT useaod_state_readhere — the compound helper extracts only the 3 fields needed for routing.Check completion: If stage_status from step 1 indicates all stages may be complete, verify by checking all 6 stage statuses via
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_get_multi ".stages.discover.status" ".stages.define.status" ".stages.plan.status" ".stages.build.status" ".stages.deliver.status" ".stages.document.status"'. If all showcompleted, MANDATORY: You MUST use the Read tool to loadreferences/error-recovery.md, then follow the Lifecycle Complete instructions. Do NOT rely on memory of prior error-recovery content. If the file cannot be read, display an error and STOP. Note: If.stages.document.statusreturns null (legacy 5-stage state file), treat aspending.Determine next stage: Use the
current_stageandstage_statusfrom step 1. If status is"completed", advance to the next stage in sequence:discover→define→plan→build→deliver→documentHandle Plan substages: If
current_stageisplan, use thesubstagefrom step 1 and cycle throughspec→project_plan→tasks. Only advance past Plan when all 3 substages complete. When advancing between substages, apply context boundary (see Plan Substage Tracking step 3a) to clear previous substage content and retain only approval metadata.Write pre-stage checkpoint: Update state with
current_stagestatus ="in_progress"and current timestamp. Write atomically viabash -c 'source .aod/scripts/bash/run-state.sh && aod_state_write '"'"'<json>'"'"''.Display stage map: Show current progress (see Stage Map Display)
Display transition message: Show formatted header for the stage about to execute (see Transition Messages) 7a. Aggressive pre-Build boundary: If the stage about to execute is
build, apply an extra-aggressive context boundary before invocation:- Summarize all prior stages into a compact metadata block (~500 tokens max):
- Feature:
{feature_id}-{feature_name} - Branch:
{branch} - Spec:
{path}(APPROVED by PM) - Plan:
{path}(APPROVED by PM + Architect) - Tasks:
{path}(APPROVED by PM + Architect + Team-Lead) - Wave count:
{N}
- Feature:
- Clear ALL prior stage content from working context
- Build skill reads its own context (tasks.md, plan.md, assignments) fresh
- Display:
"[Pre-Build boundary] Prior stages summarized (~500 tokens). Build reads context fresh."
- Summarize all prior stages into a compact metadata block (~500 tokens max):
Invoke stage skill: Use the Skill tool to invoke the appropriate stage skill (see Stage Skill Mapping). Pass required context (idea text, issue number, artifact paths from prior stages).
Detect governance result: After the skill returns, first check the governance cache via
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_get_governance_cache "{artifact}" "{reviewer}"'. If the cache returns a verdict (not"null"), use the cached result. If the cache returns"null", MANDATORY: You MUST use the Read tool to loadreferences/governance.mdbefore proceeding with governance gate detection. Do NOT rely on memory of prior governance content. If the file cannot be read, display an error and STOP. Follow the Governance Gate Detection and Governance Tier instructions from that file. Apply tier-specific rules.Parallel Triad Reviews: When a governance gate requires multiple reviewers (e.g., plan.md requires PM + Architect, tasks.md requires PM + Architect + Team-Lead), execute reviews in parallel using multiple Agent tool calls in a single message:
- Parallel execution: Launch all required reviewers simultaneously via Agent tool calls in one message. Each reviewer runs in its own agent context, preventing cross-contamination.
- Cache all verdicts: After all reviewers return, cache each verdict via
aod_state_cache_governancebefore evaluating the aggregate result. - Aggregate evaluation: After all reviewers complete, evaluate the combined result. If any reviewer returned CHANGES_REQUESTED or BLOCKED, handle per step 10 (use the most severe status).
- Same checklists and criteria: Parallel execution uses the same reviewers, review prompts, and approval criteria as sequential — only the execution model changes.
Context note: Parallel reviews are safe with 1M context. Each reviewer runs in an isolated agent context, so there is no cross-reviewer drift. Re-grounding (see below) applies once after all reviewers return, not between each reviewer.
Handle result:
- APPROVED / APPROVED_WITH_CONCERNS: Mark stage completed in state, record artifacts, write checkpoint, continue loop
- CHANGES_REQUESTED: Follow the Rejection Handling instructions in
references/governance.md(re-read if not already loaded). This includes Retry Tracking and Max-Retry Circuit Breaker checks. - BLOCKED: Follow the Blocked Handling instructions in
references/governance.md(re-read if not already loaded). - No governance gate for this stage/tier: Mark completed, continue
Write post-stage checkpoint: Update state with completion status, artifacts, governance results, timestamp. Write atomically via
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_write '"'"'<json>'"'"''. 11a. Apply context boundary (all stage transitions): When advancing from one stage to the next:- Display:
"[Context boundary] Clearing {completed_stage} context" - Retain ONLY from the completed stage:
status(APPROVED / completed)artifact_paths(list of file paths)feature_id,github_issue,branch- governance verdict summary (one line per reviewer)
- The next stage skill will re-read any artifacts it needs via Read tool
- Skill file content from the completed stage is NOT carried forward
- Display:
Update GitHub Issue label: If
ghavailable, update stage label (see GitHub Integration)Loop: Return to step 1
Re-grounding policy (context-thrifty): Re-read reference files only when variable-length output has been injected into context since the last read. This prevents template drift without wasting context on redundant reads.
- After governance reviews: If reviewer feedback, rejection details, or override justifications produced significant output (>50 lines), re-read
references/governance.mdbefore continuing the loop. With parallel reviews, this applies once after all reviewers return, since each reviewer runs in an isolated agent context. - After rejection/blocked handling: Re-read
references/governance.mdbefore re-entering the loop, since user interaction and error display inject variable-length content. - After lifecycle complete display: Re-read
references/error-recovery.mdto ensure the completion template is followed exactly. - Skip re-grounding when: the previous step produced minimal output (stage map display, short status messages, cache hits). Unnecessary re-reads waste ~4-7K tokens each.
Stage sequence: discover → define → plan (spec → project_plan → tasks) → build → deliver → document
Exit conditions:
- All stages completed → display lifecycle summary (6/6)
- BLOCKED with no resolution → save state, exit
- User chooses to pause → save state, exit
- Session ends → user resumes with
--resumein new session
Plan Substage Tracking
The Plan stage contains 3 substages executed in strict sequence. The orchestrator tracks each substage independently and only advances past Plan when all 3 are complete.
Substage sequence: spec → project_plan → tasks
Algorithm (detailed expansion of Core Loop step 4):
When entering Plan stage: If
current_substageis null, set it tospec(first substage). Update state:stages.plan.status = "in_progress",stages.plan.started_at = {now},current_substage = "spec".Determine active substage: Read
current_substagefrom state. Check the substage's status instages.plan.substages.{substage}.status:- If
"completed": Advance to next substage in sequence - If
"pending"or"in_progress": This is the active substage to execute
- If
Substage advancement logic:
speccompleted → setcurrent_substage = "project_plan", setstages.plan.substages.project_plan.status = "in_progress". Apply context boundary (see step 3a).project_plancompleted → setcurrent_substage = "tasks", setstages.plan.substages.tasks.status = "in_progress". Apply context boundary (see step 3a).taskscompleted → setcurrent_substage = null, mark overall Plan stage as"completed", setstages.plan.completed_at = {now}
3a. Context boundary at substage transitions (FR-013, FR-014, FR-015, FR-016) — retained for context thrift; prevents accumulation even with 1M window:
When advancing from one substage to the next (spec → project_plan, or project_plan → tasks), apply a context boundary to prevent context accumulation:
Step 1 - Display boundary message:
[Context boundary] Clearing {previous_substage} context
Where {previous_substage} is spec or project_plan.
Step 2 - Extract and retain only approval metadata: From the completed substage, retain only:
status: The approval status (e.g., "APPROVED")artifact_path: Path to the artifact (e.g.,specs/{NNN}-*/spec.md)feature_id: The 3-digit feature ID (e.g., "047")
These values are already persisted in run-state.json under stages.plan.substages.{substage} and do not require re-extraction.
Step 3 - Clear previous substage full content: The previous substage's artifact content (spec.md or plan.md full text) is NOT carried forward. The next substage skill will read its own required context fresh.
Step 4 - On-demand re-read available:
If the next substage explicitly needs details from a prior artifact, it can re-read the artifact using the path stored in metadata. This is NOT automatic — the substage skill must explicitly invoke the Read tool with the artifact path from stages.plan.substages.{prior_substage}.artifacts.
Example boundary output:
[Context boundary] Clearing spec context
Retained metadata: {status: "APPROVED", artifact_path: "specs/047-*/spec.md", feature_id: "047"}
Write substage checkpoint: After each substage transition, write state atomically. This ensures that if the session dies between substages, the orchestrator can resume at the correct substage.
Skill invocation per substage:
specsubstage → invokeaod.specskillproject_plansubstage → invokeaod.project-planskilltaskssubstage → invokeaod.tasksskill
Governance per substage: Each substage has its own governance gate. First check the governance cache via
aod_state_get_governance_cache(see Core Loop step 9). If the cache returns"null", fall back to reading the substage's artifact frontmatter (loadreferences/governance.mdfor the detection algorithm):spec: Checkspecs/{NNN}-*/spec.mdfor PM sign-offproject_plan: Checkspecs/{NNN}-*/plan.mdfor PM + Architect sign-offtasks: Checkspecs/{NNN}-*/tasks.mdfor PM + Architect + Team-Lead sign-off
Display: When displaying the stage map during Plan, show the active substage:
[>] Plan (spec) — spec substage in progress [>] Plan (plan) — project_plan substage in progress [>] Plan (tasks) — tasks substage in progress
Stage Skill Mapping
Each lifecycle stage maps to an existing AOD skill invoked via the Skill tool. The orchestrator delegates all stage work — it never re-implements stage logic.
| Stage | Substage | Skill to Invoke | Skill Tool Name | Arguments to Pass |
|---|---|---|---|---|
| Discover | — | Discovery flow | aod.discover |
--autonomous "{idea_text}" (if autonomous_mode) or idea text only |
| Define | — | PRD creation | aod.define |
--autonomous "{feature_title}" (if autonomous_mode) or feature title only |
| Plan | spec | Specification | aod.spec |
--autonomous (if autonomous_mode) or no args |
| Plan | project_plan | Architecture plan | aod.project-plan |
--autonomous (if autonomous_mode) or no args |
| Plan | tasks | Task breakdown | aod.tasks |
--autonomous (if autonomous_mode) or no args |
| Build | — | Implementation | aod.build |
--orchestrated --autonomous (if autonomous_mode) or --orchestrated only |
| Deliver | — | Delivery retrospective | aod.deliver |
--autonomous "FEATURE: {NNN} - {name}" {deliver_flags...} (if autonomous_mode) or "FEATURE: {NNN} - {name}" {deliver_flags...} (interactive). {deliver_flags...} expands the per-feature flags forwarded from /aod.orchestrate via the Task description (e.g. --no-tests="<reason>"). When /aod.run is invoked standalone (not via /aod.orchestrate), {deliver_flags...} is empty. See aod-orchestrate SKILL.md Step 4.7 and Step 7.1.1 step 2.5 for the upstream contract. |
| Document | — | Documentation review | aod.document |
--autonomous (if autonomous_mode) or no args |
Invocation pattern: Use the Skill tool with skill: "{skill_name}" and pass arguments as args: "{arguments}".
Context passing between stages (FR-012):
- After Discover completes: extract GitHub Issue number from discovery output; store in state as
github_issue - After Define completes: PRD path is at
docs/product/02_PRD/{NNN}-*.md; store in state artifacts - After Plan:spec completes: spec path is at
specs/{NNN}-*/spec.md; store in state artifacts - After Plan:project_plan completes: plan path is at
specs/{NNN}-*/plan.md; store in state artifacts - After Plan:tasks completes: tasks path is at
specs/{NNN}-*/tasks.md; store in state artifacts - After Build completes: implementation files tracked via tasks.md
[X]markers - After Deliver completes: delivery summary and metrics captured
Argument formatting per stage:
When autonomous_mode == true, prepend --autonomous to all skill args:
- Discover:
args: "--autonomous {idea_text}"— pass flag + raw idea description - Define:
args: "--autonomous {feature_title}"— pass flag + feature title/topic - Plan stages:
args: "--autonomous"— flag only; skills read context from branch - Build:
args: "--orchestrated --autonomous"— both flags enable orchestrated + autonomous modes - Deliver:
args: "--autonomous FEATURE: {NNN} - {feature_name} {deliver_flags...}"— flag + feature info + optional per-feature deliver flags forwarded from/aod.orchestrate(e.g.--no-tests="<reason>"). When/aod.runis invoked standalone (not via orchestrator),{deliver_flags...}is empty. Seeaod-orchestrateSKILL.md Step 4.7 + 7.1.1 step 2.5 for the upstream contract. - Document:
args: "--autonomous"— flag only; skill reads context from branch
When autonomous_mode == false (interactive), omit --autonomous:
- Discover:
args: "{idea_text}" - Define:
args: "{feature_title}" - Plan stages: no args
- Build:
args: "--orchestrated" - Deliver:
args: "FEATURE: {NNN} - {feature_name} {deliver_flags...}"—{deliver_flags...}expansion is empty unless forwarded from/aod.orchestrate(see Stage Skill Mapping table note) - Document: no args
Post-Stage Context Extraction
After each stage skill returns, the orchestrator extracts context from the produced artifacts and updates the state file. This ensures subsequent stages receive the correct inputs.
After Discover completes:
- The Discover skill creates a GitHub Issue and outputs the issue number. Read the orchestration output to find the issue number.
- Use Bash to scan for new GitHub Issues:
gh issue list --label "stage:discover" --json number,title --limit 5(ifghavailable) - Update state fields:
github_issue: Set to the issue numberfeature_id: Zero-pad the issue number to 3 digits (e.g.,22→"022")branch: Set to{feature_id}-{feature_name}(e.g.,"022-add-dark-mode-toggle")
- Create the feature branch if not already on it:
git checkout -b {branch}(or confirm current branch matches) - Record artifacts: Add the GitHub Issue URL to
stages.discover.artifacts - Write updated state atomically
After Define completes:
- Use Glob to find the PRD:
docs/product/02_PRD/{NNN}-*.mdwhere NNN isfeature_id - Record artifacts: Add the PRD path to
stages.define.artifacts - Write updated state atomically
After Plan:spec completes:
- Use Glob to find the spec:
specs/{NNN}-*/spec.md - Record artifacts: Add the spec path to
stages.plan.substages.spec.artifacts - Write updated state atomically
After Plan:project_plan completes:
- Use Glob to find the plan:
specs/{NNN}-*/plan.md - Record artifacts: Add the plan path to
stages.plan.substages.project_plan.artifacts - Write updated state atomically
After Plan:tasks completes:
- Use Glob to find tasks and assignments:
specs/{NNN}-*/tasks.md,specs/{NNN}-*/agent-assignments.md - Record artifacts: Add paths to
stages.plan.substages.tasks.artifacts - Mark overall Plan stage as completed
- Write updated state atomically
After Plan:tasks completes (continued) — Size Estimation Display:
After recording Plan:tasks artifacts and before the Core Loop advances to Build, perform a size estimation:
Read
agent-assignments.mdfromspecs/{NNN}-*/agent-assignments.mdCount the number of waves (sections labeled "Wave 1", "Wave 2", etc.)
Apply the heuristic:
wave_count <= 3→session_strategy = "one-shot",estimated_sessions = 1wave_count <= 6→session_strategy = "cautious",estimated_sessions = 2wave_count > 6→session_strategy = "multi-session",estimated_sessions = ceil(wave_count / 3)
Update state with the estimation:
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_write '"'"'{"session_strategy":"{strategy}","estimated_sessions":{N},"build_progress":{"total_waves":{wave_count},"completed_waves":0,"session_breaks":[]}}'"'"''Display the build estimate:
--- BUILD ESTIMATE --- Waves: {wave_count} Estimated sessions: {estimated_sessions} {If wave_count <= 3: "Expected to complete in this session."} {If wave_count > 3: "May require ~{estimated_sessions} sessions. The orchestrator will auto-break and resume if needed."}
After Build completes:
- Read tasks.md from
specs/{NNN}-*/tasks.md - Count total tasks (lines matching
- [ ]or- [X]) - Count completed tasks (lines matching
- [X]) - Determine the last completed wave by reading
agent-assignments.mdand cross-referencing completed tasks - If completed < total (session break protocol):
- Parse the
AOD_BUILD_RESULTcomment from build output - Log in error_log with type
"build_incomplete"and message including wave progress - Update
build_progressin state:bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_write '"'"'{"build_progress":{"total_waves":{total},"completed_waves":{N},"session_breaks":[{"session":{session_count},"waves_completed":"{range}","timestamp":"{now}"}]}}'"'"'' - Log the session break decision to
autonomous_decisions:bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_append ".autonomous_decisions" '"'"'{"decision":"session_break","reason":"Build incomplete: {completed}/{total} tasks after wave {N}","timestamp":"{now}"}'"'"'' - Display the session break message:
AOD ORCHESTRATOR — Session Break ================================== Feature: {feature_name} (#{github_issue}) Build: Wave {N}/{total_waves} complete ({completed_tasks}/{total_tasks} tasks) The remaining waves require a new conversation. To continue: /aod.run --resume The orchestrator will pick up from Wave {N+1} automatically. Resume prompt (copy-paste): claude "Resume aod.run for #{github_issue} — {feature_name}. Run /aod.run --resume" - STOP — Do NOT continue to Deliver. Exit the Core Loop.
- Parse the
- If completed == total: Mark Build as completed, continue to Deliver
- Record artifacts: Add
"tasks.md (all tasks completed)"tostages.build.artifacts - Write updated state atomically
After Deliver completes:
The deliver stage produces a delivery summary
Check for halt record (FR-024, FR-025) — before marking Deliver as completed, inspect
.aod/state/deliver-{NNN}.halt.json(whereNNNis the zero-paddedfeature_idfrom state). The/aod.deliverskill writes this file when it halts for review per the three-channel halt protocol. Schema and exit-code taxonomy are documented inspecs/139-delivery-verified-not-documented/contracts/halt-record.md.Exit-code taxonomy (from halt-record contract §Channel 3):
Code Meaning /aod.runpolicy0 Success Proceed to Document stage 10 Halted for review (E2E fail, AC-coverage fail, or abandoned heal) Halt lifecycle; emit halt record to operator; do NOT advance to Document 11 Lockfile conflict (concurrent /aod.deliverlive)Halt lifecycle; log holding PID from lockfile; operator resolves 12 Abandoned heal sentinel (crash-recovery) Halt lifecycle; emit manual-cleanup prompt; do NOT auto-retry 1-9 Pre-existing delivery errors Handle per existing stage_error logic (log, surface to operator) Inspection algorithm:
Derive the halt-record path:
halt_record_path = ".aod/state/deliver-{NNN}.halt.json"Check existence via Bash:
test -f "$halt_record_path" && echo EXISTSIf the file does NOT exist: assume Deliver succeeded; proceed to step 3 below (record artifacts, continue to Document).
If the file exists, parse it via
jq:jq -r '[.reason, .recovery_status, (.heal_pr_url // "null"), (.heal_pr_number // "null"), (.failing_scenarios | tostring), .timestamp] | @tsv' "$halt_record_path"Extract fields:
reason,recovery_status,heal_pr_url,heal_pr_number,failing_scenarios,timestamp.Emit human-readable halt summary to the operator (surfaced in stdout via the skill's output):
===================================================== LIFECYCLE HALT — Deliver stage halted for review ===================================================== Feature: {feature_id} — {feature_name} Reason: {reason} Recovery Status: {recovery_status} Heal-PR: {heal_pr_url} (#{heal_pr_number}) Failing Scenarios: - {scenario_1} - {scenario_2} ... Halted At: {timestamp} ===================================================== Next steps: - Review the heal-PR for failure context and attempted fixes - Fix the underlying failure and merge the heal-PR (requires human approval — no auto-merge per FR-023) - Re-run /aod.run --resume to retry Deliver after the fix is mergedMark Deliver as halted in state:
bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_write '"'"'{"stages":{"deliver":{"status":"failed","halt_record":{"reason":"{reason}","recovery_status":"{recovery_status}","heal_pr_url":"{heal_pr_url}","heal_pr_number":{heal_pr_number},"failing_scenarios":{failing_scenarios_json},"timestamp":"{timestamp}"}}}}'"'"''Append to error_log (per Error Logging contract) with
type: "stage_error"and a summary message.Autonomous mode policy: If
autonomous_mode == true, the halt is non-recoverable within the current run — the lifecycle stops here. Do NOT advance to Document. The operator resolves via manual heal-PR review, then re-runs/aod.run --resumeto re-invoke the Deliver stage. Emit an additional line:Autonomous mode: lifecycle halted. Manual intervention required before --resume.Interactive mode policy: If
autonomous_mode == false, display the halt summary and prompt:Deliver stage halted for review. Options: [Pause / Abort]- Pause: Save state, exit cleanly. Operator resumes after fix via
/aod.run --resume. - Abort: Save state with
abortedstatus, exit.
- Pause: Save state, exit cleanly. Operator resumes after fix via
Exit the Core Loop — do NOT advance to Document. The halt is terminal for the current session.
Record artifacts: Add
"delivery complete"tostages.deliver.artifacts(only when no halt record is present)Write updated state atomically
After Document completes:
- The document stage produces documentation review artifacts (CHANGELOG updates, docstrings, API docs, code simplification commits)
- Record artifacts: Add PR URL and commit SHAs from the document branch merge to
stages.document.artifacts - Write updated state atomically
- MANDATORY: You MUST use the Read tool to load
references/error-recovery.md, then follow the Lifecycle Complete instructions. If the file cannot be read, display an error and STOP.
Stage Map Display
Display the stage map after each stage transition to show progress. This is referenced by Core Loop step 6.
Algorithm:
- Read state from
.aod/run-state.json - For each stage in sequence (
discover,define,plan,build,deliver,document), determine its display marker:status == "completed"→[x]status == "in_progress"→[>]status == "pending"→[ ]status == "failed"→[!]
- For the Plan stage, append the active substage in parentheses if in progress:
- If
current_substage == "spec"→Plan (spec) - If
current_substage == "project_plan"→Plan (plan) - If
current_substage == "tasks"→Plan (tasks) - If Plan is completed →
Plan
- If
- Display the formatted stage map:
Stage Map:
{marker} Discover {marker} Define {marker} Plan{substage} {marker} Build {marker} Deliver {marker} Document
Examples:
Starting a new lifecycle:
Stage Map:
[>] Discover [ ] Define [ ] Plan [ ] Build [ ] Deliver [ ] Document
After Discover and Define complete, Plan:spec in progress:
Stage Map:
[x] Discover [x] Define [>] Plan (spec) [ ] Build [ ] Deliver [ ] Document
Mid-lifecycle with Build in progress:
Stage Map:
[x] Discover [x] Define [x] Plan [>] Build [ ] Deliver [ ] Document
All stages complete:
Stage Map:
[x] Discover [x] Define [x] Plan [x] Build [x] Deliver [x] Document
Transition Messages
Display a formatted transition header before each stage begins executing. This is referenced by Core Loop step 7.
Algorithm (called by Core Loop step 7, before each stage skill invocation):
Read current state: Get
current_stageandcurrent_substagefrom state.Map stage to number and detail: Use the lookup table below:
current_stagecurrent_substageN STAGE_NAME Substage Detail discovernull 1 DISCOVER — definenull 2 DEFINE — planspec3 PLAN sub-stage 1/3: Feature Specification planproject_plan3 PLAN sub-stage 2/3: Architecture Plan plantasks3 PLAN sub-stage 3/3: Task Breakdown buildnull 4 BUILD — delivernull 5 DELIVER — documentnull 6 DOCUMENT — Format and display:
For non-Plan stages:
--- STAGE {N}: {STAGE_NAME} ---For Plan substages:
--- STAGE 3: PLAN ({substage detail}) ---
Examples:
--- STAGE 1: DISCOVER ---
--- STAGE 3: PLAN (sub-stage 1/3: Feature Specification) ---
--- STAGE 4: BUILD ---
GitHub Integration
After each stage completes successfully (governance gate passed), update the GitHub Issue's stage:* label to reflect the new current stage. This keeps the GitHub Issue board in sync with the orchestration state (FR-023).
Algorithm (called by Core Loop step 12, after post-stage checkpoint):
Check prerequisites:
- Read
github_issuefrom state. If null (no GitHub Issue for this feature), skip entirely. - Check if
ghCLI is available:command -v gh >/dev/null 2>&1. If not, skip silently. - Check if
ghis authenticated:gh auth status >/dev/null 2>&1. If not, skip silently.
- Read
Determine the new stage label: Map the newly-completed stage to the next stage in the sequence:
Completed Stage Completed Substage New Label discover — stage:definedefine — stage:planplan spec stage:plan(still in Plan)plan project_plan stage:plan(still in Plan)plan tasks stage:buildbuild — stage:deliverdeliver — stage:documentdocument — stage:doneNote: Plan substage completions (spec, project_plan) do not change the label — the issue stays at
stage:planuntil all 3 substages complete.Update the label: Use the
github-lifecycle.shfunction:bash -c 'source .aod/scripts/bash/github-lifecycle.sh && aod_gh_update_stage {github_issue} {new_stage}'Handle failures gracefully: If the label update fails, log a warning but do NOT halt orchestration.
- Display:
"Note: GitHub label update skipped ({reason}). Orchestration continues."
- Display:
Backlog refresh: After updating the label:
bash .aod/scripts/bash/backlog-regenerate.sh 2>/dev/null || trueThis is fire-and-forget — failure does not affect orchestration.
Error Logging
Capture stage errors and significant events in the state file's error_log array for debugging and auditability. Error entries follow Entity 4 schema.
Algorithm (called whenever an error or significant event occurs during orchestration):
- Build error entry:
{
"timestamp": "{current ISO 8601 timestamp}",
"stage": "{current_stage}",
"type": "{error_type}",
"message": "{descriptive error message}",
"recoverable": true
}
Error types (standardized values for
typefield):Type When Used Recoverable stage_errorA stage skill invocation fails or returns an error true governance_rejectionA governance gate returns CHANGES_REQUESTED true governance_blockedA governance gate returns BLOCKED true circuit_breakerMax retries (3) reached on a governance gate true user_abortUser chose to abort orchestration true artifact_missingArtifact recorded in state not found on disk true state_corruptionState file failed validation true github_errorGitHub CLI operation failed true skill_invocation_errorSkill tool invocation returned unexpected result true Append to state: Use the
aod_state_appendfunction:bash -c 'source .aod/scripts/bash/run-state.sh && aod_state_append ".error_log" '"'"'{"timestamp":"...","stage":"...","type":"...","message":"...","recoverable":true}'"'"''When to log errors:
- Stage skill failure: When a Skill tool invocation produces an error or unexpected output
- Governance gate rejection: Also tracked in
gate_rejections, log summary inerror_log - Circuit breaker activation: When max retries are reached
- User abort: When the user chooses to abort
- Artifact inconsistency: When resume validation detects missing artifacts
- State corruption: When the state file fails validation
- GitHub errors: When
ghCLI operations fail
Error entries are append-only: Never remove or modify existing error log entries. The log provides a chronological audit trail.
Type When Used Recoverable build_incompletePost-build verification finds incomplete tasks true
Adaptive Session Management
The orchestrator adapts its session strategy based on feature size. Small features complete in one session; large features automatically split across sessions when context runs out.
Size Estimation Heuristic (executed after Plan:tasks completes, before Build):
wave_count = count of waves in agent-assignments.md
if wave_count <= 3:
session_strategy = "one-shot"
estimated_sessions = 1
elif wave_count <= 6:
session_strategy = "cautious"
estimated_sessions = 2
else:
session_strategy = "multi-session"
estimated_sessions = ceil(wave_count / 3)
Session breaks are reactive, not predictive: Token heuristics don't work — Claude Code doesn't expose token usage to skills. Instead, aod.build runs waves until either all complete (AOD_BUILD_RESULT:COMPLETE) or context degrades (AOD_BUILD_RESULT:PARTIAL). Post-build verification in the orchestrator then triggers a session break if tasks remain incomplete.
State fields for session management:
| Field | Type | Description |
|---|---|---|
session_strategy |
string | "one-shot", "cautious", or "multi-session" — set after Plan:tasks |
estimated_sessions |
number | Estimated session count from heuristic |
build_progress |
object | {total_waves, completed_waves, session_breaks[]} — updated after each build run |
autonomous_decisions |
array | Log of automated decisions (session breaks, auto-retries) for post-run review |
Resume-after-break flow: When aod.run --resume loads a state where Build is in_progress, it re-invokes aod.build --orchestrated --autonomous. Build's Step 1.6 detects completed waves via [X] markers and continues from the next incomplete wave. This repeats (recursive session breaks) until all waves complete, then the orchestrator advances to Deliver.