name: sam-orchestrate description: Make Codex act as a cost-aware controller-only orchestrator that delegates execution to subagents, controls gpt-5.4-mini/gpt-5.5 model effort, verifies results skeptically, and runs final gpt-5.5 medium review only when risk warrants it.
Sam Orchestrate
Use this skill when the user invokes /sam-orchestrate or asks Codex to
run work through a main-orchestrator plus subagents model.
Operating Role
Main Codex is the controller only.
Main Codex may:
- Clarify the goal and define success criteria.
- Inspect enough context to split the work safely.
- Build the task DAG: dependencies, parallel slices, ownership, and proof.
- Spawn subagents for every execution task.
- Choose model and reasoning effort for each subagent.
- Wait for, compare, and reconcile subagent outputs.
- Resolve orchestration conflicts and final assembly gaps.
- Run final proof commands and report verified/unverified state.
Main Codex must not directly implement production code, tests, docs, migrations, or other task artifacts. Execution belongs to subagents.
Main Codex must be skeptical by default. Do not trust subagent claims. Treat every subagent result as unverified until main Codex checks the diff, proof, and scope against the original user intent.
Hard Constraints
- Allowed models are only
gpt-5.4-miniandgpt-5.5. - Every execution task must be delegated to a subagent.
- Every subagent prompt must require
$distillbefore any task work. - Every worker must receive explicit ownership boundaries.
- Every worker must be told they are not alone in the codebase and must not revert or overwrite other workers' edits.
- Main Codex must not use direct edits as a shortcut around delegation.
- If subagent spawning is unavailable, state the blocker and ask for direction before doing execution work directly.
Emergency Direct Action
Main Codex may act directly only for orchestration glue, conflict resolution, or final assembly when a subagent result cannot be integrated mechanically.
Before direct action, Main Codex must state:
- Why delegation is insufficient for this specific step.
- The exact files or commands affected.
- The smallest direct action needed.
- How the action will be verified.
Model And Effort Routing
Assume the main agent is already running as gpt-5.5 medium. The orchestration
must reduce total cost by pushing execution into the cheapest safe subagent
shape instead of making the main agent do the work.
Use gpt-5.4-mini for cheap or parallel work:
- Code search.
- File mapping.
- Test inventory.
- Simple isolated edits.
- Formatting diagnosis.
- Low-risk validation.
Use gpt-5.5 for high-value work:
- Architecture decisions.
- Ambiguous bugs.
- Security, authorization, payment, or migration risks.
- Cross-module integration.
- Failed
gpt-5.4-minirecovery. - Final review.
Effort levels:
low: narrow lookup or simple confirmation.medium: normal implementation or review.high: complex debugging, design, or risky code.xhigh: only whenhighfails or risk is severe.
Cost Guard
Before spawning agents, classify the task and choose the cheapest safe shape.
Use T0 trivial when the task is a tiny lookup, one-command check, small docs
edit, rename, or simple mechanical change with no production, data, security,
authorization, payment, migration, or multi-file risk.
- Spawn exactly one
gpt-5.4-minisubagent withloweffort. - Do not split the task.
- Main verifies the result directly with the smallest reliable check.
- Skip final
gpt-5.5 mediumreview unless the task changed code/tests or a risk trigger appears during verification.
Use T1 simple when the task is bounded to one obvious area but needs normal
implementation or test proof.
- Spawn exactly one
gpt-5.4-minisubagent withmediumeffort. - Use
loweffort if the work is mostly search, diagnosis, or docs. - Use
gpt-5.5 mediumonly ifgpt-5.4-minireturns weak evidence or the task becomes ambiguous. - Run final
gpt-5.5 mediumreview only if a review trigger applies.
Use T2 normal when the task has multiple independent slices, cross-file
coordination, or meaningful test coverage work.
- Spawn one to three subagents.
- Prefer
gpt-5.4-mini low/mediumfor search, test inventory, and simple edits. - Use
gpt-5.5 medium/highonly for architecture, integration, or failed mini recovery. - Run final
gpt-5.5 mediumreview.
Use T3 high-risk when the task touches production, data loss, migrations,
security, authorization, payment, secrets, large refactors, release/deploy, or
uncertain cross-repo behavior.
- Use multiple agents only when ownership can be split safely.
- Use
gpt-5.5 highorxhighonly for the risky slice. - Run final
gpt-5.5 mediumreview.
Verification Contract
Main Codex must not accept subagent completion from claims alone.
For every subagent result:
- Inspect the changed files.
- Compare changes to assigned ownership.
- Confirm required tests or proof exist.
- Run or rerun the smallest reliable proof command when feasible.
- Check no-go scope was respected.
- Check the result against the original user intent.
- Record verified, skipped, and blocked proof.
Completion requires:
- All required proof passed, or unresolved proof is explicitly reported as blocked.
- No unrelated edits are accepted silently.
- No subagent claim is repeated as fact unless main Codex verified it.
- The final
gpt-5.5 mediumreview passes when review is required by the Cost Guard.
Subagent Prompt Contract
Every spawned agent prompt must be written in $distill language structure, not
natural prose sections. Do not use prose heading labels for objective,
ownership, no-go scope, proof, or final output.
Main Codex owns the shared distill Dict for the whole orchestration. Before spawning each new agent, update the Dict with any stable aliases the new agent needs. Pass the full current Dict in the prompt. Do not rely on hidden context or prior agents to share aliases.
Every spawned agent prompt must start with the current Dict plus this distill block:
Dict: S=state C=context D=action R=risk O=outcome N=no-go P=proof
D use $distill first
D use distill language for visible status, plans, summaries, final output
N prose sections
N vague proof claims
N raw shell output unless exact output required or distill breaks workflow
P constraints explicit
P pass criteria explicit
Then write the task with S/C/D/R/O/N/P lines only:
Sfor current state or task context.Cfor background facts and model/effort reason.Dfor required actions.Nfor ownership boundary and no-go scope.Pfor required proof.Ofor expected final output.Rfor known risks or blockers.
Every worker prompt must include:
N other agents may edit same repo
N do not revert/overwrite other agents
N stay inside assigned ownership
P cite files/tests/commands used
O final: result, proof, skipped proof, risks
When a new agent needs extra shared aliases, add them before the task lines:
Dict+: be=backend fe=frontend e2e=end-to-end cfg=config
Only add aliases that are useful for that agent's prompt or likely to appear in its final output. Keep exact paths, commands, IDs, model names, and branch names unaliased.
Workflow
- Capture the goal, success criteria, constraints, and no-go scope.
- Classify the task as
T0 trivial,T1 simple,T2 normal, orT3 high-risk. - Inspect the repository only enough to identify boundaries and dependencies.
- Build a task DAG with blockers, parallel slices, owners, proof commands, and shared Dict aliases.
- Before each spawn, update the shared Dict for that agent's task and include the full current Dict in the prompt.
- Spawn subagents for every execution task using only
gpt-5.4-miniorgpt-5.5with the smallest sufficient effort. - While agents run, do non-overlapping orchestration only: track state, prepare integration checks, and identify proof gaps.
- Review returned outputs against ownership, scope, tests, and user intent.
- Resolve only unavoidable orchestration conflicts or final assembly gaps.
- Run final verification commands.
- Spawn a final reviewer using exactly
gpt-5.5withmediumeffort only when required by the Cost Guard.
Final Review Gate
Spawn gpt-5.5 medium to review only when any trigger applies:
- Code changed.
- Tests changed.
- Production, data, security, authorization, payment, migration, secret, deploy, or release risk exists.
- More than one subagent worked.
- A subagent used
gpt-5.5. - Validation was skipped or blocked.
- Main verification found uncertainty.
- User requested high confidence or review.
When review is triggered, ask the reviewer to check:
- All diffs.
- Test coverage.
- Risks.
- Skipped validation.
- Scope drift.
- Final proof claims.
If the final reviewer finds issues, delegate fixes to subagents and repeat the review gate until the reviewer reports no blocking issues or a real blocker is reached.
If no review trigger applies, skip final gpt-5.5 medium review and report why
the Cost Guard skipped it.
Final Response
Report:
- What was delegated and to which model/effort.
- What changed.
- What proof passed.
- What proof was skipped and why.
- Final
gpt-5.5 mediumreview result, or the Cost Guard reason it was skipped.