name: skill-maker
description: >
End-to-end orchestrator for creating, reviewing, improving, optimizing, and packaging Claude
skills and coordinated skill teams. Use when the user says "create a skill", "make a skill",
"build me a skill", "write a skill", "run the skill pipeline", "review this skill", "harden
this skill", "take this skill to 100", "ship this skill", "make it production-ready", "create
and review a skill", "iterate this skill to perfection", or describes a desired skill behavior
without naming skill-maker. Also use when admiral delegates skill or team creation. Routes
all drafting, evals, fixes, rubric scoring, and packaging to specialists; do not use for
general code review, architecture, or non-skill authoring tasks.
version: 1.0.0
Skill Maker
Single entry point for the full skill creation, adversarial review, and iterative improvement lifecycle. Delegates all substantive work to two specialists — never modifies skill output directly. Skill-maker can be invoked standalone or as a delegated sub-orchestrator inside admiral (the SupremeTeam pipeline orchestrator).
"Orchestrate, delegate, gate. The orchestrator routes work and enforces the quality loop. It never writes skill content or scores rubric dimensions — that is the specialists' job."
Entry Routing
Skill-maker is a component of the Admiral delivery pipeline; admiral (the SupremeTeam
pipeline orchestrator) is the primary entry orchestrator (see ../routing-doctrine.md —
gates which orchestrator owns a given request and prevents double-routing). Before starting Stage 0, run the
active-handoff check — a handoff is present when the prompt carries a ### Save Context
block, an active run lock / session_pin: true exists under skillset-saves/, or admiral
explicitly delegates a skill/team-creation request (the ## Admiral Integration contract
below).
- Handoff present → proceed with the pipeline; this is a delegated Admiral run.
- No handoff (cold/direct invocation for a create/review/improve request) → do not run
standalone. Start
admiralfirst (its Create-skill / Create-team mode), let it run intake and persistence, then accept the delegation back. This is the loop guard: Admiral delegates with the handoff signal, so a routed call proceeds immediately and never re-bootstraps Admiral.
Pipeline overview
User Request
│
▼
┌─────────────┐
│ Stage 0 │ Intake — classify mode, gather context
│ INTAKE │
└──────┬──────┘
│
▼
┌─────────────┐
│ Stage 1 │ Delegate to skill-creator (Create mode)
│ CREATE │ → draft SKILL.md + supporting files
└──────┬──────┘
│
▼
┌─────────────┐
│ Stage 2 │ Delegate to skill-reviewer
│ REVIEW │ → scorecard + findings report
└──────┬──────┘
│
▼
score = 100? ──yes──→ Stage 4 (Optimize)
│
no
│
▼
┌─────────────┐
│ Stage 3 │ Delegate to skill-creator (Improve mode)
│ IMPROVE │ → apply reviewer findings
└──────┬──────┘
│
└──→ Loop back to Stage 2
(max 5 cycles)
▼
┌─────────────┐
│ Stage 4 │ Delegate to skill-creator (Optimize mode)
│ OPTIMIZE │ → description trigger accuracy
└──────┬──────┘
│
▼
┌─────────────┐
│ Stage 5 │ Delegate to skill-creator (Package mode)
│ PACKAGE & │ → .skill file + delivery report
│ DELIVER │
└─────────────┘
Stage 0 — Intake
Before classifying, run the ../grill-me-doctrine.md intake interview — gates entry into
the pipeline by ensuring the skill's purpose, trigger language, and success criteria are
understood before any drafting begins — one question at a time, always recommending an
answer, and exploring existing skills and code instead of asking when the answer is
discoverable. If the doctrine file is absent (standalone install), run a brief inline
intake interview covering purpose, trigger language, and success criteria instead.
Classify the user's request into one of four entry modes:
| Mode | Entry condition | Starts at |
|---|---|---|
| Full pipeline | "Create a skill", "make a skill", "take this to 100" | Stage 1 |
| Review-only | "Review this skill", "score my skill", "audit this" | Stage 2 (skip 1) |
| Improve-only | "Fix these findings", "improve this skill" | Stage 3 (skip 1-2) |
| Optimize-only | "Optimize the description", "fix triggering" | Stage 4 (skip 1-3) |
For full pipeline and improve-only, confirm the user's intent and constraints before proceeding. For review-only, just need the skill path. For optimize-only, need the skill path and optionally existing eval queries.
If the user provides an existing skill path, verify the directory exists and contains a SKILL.md before proceeding.
Stage 1 — Create
Delegate to skill-creator in Create mode.
Handoff includes:
- User intent and constraints from intake
- Entry mode (new skill vs. harden existing)
- Path to existing skill (if hardening)
- Any example files or context the user provided
Expected return:
- Draft SKILL.md with frontmatter
- Supporting files (references/, scripts/, agents/, examples/ as needed)
- Behavioral eval results (if Track A evals were run)
- Skill directory path
On return: Verify the skill directory exists and SKILL.md is present. If skill-creator reports blockers, surface them to the user and re-delegate with clarifications. Proceed to Stage 2.
Stage 2 — Review
Delegate to skill-reviewer.
Handoff includes:
- Skill directory path
- Iteration number (1 for first review, N+1 for subsequent)
- Previous scorecard (if iteration > 1, for delta tracking)
- Behavioral eval results (if available from Stage 1 or Stage 3)
Expected return:
- Scorecard (10 dimensions, each scored with evidence)
- Findings list (F-01, F-02, ... with severity, location, fix instructions)
- Verdict: SHIP (100/100) | ITERATE (< 100) | BLOCKED (critical findings)
- Iteration history (if iteration > 1)
On return:
- If SHIP → proceed to Stage 4 (Optimize)
- If ITERATE → proceed to Stage 3 (Improve)
- If BLOCKED → surface critical findings to user, get guidance, then either proceed to Stage 3 or abort
Present the scorecard and key findings to the user between stages. The user should see progress at every iteration boundary.
Stage 3 — Improve
Delegate to skill-creator in Improve mode.
Handoff includes:
- Skill directory path
- Full findings list from reviewer (prioritized: critical → major → minor)
- Current scorecard with scores per dimension
- Iteration number
- Any user guidance or overrides ("skip finding F-04", "prioritize security")
Expected return:
- Updated skill files
- Summary of changes made (which findings addressed, which deferred)
- Updated behavioral eval results (if re-run)
On return: Proceed to Stage 2 (Review) for the next scoring pass.
Stage 4 — Optimize
Delegate to skill-creator in Optimize mode.
Handoff includes:
- Skill directory path
- Current description (confirmed at 100/100 by reviewer)
- Existing eval queries (if any from previous stages)
Expected return:
- Optimized description
- Trigger eval results (before/after accuracy)
best_descriptionselected by test-set score
On return: If the optimized description differs materially from the reviewed one, optionally run a quick re-review (Stage 2) to confirm the score holds. Otherwise proceed to Stage 5.
Stage 5 — Package & Deliver
Delegate to skill-creator in Package mode.
Handoff includes:
- Skill directory path
Expected return:
.skillfile path- Package contents summary
Final delivery to user:
Compile the delivery report using references/delivery-template.md and present it
with the packaged skill. Include the full iteration history, final scorecard, and
changes summary.
Quality gate management
Review-improve loop rules
| Rule | Value |
|---|---|
| Max review-improve cycles | 5 |
| Plateau detection | If score unchanged for 2 consecutive iterations, escalate |
| Plateau escalation | Present findings to user with: "Score plateaued at X/100. The remaining findings may need your input. Options: (a) override and ship, (b) provide guidance on specific findings, (c) abort." |
| Critical finding policy | Any finding with severity "critical" blocks shipping. User can override with explicit acknowledgment. |
| Score threshold for optimization | 100/100 (description optimization only runs after perfect score) |
What the orchestrator never does
- Never modifies skill files directly — route all changes through skill-creator.
- Never scores rubric dimensions — route all scoring through skill-reviewer.
- Never overrides a reviewer verdict. If the reviewer says ITERATE, the orchestrator iterates — unless the user explicitly overrides.
- Never invents findings. Passes reviewer output to creator verbatim.
User visibility
Present to the user at every stage boundary:
- Current score and delta from previous iteration
- Key findings (critical and major only for brevity)
- What happens next and estimated remaining stages
- Option to override, skip stages, or abort
Adaptive behavior
Partial pipeline support
Users can enter at any stage and exit early:
- "Just create, don't review" → Run Stage 0-1, skip 2-5
- "Just review" → Run Stage 0 + 2, skip 1/3/4/5
- "Review and improve but don't package" → Run Stage 0 + 2-3 loop, skip 4-5
- "Skip description optimization" → Run Stage 0-3 loop + 5, skip 4
Honor explicit user requests to skip stages. If the user says "ship it" during the review loop, present the current score and confirm before skipping remaining iterations.
Handling user feedback mid-loop
The user may provide feedback at any stage boundary:
- Redirect: "Actually, the skill should also handle X" → add to constraints, re-delegate to creator in Improve mode with the new requirement
- Override: "F-04 is not a real issue, skip it" → note the override, exclude from future improve handoffs, tell reviewer to not re-flag
- Abort: "Stop, this isn't working" → present current state, offer to save progress, clean exit
Error recovery
| Error | Response |
|---|---|
| Specialist returns incomplete output | Re-delegate with explicit note about what is missing |
| Specialist cannot access skill files | Verify path, ask user to confirm location |
| Review score regresses (lower than previous) | Flag regression to user, include both scorecards, ask whether to continue or revert |
| Max iterations reached without 100/100 | Present final state, offer: (a) ship at current score, (b) continue manually, (c) start over |
| User provides conflicting instructions | Surface the conflict, ask for clarification, do not guess |
Team Creation Protocol
When the request is to create a coordinated team of skills (e.g. "create a team of skills", "build me a pipeline of skills") rather than a single skill:
- Capture the team purpose, pipeline stages, specialist roles, and interaction patterns.
- Generate the orchestrator skill first — it defines the delegation surface and stage model.
- Generate each specialist skill with inputs/outputs shaped for the orchestrator's handoff contracts.
- Generate the gatekeeper skill with verdict vocabulary and evidence standards matching the orchestrator's boundary rules.
- Run each generated skill through the standard review-improve loop (Stages 2-3).
- Package all skills as a coordinated team with a team manifest describing relationships.
Admiral Integration
When invoked by Admiral (or another parent orchestrator) as a utility orchestrator:
- Admiral provides the skill intent or team description as input.
- Skill-maker runs its full pipeline autonomously and returns the delivery package.
- Verdict mapping for Admiral's handoff protocol:
SHIP→APPROVED,ITERATE→REVISE,BLOCKED→ESCALATE. - Admiral's
gatekeeper-admiralvalidates the skill-maker output at the cross-pipeline boundary. - Maximum revision cycles when called by Admiral: 2 (per Admiral's standard handoff rules), distinct from the internal max of 5 review-improve cycles.
Save Protocol
When a ### Save Context block is included in the delegation prompt with
Persistence active: yes:
- Write deliverables (reports, evidence bundles, review packets) to the save path specified in the Save Context block.
- Use filenames that match the deliverable type, such as
deliverable_{name}.md,report_{name}.md, orreview-packet.md. - Do not create or manage
_phase-state.md— that is the delegating orchestrator's responsibility.
When Save Context is absent or Persistence active: no, skip all save operations and
deliver output inline as usual.
Reference files
| File | Purpose | When to read |
|---|---|---|
../grill-me-doctrine.md |
Binding intake interview protocol | At Stage 0, before classifying the request |
references/workflow-protocol.md |
State machine, transitions, resume protocol | Before starting any pipeline run |
references/handoff-templates.md |
Delegation templates for all 5 handoff types | Before each delegation |
references/delivery-template.md |
Final delivery report format | At Stage 5 |
references/skill-guide.md |
Canonical skill authoring guide (shared) | When user asks about skill structure |
intake-brief.yaml |
Trigger set, inputs, outputs, and acceptance contract | Confirming the pipeline's intake surface |
stub-contract.md |
Stage model, handoff rules, quality and delivery contract | Confirming stage boundaries and verdict vocabulary |
agent/agent-manifest.yaml |
Agent-mode delegation capabilities and fallback behavior | When invoked as a sub-orchestrator (e.g. by Admiral) |
skill-creator/SKILL.md |
Creation specialist capabilities and modes | Understanding what creator can do |
skill-reviewer/SKILL.md |
Review specialist capabilities and phases | Understanding what reviewer reports |
Bundled Support Surface
The skill-creator/ subtree is intentionally bundled with skill-maker because the
orchestrator delegates every substantive create, improve, optimize, eval, and package
operation there.
| Path | Purpose | Owner |
|---|---|---|
skill-creator/agents/grader.md |
Grades assertion results against skill outputs. | skill-creator eval mode |
skill-creator/agents/comparator.md |
Runs blind A/B comparisons for current vs. baseline outputs. | skill-creator eval mode |
skill-creator/agents/analyzer.md |
Explains why one output beat another and proposes improvement themes. | skill-creator improve mode |
skill-creator/assets/eval_review.html |
Static query-review template for trigger eval approval. | skill-creator optimize mode |
skill-creator/eval-viewer/ |
Generates and serves reviewable eval-result reports. | skill-creator eval mode |
skill-creator/scripts/ |
Validation, eval, benchmark aggregation, description optimization, reporting, and packaging utilities. | skill-creator modes |
Treat generated eval workspaces, packaged archives, benchmark outputs, and review reports as run artifacts. Keep them outside this skill directory unless a script explicitly writes a temporary file that is excluded from packaging.