name: skill-optimizer description: SkillOpt-flavored offline training loop for any SKILL.md. Treats accumulated learn-rule corrections as training trajectories, proposes bounded patches via an optimizer LLM, gates each candidate against a held-out validation set built from the user's own past corrections, and ships only candidates that demonstrably improve the score. Inspired by Microsoft SkillOpt's ReflACT pipeline (rollout → reflect → aggregate → select → update → evaluate) adapted to pro-workflow's SQLite store. Use when a skill has accumulated 8+ learn-rule rows and the user wants the skill itself to get better, not just longer.
Skill Optimizer
Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes.
When to use
Use this skill when:
- A pro-workflow skill has accumulated 8+ learn-rule rows for it
- The user reports the skill is "getting bloated" or "rules keep being repeated"
- The user wants offline, budget-capped improvement over multiple sessions
Do not use when:
- Skill has fewer than 8 trajectories (nothing to learn from)
- The user wants real-time edits (this is offline, single-shot)
- No
ANTHROPIC_API_KEY(or equivalent provider key) is available
Architecture (mirrors SkillOpt's six-stage loop)
rollout pull recent learnings from SQLite (existing learn-rule rows)
reflect optimizer LLM analyzes a minibatch, proposes add/delete/replace patches
aggregate vote-merge patches across minibatches
select clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step)
update apply selected patches to a candidate skill content
evaluate evaluator LLM scores candidate against held-out validation items
gate accept candidate only if weighted score >= current + acceptThreshold
slow update at epoch boundary, consolidate accepted edits into a coherent rewrite
Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.
Run it
/skill-optimize <slug> [options]
Options (all optional; sensible defaults shown):
| Flag | Default | Notes |
|---|---|---|
--epochs N |
3 | Outer loop count |
--batch-size N |
8 | Trajectories per minibatch |
--minibatches N |
2 | Minibatches per epoch |
--holdout N |
6 | Validation items reserved (max ~25% of trajectories) |
--budget-usd X |
0.50 | Hard cap; loop aborts when spent |
--optimizer-model M |
claude-sonnet-4-6 |
Reflect + slow-update model |
--evaluator-model M |
claude-haiku-4-5-20251001 |
Gate model (cheaper) |
--max-adds N |
3 | LR budget per step |
--max-deletes N |
2 | |
--max-replaces N |
3 | |
--accept-threshold X |
0.0 | Minimum score delta to accept candidate |
--max-skill-tokens N |
2000 | Hard cap on candidate length |
--slow-every N |
2 | Epochs between consolidation passes |
--json |
off | Machine-readable output |
Kill switch: touch ~/.pro-workflow/STOP aborts the loop between steps.
Output
- Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment
- Run details persist in
optimization_runs,optimization_candidates,optimization_patches,optimization_rejections - Validation set persists in
optimization_validation(reusable across runs)
Inspect after:
sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5"
Rules
- Validation set is frozen at run start. Never re-derive from new corrections mid-run.
- One candidate per step. No parallel branches.
- Slow-update output is itself a candidate; it must pass the gate to replace the best.
- The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config.
- If
spent_usd >= budget_usdat any step boundary, the loop ends withstopped_reason="budget exhausted". - Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason
anchor_missing.
Provenance
Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.