skill-optimizer

star 2.3k

SkillOpt-flavored offline training loop for any SKILL.md. Treats accumulated learn-rule corrections as training trajectories, proposes bounded patches via an optimizer LLM, gates each candidate against a held-out validation set built from the user's own past corrections, and ships only candidates that demonstrably improve the score. Inspired by Microsoft SkillOpt's ReflACT pipeline (rollout → reflect → aggregate → select → update → evaluate) adapted to pro-workflow's SQLite store. Use when a skill has accumulated 8+ learn-rule rows and the user wants the skill itself to get better, not just longer.

rohitg00 By rohitg00 schedule Updated 6/3/2026

name: skill-optimizer description: SkillOpt-flavored offline training loop for any SKILL.md. Treats accumulated learn-rule corrections as training trajectories, proposes bounded patches via an optimizer LLM, gates each candidate against a held-out validation set built from the user's own past corrections, and ships only candidates that demonstrably improve the score. Inspired by Microsoft SkillOpt's ReflACT pipeline (rollout → reflect → aggregate → select → update → evaluate) adapted to pro-workflow's SQLite store. Use when a skill has accumulated 8+ learn-rule rows and the user wants the skill itself to get better, not just longer.

Skill Optimizer

Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes.

When to use

Use this skill when:

  • A pro-workflow skill has accumulated 8+ learn-rule rows for it
  • The user reports the skill is "getting bloated" or "rules keep being repeated"
  • The user wants offline, budget-capped improvement over multiple sessions

Do not use when:

  • Skill has fewer than 8 trajectories (nothing to learn from)
  • The user wants real-time edits (this is offline, single-shot)
  • No ANTHROPIC_API_KEY (or equivalent provider key) is available

Architecture (mirrors SkillOpt's six-stage loop)

rollout      pull recent learnings from SQLite (existing learn-rule rows)
reflect      optimizer LLM analyzes a minibatch, proposes add/delete/replace patches
aggregate    vote-merge patches across minibatches
select       clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step)
update       apply selected patches to a candidate skill content
evaluate     evaluator LLM scores candidate against held-out validation items
gate         accept candidate only if weighted score >= current + acceptThreshold
slow update  at epoch boundary, consolidate accepted edits into a coherent rewrite

Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.

Run it

/skill-optimize <slug> [options]

Options (all optional; sensible defaults shown):

Flag Default Notes
--epochs N 3 Outer loop count
--batch-size N 8 Trajectories per minibatch
--minibatches N 2 Minibatches per epoch
--holdout N 6 Validation items reserved (max ~25% of trajectories)
--budget-usd X 0.50 Hard cap; loop aborts when spent
--optimizer-model M claude-sonnet-4-6 Reflect + slow-update model
--evaluator-model M claude-haiku-4-5-20251001 Gate model (cheaper)
--max-adds N 3 LR budget per step
--max-deletes N 2
--max-replaces N 3
--accept-threshold X 0.0 Minimum score delta to accept candidate
--max-skill-tokens N 2000 Hard cap on candidate length
--slow-every N 2 Epochs between consolidation passes
--json off Machine-readable output

Kill switch: touch ~/.pro-workflow/STOP aborts the loop between steps.

Output

  • Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment
  • Run details persist in optimization_runs, optimization_candidates, optimization_patches, optimization_rejections
  • Validation set persists in optimization_validation (reusable across runs)

Inspect after:

sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5"

Rules

  • Validation set is frozen at run start. Never re-derive from new corrections mid-run.
  • One candidate per step. No parallel branches.
  • Slow-update output is itself a candidate; it must pass the gate to replace the best.
  • The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config.
  • If spent_usd >= budget_usd at any step boundary, the loop ends with stopped_reason="budget exhausted".
  • Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason anchor_missing.

Provenance

Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.

Install via CLI
npx skills add https://github.com/rohitg00/pro-workflow --skill skill-optimizer
Repository Details
star Stars 2,284
call_split Forks 224
navigation Branch main
article Path SKILL.md
More from Creator