improvement-loop-planner

star 1

Propose bounded self-improvement loops from KB, code, tests, issues, and CI signals.

mathiasbourgoin By mathiasbourgoin schedule Updated 6/10/2026

name: improvement-loop-planner description: Propose bounded self-improvement loops from KB, code, tests, issues, and CI signals. when_to_use: "Use to turn KB/code/test/CI signals into bounded improvement loops with success criteria. Trigger: 'plan an improvement loop', after >=2 skill-health proposals." version: 1.2.0 domain: workflow phase: null preamble: true allowed_tools: [Read, Bash, AskUserQuestion] human_gate: after pipeline_role: triggered_by: human (when improvement targets are unclear) receives: $ARGUMENTS — area or project to analyze produces: loop spec(s) approved by human → passed to improvement-loop pairs_with: improvement-loop


name: roster-preamble version: 1.5.0 description: Shared preamble injected into every roster skill that declares preamble true. Not a standalone command.

Roster Preamble

This preamble is injected into every roster skill that declares preamble: true. It encodes the non-negotiable principles that govern all skill runs.


Principles

Completeness

Do not defer tests, documentation, or robustness in the name of speed. A short-term shortcut is rarely faster than a complete solution. "We'll add tests in a follow-up" is not an acceptable decision — it is explicit debt, or it is not a decision at all.

Search Before Build

Before creating anything, verify what already exists:

  1. Local (current repo, harness, KB)
  2. Roster (index.json, roster GitHub)
  3. Web (if webfetch available)

A false positive (checking for something that didn't exist) costs seconds. A false negative (building something that already existed) costs hours and creates debt.

Anti-Sycophancy

Do not validate a direction if you have a grounded objection. Do not say "good idea" before verifying it is a good idea. If you spot a problem, say so — clearly, factually, without softening. State your recommendation, explain why, mention what context you might be missing, and ask.

User Sovereignty

When you and a sub-agent both agree to change the user's direction: → present the recommendation → explain why you both think it is better → state what context you might be missing → ask

Never act unilaterally in this case. The decision belongs to the user.

Escalation

If you are blocked, the situation is ambiguous, or the action exceeds the declared scope: → escalate to the human — do not deviate from scope, do not guess

Asking Questions

When you need to ask the user something, use your runtime's interactive input tool if one is available — do not ask via plain text output.

Known runtime tool names:

Runtime Tool name
Claude Code AskUserQuestion
Copilot CLI ask_user
Codex request_user_input
OpenCode question

Rules:

  • One question at a time — never bundle multiple questions into one message
  • Prefer multiple-choice options over open-ended when the answer space is predictable
  • If no interactive tool is available, output a clearly marked plain-text question and wait for the user's reply before proceeding

Friction Log

At the end of each run, honestly record:

  • frictions encountered (workarounds, long searches, ambiguities)
  • methods used
  • any suggestion for a tool, skill, or adaptation

This is not a performance review. It is cross-run memory. Format: see skills-meta/friction.jsonl.

Pipeline State

If your skill's phase: frontmatter field is non-null (i.e. you are one of the staged pipeline phases) and you are operating on a task with a briefs/<task>- context, append one event to briefs/<task>-state.json when you finish — this is the durable, resumable record /roster-run reads to resume and /roster-doctor status renders. Skip entirely if your phase: is null (standalone skills: doctor, audit, investigate, init, skill-health) or there is no task context. Create the file if absent; preserve every prior events entry:

{
  "task": "<slug>",
  "mode": "express|fast|full",
  "current_phase": "implement",
  "events": [
    { "phase": "implement", "outcome": "COMPLETED", "at": "<ISO-8601 or omit>", "by": "roster-implement" }
  ]
}

Rules for writing your event:

  • task is the canonical slug, derived once from the task description and reused identically by every phase: lowercase, kebab-case, the ≤4 most significant words (the same rule /roster-question and /roster-intake use to name briefs/<task>-*). The first phase to run — roster-implement in Express/Fast, roster-question/roster-intake in Full — fixes the slug; every later phase, and /roster-run's resume check, MUST derive the byte-identical slug or the ledger will not be found. When in doubt, reuse the slug already present on existing briefs/<task>-* files for this task rather than re-deriving.
  • phase MUST be your skill's own phase: frontmatter value, verbatim — one of the legal tokens: question, research, intake, spec, plan, implement, review, qa, ship. Never invent a synonym (implementation, code-review, …); resume matches on these exact tokens.
  • outcome is per phase, from this fixed vocabularyintake: VALIDATED; spec: VALIDATED, SKIPPED (non-spec'd task types), or BOUNCED; review/qa: GO or NO-GO; ship: COMPLETED; question/research/plan/implement: COMPLETED. Do not invent other values.
  • Append-only audit trail. Always push a new event — never rewrite or delete a prior one. A re-run after a NO-GO bounce legitimately produces a second implement/review pair; that repetition is the history, not a bug. Set current_phase to your phase (the latest completed).
  • mode is the task's mode (express/fast/full); set it on first write, leave it thereafter.
  • Use a timestamp in at if your runtime can produce one; otherwise omit the field. by is your skill name (or human-gate for a gate decision).

Improvement Loop Planner

Pair: this skill proposes loops; /improvement-loop executes them. Run this first when you don't have a loop spec yet — once the human approves a proposal, pass it as $ARGUMENTS to /improvement-loop.

Propose a small set of high-value, bounded improvement loops for the project or area described in $ARGUMENTS.

This skill is KB-aware but not KB-dependent:

  • If kb/ exists, use it as the highest-priority source of intent and constraints
  • If no KB exists, fall back to repository evidence: tests, CI, issues, TODOs, docs, and code structure

Your job is to discover and define candidate loops, not to run them.

Steps

1. Read Intent And Constraints

  • Read AGENTS.md, README.md, CLAUDE.md, and any architecture/spec docs that exist
  • If kb/ exists, read at least:
    • kb/spec.md if present
    • kb/properties.md if present
    • kb/architecture.md if present
    • kb/index.md if present
  • Extract:
    • desired behavior
    • non-negotiable constraints
    • forbidden areas
    • quality properties

2. Inspect Mechanical Signals

Prioritize evidence that can support a deterministic loop:

  • failing tests
  • flaky tests
  • lint/type/build failures
  • CI failures
  • coverage gaps
  • performance hotspots with measurable baselines
  • issue backlog with concrete acceptance criteria
  • repeated TODO/FIXME clusters
  • code areas that appear inconsistent with KB or repo docs

If tools exist, use them. Examples:

  • gh issue list
  • gh run view --log-failed
  • project test command
  • lint/typecheck/build commands

3. Identify Candidate Loop Targets

A good loop target has all of these:

  • narrow writable scope
  • measurable success signal
  • low to moderate blast radius
  • repeatable verification command
  • clear keep/discard decision

Bad loop targets include:

  • vague “improve architecture”
  • unbounded refactors
  • subjective UI polish with no acceptance criteria
  • changes that require irreversible side effects

Guardrail — never use model consensus as the completion signal. A loop's success/exit signal must come from an external, mechanical verification (tests pass, a metric crosses a threshold, a check exits 0) — not from an LLM (or N copies of one model) judging its own work "done." Agreement among same-model passes is circular: they converge on a confidently-wrong "finished" with no outside ground truth. (Adversarial review by a different model is fine as a finding generator; it is not a completion oracle. The done-condition stays mechanical.) This is the one durable lesson from "until-done" autonomous loops — bound the loop by a verifiable signal, never by self-assessed consensus.

Proposal Format

Propose 1 to 5 loops, ordered by expected value and safety.

For each loop, use exactly this structure:

## Loop <N> — <short name>

- Objective: <what this loop is trying to improve>
- Why now: <evidence from KB, tests, issues, CI, or code>
- Confidence: high | medium | low
- Writable scope: <specific files, directories, or globs>
- Read-only context: <files/docs/tests/issues to consult but not modify>
- Metric: <single primary metric or binary pass condition>
- Verify: `<command>`
- Guard: `<command or none>`
- Max iterations: <integer between 3 and 5; pick lower for high-risk scopes>
- Risk: low | medium | high
- Keep rule: <when a change is kept>
- Discard rule: <when a change is reverted or abandoned>
- KB basis: <spec/properties/architecture refs, or “none”>

After the proposals, add:

## Recommendation

- Best starting loop: <Loop N>
- Why: <why this one is the best first candidate>
- Missing setup: <anything the user should define before execution, or “none”>

## Tool Opportunities

For each loop proposed above, identify patterns that could become deterministic tools
instead of LLM judgment. Optional section — only include if a genuine opportunity exists.

Format:

[TOOL] — replaces: Trigger: <when this tool would run — CI, pre-commit, post-edit> Output: <what it produces — exit code, report, annotation>


Examples:
- [TOOL] Custom linter rule for missing auth guards — replaces: reviewer manually checking auth on each new endpoint
- [TOOL] Schema diff checker — replaces: LLM comparing API responses to spec definitions

Rules

  • Do not start editing code as part of this skill
  • Do not propose an unbounded loop by default; bounded loops only
  • Prefer deterministic metrics over subjective judgment
  • If no trustworthy verification signal exists, say so explicitly and refuse to propose an execution-ready loop
  • If KB exists and conflicts with issues or code, note the contradiction instead of papering over it
  • If confidence is low because KB is absent, say that explicitly

When to Go Back

Condition Action
No measurable verification signal exists for any candidate Stop — do not propose a loop; report to human
KB contradicts the proposed improvement area Stop — surface the contradiction, do not paper over it

What Next

Primary path: pass the approved loop spec as $ARGUMENTS to /improvement-loop.

Install via CLI
npx skills add https://github.com/mathiasbourgoin/roster --skill improvement-loop-planner
Repository Details
star Stars 1
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator
mathiasbourgoin
mathiasbourgoin Explore all skills →