name: distill
description: "One-time snapshot extracting patterns from work history and accumulated lessons, distills into concrete improvements — new agent/skill suggestions, roster quality review, memory pruning, consolidating lessons into rules/agent updates, or performing bin/ extraction from /audit --efficiency candidates."
argument-hint: '[review | prune | lessons | executables [] | "external " | ""] [--eager]'
disable-model-invocation: true
allowed-tools: Read, Edit, Bash, Glob, Grep, Write, AskUserQuestion, Agent, WebFetch, TaskCreate, TaskUpdate, TaskList
effort: low
Analyze how Claude Code is used and surface concrete improvements — new agents/skills to reduce repetition, or consolidate lessons into governance files (rules, agent instructions, skill updates) — without duplicating what exists.
NOT for single-file edits or quality checks — use /foundry:audit for config quality checks.
NOT for audit-only scan for extraction candidates (use /foundry:audit --efficiency instead of distill executables for detection-only).
- $ARGUMENTS: optional. Modes:
- Omitted — analyze existing patterns and agents; generate suggestions proactively.
prune [--eager]— evaluate project memory file for stale, redundant, or verbose entries. Default: advisory diff + apply prompt.--eager: score every entry (Usage likelihood × Impact → Tier P0/P1/P2), print full scored table with#column, let user select by tier or item numbers, delegate edits tofoundry:curator.lessons [--eager]— read.notes/lessons.mdand memory feedback files, distill recurring patterns into proposed rule files, agent instruction updates, and skill workflow changes.--eager: include Pattern count, Strength, and Tier columns in proposal table; let user select clusters to promote by tier or item numbers; delegate writes tofoundry:curator.review [--eager]— review existing agent/skill roster for quality and gaps without suggesting new additions.--eager: lower overlap flag threshold from >50% to >30% scope coverage; surface any shared single capability between agents as boundary issue; add "Sharpen Boundary" section to output.external <source> [--eager]— analyse external plugin, skill, or agentic resource and produce structured adoption proposal.<source>is URL, file path, or local directory.--eager: lower adoption bar — recommend partial adoption even for single useful components.executables [--eager] [<run-dir-or-report-path>]— perform bin/ extraction from/foundry:audit --efficiencyCheck 33 candidates. Auto-detects latest run dir under.reports/audit/; pass optional path to target a specific run dir or report file. Runs inline Check 33 scan when no report exists. Default gates on HIGH/MEDIUM verdict.--eager: also surface LOW verdict clusters as extraction candidates. Spawnsfoundry:sw-engineerper cluster. Skip to Mode: Executables Extraction below.[--eager] <recurring task description>— use description as context when generating suggestions.--eager: lower frequency threshold from 3+ to 2+ occurrences; single high-effort occurrence also qualifies.
Task hygiene:
# audit-skip: resilience-replication
_FS=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/bin/resolve_shared_path.py" foundry skills/_shared 2>/dev/null || echo "plugins/foundry/skills/_shared") # timeout: 5000
Read $_FS/task-hygiene.md — follow task hygiene protocol.
EAGER=false
[[ "$ARGUMENTS" == *"--eager"* ]] && EAGER=true
ARGUMENTS=$(echo "$ARGUMENTS" | sed 's/--eager//g' | xargs) # timeout: 3000
echo "EAGER=$EAGER" # shell vars don't persist across Bash calls — read from stdout
echo "ARGUMENTS_STRIPPED=$ARGUMENTS"
Note:
EAGERand strippedARGUMENTSare set by this Bash block, but shell variable state does not persist across separate Bash() tool calls. After this block runs, read its stdout (EAGER=true/false,ARGUMENTS_STRIPPED=...) and carry those values as model-context references for all subsequent mode dispatch and threshold decisions. Do not rely on$EAGERas a live shell variable in later steps — substitute the literal boolean value read from stdout.
Step 1: Inventory existing agents and skills
Use Glob tool to enumerate agents and skills across all sources — project-local AND plugin-namespaced — to avoid false-gap findings when candidate already exists in plugin:
- Project-local: pattern
agents/*.md, path.claude/; patternskills/*/SKILL.md, path.claude/ - Plugin source (workspace): pattern
*/agents/*.md, pathplugins/; pattern*/skills/*/SKILL.md, pathplugins/ - Installed plugin cache (if accessible): resolve cache root —
PLUGIN_CACHE="${CLAUDE_PLUGIN_ROOT:-plugins/foundry}"— then use Glob tool on$PLUGIN_CACHEfor pattern*/agents/*.mdand*/skills/*/SKILL.md
For each agent/skill found, extract: name, description, tools, purpose. Tag each entry with plugin namespace (e.g. foundry:sw-engineer, oss:resolve) — used in Step 3 gap analysis to prevent recommending duplicates of plugin-namespaced agents/skills.
Step 2: Analyze work patterns
Mode-token normalization — all mode dispatches below compare against the first whitespace-delimited token of the stripped
ARGUMENTS(after--eagerremoval). Use this single rule consistently; do not rely on exact equality of the full$ARGUMENTSstring, since trailing flags/spaces from prior parsing may differ.
If first token equals executables (i.e. executables alone or executables <path>, NOT a path or word that merely starts with the string executables): skip Steps 2–5 entirely and go to "Mode: Executables Extraction" below.
If first token equals prune: skip Steps 2–5 entirely and go to "Mode: Memory Pruning" below.
If first token equals lessons: skip Steps 2–5 entirely and go to "Mode: Lessons Distillation" below.
If first token equals external (i.e. external <source>, NOT a word that merely starts with the string external): skip Steps 2–5 entirely and go to "Mode: External Distillation" below.
If $ARGUMENTS is review: skip git analysis below and go directly to Step 3 (Gap analysis). Use agent/skill descriptions from Step 1 as sole input — goal is to assess quality and coverage of existing roster, not look for new patterns in recent work. In Step 5, suppress all "Recommend: New Agent/Skill" sections and output only "Existing Coverage", "Recommend: Enhance Existing", and "No Action Needed" entries. With --eager: apply stricter overlap detection in Step 4 (threshold drops to >30%; any shared single named capability flags as boundary issue); add "Recommend: Sharpen Boundary" section to Step 5 output listing all partial-overlap pairs with specific capability to split.
Otherwise, look for signals of repetitive or specialist work. First three git commands are independent — run in parallel:
# timeout: 3000
# --- run these three in parallel ---
# Recent git history — what kinds of changes are common?
git log --oneline -50
# What file types are being worked on?
git log --name-only --pretty="" -30 | sort | uniq -c | sort -rn | head -20
# Commit message patterns — what verbs appear most?
git log --oneline -100 | cut -d' ' -f2 | sort | uniq -c | sort -rn | head -15
Then use Glob tool (pattern todo_*.md, path .plans/active/) to list active task files; read each with Read tool. Also read .notes/lessons.md (if exists) for task history and conversation hints.
If $ARGUMENTS provided, use as additional context for pattern analysis.
Frequency Heuristics
- 3+ occurrences of pattern in recent history → candidate for automation
- 2+ different projects using same manual process → cross-project skill
- significant manual effort per occurrence (subjective — use git history context) → high-value automation target
- Domain-specific knowledge required → candidate for specialist agent (not just skill)
With --eager (lower thresholds):
- 2+ occurrences → candidate for automation
- 1 occurrence with significant manual effort → qualifies as high-value candidate
- Domain-specific threshold unchanged
Step 3: Gap analysis
reviewmode: focus on agent/skill quality and coverage gaps — skip "Recommend: New Agent/Skill" analysis and focus on "Existing Coverage" and "Recommend: Enhance Existing".
For each identified pattern, check:
- Already covered? — search existing agent/skill descriptions for overlap
- Frequent enough? — recurring ≥ 3 times or clearly domain-specialized (See Step 2 heuristics — combine ≥3 occurrences with effort/frequency signals from Steps 1–2)
- Would specialist add quality? — does it require deep domain knowledge?
- Too narrow? — single-use task doesn't warrant persistent agent
Thresholds for recommendation:
- New agent: recurring specialist role, complex decision-making, 5+ distinct capabilities
- New skill: workflow orchestration, multi-step process with fixed structure
- No new file needed: one-off or already covered by existing agent
Step 4: Check for duplication
reviewmode: duplication checks still apply — review mode does not skip this step.
Before recommending anything, run overlap check and anti-pattern checklist:
For each candidate agent/skill:
- Does any existing agent cover >50% of its scope? → enhance existing instead
(with --eager: lower to >30%; any shared single named capability → flag as boundary issue)
- Is the name/description confusingly similar to an existing one? → rename existing
Anti-pattern checklist — reject candidate if any apply:
- Role vs task confusion: agents are roles, not tasks. Do not create agent for every different topic.
- Near-duplicate: candidate duplicates existing agent with slightly different name. Enhance existing instead.
- Thin wrapper: candidate skill just calls one agent with fixed args. Not enough value to justify new skill file. Exception: skills that add measure-first/measure-after bookends, multi-mode dispatch across 3+ agents, or safety breaks (retry limits, validation gates) justify wrapper even if only one agent executes for given invocation.
Step 5: Report
## Agent/Skill Suggestions
### Existing Coverage (no gaps found)
- [agent/skill]: covers [pattern] well — no new file needed
### Recommend: New Agent — [name]
**Trigger**: [what recurring pattern or gap justifies this]
**Gap**: [what existing agents don't cover]
**Scope**: [what it would do — 3-5 bullet points]
**Suggested tools**: [Read, Write, Edit, Bash, etc.]
**Draft description**: "[one-line description for frontmatter]"
### Recommend: New Skill — [name]
**Trigger**: [what repetitive workflow justifies this]
**Gap**: [why existing skills don't cover it]
**Scope**: [what workflow steps it would orchestrate]
**Draft description**: "[one-line description for frontmatter]"
### Recommend: Enhance Existing — [agent/skill name]
**Add**: [specific capability missing from current version]
**Why**: [what recurring task would benefit]
### No Action Needed
[pattern]: already handled by [existing agent/skill]
## Confidence
**Score**: [0.N]
**Gaps**: [e.g., git history too shallow, task files not present, descriptions too generic to compare]
**Refinements**: N passes. [Pass 1: <what improved>. Pass 2: <what improved>.] — omit if 0 passes
Mode: Memory Pruning — only when $ARGUMENTS == "prune"
Locate, evaluate, and trim project memory file.
Find memory file:
# timeout: 5000 — uses canonical resolve_memory_dir.py (aligned with modes/lessons.md)
MEMORY_DIR=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/bin/resolve_memory_dir.py" 2>/dev/null)
MEMORY_FILE="$MEMORY_DIR/MEMORY.md"
if [ -n "$MEMORY_DIR" ] && [ -f "$MEMORY_FILE" ]; then
echo "PRUNE_FOUND"
echo "PRUNE_FOUND_PATH: $MEMORY_FILE"
else
echo "PRUNE_ABORT"
echo "PRUNE_ABORT_REASON: no memory file at $MEMORY_FILE — skipping prune mode"
fi
Short-circuit: exit 0 inside this bash block would terminate only the bash subprocess, not the surrounding skill — so without the explicit gate below the skill would continue into the prune-evaluation steps with no memory file to operate on. After the block above runs, scan bash output for a line where the entire line content is exactly PRUNE_ABORT (use exact-line match: [[ "$line" == "PRUNE_ABORT" ]], not substring match). If present, stop the prune mode entirely: skip every remaining prune step (read, evaluate, P1–P3, summary) and end the response with the Confidence block. Otherwise, extract the memory file path from the PRUNE_FOUND_PATH: <path> output line for use in subsequent Read calls. The remaining prune-mode prose below assumes PRUNE_FOUND was in output.
Read memory file with Read tool. Also read .claude/CLAUDE.md to identify overlap — anything already covered in CLAUDE.md need not live in memory.
If $EAGER == true — skip P1–P3 below; execute P-eager steps:
P-eager-1: Score every MEMORY.md section against two dimensions:
- Usage likelihood: High = needed every session · Moderate = occasional · Low = rare/one-off
- Impact if missing: High = wrong behavior without it · Moderate = degraded output · Low = no effect
- Tier (derived): P0 = keep · P1 = trim candidate · P2 = drop/convert candidate
- P0: High×High or High×Moderate
- P1: any Moderate×Moderate or mixed High/Low signal
- P2: Low usage OR Low impact (especially both)
- Action: entries whose content could live in
rules/*.mdor an agent file → mark→ rulein Action column
Print scored table:
| # | Section | Usage likelihood | Impact if missing | Tier | Action |
|----|---------|-----------------|-------------------|------|-------------|
| 1 | ... | High | High | P0 | Keep |
| 2 | ... | Low | Low | P2 | Drop |
| 3 | ... | Moderate | High | P1 | Trim |
| 4 | ... | Low | High | P2 | → rule |
Legend:
Usage likelihood — High: every session · Moderate: occasional · Low: rare/one-off
Impact if missing — High: wrong behavior · Moderate: degraded · Low: no effect
Tier — P0: keep · P1: trim candidate · P2: drop/convert candidate
Action — "→ rule" entries can be promoted to rules/*.md then dropped from memory
P-eager-2: Call AskUserQuestion tool — do NOT write question as plain text:
- question: "Which entries to prune? Select tier or type item numbers (e.g. 2, 4, 7)."
- (a) label:
All P2— description: drop all tier-P2 entries; apply→ ruleconversions as proposals - (b) label:
All P1 + P2— description: trim P1 entries and drop P2 entries - (c) label:
Specific items— description: enter item numbers in next message; applies only those - (d) label:
Skip— description: leave MEMORY.md untouched; user edits manually
If user picks (c): print "Enter item numbers (e.g. 2, 4, 7):" and wait for next message; resolve item numbers against # column before proceeding.
P-eager-3: Spawn foundry:curator to apply selected edits. Substitute absolute memory file path (resolved from PRUNE_FOUND_PATH) inline before issuing the Agent call:
Read MEMORY.md at <absolute-path>.
Apply these prune actions (sections identified by # from scored table):
<list: # — section name — action (Drop | Trim | Convert to rule)>
Rules:
- Drop: remove entire section including heading
- Trim: keep operational directive only (1 line max per entry); remove rationale/backstory
- Convert to rule: remove section from MEMORY.md; print proposed rule file content inline in response for user review before writing — do NOT write the rule file
Write MEMORY.md changes using the Edit tool.
Return ONLY: {"status":"done","sections_dropped":N,"sections_trimmed":N,"rule_conversions":N,"confidence":0.N}
Print compact summary after curator completes:
Pruned MEMORY.md — <date>
Dropped: N sections — [names]
Trimmed: N sections — [names]
Rule conversions proposed: N — [names] (review and write manually or via /manage)
Kept: N sections unchanged
Saved: ~N lines
End response with ## Confidence block per CLAUDE.md output standards.
Otherwise ($EAGER == false) — standard read-only advisory flow:
Evaluate each section against these criteria:
- Drop: content no longer accurate (removed features, resolved one-time issues, superseded decisions), or fully duplicated in CLAUDE.md
- Trim: sections still accurate but containing implementation history or rationale no longer needed day-to-day — keep operational facts (what/where), drop why-it-was-built backstory
- Keep: rules actively applied every session; project-specific facts absent from CLAUDE.md; anything model needs to act correctly
Memory-write gate — project CLAUDE.md Memory Policy prohibits auto-writes to MEMORY.md. Prune mode runs read-only by default and produces advisory diff/report rather than applying edits silently:
P1: Read memory file and analyse for stale, redundant, and verbose entries.
P2: Print proposed prune report to terminal (sections to drop + sections to trim, with line ranges and reasoning):
Prune proposals (apply manually unless explicitly approved below):
Drop — <section name>: <reason>
Trim — <section name>: <what to remove vs keep>
...
P3: Call AskUserQuestion — do NOT write question as plain text. Map options directly into tool call:
- question: "Apply prune edits to MEMORY.md?"
- (a) label:
Apply now— description: use Edit tool to apply all proposals to memory file - (b) label:
Show diff first— description: print line-by-line preview before applying any change - (c) label:
Skip— description: leave MEMORY.md untouched; user will edit manually
Only after user picks (a) (or (b) followed by approval) may Edit be invoked on memory file. Never apply prune edits silently.
Print compact summary after applying (or after user declines):
Pruned MEMORY.md — <date>
Dropped: N sections — [names]
Trimmed: N sections — [names]
Kept: N sections unchanged
Saved: ~N lines
End response with ## Confidence block per CLAUDE.md output standards.
Mode: Lessons Distillation — only when $ARGUMENTS == "lessons"
Read and execute ${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/skills/distill/modes/lessons.md.
Mode: External Distillation — only when $ARGUMENTS begins with external
Read and execute ${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/skills/distill/modes/external.md.
Mode: Executables Extraction — only when $ARGUMENTS begins with executables
Read and execute ${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/skills/distill/modes/executables.md.
Skill is introspective: looks at tooling itself, not just code
Invoke periodically (e.g., monthly) or after burst of correction/feedback; one-time snapshot, not continuous monitor
Suggestions are proposals — always review before creating new files
After creating new agent/skill based on suggestion, re-run skill once to confirm gap resolved, then stop
lessonsmode is primary consolidation path — run after any session with significant corrections to prevent lesson drift back into MEMORY.md noiseAgent Teams signal tracking: when reviewing patterns, also look for:
- Skills using
--teamor team-mode heuristics more/less than expected → flag over/under-use relative to decision matrix inCLAUDE.md § Agent Teams - Security findings appearing in reviews for non-auth code → suggests foundry:qa-specialist teammate scope too broad; narrow it
- Model tier mismatches (e.g., heavy analysis assigned to
sonnetteammates) → flag for tier adjustment
- Skills using
externalmode calibration: two concrete GT fixture cases defined in calibrate skills mode file — find viafind "${CLAUDE_PLUGIN_ROOT:-plugins/foundry}" -maxdepth 5 -path "*/calibrate/modes/skills.md" 2>/dev/null | head -1with fallback toplugins/foundry/skills/calibrate/modes/skills.md:- caveman plugin — narrow, self-contained communication mode, no local structural overlap → GT: install-as-is recommended, Group A empty or thin
- Karpathy autoresearch — research automation tool, strong overlap with
research:plugin structure → GT: Group A candidates map to research plugin, digest recommended, install-as-is not triggered - Ground truth = static snapshot of each tool's agent/skill/rule files (no live fetch needed); score adoption-table lane assignments against GT outcomes
Follow-up chains:
- Suggestion accepted for new agent/skill →
/foundry:manage createto scaffold and register it - Suggestion to enhance existing → edit agent/skill directly, then
/foundry:setup lessonsproposals applied →/foundry:setupto propagate;/foundry:audit rulesto verify new rule files structurally soundexecutablesextraction complete →/foundry:setupto propagate bin/ scripts; run/foundry:audit --efficiencyto confirmclusters == 0
- Suggestion accepted for new agent/skill →