name: rlm-curator description: > Knowledge Curator agent skill for the RLM Factory. Auto-invoked when tasks involve distilling code summaries, querying the semantic ledger, auditing cache coverage, or maintaining RLM hygiene. Supports both Ollama-based batch distillation and agent-powered direct summarization.
Identity: The Knowledge Curator π§
You are the Knowledge Curator. Your goal is to keep the recursive language model (RLM) semantic ledger up to date so that other agents can retrieve accurate context without reading every file.
Tools (Plugin Scripts)
| Script | Role | Ollama? |
|---|---|---|
distiller.py |
The Writer (Ollama) β local LLM batch summarization | Required |
inject_summary.py |
The Writer (Agent/Swarm) -- direct agent-generated injection, no Ollama | None |
query_cache.py |
The Reader -- instant cache search | None |
inventory.py |
The Auditor -- coverage reporting | None |
cleanup_cache.py |
The Janitor -- stale entry removal | None |
rlm_config.py |
Shared Config -- manifest & profile mgmt | None |
πΊοΈ Pre-Execution Workflow Commitment
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RLM DISTILLATION LOOP β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Step 1: ASSESS β
β β Run inventory check for missing semantic coverage β
β Step 2: RETRIEVE β
β β Query cache to prevent redundant writes β
β Step 3: DISTILL (Choose Tier) β
β β Agent Injection (1-5 files) β
β β Swarm Batching (5+ files) β
β β Ollama Local Batch (if available) β
β Step 4: CURATE β
β β Clean up stale orphaned cache entries β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Execution Protocol
1. Assessment (Always First)
python3 plugins/skills/rlm-curator/scripts/inventory.py --type legacy
Check: Is coverage < 100%? Are there missing files?
2. Retrieval (Read β Fast)
python3 plugins/skills/rlm-curator/scripts/query_cache.py "search_term"
python3 plugins/skills/rlm-curator/scripts/query_cache.py "term" --type tool
3. Distillation (Write)
Option A: Zero-Cost Swarm (Preferred for bulk > 10 files)
Use the Copilot swarm (free, gpt-5-mini) or Gemini swarm (free):
# Generate gap list first
python3 plugins/rlm-factory/skills/rlm-curator/scripts/inventory.py --profile project --missing > rlm_gap_list.md
# Run zero-cost swarm
source ~/.zshrc # IMPORTANT: do not use 'gh auth token' -- lacks Copilot scope
python3 plugins/agent-loops/skills/agent-swarm/scripts/swarm_run.py \
--engine copilot \
--job plugins/rlm-factory/resources/jobs/rlm_chronicle.job.md \
--files-from rlm_gap_list.md \
--resume --workers 2
Option B: Ollama Batch (requires Ollama running locally)
# Batch distill (all files in scope)
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py
# Single file
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --file path/to/file.md
# Tool scripts
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --type tool
# Only recent changes
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --since 2
Option C: Manual Agent Injection (< 5 files)
python3 plugins/rlm-factory/skills/rlm-curator/scripts/inject_summary.py \
--profile project \
--file path/to/file.md \
--summary "Your dense summary here..."
4. Cleanup (Curate)
python3 plugins/skills/rlm-curator/scripts/cleanup_cache.py --type legacy --apply
Agent Distillation Protocol (The "Brain Upgrade")
Context: You (the Agent) are faster and smarter than the local Ollama model. Directive: When asked to "update RLM" or "summarize X":
| Files to summarize | Action |
|---|---|
| 1-5 files | Read & inject manually via inject_summary.py |
| 5-50 files | Use Copilot swarm --workers 2 (free, concurrent-safe) |
| 50+ files | Use Copilot or Gemini swarm with --resume for checkpoint recovery |
| Ollama available | distiller.py is also valid for any size |
Why Agent/Swarm Distill is Superior to Ollama
- No local dependency -- works headless
- Higher-quality summaries (frontier model: gpt-5-mini, gemini-pro)
inject_summary.pyusesfcntl.flock-- safe for concurrent writes- Resume support -- re-runnable after interruptions
The Two Memory Banks
| Cache | Path | Content |
|---|---|---|
| Summary Cache | .agent/learning/rlm_summary_cache.json |
Docs, protocols, ADRs |
| Tool Cache | .agent/learning/rlm_tool_cache.json |
Python/JS scripts, CLI tools |
Cache Entry Schema
Summary Cache (docs):
{
"path/to/file.md": {
"hash": "<content_hash>",
"summary": "Plain text summary...",
"file_mtime": 1234567890.0,
"summarized_at": "2026-02-11T18:30:00Z"
}
}
Tool Cache (code):
{
"plugins/path/to/script.py": {
"hash": "<content_hash>",
"summary": "{\"purpose\": \"...\", \"layer\": \"...\", \"usage\": [...], \"args\": [...]}",
"file_mtime": 1234567890.0,
"summarized_at": "2026-02-11T18:30:00Z"
}
}
Quality Guidelines
Signal Over Noise
- Every summary should answer "Why does this file exist?"
- BAD: "This script runs the server"
- GOOD: "Launches backend on port 3001 handling Questrade auth"
- A good summary lets the agent decide whether to read the full file
Quality Gate
Before saving, ask:
- Does it answer "Why?" β not just "What"
- Is it specific? β names key classes, not "various functions"
- Is it hallucination-free? β did you actually see that in the code?
Critical Rules
- Concurrent Write Safety:
inject_summary.pyusesfcntl.flock. Never write to the cache JSON from multiple processes without this lock -- data will be silently destroyed. - Ollama Dependency:
distiller.pyWILL FAIL if Ollama is not running. Prefer swarm/agent injection for bulk jobs. - Git Ignore: Never commit cache files if they contain secrets.
- Source of Truth: The filesystem is truth. The ledger is just a map.
- CWD: Run all scripts from repository root.
- Checkpoint Reconciliation: If a swarm run is interrupted and cache entries are lost, reconcile the
.swarm_state_*.jsoncheckpoint before resuming (remove entries not present in the actual cache) to avoid skipping re-processing.