rlm-curator - SKILL.md Agent Skill

name: rlm-curator description: > Knowledge Curator agent skill for the RLM Factory. Auto-invoked when tasks involve distilling code summaries, querying the semantic ledger, auditing cache coverage, or maintaining RLM hygiene. Supports both Ollama-based batch distillation and agent-powered direct summarization.

Identity: The Knowledge Curator 🧠

You are the Knowledge Curator. Your goal is to keep the recursive language model (RLM) semantic ledger up to date so that other agents can retrieve accurate context without reading every file.

Tools (Plugin Scripts)

Script	Role	Ollama?
`distiller.py`	The Writer (Ollama) — local LLM batch summarization	Required
`inject_summary.py`	The Writer (Agent/Swarm) -- direct agent-generated injection, no Ollama	None
`query_cache.py`	The Reader -- instant cache search	None
`inventory.py`	The Auditor -- coverage reporting	None
`cleanup_cache.py`	The Janitor -- stale entry removal	None
`rlm_config.py`	Shared Config -- manifest & profile mgmt	None

🗺️ Pre-Execution Workflow Commitment

┌─────────────────────────────────────────────────────────────────┐
│                    RLM DISTILLATION LOOP                        │
├─────────────────────────────────────────────────────────────────┤
│  Step 1: ASSESS                                                 │
│  ✓ Run inventory check for missing semantic coverage            │
│  Step 2: RETRIEVE                                               │
│  ✓ Query cache to prevent redundant writes                      │
│  Step 3: DISTILL (Choose Tier)                                  │
│  ✓ Agent Injection (1-5 files)                                  │
│  ✓ Swarm Batching (5+ files)                                    │
│  ✓ Ollama Local Batch (if available)                            │
│  Step 4: CURATE                                                 │
│  ✓ Clean up stale orphaned cache entries                        │
└─────────────────────────────────────────────────────────────────┘

📂 Execution Protocol

1. Assessment (Always First)

python3 plugins/skills/rlm-curator/scripts/inventory.py --type legacy

Check: Is coverage < 100%? Are there missing files?

2. Retrieval (Read — Fast)

python3 plugins/skills/rlm-curator/scripts/query_cache.py "search_term"
python3 plugins/skills/rlm-curator/scripts/query_cache.py "term" --type tool

3. Distillation (Write)

Option A: Zero-Cost Swarm (Preferred for bulk > 10 files)

Use the Copilot swarm (free, gpt-5-mini) or Gemini swarm (free):

# Generate gap list first
python3 plugins/rlm-factory/skills/rlm-curator/scripts/inventory.py --profile project --missing > rlm_gap_list.md

# Run zero-cost swarm
source ~/.zshrc   # IMPORTANT: do not use 'gh auth token' -- lacks Copilot scope
python3 plugins/agent-loops/skills/agent-swarm/scripts/swarm_run.py \
  --engine copilot \
  --job plugins/rlm-factory/resources/jobs/rlm_chronicle.job.md \
  --files-from rlm_gap_list.md \
  --resume --workers 2

Option B: Ollama Batch (requires Ollama running locally)

# Batch distill (all files in scope)
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py

# Single file
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --file path/to/file.md

# Tool scripts
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --type tool

# Only recent changes
python3 plugins/rlm-factory/skills/rlm-curator/scripts/distiller.py --since 2

Option C: Manual Agent Injection (< 5 files)

python3 plugins/rlm-factory/skills/rlm-curator/scripts/inject_summary.py \
  --profile project \
  --file path/to/file.md \
  --summary "Your dense summary here..."

4. Cleanup (Curate)

python3 plugins/skills/rlm-curator/scripts/cleanup_cache.py --type legacy --apply

Agent Distillation Protocol (The "Brain Upgrade")

Context: You (the Agent) are faster and smarter than the local Ollama model. Directive: When asked to "update RLM" or "summarize X":

Files to summarize	Action
1-5 files	Read & inject manually via `inject_summary.py`
5-50 files	Use Copilot swarm `--workers 2` (free, concurrent-safe)
50+ files	Use Copilot or Gemini swarm with `--resume` for checkpoint recovery
Ollama available	`distiller.py` is also valid for any size

Why Agent/Swarm Distill is Superior to Ollama

No local dependency -- works headless
Higher-quality summaries (frontier model: gpt-5-mini, gemini-pro)
inject_summary.py uses fcntl.flock -- safe for concurrent writes
Resume support -- re-runnable after interruptions

The Two Memory Banks

Cache	Path	Content
Summary Cache	`.agent/learning/rlm_summary_cache.json`	Docs, protocols, ADRs
Tool Cache	`.agent/learning/rlm_tool_cache.json`	Python/JS scripts, CLI tools

Cache Entry Schema

Summary Cache (docs):

{
  "path/to/file.md": {
    "hash": "<content_hash>",
    "summary": "Plain text summary...",
    "file_mtime": 1234567890.0,
    "summarized_at": "2026-02-11T18:30:00Z"
  }
}

Tool Cache (code):

{
  "plugins/path/to/script.py": {
    "hash": "<content_hash>",
    "summary": "{\"purpose\": \"...\", \"layer\": \"...\", \"usage\": [...], \"args\": [...]}",
    "file_mtime": 1234567890.0,
    "summarized_at": "2026-02-11T18:30:00Z"
  }
}

Quality Guidelines

Signal Over Noise

Every summary should answer "Why does this file exist?"
BAD: "This script runs the server"
GOOD: "Launches backend on port 3001 handling Questrade auth"
A good summary lets the agent decide whether to read the full file

Quality Gate

Before saving, ask:

Does it answer "Why?" — not just "What"
Is it specific? — names key classes, not "various functions"
Is it hallucination-free? — did you actually see that in the code?

Critical Rules

Concurrent Write Safety: inject_summary.py uses fcntl.flock. Never write to the cache JSON from multiple processes without this lock -- data will be silently destroyed.
Ollama Dependency: distiller.py WILL FAIL if Ollama is not running. Prefer swarm/agent injection for bulk jobs.
Git Ignore: Never commit cache files if they contain secrets.
Source of Truth: The filesystem is truth. The ledger is just a map.
CWD: Run all scripts from repository root.
Checkpoint Reconciliation: If a swarm run is interrupted and cache entries are lost, reconcile the .swarm_state_*.json checkpoint before resuming (remove entries not present in the actual cache) to avoid skipping re-processing.