name: kai-retro description: Run a learning retrospective on the Kai harness. Mines gate-failure logs and 30-day performance results into lessons, triages candidate lessons (promote/keep/retire), and graduates repeated lessons into enforced gate checks with golden corpus cases. Use when "retro", "what have we learned", "triage lessons", "promote lessons", "why does this keep failing", "harness retrospective", or monthly / after any heavy content sprint.
Run the Kai learning retrospective. This is how the harness gets smarter: raw failure logs become lessons, repeated lessons become enforced checks. Read memory/MEMORY.md first for the graduation ladder.
When to run
- Monthly, or after any sprint that produced 5+ gated pieces
- Whenever the same gate failure shows up twice in one session
- When a 30-day performance check grades new losers
Step 1 — Mine the gate logs
python scripts/self_improvement/lesson_capture.py mine
This groups recurring failure signatures from data/learning/gate_runs.jsonl. Append candidates with --write. If the log is empty, note it and move on — the gates only log when they run.
Step 2 — Diagnose losers
python scripts/self_improvement/lesson_capture.py losers
For each undiagnosed loser, read the piece and its content_log.json entry, write a one-line diagnosis (hook type, persona mismatch, seasonality, thin proof — name the cause, not the symptom), and add it to memory/what-doesnt-work.md under "Measured losers" with the piece id. Check seasonality and competitor moves before blaming the content (see memory/edge-cases.md EC-15).
Step 3 — Triage every lesson
Go through memory/lessons.md:
| Verdict | Criteria | Action |
|---|---|---|
| Promote | Fired 3+ times, or checkable by a regex/threshold | Graduate it (Step 4), mark (promoted) |
| Keep | True, useful, not yet recurring | Upgrade candidate → active if verified |
| Merge | Near-duplicate of another lesson | Combine into the more general one |
| Retire | No longer true (platform changed, gate fixed) | Mark (retired) with the reason — never delete |
Step 4 — Graduate promoted lessons
Pick the strongest enforcement target, in this order:
- Lint rule / contract check — new entry in a banned-word tier, a new overclaim regex in
scripts/quality_gates/seo_lint.py, or adeterministic_checksline in the format's skill contract. - Checklist line — the relevant
knowledge/checklists/*.md. - CLAUDE.md / framework rule — only for judgment calls code can't check.
Non-negotiable: any change to a gate script requires a matching case in evals/golden/manifest.json (one sample proving the new check fires, and confirm the existing pass samples still pass), then:
python scripts/quality_gates/golden_check.py
A gate change without a golden case is not a promotion — it's a regression waiting to happen.
Step 5 — Refresh the memory index
- Update the "Current standing lessons" section of
memory/MEMORY.md(keep the file under 200 lines). - Cross-check
memory/edge-cases.md: mark any entry whoseEnforcement: noneyou just fixed; add new edge cases discovered this cycle.
Step 6 — Report
Output a retro summary:
## Kai Retro — [date]
**Mined:** [N] recurring failure signatures ([gate]: [signature] ×[count], ...)
**Losers diagnosed:** [N] ([id]: [one-line diagnosis], ...)
**Promoted:** [lesson] → [enforcement target] (+ golden case [id])
**Retired:** [lesson] — [reason]
**Edge cases:** [new/closed entries]
**Open risks:** [lessons at 2 occurrences — one more and they must promote]
Commit the memory and gate changes together so the diff shows the lesson and its enforcement side by side. If a promotion changes publishing behavior (new hard block), flag it for human approval rather than applying silently.