context: fork user-invocable: false name: lifelong-learning description: | Continuous learning pipeline that captures session experiences, performs batch learning via GRPO, and transfers validated knowledge between System 1 and System 2 caches. Auto-activates when: session end, pattern discovery during routing, knowledge transfer triggers, manual /learn command. Triggers: learn, experience, knowledge, transfer, promote, demote, grpo, batch, pattern, skill development, growth, continuous improvement lang: [en] platforms: [claude-code, gemini-cli, codex-cli, cursor] level: 2 triggers: - "learn" - "skill development" - "knowledge" - "growth" - "continuous improvement" agents: - "planner" tokens: "~2K" category: "learning" source_hash: d5a5a1f0 whenNotToUse: "One-off tasks or throwaway experiments where no routing pattern or user preference is worth persisting; do not trigger during active task execution."
Lifelong Learning
Contents
- When This Skill Applies
- Core Guidance
- Configuration
- Workflow Checklist
- Human Checkpoints
- Freedom Levels
- Quick Reference
When This Skill Applies
- Session end (automatic via nightly-learner hook)
- Pattern discovery during routing decisions
- Knowledge transfer between System 1 and System 2
- Periodic consolidation of routing experience
- Manual learning triggers via
/learncommand
Core Guidance
1. Learning Pipeline
Session Experiences
|
v
+------------------+
| Experience | Collect routing decisions + outcomes
| Collector | during the session
+--------+---------+
|
v
+------------------+
| Batch Learner | Process experiences in batches (size: 50)
| (GRPO) | Group Relative Policy Optimization
+--------+---------+
|
v
+------------------+
| Knowledge | Promote/demote patterns between
| Transfer | System 1 and System 2 caches
+--------+---------+
|
v
+------------------+
| Persistence | Save updated caches to disk
| Layer | ~/.claude/artibot/
+------------------+
2. Experience Collection
Each routing decision is recorded as an experience entry:
| Field | Type | Description |
|---|---|---|
input |
string | User request (anonymized) |
complexity |
number | Computed complexity score |
routed_to |
string | "system1" or "system2" |
outcome |
string | "success", "escalated", "failed" |
latency_ms |
number | Processing time |
confidence |
number | Router confidence at decision time |
timestamp |
string | ISO timestamp |
3. GRPO (Group Relative Policy Optimization)
Batch learning algorithm that groups similar experiences and optimizes routing thresholds:
1. Group experiences by domain + complexity range (group size: 5)
2. For each group:
a. Calculate success rate per routing decision
b. Compare System 1 vs System 2 outcomes
c. Compute relative advantage: advantage = s2_success - s1_success
3. Update routing threshold:
- If System 2 consistently better -> lower threshold (more System 2)
- If System 1 reliably handles -> raise threshold (more System 1)
4. Adjustment step: adaptRate * advantage (clamped to [-0.1, 0.1])
| Parameter | Default | Description |
|---|---|---|
batchSize |
50 | Experiences per batch |
grpoGroupSize |
5 | Experiences per comparison group |
4. Knowledge Transfer
Validated patterns move between System 1 and System 2 caches:
Promotion (System 2 -> System 1)
- Pattern succeeds
promotionThreshold(3) consecutive times in System 2 - Confidence consistently > 0.8
- Action: Cache pattern in System 1 for fast retrieval
Demotion (System 1 -> System 2)
- Pattern fails
demotionThreshold(2) consecutive times in System 1 - Confidence drops below minConfidence
- Action: Remove from System 1 cache, flag for System 2 analysis
System 2 Cache System 1 Cache
+-------------------+ promote (3x) +-------------------+
| Complex patterns | =================> | Fast patterns |
| Deep analysis | | Cached heuristics |
| New discoveries | <================= | Quick matches |
+-------------------+ demote (2x) +-------------------+
5. Knowledge Transfer Parameters
| Parameter | Default | Description |
|---|---|---|
promotionThreshold |
3 | Consecutive successes to promote |
demotionThreshold |
2 | Consecutive failures to demote |
6. Persistence
Learning state is saved to ~/.claude/artibot/:
| 파일 | 용도 |
|---|---|
~/.claude/artibot/daily-experiences.json |
일일 경험 로그 (JSON array) |
~/.claude/artibot/learning-log.json |
배치 학습 라운드 기록 |
~/.claude/artibot/system1-patterns.json |
승격된 System 1 패턴 |
~/.claude/artibot/transfer-log.json |
승격/강등 이력 |
~/.claude/artibot/evaluations.json |
Self-Rewarding 평가 결과 |
~/.claude/artibot/tool-history.json |
도구 사용 학습 기록 |
~/.claude/artibot/patterns/ |
추출된 패턴 디렉토리 |
~/.claude/artibot/memory/ |
메모리 저장소 (에러, 컨텍스트, 선호) |
7. Integration with Cognitive Routing
The lifelong learning system feeds back into the cognitive router:
- Updated thresholds are loaded at session start
- Promoted patterns are available to System 1 immediately
- Demoted patterns are flagged for System 2 re-evaluation
- Transfer history informs meta-cognitive monitoring
Configuration
Settings in artibot.config.json under learning.lifelong and learning.knowledgeTransfer:
{
"learning": {
"lifelong": { "batchSize": 50, "grpoGroupSize": 5 },
"knowledgeTransfer": { "promotionThreshold": 3, "demotionThreshold": 2 },
"schedule": { "enabled": false, "nightlyLearner": "3 2 * * *", "driftCheck": "7 6 * * 1" }
}
}
Automatic Scheduling (CronCreate)
When learning.schedule.enabled is true, the learning pipeline can be automatically scheduled
within the current Claude Code session via the CronCreate tool. Jobs are session-only (in-memory)
and auto-expire after 7 days. See the scheduled-learning skill for full setup details.
Workflow Checklist
Copy this checklist and track progress:
Progress:
- [ ] Step 1: Collect routing experiences during session
- [ ] Step 2: Batch experiences (size: 50) for GRPO processing
- [ ] Step 3: Group by domain + complexity range (group size: 5)
- [ ] Step 4: Compare System 1 vs System 2 outcomes per group
- [ ] Step 5: Update routing threshold (adaptRate * advantage)
- [ ] Step 6: Transfer knowledge — promote/demote between caches
- [ ] Step 7: Persist updated caches to disk
Human Checkpoints
Checkpoint 1: GRPO 비교 결과 검토 (After Step 4)
Context: System 1과 System 2의 성공률 비교가 완료된 시점. 그룹별 결과가 합리적인지 확인해야 라우팅 임계값 조정의 신뢰성이 보장된다. Ask: "Step 4 GRPO 그룹 비교 결과를 확인했습니다. 각 그룹의 성공률 차이가 합리적으로 보이나요?" Options:
- Accept — 결과가 합리적, Step 5 임계값 조정으로 진행
- Reset group data — 그룹 데이터를 초기화하고 재집계 Default: 1 (데이터가 충분할 경우 GRPO 결과는 신뢰 가능) Skippable: No — 잘못된 비교 결과로 임계값이 왜곡될 수 있음 Freedom: LOW
Checkpoint 2: 임계값 조정 방향 확인 (After Step 5)
Context: adaptRate * advantage 공식으로 라우팅 임계값이 조정된 시점. 조정 방향(올리기/내리기)이 실제 관찰된 패턴과 일치하는지 검증이 필요하다. Ask: "라우팅 임계값이 조정되었습니다. 조정 방향(System 2 비중 증가/감소)이 세션에서 관찰된 패턴과 맞나요?" Options:
- Apply — 조정값 적용, Step 6 지식 이전으로 진행
- Revert adjustment — 이번 조정 취소, 기존 임계값 유지 Default: 1 (공식 범위 [-0.1, 0.1]로 클램핑되어 있어 안전) Skippable: No — 잘못된 방향 조정은 라우팅 품질을 누적 저하시킬 수 있음 Freedom: LOW
Checkpoint 3: 승격/강등 결정 검토 (After Step 6)
Context: 패턴의 System 1 ↔ System 2 이동 결정이 완료된 시점. 자동 기준(3회 연속 성공/2회 연속 실패)이 맥락에 맞는지 사람의 판단이 필요할 수 있다. Ask: "지식 이전 결정이 생성되었습니다. 각 패턴의 승격/강등/보류 결정이 타당해 보이나요?" Options:
- Promote — 해당 패턴을 System 1로 승격
- Demote — 해당 패턴을 System 2로 강등
- Hold — 이번 사이클에서는 현재 위치 유지 Default: 자동 기준 결과 적용 (임계값 기반 결정이 기본값) Skippable: Yes (기본값 사용) — 자동 기준으로 결정하고 Step 7 퍼시스턴스로 진행 Freedom: MEDIUM
Freedom Levels
| Step | Freedom | Guidance |
|---|---|---|
| Collect experiences | LOW | Schema is fixed, record all fields |
| Batch processing | LOW | Batch size (50) and group size (5) are configured |
| Group by domain | MEDIUM | Domain classification may require interpretation |
| Compare outcomes | LOW | Success rate calculation is deterministic |
| Update threshold | LOW | Formula is defined, clamped to [-0.1, 0.1] |
| Knowledge transfer | LOW | Promotion (3x) and demotion (2x) thresholds are fixed |
| Persist to disk | LOW | File paths and formats are defined |
Quick Reference
Learning Cycle: Collect -> Batch (GRPO) -> Transfer -> Persist
Promotion: 3 consecutive System 2 successes -> System 1 cache
Demotion: 2 consecutive System 1 failures -> System 2 re-analysis
Storage: ~/.claude/artibot/
Rationalizations
The following table captures common excuses agents make to skip the discipline of this skill, paired with factual rebuttals.
| Excuse | Rebuttal |
|---|---|
| "learning across sessions breaks reproducibility" | reproducibility comes from versioned knowledge stores, not from amnesia — snapshot and replay |
| "GRPO is overkill for my use case" | GRPO is just group-relative comparison; it's the simplest correct way to extract preference signal from rollouts |
| "validated knowledge is stale by the time it transfers" | staleness is managed via freshness scoring; untransferred knowledge has infinite staleness |
| "System 1 to System 2 transfer introduces bugs" | transfer WITH validation catches bugs; transfer is not the risk — untested promotion is |
| "I'll curate the training set manually" | manual curation is where bias enters; automated capture with review gates is more neutral |