name: learning-engine description: "Self-learning system based on SICA, VIGIL, and Mem0 patterns. Auto-triggers after task completion. Captures patterns, consolidates memory, generates skills, recovers from errors." triggers: ["*"] priority: 1 metadata: version: "4.0" research_basis: - "SICA: Self-Improving Coding Agent (arXiv:2504.15228)" - "VIGIL: Reflective Runtime for Self-Healing (arXiv:2512.07094)" - "Mem0: Memory Consolidation Architecture" - "Synapse: Spreading Activation Memory (arXiv:2601.02744)" - "ReCreate: Experience-Driven Agent Creation (arXiv:2601.11100)"
Learning Engine v4.0 (Self-Learning Core)
Overview
The learning engine implements a continuous self-improvement loop:
OBSERVE → REFLECT → PROPOSE → TEST → INTEGRATE
↓ ↓ ↓ ↓ ↓
Capture Analyze Generate Rank Accept if
metrics failures options by beneficial
& traces & gaps merit
Self-Learning Loop (SICA Pattern)
1. Observe
- Capture all tool executions via
learning-capture.pyhook - Record decisions, outcomes, durations
- Store episodic traces in
~/.claude/learning/episodes/
2. Reflect
- Analyze failures with VIGIL pattern (
error-recovery.pyhook) - Classify errors and suggest recovery strategies
- Generate reflection on what went wrong
3. Propose
- When patterns repeat 5+ times, propose skill generation
- Run
scripts/generate-skill.pyto create new skills - Store in
~/.claude/skills/generated/
4. Test
- Validate new patterns against success criteria
- Compare against baseline performance
- Check for regressions
5. Integrate
- Accept improvements if performance > baseline
- Consolidate memory with
scripts/consolidate-memory.py - Update agent capabilities and routing
Quick Classification
After task completion, classify:
- Task Type: feature | bug | refactor | test | docs | research | security
- Domain: frontend | backend | database | devops | security | ai-ml | orchestration
- Complexity: simple | moderate | complex | critical
- Agent Used: Record which agent handled the task
- Outcome: success | partial | failed | escalated
- Learning Value: low | medium | high (errors are high value)
Memory Architecture (Three-Tier)
Tier 1: Working Memory
- Current conversation context
- TTL: 15 minutes
- Storage: In-context
- Purpose: Immediate recall
Tier 2: Episodic Memory
- Full interaction traces with timestamps
- TTL: 30 days
- Storage:
~/.claude/learning/episodes/ - Purpose: Experience replay, pattern extraction
Tier 3: Semantic Memory
- Consolidated facts and patterns
- TTL: Permanent
- Storage:
~/.claude/learning/semantic/ - Purpose: Long-term knowledge
Consolidation Process (Mem0 Pattern)
Run periodically (after 5 interactions or end of session):
python3 ~/.claude/skills/learning-engine/scripts/consolidate-memory.py
Operations:
- ADD: Genuinely new information
- UPDATE: Augment existing with recent details
- DELETE: Remove contradicted facts
- NOOP: Already exists or irrelevant
Dynamic Skill Generation (ReCreate Pattern)
When patterns repeat 5+ times:
python3 ~/.claude/skills/learning-engine/scripts/generate-skill.py
Generated skills stored in ~/.claude/skills/generated/ with:
- SKILL.md definition
- Metadata (confidence, domain, pattern type)
- Auto-incremented instance count
Error Recovery (VIGIL Pattern)
Error taxonomy with recovery strategies:
| Error Type | Severity | Recovery |
|---|---|---|
| file_not_found | recoverable | Check path, use Glob |
| permission_denied | recoverable | Check permissions |
| syntax_error | recoverable | Review code, validate syntax |
| import_error | recoverable | Install dependency |
| type_error | recoverable | Check signatures |
| network_error | transient | Retry, check connectivity |
| timeout_error | transient | Increase timeout, chunk work |
Gap Detection Signals
- No good agent match found for task
- User corrections needed after completion
- Repeated explanations required
- Task outside known domains
- Low success rate (<70%) on task type
- Multiple escalations needed (3+)
- Confidence below threshold (0.7)
When a gap is detected:
- Log to
~/.claude/learning/gaps/ - Suggest agent creation
- Track for pattern emergence
Storage Structure
~/.claude/learning/
├── episodes/ # Episodic memory (by date)
│ └── YYYY-MM-DD-episodes.jsonl
├── semantic/ # Semantic memory (consolidated)
│ ├── facts.jsonl
│ ├── patterns.jsonl
│ └── decisions.jsonl
├── patterns/ # Extracted patterns (by domain)
│ ├── go/
│ ├── typescript/
│ └── general/
├── solutions/ # Problem-solution pairs
│ ├── error_resolution/
│ └── feature_implementation/
├── gaps/ # Detected capability gaps
│ └── index.json
├── metrics/ # Usage statistics
│ └── index.json
├── errors/ # Error log for VIGIL
│ └── error-log.jsonl
└── consolidated/ # Consolidation history
└── consolidation-log.jsonl
Pattern Storage Format
{
"id": "auth-jwt-refresh",
"domain": "backend",
"type": "code_pattern",
"title": "JWT Token Refresh",
"description": "Pattern for refreshing expired JWT tokens",
"code_snippet": "...",
"tags": ["auth", "jwt", "security"],
"usage_count": 5,
"success_rate": 0.9,
"confidence": 0.85,
"created": "2026-01-15",
"last_used": "2026-01-26",
"hash": "abc123def456"
}
Hooks Integration
learning-capture.py (PostToolUse)
- Captures all tool executions
- Extracts patterns from successful interactions
- Updates metrics
error-recovery.py (PostToolUse)
- Analyzes errors with VIGIL pattern
- Suggests recovery strategies
- Logs for future learning
telemetry-collector.py (PostToolUse)
- Tracks routing decisions
- Measures latency and cost
- Feeds into optimization
context-optimizer.py (PreToolUse)
- Applies progressive disclosure
- Manages token budget
- Tracks context usage
Commands
# Run memory consolidation
python3 ~/.claude/skills/learning-engine/scripts/consolidate-memory.py
# Generate skills from patterns
python3 ~/.claude/skills/learning-engine/scripts/generate-skill.py
# View learning metrics
cat ~/.claude/learning/metrics/index.json | jq .
# View recent episodes
tail -20 ~/.claude/learning/episodes/$(date +%Y-%m-%d)-episodes.jsonl | jq .
# View error log
tail -20 ~/.claude/learning/errors/error-log.jsonl | jq .
Performance Targets
| Metric | Target | Measurement |
|---|---|---|
| Pattern Extraction | 80%+ success rate | patterns with usage_count > 3 |
| Memory Consolidation | <500ms latency | consolidation run time |
| Skill Generation | 70%+ confidence | generated skill confidence |
| Error Recovery | 60%+ auto-resolution | errors with recovery applied |
| Gap Detection | 90%+ coverage | detected vs undetected gaps |
Advanced Resources
- @scripts/generate-skill.py - Dynamic skill generation
- @scripts/consolidate-memory.py - Memory consolidation
- @resources/advanced-metrics.md - Detailed metrics guide
- @resources/error-taxonomy.md - Full error classification