name: evolve description: "Ring 3 evolution engine. Analyzes session observations, generates and improves evolved skills, shows metrics dashboard. Subcommands: status, history, rollback, reset. Use for post-session review and skill improvement."
/evolve — Manual Evolution Trigger
CRITICAL: Run HARNESS_DIR=$(epic path) first. NEVER use .harness/ in the project directory.
You are the Evolution Engine — analyze past sessions to improve skills.
Sub-commands
/evolve (default) — Run evolution now
- Read observation logs from
$HARNESS_DIR/obs/ - Analyze failure patterns across all sessions
- Identify weak areas (error types, recurring failures)
- Generate or improve evolved skills in
$HARNESS_DIR/evolved/ - Gate: validate new skills (format, dedup, cap of 10)
- Report what changed
/evolve status — Show evolution dashboard
Read $HARNESS_DIR/metrics.json and $HARNESS_DIR/evolution.jsonl, then display:
## Evolution Dashboard
### Overview
- Sessions analyzed: {total_sessions}
- Average success rate: {avg_success_rate}%
- Best score: {best_score} (session: {best_session})
- Trend: {trend} ({score_history.length} data points)
- Stagnation count: {stagnation_count} / 3 (rollback at 3)
### Score History (last 5 sessions)
| Session | Success Rate | Avg Score | Observations | Tool Success | Output Quality |
|---------|-------------|-----------|--------------|-------------|---------------|
### Evolved Skills
(list $HARNESS_DIR/evolved/*/SKILL.md with name and description from frontmatter)
### Last Session Analysis
(read last entry from evolution.jsonl)
- Error patterns: {error_patterns}
- Failure patterns: {failure_patterns[].pattern_type}
- Skills seeded: {skills_seeded}
- Skills rolled back: {skills_rolled_back}
- Analysis: {analysis_summary}
/evolve history — Long-term analysis
Read $HARNESS_DIR/evolution.jsonl (full history), then display:
## Evolution History
### Trend Over Time
| Session # | Date | Success Rate | Avg Score | Skills | Patterns |
|-----------|------|-------------|-----------|--------|----------|
### Cumulative Pattern Frequency
| Pattern | Total Count | First Seen | Last Seen |
|---------|-------------|------------|-----------|
### Skill Effectiveness
| Skill | Sessions Active | Avg Score With | Avg Score Without | Delta |
|-------|----------------|----------------|-------------------|-------|
### Dispatch Analysis
| Skill | Times Invoked | Top Trigger Signals |
|-------|--------------|---------------------|
/evolve rollback — Undo last evolution
- If
$HARNESS_DIR/evolved_backup/exists, restore it to$HARNESS_DIR/evolved/ - Otherwise, read
$HARNESS_DIR/evolution.jsonlfor last entry, remove skills seeded in that entry - Append a rollback record to evolution.jsonl
- Report what was rolled back
/evolve reset — Clear all evolution data
- Remove
$HARNESS_DIR/evolved/,$HARNESS_DIR/evolved_backup/ - Clear
metrics.jsonandevolution.jsonl - Confirm with user first
How Evolution Works
Observe (PostToolUse — multi-dimensional scoring)
↓ $HARNESS_DIR/obs/session_YYYYMMDD.jsonl
Analyze (Stop or /evolve)
↓ SessionAnalysis: per-tool, per-ext, score distribution
↓ Pattern detection: repeated_same_error, fix_then_break, long_debug_loop, thrashing
Seed (auto-generate targeted skills)
↓ 4 seeding paths: pattern / weak tool / weak file type / high-freq error
Gate (validate: format, dedup, cap of 10)
↓ Stagnation check: 3 sessions no improvement → rollback to best checkpoint
Reload (next session resume reports metrics + loads evolved skills)
Scoring System
- Composite:
0.5 × tool_success + 0.3 × output_quality + 0.2 × execution_cost - Failure classification: 9 categories (type_error, syntax_error, test_fail, lint_fail, build_fail, permission_denied, timeout, not_found, runtime_error)
- Pattern detection: 4 types (repeated_same_error, fix_then_break, long_debug_loop, thrashing)
Red Flags
- Evolving after only 1-2 observations (not enough data)
- Keeping evolved skills that never trigger
- Not reviewing evolved skills periodically
- Ignoring stagnation warnings