name: phase-4-heal description: Systematic error recovery using root cause investigation before fixes allowed-tools: [Read, Write, Bash, WebSearch, Grep, Glob] user-invocable: false
Phase 4: Systematic Healing
Goal
Systematically investigate and fix implementation errors using root cause analysis. NO FIXES WITHOUT INVESTIGATION FIRST.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1 (Root Cause Investigation), you CANNOT propose fixes.
Input
- Task ID:
{task_id} - Error message:
{error_msg} - Error context: Stack trace, test output, build logs
Workflow
Step 0: Initialize CLI (Automatic)
IMPORTANT: This skill requires the Ralph-dev CLI. It will build automatically on first use.
# Bootstrap CLI - runs automatically, builds if needed
source ${CLAUDE_PLUGIN_ROOT}/shared/bootstrap-cli.sh
# Verify CLI is ready
ralph-dev --version
# Context-compression resilience: Verify current phase
CURRENT_PHASE=$(ralph-dev state get --json 2>/dev/null | jq -r '.phase // "none"')
echo "Current phase: $CURRENT_PHASE"
# Expected: implement or heal (invoked by phase-3)
Step 1: Load Context
# Get task details (context-compression safe)
TASK_JSON=$(ralph-dev tasks get "$TASK_ID" --json)
TASK_DESC=$(echo "$TASK_JSON" | jq -r '.data.description // "No description"')
# Get test command from language config
TEST_CMD=$(ralph-dev detect --json | jq -r '.verifyCommands[] | select(contains("test"))' | head -1)
echo "Task: $TASK_ID"
echo "Description: $TASK_DESC"
echo "Test command: $TEST_CMD"
The Four Phases
MANDATORY: Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
- Read error completely - Extract line number, file path, error code
- Reproduce consistently - Run
CI=true {test_command}to confirm - Check recent changes -
git log -5,git diff HEAD~1 - Trace data flow - Find where the bad value originates
Phase 2: Pattern Analysis
- Find working examples - Search codebase for similar working patterns
- Compare working vs broken - Identify key differences
- Check dependencies - Verify modules are installed and versions match
Phase 3: Hypothesis and Testing
- Form hypothesis - Based on investigation, classify error type
- Use WebSearch - Confirm hypothesis with proven solutions
- Validate approach - Ensure fix addresses root cause, not symptoms
Error Classification:
| Error Type | Indicators | Fix Type |
|---|---|---|
| Missing dependency | "Module not found", "Cannot find module" | npm install |
| Type error | "TypeError", type mismatch | Code fix |
| Undefined reference | "is not defined", "ReferenceError" | Import/declaration |
| Test failure | "Expected X but got Y" | Implementation logic |
| Build error | "compilation error", "build failed" | Config fix |
Phase 4: Implementation with Verification
Max 3 attempts. After 3 failures → mark task as failed.
MAX_ATTEMPTS=3
ATTEMPT=1
while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do
# Apply single fix based on hypothesis
# Run tests: CI=true {test_command}
if [ $? -eq 0 ]; then
# SUCCESS - return result
break
fi
ATTEMPT=$((ATTEMPT + 1))
# If still failing, refine hypothesis and retry
done
CRITICAL: Always show full test output as verification evidence.
Return Result
REQUIRED Output Format:
---HEALING RESULT---
task_id: {task_id}
status: success | failed
verification_passed: true | false
attempts: {N}
fix_type: dependency | code | implementation | config
hypothesis: {what was wrong}
notes: {brief summary}
---END HEALING RESULT---
Safety Limits
| Limit | Value | Purpose |
|---|---|---|
| Max attempts | 3 | Prevent infinite retry loops |
| Timeout | 1 hour | Prevent stuck healing |
| Single fix per attempt | 1 | Isolate variables |
Constraints
- NEVER apply fixes without completing Phase 1 investigation
- NEVER apply multiple fixes at once (isolate variables)
- NEVER retry more than 3 times
- ALWAYS use
CI=truewhen running tests - ALWAYS show full test output as verification evidence
- ALWAYS return structured HEALING RESULT block
Error Handling
| Error | Action |
|---|---|
| Cannot reproduce | Gather more data, check environment |
| No working examples | Use WebSearch for reference implementations |
| All 3 attempts fail | Mark task failed, require manual intervention |
| Hypothesis unclear | Return to Phase 1 with more investigation |
Key Principles
- Investigation before action - Understand the problem first
- One fix at a time - Isolate variables for debugging
- Evidence-based - Show test output, don't just assert success
- WebSearch to confirm - Don't guess, find proven solutions
- Circuit breaker - 3 attempts max, then escalate