name: harness-executor description: Execute development tasks autonomously with self-validation. Auto-bootstraps harness via harness-creator if missing. Use when the user asks to implement features, fix bugs, refactor code, execute plans, or make any code change in an existing or new codebase.
Harness Executor
Execute development tasks autonomously: setup → plan → execute → validate → verify → record → present.
Core Philosophy: "The Agent Harness is the Operating System. The LLM is just the CPU." Verify your changes mechanically through automated checks, not hope.
Architecture Principle: Coordinator manages state, Subagent executes code. The coordinator spawns subagents for code changes and verification. The subagent never calls task_state.py.
Script Execution
This skill bundles helper scripts in its scripts/ subdirectory. Before running any script, determine this skill's installation directory from the path of this SKILL.md file, and set:
SKILL_DIR="<directory containing this SKILL.md>"
Then call scripts as: python3 "$SKILL_DIR/scripts/xxx.py". All bash examples below assume SKILL_DIR has been set this way.
Execution Flow
Every task follows the same seven steps. No exceptions, no shortcuts.
COORDINATOR
═══════════════════════════════════════════
1. SETUP bootstrap → check interrupted → query memory → load context
2. PLAN scope the work → init state → (multi-phase: plan file + user approval)
3. EXECUTE spawn executor subagent → make code changes → checkpoint
4. VALIDATE static validation (build, lint, test)
5. VERIFY spawn verifier subagent → functional verification (MANDATORY)
6. RECORD task_state.py complete → episodic memory → AutoHarness
7. PRESENT results summary to user
═══════════════════════════════════════════
⚠️ CRITICAL: Steps 4 and 5 are BOTH mandatory for ALL tasks. Static validation proves code compiles. Functional verification proves code works. Never skip Step 5.
Step 1: Setup
1.1 Bootstrap Harness
test -f AGENTS.md && echo "HARNESS_EXISTS=true" || echo "HARNESS_EXISTS=false"
If HARNESS_EXISTS=false: invoke Skill(skill="harness-creator") first.
1.2 Check Interrupted Tasks
python3 "$SKILL_DIR/scripts/task_state.py" list
If an in_progress task matches the current request → Resume Protocol (see below).
1.3 Query Memory
if [ -d "harness/memory" ]; then
python3 "$SKILL_DIR/scripts/memory_query.py" search "<relevant-keyword>" --json 2>/dev/null || echo '{"results": []}'
else
echo "No memory store yet — skipping"
fi
1.4 Load Context
Read: AGENTS.md, docs/ARCHITECTURE.md, docs/DEVELOPMENT.md.
Extract: build command, test command, lint command, validation script path.
Step 2: Plan
All tasks: Identify files to modify/create, decide the approach, initialize task state.
Multi-phase tasks (touching 3+ files or requiring sequential changes): Write a plan file, get user approval.
Initialize Task State (all tasks)
TASK_ID=$(python3 "$SKILL_DIR/scripts/task_state.py" init "<task-name>" \
--phases <N> \
--description "<description>" \
--plan-path "docs/exec-plans/active/YYYY-MM-DD-<slug>.md") # optional for single-phase
echo "Task ID: $TASK_ID"
Multi-Phase Plan File
mkdir -p docs/exec-plans/active
Write to docs/exec-plans/active/YYYY-MM-DD-<task-slug>.md:
# [Task Name]
**Created**: YYYY-MM-DD
## Goal
One sentence describing what success looks like.
## Scope
- **Files to modify**: [list]
- **Files to create**: [list]
## Phases
### Phase 1: [Name]
- [ ] Step 1.1: [action]
- **Validates with**: `[command]`
### Phase 2: [Name]
- [ ] Step 2.1: [action]
- **Validates with**: `[command]`
Multi-Phase User Approval
Use AskUserQuestion with options: Approve / Approve with changes / Reject.
Step 3: Execute
Spawn an executor subagent to make code changes. The coordinator never writes code directly.
Executor Subagent Prompt
Agent(
description="Execute: [task-name]",
prompt="""
You are a code executor. Your ONLY job is to make code changes.
## Task
[task description]
## Project Root
[absolute path]
## Files to Modify/Create
[explicit list]
## Validation Command
After making changes, run:
[project-specific command, e.g., go build ./... && make lint-arch]
## Prior Lessons
[paste lessons from memory_query, or "none"]
## Output Format
Return this JSON block at the end of your response:
```json
{
"status": "success | failed | blocked",
"summary": "one paragraph describing what you did",
"files_changed": ["file1.go", "file2.go"],
"files_created": ["new_file.go"],
"validation_result": "pass | fail",
"validation_output": "relevant output if failed",
"lessons": ["any insights worth remembering"],
"blockers": ["if blocked, describe what's stopping you"]
}
Rules
- Focus ONLY on making code changes
- Do NOT manage task state or checkpoints — the coordinator handles that
- If validation fails, fix and retry (max 3 attempts)
- If blocked, return with status "blocked" """ )
### Checkpoint (after successful executor return)
```bash
python3 "$SKILL_DIR/scripts/task_state.py" checkpoint \
--task-id "$TASK_ID" \
--phase <N> \
--summary "<phase summary from subagent>" \
--files-changed <file1> <file2> \
--decisions '["key decisions from subagent lessons"]'
Failure Handling
| Subagent Status | Action |
|---|---|
success |
Continue to Step 4 |
failed |
Retry with additional context (max 2 retries) |
blocked |
Escalate to user |
Step 4: Validate (Static)
Run static validation to ensure code compiles and passes lints/tests.
if [ -f "scripts/validate.py" ]; then
python3 scripts/validate.py .
else
# Use commands from docs/DEVELOPMENT.md
<build-command> && <lint-command> && <test-command>
fi
If static validation fails:
- Analyze error output
- Return to Step 3 with fix instructions (spawn executor again)
- Max 2 retries, then escalate to user
Step 5: Verify (Functional) — MANDATORY
⚠️ This step is MANDATORY for ALL tasks. Do NOT skip to Step 6 without completing verification.
Static checks only prove code compiles. Functional verification proves code works — by starting the actual application, making real HTTP requests, and verifying observable behavior.
5.1 Design Verification Scenarios
Based on what changed, design 1-3 task-specific scenarios (see references/scenario-design-guide.md):
| Change Type | Scenarios to Design |
|---|---|
| New endpoint | Create success, validation error, persistence check |
| Modified endpoint | New behavior works, old behavior unchanged |
| New validation | Valid input accepted, invalid input rejected |
| Permission change | Authorized user succeeds, unauthorized user rejected |
| Bug fix | The specific bug is fixed |
5.2 Spawn Verifier Subagent
Agent(
description="Functional Verifier: [task-name]",
prompt="""
You are a Functional Verifier agent. Read the verifier guide at:
$SKILL_DIR/agents/verifier.md
## Task Context
- Project root: [absolute path]
- Task description: [what was implemented]
- Files changed/created: [list]
## Environment Context (from environment.json if exists)
- Startup: [command], Readiness: [check config]
- Services: [databases, caches], Env Vars: [required vars]
## Scenarios to Verify
[your designed scenarios as JSON array]
## Your Responsibilities
1. Start the application server
2. Execute ALL scenarios
3. For each: verify behavior AND side effects with real HTTP requests
4. Stop the server cleanly
5. Save results to: harness/trace/verification-report.json
## Output Requirements
Your verification-report.json MUST include:
- server.started: true (prove you started the app)
- At least one scenario with request/response evidence
"""
)
5.3 Handle Verifier Result
| Result | Action |
|---|---|
pass |
Continue to Step 6 |
partial |
Fix failing scenarios related to task, log unrelated as warnings |
fail |
Return to Step 3 with fix instructions, max 2 retries, then escalate |
5.4 If Verification Cannot Run
If the application cannot be started (no server, library project, missing infrastructure), write a skip report:
mkdir -p harness/trace
cat > harness/trace/verification-report.json << 'EOF'
{
"overall_status": "skip",
"skip_reason": "[explain why: e.g., 'Library project with no runnable server', 'Missing required database']",
"server": {"started": false},
"task_specific_scenarios": [],
"summary": {"task_specific_total": 0, "task_specific_passed": 0, "pass_rate": 0}
}
EOF
Step 6: Record & Complete
Complete Task
python3 "$SKILL_DIR/scripts/task_state.py" complete \
--task-id "$TASK_ID" \
--summary "Completed: <overall summary>" \
--files-changed file1 file2 \
--files-created new_file \
--validation '{"build": "pass", "lint": "pass", "test": "pass"}' \
--lessons '["lesson1", "lesson2"]'
⚠ Completion Gate:
completechecks forharness/trace/verification-report.json. It rejects if:
- File is missing (Step 5 was skipped)
- Report lacks
server.startedor HTTP evidence (unlessoverall_status: "skip")
Move plan file (if exists):
mkdir -p docs/exec-plans/completed
mv "docs/exec-plans/active/<plan-file>.md" "docs/exec-plans/completed/" 2>/dev/null || true
AutoHarness Check
TASK_COUNT=$(python3 "$SKILL_DIR/scripts/task_state.py" list --json 2>/dev/null | \
python3 -c "import sys,json; d=json.load(sys.stdin); print(len([t for t in d if t.get('status')=='completed']))" 2>/dev/null || echo 0)
if [ "$TASK_COUNT" -ge 3 ]; then
python3 "$SKILL_DIR/scripts/harness_critic.py" --since 7d 2>/dev/null || true
fi
Step 7: Present Results
## Task Complete
### Changes Made
- Modified `path/to/file` — [what changed]
- Created `path/to/new-file` — [purpose]
### Validation Results
- Build: PASS | Lint: PASS | Test: PASS
### Verification Results
- Server started: YES
- Scenarios: [N] designed, [N] passed
- Evidence: [summary of what was verified]
### Lessons Recorded
- [aggregated lessons]
### Next Steps
1. Create PR
2. Commit to current branch
Resume Protocol
When Step 1.2 finds an interrupted task:
python3 "$SKILL_DIR/scripts/task_state.py" show --task-id <TASK_ID> --json
Resume from the last successful checkpoint:
- Read
harness/tasks/<task-id>/state/context.json - Pass context to subagent for the next phase
- Continue the execution loop
Reference Files
| File | When to Read | Contents |
|---|---|---|
agents/verifier.md |
Step 5.2: spawn Functional Verifier | Verifier subagent instructions, bootstrap protocol, output format |
references/scenario-design-guide.md |
Step 5.1: designing scenarios | Scenario design patterns and examples |
references/functional-verification-guide.md |
Understanding the verification flow | Static validation → Functional Verifier architecture |
references/environment-schema.md |
Reading environment.json | environment.json contract: startup, services, env_vars |
references/validation-guide.md |
Step 4: static validation | Validation order, error recovery |
references/state-management.md |
Task state operations | task.json/context.json/checkpoint schemas |
Guardrails
These are hard constraints. Violating them causes task completion to fail.
| Guardrail | Enforced By | Consequence |
|---|---|---|
| Must spawn verifier subagent | complete command |
Rejects without verification-report.json |
| Must have HTTP evidence | complete command |
Rejects if report lacks request/response |
| Must start application | complete command |
Rejects if server.started=false (unless skip) |
If you find yourself wanting to bypass these guardrails, stop and reconsider. The guardrails exist because skipping verification is the #1 cause of "it compiled but doesn't work" bugs.