harness-executor - SKILL.md Agent Skill

name: harness-executor description: Execute development tasks autonomously with self-validation. Auto-bootstraps harness via harness-creator if missing. Use when the user asks to implement features, fix bugs, refactor code, execute plans, or make any code change in an existing or new codebase.

Harness Executor

Execute development tasks autonomously: setup → plan → execute → validate → verify → record → present.

Core Philosophy: "The Agent Harness is the Operating System. The LLM is just the CPU." Verify your changes mechanically through automated checks, not hope.

Architecture Principle: Coordinator manages state, Subagent executes code. The coordinator spawns subagents for code changes and verification. The subagent never calls task_state.py.

Script Execution

This skill bundles helper scripts in its scripts/ subdirectory. Before running any script, determine this skill's installation directory from the path of this SKILL.md file, and set:

SKILL_DIR="<directory containing this SKILL.md>"

Then call scripts as: python3 "$SKILL_DIR/scripts/xxx.py". All bash examples below assume SKILL_DIR has been set this way.

Execution Flow

Every task follows the same seven steps. No exceptions, no shortcuts.

COORDINATOR
═══════════════════════════════════════════

 1. SETUP      bootstrap → check interrupted → query memory → load context
 2. PLAN       scope the work → init state → (multi-phase: plan file + user approval)
 3. EXECUTE    spawn executor subagent → make code changes → checkpoint
 4. VALIDATE   static validation (build, lint, test)
 5. VERIFY     spawn verifier subagent → functional verification (MANDATORY)
 6. RECORD     task_state.py complete → episodic memory → AutoHarness
 7. PRESENT    results summary to user

═══════════════════════════════════════════

⚠️ CRITICAL: Steps 4 and 5 are BOTH mandatory for ALL tasks. Static validation proves code compiles. Functional verification proves code works. Never skip Step 5.

Step 1: Setup

1.1 Bootstrap Harness

test -f AGENTS.md && echo "HARNESS_EXISTS=true" || echo "HARNESS_EXISTS=false"

If HARNESS_EXISTS=false: invoke Skill(skill="harness-creator") first.

1.2 Check Interrupted Tasks

python3 "$SKILL_DIR/scripts/task_state.py" list

If an in_progress task matches the current request → Resume Protocol (see below).

1.3 Query Memory

if [ -d "harness/memory" ]; then
  python3 "$SKILL_DIR/scripts/memory_query.py" search "<relevant-keyword>" --json 2>/dev/null || echo '{"results": []}'
else
  echo "No memory store yet — skipping"
fi

1.4 Load Context

Read: AGENTS.md, docs/ARCHITECTURE.md, docs/DEVELOPMENT.md.

Extract: build command, test command, lint command, validation script path.

Step 2: Plan

All tasks: Identify files to modify/create, decide the approach, initialize task state.

Multi-phase tasks (touching 3+ files or requiring sequential changes): Write a plan file, get user approval.

Initialize Task State (all tasks)

TASK_ID=$(python3 "$SKILL_DIR/scripts/task_state.py" init "<task-name>" \
  --phases <N> \
  --description "<description>" \
  --plan-path "docs/exec-plans/active/YYYY-MM-DD-<slug>.md")  # optional for single-phase
echo "Task ID: $TASK_ID"

Multi-Phase Plan File

mkdir -p docs/exec-plans/active

Write to docs/exec-plans/active/YYYY-MM-DD-<task-slug>.md:

# [Task Name]

**Created**: YYYY-MM-DD

## Goal
One sentence describing what success looks like.

## Scope
- **Files to modify**: [list]
- **Files to create**: [list]

## Phases

### Phase 1: [Name]
- [ ] Step 1.1: [action]
- **Validates with**: `[command]`

### Phase 2: [Name]
- [ ] Step 2.1: [action]
- **Validates with**: `[command]`

Multi-Phase User Approval

Use AskUserQuestion with options: Approve / Approve with changes / Reject.

Step 3: Execute

Spawn an executor subagent to make code changes. The coordinator never writes code directly.

Executor Subagent Prompt

Agent(
  description="Execute: [task-name]",
  prompt="""
You are a code executor. Your ONLY job is to make code changes.

## Task
[task description]

## Project Root
[absolute path]

## Files to Modify/Create
[explicit list]

## Validation Command
After making changes, run:

[project-specific command, e.g., go build ./... && make lint-arch]


## Prior Lessons
[paste lessons from memory_query, or "none"]

## Output Format
Return this JSON block at the end of your response:
```json
{
  "status": "success | failed | blocked",
  "summary": "one paragraph describing what you did",
  "files_changed": ["file1.go", "file2.go"],
  "files_created": ["new_file.go"],
  "validation_result": "pass | fail",
  "validation_output": "relevant output if failed",
  "lessons": ["any insights worth remembering"],
  "blockers": ["if blocked, describe what's stopping you"]
}

Rules

Focus ONLY on making code changes
Do NOT manage task state or checkpoints — the coordinator handles that
If validation fails, fix and retry (max 3 attempts)
If blocked, return with status "blocked" """ )


### Checkpoint (after successful executor return)

```bash
python3 "$SKILL_DIR/scripts/task_state.py" checkpoint \
  --task-id "$TASK_ID" \
  --phase <N> \
  --summary "<phase summary from subagent>" \
  --files-changed <file1> <file2> \
  --decisions '["key decisions from subagent lessons"]'

Failure Handling

Subagent Status	Action
`success`	Continue to Step 4
`failed`	Retry with additional context (max 2 retries)
`blocked`	Escalate to user

Step 4: Validate (Static)

Run static validation to ensure code compiles and passes lints/tests.

if [ -f "scripts/validate.py" ]; then
  python3 scripts/validate.py .
else
  # Use commands from docs/DEVELOPMENT.md
  <build-command> && <lint-command> && <test-command>
fi

If static validation fails:

Analyze error output
Return to Step 3 with fix instructions (spawn executor again)
Max 2 retries, then escalate to user

Step 5: Verify (Functional) — MANDATORY

⚠️ This step is MANDATORY for ALL tasks. Do NOT skip to Step 6 without completing verification.

Static checks only prove code compiles. Functional verification proves code works — by starting the actual application, making real HTTP requests, and verifying observable behavior.

5.1 Design Verification Scenarios

Based on what changed, design 1-3 task-specific scenarios (see references/scenario-design-guide.md):

Change Type	Scenarios to Design
New endpoint	Create success, validation error, persistence check
Modified endpoint	New behavior works, old behavior unchanged
New validation	Valid input accepted, invalid input rejected
Permission change	Authorized user succeeds, unauthorized user rejected
Bug fix	The specific bug is fixed

5.2 Spawn Verifier Subagent

Agent(
    description="Functional Verifier: [task-name]",
    prompt="""
You are a Functional Verifier agent. Read the verifier guide at:
$SKILL_DIR/agents/verifier.md

## Task Context
- Project root: [absolute path]
- Task description: [what was implemented]
- Files changed/created: [list]

## Environment Context (from environment.json if exists)
- Startup: [command], Readiness: [check config]
- Services: [databases, caches], Env Vars: [required vars]

## Scenarios to Verify
[your designed scenarios as JSON array]

## Your Responsibilities
1. Start the application server
2. Execute ALL scenarios
3. For each: verify behavior AND side effects with real HTTP requests
4. Stop the server cleanly
5. Save results to: harness/trace/verification-report.json

## Output Requirements
Your verification-report.json MUST include:
- server.started: true (prove you started the app)
- At least one scenario with request/response evidence
"""
)

5.3 Handle Verifier Result

Result	Action
`pass`	Continue to Step 6
`partial`	Fix failing scenarios related to task, log unrelated as warnings
`fail`	Return to Step 3 with fix instructions, max 2 retries, then escalate

5.4 If Verification Cannot Run

If the application cannot be started (no server, library project, missing infrastructure), write a skip report:

mkdir -p harness/trace
cat > harness/trace/verification-report.json << 'EOF'
{
  "overall_status": "skip",
  "skip_reason": "[explain why: e.g., 'Library project with no runnable server', 'Missing required database']",
  "server": {"started": false},
  "task_specific_scenarios": [],
  "summary": {"task_specific_total": 0, "task_specific_passed": 0, "pass_rate": 0}
}
EOF

Step 6: Record & Complete

Complete Task

python3 "$SKILL_DIR/scripts/task_state.py" complete \
  --task-id "$TASK_ID" \
  --summary "Completed: <overall summary>" \
  --files-changed file1 file2 \
  --files-created new_file \
  --validation '{"build": "pass", "lint": "pass", "test": "pass"}' \
  --lessons '["lesson1", "lesson2"]'

⚠ Completion Gate: complete checks for harness/trace/verification-report.json. It rejects if:

File is missing (Step 5 was skipped)

Report lacks server.started or HTTP evidence (unless overall_status: "skip")

Move plan file (if exists):

mkdir -p docs/exec-plans/completed
mv "docs/exec-plans/active/<plan-file>.md" "docs/exec-plans/completed/" 2>/dev/null || true

AutoHarness Check

TASK_COUNT=$(python3 "$SKILL_DIR/scripts/task_state.py" list --json 2>/dev/null | \
  python3 -c "import sys,json; d=json.load(sys.stdin); print(len([t for t in d if t.get('status')=='completed']))" 2>/dev/null || echo 0)

if [ "$TASK_COUNT" -ge 3 ]; then
  python3 "$SKILL_DIR/scripts/harness_critic.py" --since 7d 2>/dev/null || true
fi

Step 7: Present Results

## Task Complete

### Changes Made
- Modified `path/to/file` — [what changed]
- Created `path/to/new-file` — [purpose]

### Validation Results
- Build: PASS | Lint: PASS | Test: PASS

### Verification Results
- Server started: YES
- Scenarios: [N] designed, [N] passed
- Evidence: [summary of what was verified]

### Lessons Recorded
- [aggregated lessons]

### Next Steps
1. Create PR
2. Commit to current branch

Resume Protocol

When Step 1.2 finds an interrupted task:

python3 "$SKILL_DIR/scripts/task_state.py" show --task-id <TASK_ID> --json

Resume from the last successful checkpoint:

Read harness/tasks/<task-id>/state/context.json
Pass context to subagent for the next phase
Continue the execution loop

Reference Files

File	When to Read	Contents
`agents/verifier.md`	Step 5.2: spawn Functional Verifier	Verifier subagent instructions, bootstrap protocol, output format
`references/scenario-design-guide.md`	Step 5.1: designing scenarios	Scenario design patterns and examples
`references/functional-verification-guide.md`	Understanding the verification flow	Static validation → Functional Verifier architecture
`references/environment-schema.md`	Reading environment.json	environment.json contract: startup, services, env_vars
`references/validation-guide.md`	Step 4: static validation	Validation order, error recovery
`references/state-management.md`	Task state operations	task.json/context.json/checkpoint schemas

Guardrails

These are hard constraints. Violating them causes task completion to fail.

Guardrail	Enforced By	Consequence
Must spawn verifier subagent	`complete` command	Rejects without verification-report.json
Must have HTTP evidence	`complete` command	Rejects if report lacks request/response
Must start application	`complete` command	Rejects if server.started=false (unless skip)

If you find yourself wanting to bypass these guardrails, stop and reconsider. The guardrails exist because skipping verification is the #1 cause of "it compiled but doesn't work" bugs.