orchestrator-test-monitor - SKILL.md Agent Skill

name: orchestrator-test-monitor description: > Monitor and evaluate orchestrator test execution plans end-to-end. Use when the user wants to run a test execution plan from docs/showcases/, observe the orchestrator's full-pipeline processing, and get a final assessment. Triggers on: "run test plan", "execute plan", "monitor orchestrator", "test the orchestrator", "run execution plan", "run showcase", or any request to observe/evaluate orchestrator behavior on a showcase plan. This skill is OBSERVE-ONLY — never intervene in the orchestrator's execution.

Orchestrator Test Monitor

Observe and evaluate the orchestrator's execution of a test plan. You are a monitor, not a participant. Never modify code, fix bugs, or intervene in execution. Only observe, report, and assess.

Workflow

Phase 1: Plan Selection

List all files under docs/showcases/ in the project working directory
Present each plan with a one-line summary (read the "Task Goal" or opening section)
Ask the user which plan to execute
Read the selected plan thoroughly — extract:
- Expected workflow steps and their order
- Success criteria
- Key checkpoints
- Known anomaly patterns

Phase 2: Pre-Execution Setup — Build, Restart & Baseline

Ensure the daemon and CLI are built from the latest code and the daemon is running before launching any task.

2.1 Rebuild daemon & CLI

cargo build --release -p orchestratord -p orchestrator-cli

If the build fails, report the error to the user and stop — do NOT proceed with a stale binary.

2.2 Restart the daemon

IMPORTANT: Check .claude/CLAUDE.md for a daemon-pid-guard section first. If orchestratord is managing this session (i.e., this skill was triggered BY the orchestrator), you MUST NOT kill the daemon. In that case, skip the restart and reuse the running daemon.

If it is safe to restart:

# Stop old daemon if running
kill $(cat data/daemon.pid 2>/dev/null) 2>/dev/null; sleep 2

# Start fresh daemon
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &

# Verify it's running (wait a moment for startup)
sleep 3
pgrep -f orchestratord

If the daemon fails to start, check /tmp/orchestratord.log and report the error.

2.3 Set up CLI environment

The CLI uses Unix Domain Socket (UDS) to connect to the daemon. If the CLI fails to connect (e.g., "failed to connect to https://..."), set the ORCHESTRATOR_SOCKET env var to force UDS transport:

export ORCHESTRATOR_SOCKET=data/orchestrator.sock

Prefix all subsequent orchestrator commands with this env var, or export it once. Verify connectivity: orchestrator task list.

2.4 Collect baseline state

Note the current git state: git status, git log --oneline -3
Note existing tasks: orchestrator task list
Run any plan-specific baseline checks (e.g., grep for specific code patterns)
Record the baseline state for later comparison

Phase 3: Task Launch

Directly execute the startup steps described in the selected plan — do NOT ask the user to run them.

Run any plan-specific resource initialization commands (e.g., orchestrator delete, orchestrator init, orchestrator apply)
Create the task using the exact command from the plan
Record the returned task ID for monitoring
Immediately proceed to Phase 4 (Live Monitoring)

Phase 4: Live Monitoring (OBSERVE ONLY)

Monitor the task using these commands in a loop:

orchestrator task watch <task_id>       # Real-time status panel
orchestrator task trace <task_id>       # Event timeline with anomaly detection
orchestrator task logs --follow <task_id>  # Stream logs
orchestrator task info <task_id>        # Detailed status snapshot

Monitoring Checklist

For each workflow step, verify:

Step started (event emitted)
Correct agent was selected
Step completed with expected exit code
Output matches expected structure (JSON for structured steps)
No timeout or stall detected
Cycle transitions are correct

Suspicious Indicators — Report Immediately

Step stalled (no progress for >60s without explanation)
Unexpected step skip or branch
Agent mismatch (wrong agent for capability)
Non-zero exit code on critical steps
Missing expected events in timeline
Database state inconsistency
Cycle number not advancing as expected
Items not generated when expected (evolution workflow)

On suspicion: Immediately tell the user what you observed, which step/event triggered concern, and the raw evidence (log line, event, status).

Confirmed Anomaly — Record Ticket

If an anomaly is clearly a bug (not just suspicious), create a ticket file:

docs/ticket/YYYYMMDD-<short-slug>.md

Format:

# <Title>

- **Observed during**: <plan name>, step <step>, cycle <N>
- **Severity**: critical | major | minor
- **Symptom**: <what happened>
- **Expected**: <what should have happened>
- **Evidence**: <relevant log lines, events, or DB state>
- **Status**: open

Do NOT attempt to fix the anomaly. Only record it.

Phase 5: Post-Execution Verification

After the task reaches completed or failed:

Check final task status:

orchestrator task info <task_id>
orchestrator task trace <task_id>

Verify success criteria from the plan:
- Did all expected steps execute?
- Did self_test pass (if applicable)?
- Are cycle counts correct?
- Did loop_guard terminate properly?
Check artifacts:
- git diff --stat — were code changes made as expected?
- Check docs/qa/ for generated QA documents (if applicable)
- Check docs/ticket/ for any auto-generated tickets
- Verify DB state via sqlite3 if the plan specifies DB checks
Run plan-specific validations:
- For self-bootstrap: verify compilation gate passed, QA docs generated
- For self-evolution: verify candidates generated, benchmarks ran, winner selected

Phase 6: Assessment Report

Produce a structured assessment:

## Test Execution Report

### Plan: <plan name>
### Task ID: <id>
### Duration: <start to end>
### Final Status: <completed/failed>

### Step-by-Step Results
| Step | Status | Duration | Notes |
|------|--------|----------|-------|
| ...  | ...    | ...      | ...   |

### Anomalies Detected
- <list anomalies with severity, or "None">

### Tickets Created
- <list ticket files, or "None">

### Success Criteria Evaluation
| Criterion | Met? | Evidence |
|-----------|------|----------|
| ...       | ...  | ...      |

### Overall Assessment
<Pass/Partial/Fail with explanation>

### Recommendations
<Actionable next steps, if any>

Rules

NEVER modify source code, config files, or workflow definitions
NEVER fix bugs — only record them as tickets
ALWAYS report suspicious observations immediately, don't wait
ALWAYS show raw evidence (log lines, events) when reporting issues
If the orchestrator is stuck and you suspect it will not recover, inform the user with evidence and let THEM decide whether to intervene
ALLOWED in Phase 2 only: cargo build (daemon/CLI rebuild), killing and restarting the daemon process, and orchestrator CLI commands for resource setup
ALLOWED in Phase 3: running orchestrator CLI commands to initialize resources and create tasks
NEVER run git commit, git push, or any git write operations