name: orchestrator-test-monitor description: > Monitor and evaluate orchestrator test execution plans end-to-end. Use when the user wants to run a test execution plan from docs/showcases/, observe the orchestrator's full-pipeline processing, and get a final assessment. Triggers on: "run test plan", "execute plan", "monitor orchestrator", "test the orchestrator", "run execution plan", "run showcase", or any request to observe/evaluate orchestrator behavior on a showcase plan. This skill is OBSERVE-ONLY — never intervene in the orchestrator's execution.
Orchestrator Test Monitor
Observe and evaluate the orchestrator's execution of a test plan. You are a monitor, not a participant. Never modify code, fix bugs, or intervene in execution. Only observe, report, and assess.
Workflow
Phase 1: Plan Selection
- List all files under
docs/showcases/in the project working directory - Present each plan with a one-line summary (read the "Task Goal" or opening section)
- Ask the user which plan to execute
- Read the selected plan thoroughly — extract:
- Expected workflow steps and their order
- Success criteria
- Key checkpoints
- Known anomaly patterns
Phase 2: Pre-Execution Setup — Build, Restart & Baseline
Ensure the daemon and CLI are built from the latest code and the daemon is running before launching any task.
2.1 Rebuild daemon & CLI
cargo build --release -p orchestratord -p orchestrator-cli
If the build fails, report the error to the user and stop — do NOT proceed with a stale binary.
2.2 Restart the daemon
IMPORTANT: Check .claude/CLAUDE.md for a daemon-pid-guard section first. If
orchestratord is managing this session (i.e., this skill was triggered BY the
orchestrator), you MUST NOT kill the daemon. In that case, skip the restart and reuse the
running daemon.
If it is safe to restart:
# Stop old daemon if running
kill $(cat data/daemon.pid 2>/dev/null) 2>/dev/null; sleep 2
# Start fresh daemon
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &
# Verify it's running (wait a moment for startup)
sleep 3
pgrep -f orchestratord
If the daemon fails to start, check /tmp/orchestratord.log and report the error.
2.3 Set up CLI environment
The CLI uses Unix Domain Socket (UDS) to connect to the daemon. If the CLI fails to
connect (e.g., "failed to connect to https://..."), set the ORCHESTRATOR_SOCKET env var
to force UDS transport:
export ORCHESTRATOR_SOCKET=data/orchestrator.sock
Prefix all subsequent orchestrator commands with this env var, or export it once.
Verify connectivity: orchestrator task list.
2.4 Collect baseline state
- Note the current git state:
git status,git log --oneline -3 - Note existing tasks:
orchestrator task list - Run any plan-specific baseline checks (e.g., grep for specific code patterns)
- Record the baseline state for later comparison
Phase 3: Task Launch
Directly execute the startup steps described in the selected plan — do NOT ask the user to run them.
- Run any plan-specific resource initialization commands (e.g.,
orchestrator delete,orchestrator init,orchestrator apply) - Create the task using the exact command from the plan
- Record the returned task ID for monitoring
- Immediately proceed to Phase 4 (Live Monitoring)
Phase 4: Live Monitoring (OBSERVE ONLY)
Monitor the task using these commands in a loop:
orchestrator task watch <task_id> # Real-time status panel
orchestrator task trace <task_id> # Event timeline with anomaly detection
orchestrator task logs --follow <task_id> # Stream logs
orchestrator task info <task_id> # Detailed status snapshot
Monitoring Checklist
For each workflow step, verify:
- Step started (event emitted)
- Correct agent was selected
- Step completed with expected exit code
- Output matches expected structure (JSON for structured steps)
- No timeout or stall detected
- Cycle transitions are correct
Suspicious Indicators — Report Immediately
- Step stalled (no progress for >60s without explanation)
- Unexpected step skip or branch
- Agent mismatch (wrong agent for capability)
- Non-zero exit code on critical steps
- Missing expected events in timeline
- Database state inconsistency
- Cycle number not advancing as expected
- Items not generated when expected (evolution workflow)
On suspicion: Immediately tell the user what you observed, which step/event triggered concern, and the raw evidence (log line, event, status).
Confirmed Anomaly — Record Ticket
If an anomaly is clearly a bug (not just suspicious), create a ticket file:
docs/ticket/YYYYMMDD-<short-slug>.md
Format:
# <Title>
- **Observed during**: <plan name>, step <step>, cycle <N>
- **Severity**: critical | major | minor
- **Symptom**: <what happened>
- **Expected**: <what should have happened>
- **Evidence**: <relevant log lines, events, or DB state>
- **Status**: open
Do NOT attempt to fix the anomaly. Only record it.
Phase 5: Post-Execution Verification
After the task reaches completed or failed:
Check final task status:
orchestrator task info <task_id> orchestrator task trace <task_id>Verify success criteria from the plan:
- Did all expected steps execute?
- Did self_test pass (if applicable)?
- Are cycle counts correct?
- Did loop_guard terminate properly?
Check artifacts:
git diff --stat— were code changes made as expected?- Check
docs/qa/for generated QA documents (if applicable) - Check
docs/ticket/for any auto-generated tickets - Verify DB state via sqlite3 if the plan specifies DB checks
Run plan-specific validations:
- For self-bootstrap: verify compilation gate passed, QA docs generated
- For self-evolution: verify candidates generated, benchmarks ran, winner selected
Phase 6: Assessment Report
Produce a structured assessment:
## Test Execution Report
### Plan: <plan name>
### Task ID: <id>
### Duration: <start to end>
### Final Status: <completed/failed>
### Step-by-Step Results
| Step | Status | Duration | Notes |
|------|--------|----------|-------|
| ... | ... | ... | ... |
### Anomalies Detected
- <list anomalies with severity, or "None">
### Tickets Created
- <list ticket files, or "None">
### Success Criteria Evaluation
| Criterion | Met? | Evidence |
|-----------|------|----------|
| ... | ... | ... |
### Overall Assessment
<Pass/Partial/Fail with explanation>
### Recommendations
<Actionable next steps, if any>
Rules
- NEVER modify source code, config files, or workflow definitions
- NEVER fix bugs — only record them as tickets
- ALWAYS report suspicious observations immediately, don't wait
- ALWAYS show raw evidence (log lines, events) when reporting issues
- If the orchestrator is stuck and you suspect it will not recover, inform the user with evidence and let THEM decide whether to intervene
- ALLOWED in Phase 2 only:
cargo build(daemon/CLI rebuild), killing and restarting the daemon process, andorchestratorCLI commands for resource setup - ALLOWED in Phase 3: running
orchestratorCLI commands to initialize resources and create tasks - NEVER run
git commit,git push, or any git write operations