name: ops-doctor description: Collect session incidents (mistakes, broken workflows, user corrections) and send an honest report to Codex for fixes. Ground truth only — no spin, no blame-shifting. user_invocable: true
/ops:doctor — Honest Incident Reporter
When something goes wrong — you made a mistake, the user corrected you, a workflow broke, you got locked out, you blamed tooling when it was your fault — this skill collects the ground truth and sends it to Codex for diagnosis and fixes.
You are not the judge. Codex is. Your job is to report facts accurately. Codex decides what to fix.
When to Use
- User says you made a mistake
- User says "fix yourself", "tell codex", "what went wrong"
- You got blocked/locked out by the permission gate
- A workflow failed because you skipped steps
- You blamed tooling when the real problem was your behavior
- Any time the user points out something isn't working
Process
Step 0: Ingest Automatic Incident Logs First
Before writing anything manually, check for ground-truth incidents captured by hooks in:
~/.claude/state/ops-doctor/incidents-<cwdHash>.jsonl~/.claude/state/ops-doctor/pending-<cwdHash>.json
For the current repo/workspace cwd:
- If the JSONL file exists, read all incidents from it first.
- Treat those incidents as primary evidence — do not omit or rewrite them.
- If the pending flag exists, note that the threshold was reached automatically.
- Then add any extra manual incidents for things the hooks could not observe (for example: wasted tokens, user corrections, wrong reasoning, or skipped workflows that never triggered a hook).
After a successful mailbox send to Codex:
- archive the JSONL file to
incidents-<cwdHash>-sent-<timestamp>.jsonl - remove the matching
pending-<cwdHash>.jsonflag if it exists
Step 1: Gather Additional Manual Incidents
Collect every important incident from the current session that is not already captured in the automatic log. For each one, document:
- What happened — the observable fact (error message, blocked tool, wrong output)
- What you did — your exact actions that led to it (be specific: which commands, which shortcuts, which workflow steps you skipped)
- What you should have done — the correct workflow
- What the user said — their exact correction or complaint
- Root cause — why you did the wrong thing (skipped a step, took a shortcut, blamed tooling, didn't read the workflow, etc.)
Step 2: Classify Each Incident
| Category | Meaning |
|---|---|
supervisor_mistake |
You (Claude) did something wrong — skipped a workflow step, took a shortcut, blamed tooling |
workflow_gap |
The workflow/skill is missing a step or has ambiguous instructions |
gate_issue |
The permission gate blocked something it shouldn't have, or didn't block something it should have |
tooling_bug |
An actual bug in OPS tools, GSD tools, hooks, or scripts |
config_issue |
A configuration problem (session state, Redis, file permissions) |
Default to supervisor_mistake unless you have concrete evidence otherwise. If you're unsure, it's probably your fault.
Step 3: Build the Report
Write a JSON envelope with ALL incidents — automatic first, then manual additions. Do not cherry-pick. Do not minimize.
{
"task_type": "investigate",
"objective": "<1-line: what went wrong this session>",
"why": "The Supervisor made mistakes and/or found issues. This is an honest incident report for diagnosis and fixes.",
"deliverable": "recommendation",
"evidence": "<automatic JSONL incidents + full manual timeline with ground truth>",
"acceptance_criteria": [
{"id": "AC-1", "requirement": "Analyze each incident and classify as supervisor_mistake vs actual tooling issue", "verification_method": "Written analysis per incident"},
{"id": "AC-2", "requirement": "For tooling issues: implement fixes if warranted", "verification_method": "Code changes or explicit skip with rationale"},
{"id": "AC-3", "requirement": "For supervisor mistakes: propose operating rules to prevent recurrence", "verification_method": "Concrete rules for the Supervisor role definition"}
],
"scope_in": ["<files involved in the incidents>"],
"scope_out": [],
"authority_flags": {"can_create_files": false, "can_delete_files": false, "can_modify_deps": false, "can_edit_dirty_files": true}
}
Step 4: Send to Codex
Dispatch via mailbox:
cat > /tmp/mailbox-task.json <<'JSON'
<the JSON envelope>
JSON
TASK_ID=$(node ~/.claude/scripts/mailbox send codex "$(cat /tmp/mailbox-task.json)" | jq -r '.id')
echo "$TASK_ID"
Step 5: Archive Automatic Logs After Successful Send
If mailbox send succeeds:
- Rename
incidents-<cwdHash>.jsonltoincidents-<cwdHash>-sent-<timestamp>.jsonl - Remove
pending-<cwdHash>.jsonif present - Leave archived incident files untouched for history
Step 6: Report to User
Tell the user:
- How many incidents were reported
- Brief summary of each (1 line)
- The mailbox task ID
- That Codex will judge independently what needs fixing
Rules
Ground truth only. Report exactly what happened. No euphemisms, no passive voice that hides who did what.
- Bad: "The session encountered an authentication issue"
- Good: "I didn't run /ops:run and tried to hack a stale session instead"
Don't minimize. If you wasted 200K tokens on a mistake, say so. If you blamed tooling when it was your fault, say so.
Don't prescribe fixes. Describe the problem. Let Codex decide the solution. You already proved your judgment is unreliable by making the mistake in the first place.
Include the user's words. When the user corrected you, quote them. Their perspective is more reliable than yours about what went wrong.
Default to your fault. Unless you have concrete evidence that tooling is broken (error in code, missing file, wrong logic), assume the problem is your behavior.