odyssey-debug

star 382

Long-running debug cycle — archaeology, diagnosis, fix, confirmation, generalization, discovery, and knowledge persistence

catlog22 By catlog22 schedule Updated 6/16/2026

name: odyssey-debug description: "Long-running debug cycle — archaeology, diagnosis, fix, confirmation, generalization, discovery, and knowledge persistence" argument-hint: " [--skip-fix] [--skip-generalize] [--auto] [-y] [-c]" allowed-tools: spawn_agents_on_csv, Read, Write, Edit, Bash, Glob, Grep, request_user_input

Closed-loop deep debugging: archaeology (what changed) → explore (call chains, error gaps) → diagnose (hypothesis-driven) → fix & confirm → generalize (举一反三) → discover siblings → persist learnings.

Unlike quality-debug (fast fix), this treats every bug as a learning signal — digs into git history before hypotheses, confirms fixes with CLI review, scans for siblings of the root cause.

Core philosophy:

  • Archaeology before hypothesis — look at what changed before guessing why
  • Fix one, find many — a single bug reveals a class of bugs
  • Decision journal — human-judgment items recorded, not lost
  • CLI-assisted review — delegate for second-opinion analysis

三句哲学约束(穷尽迭代):

  1. 零遗留 — 根因必须确认到底,修复必须验证通过,泛化必须扫描穷尽
  2. 穷尽迭代 — 假设失败不放弃:扩范围 → 换视角 → 升级工具,直到根因确认或明确 INCONCLUSIVE
  3. 改进即标准 — 修复后重新确认同区域无新问题,泛化发现的同类 bug 全部处理
**范围内:** 单一 bug/issue 的完整闭环 — 考古 → 探索 → 诊断 → 修复 → 确认 → 泛化 → 沉淀 **范围外:** 新功能 → `$odyssey-planex` | 代码审查 → `$odyssey-review-test-fix` | UI 优化 → `$odyssey-ui` **探索自由度:** 边界内自由 — 可追踪任意调用链、分析任意历史、测试任意假设 **Zero-residual principle:** Every finding MUST have a concrete action (fix / issue / decision). "Report and shelve" is not allowed. "Pre-existing issue" is not a valid skip reason — if discovered within scope, it must be addressed. **模板:** `--template ` — performance | memory-leak | race-condition | regression | crash **三条铁律(所有阶段适用):** 1. **Phase auto-commit** — 阶段完成后**自动** `git commit`,无需用户确认(session.json/evidence.ndjson 不纳入) 2. **Confident edits only, but must attempt** — only modify what you're confident about; record decisions only when genuinely requiring human judgment - Confident → edit code directly, commit - Needs decision → record `evidence.ndjson {"phase":"decision","status":"pending"}`, don't touch code - No speculative changes - ⚠️ **Decision gate** — ONLY these qualify as decisions (not fixes): - Cross-module architectural tradeoffs requiring human direction - Ambiguous business semantics where the fix could alter intended behavior - Requires new dependency or breaking API change - ❌ "Unsure how to fix", "Large scope", "Pre-existing issue" are NOT valid decision reasons — either fix it, or explain specifically why it's unfixable 3. **多 CLI 辅助** — `maestro delegate` 多 `--role`(analyze/review/explore)交叉验证关键判断 $ARGUMENTS — issue description and optional flags.

Flags: --skip-fix analysis-only | --skip-generalize quick fix | --template <name> 预定义策略 | --auto no delegate confirmation | -y auto-confirm all decisions | -c resume last session

Session: SESSION_DIR = .workflow/scratch/{YYYYMMDD}-debug-odyssey-{slug}/

Output — 4 files:

SESSION_DIR/
  ├── session.json       # state + confirmation + patterns + phase_goals
  ├── evidence.ndjson    # append-only evidence trail (phase field distinguishes origin)
  ├── explore.json       # structured CLI exploration snapshot
  └── understanding.md   # evolving narrative — 9 sections

evidence.ndjson — unified trail:

{"ts":"","phase":"archaeology|explore|diagnosis|discovery|decision|self-iteration","type":"","source":"","content":"","note":""}

Phase-specific fields:

  • archaeology: sha, author, date, message, relevance (high|medium|low)
  • explore: category (call_chain|recent_change|error_gap|similar_pattern), detail
  • diagnosis: hypothesis, result (confirmed|disproved|inconclusive)
  • discovery: file, line, classification (safe|risk|bug), action
  • decision: question, options, context, status (pending|resolved|deferred), resolution

explore.json schema:

{
  "call_chains": [{"entry":"","chain":["file:line"]}],
  "recent_changes": [{"file":"","commits":[{"sha":"","message":"","date":""}]}],
  "error_gaps": [{"file":"","line":0,"description":""}],
  "similar_patterns": [{"file":"","line":0,"description":""}],
  "cli_tool": "", "timestamp": ""
}

session.json schema:

{
  "session_id": "debug-odyssey-{YYYYMMDD-HHmmss}", "issue": "",
  "flags": { "skip_fix": false, "skip_generalize": false, "auto": false, "auto_confirm": false },
  "current_state": "S_INTAKE", "diagnosis_retries": 0,
  "root_cause": null, "patterns": [], "confirmation": null,
  "phase_goals": [], "phase_goals_all_done": false, "self_iteration_log": [],
  "cross_phase_loops": 0, "max_loops": 5,
  "generalization_stats": null,
  "created_at": "", "updated_at": ""
}

phase_goals[] — auto-derived from flags:

ID Goal done_when phase skip_when
G1 Root cause identified evidence.ndjson has phase=diagnosis result=confirmed S_DIAGNOSE
G2 Explore context gathered explore.json ≥1 category populated S_EXPLORE
G3 Fix applied and confirmed confirmation.overall == confirmed S_CONFIRM skip_fix
G4 Pattern generalized patterns[] ≥1 entry S_GENERALIZE skip_generalize
G5 Discoveries triaged all scan hits classified S_DISCOVER skip_generalize
G6 Learnings persisted spec entries created OR no actionable learnings S_RECORD

When flags[skip_when] == true → auto set status: "skipped", completion_confirmed: true.

understanding.md — 9 sections (written by owning phase):

  1. Issue & Scope ← S_INTAKE | 2. Archaeology ← S_ARCHAEOLOGY | 3. Exploration ← S_EXPLORE
  2. Hypotheses ← S_DIAGNOSE | 5. Root Cause ← S_DIAGNOSE | 6. Fix & Confirmation ← S_FIX+S_CONFIRM
  3. Generalization ← S_GENERALIZE | 8. Discoveries ← S_DISCOVER | 9. Learnings ← S_RECORD

Pre-load(可选,缺失不阻塞)

层级 命令 作用
Codebase docs Read .workflow/codebase/ARCHITECTURE.md 模块边界
Wiki search maestro search "<issue keywords>" --json 先前调查(top 5)
Specs + tools maestro spec load --category debug --keyword "<symptom>" 已知 issue/workaround
Role knowledge maestro search --category debug → 选相关 → maestro wiki load <id> 领域知识
Prior sessions Glob(".workflow/scratch/*-debug-odyssey-*") 相关会话

Knowledge Persistence(S_RECORD 中写入产出文件)

S_RECORD 将可沉淀知识 写入 understanding.md §9 Learnings,按分类结构化:

分类 写入内容 后续建议命令
反复根因模式 模式描述 + 触发条件 + 修复模板 $spec-add debug "..."
非显而易见 workaround 问题场景 + 解决方案 + 适用范围 $spec-add learning "..."
架构边界违反 违反描述 + 正确边界 + 检查方法 $spec-add arch "..."
可复用泛化 pattern pattern 签名 + 风险说明 + fix 模板 $spec-add coding "..."

两步模式: 执行中写入产出文件(临时记录)→ 任务完成后用户沉淀为永久知识。执行过程中不调用外部 skill。

**Quality Gate (适用: S_ARCHAEOLOGY, S_EXPLORE, S_DIAGNOSE, S_GENERALIZE)**
维度 sufficient insufficient
Coverage 已知相关文件/模块均已分析 遗漏 grep/git log 可发现的目标
Depth ≥80% 发现有 file:line 级证据 多数仅泛泛描述
Actionability 每条结论有具体后续动作 仅"建议关注"类无操作性结论

Rules: Phase complete → evaluate 3 dimensions → any insufficient → re-enter (max 3 rounds per phase).

  • Round 1: Broaden scope — add directories, git log depth ×2, add delegate angles
  • Round 2: Switch perspective — different CLI tool, reverse tracing, manual code reading
  • Round 3: Combine both + targeted deep-dive on remaining gaps

Exit: All sufficient → advance | 3-round limit → record gaps and continue. Log to evidence.ndjson + session.json.self_iteration_log[].

### Shared Output Schema (all waves) ```json { "type": "object", "properties": { "id": {"type":"string"}, "result_status": {"type":"string","enum":["completed","failed"]}, "findings": {"type":"string","maxLength":500}, "evidence": {"type":"string"}, "error": {"type":"string"} }, "required": ["id","result_status","findings"] } ```

Termination Contract (embed in every instruction):

You MUST call report_agent_job_result EXACTLY ONCE before exiting.
Success → result_status=completed | Failure → result_status=failed with error | Timeout → completed with partial.
NEVER continue indefinitely. NEVER exit silently. Read-only — do NOT modify source files.
Do NOT write to tasks.csv, wave-*.csv, results.csv. Do NOT call spawn_agents_on_csv.

tasks.csv

id,title,description,task_type,deps,wave,status,findings,evidence,error
  • Wave 1: Archaeology (git-timeline, git-blame) — parallel
  • Wave 2: Generalization (syntax-grep, semantic-scan, structural-match, historical-grep) — parallel, depends on root cause
  • Single-agent stages (explore, diagnose, fix, confirm) remain inline
1. **Iron Law**: NO FIX without root cause evidence 2. **Archaeology first**: git history MUST precede any hypothesis 3. **Evidence append-only**: never modify/delete evidence.ndjson entries 4. **Session is source of truth**: session.json holds all state 5. **Phase goal tracking**: each phase MUST mark its goal on completion 6. **Decision journal integrity**: all human-judgment items recorded phase=decision 7. **`-y` defers, never drops**: auto-confirm records "deferred", never silently skips 8. **CLI delegate is background**: all `maestro delegate` → `run_in_background: true`, wait for callback (do NOT halt the Odyssey flow) 9. **Resumable state**: current_state saved at every transition 10. **Goal is outcome-oriented**: `/goal` user-bound; odyssey outputs prompt then continues 11. **Violation = BLOCK**: violating any invariant blocks the operation

Stage 1: Intake (S_INTAKE)

  1. Parse arguments: issue description, flags
  2. Generate slug, create SESSION_DIR
  3. Search: maestro search "<keywords>" + Glob prior sessions + ARCHITECTURE.md + Grep keywords
  4. Derive phase_goals[] from flags (apply skip_when)
  5. Write session.json + understanding.md §1
  6. Display Goal Prompt (Appendix), continue without blocking

Resume (-c): Glob latest session → read session.json → restore current_state → jump.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_INTAKE — 目标解析"

Stage 2: Archaeology (S_ARCHAEOLOGY)

Step 1 — Git archaeology (spawn_agents_on_csv, Wave 1):

Write tasks.csv with Wave 1 rows:

id,title,description,task_type,deps,wave,status,findings,evidence,error
"arch-timeline","Git Timeline","Run git log --oneline -20 -- {files}. Return [{sha,date,author,message,files_changed}] as JSON.","archaeology","","1","pending","","",""
"arch-blame","Git Blame","Top 3 suspicious files: git blame -L {region}. Return [{file,line_range,sha,author,date,content}] as JSON.","archaeology","","1","pending","","",""
spawn_agents_on_csv({ csv_path:"tasks.csv", id_column:"id",
  instruction: ARCHAEOLOGY_INSTRUCTION + TERMINATION_CONTRACT,
  max_concurrency:2, max_runtime_seconds:300,
  output_csv_path:"wave-1-results.csv", output_schema: SHARED_OUTPUT_SCHEMA })

Merge results → evidence.ndjson (phase: "archaeology").

Step 2 — CLI change review:

maestro delegate "PURPOSE: Review recent modifications related to: {issue}
TASK: Analyze intent | Identify risky modifications | Flag potential bug sources
MODE: analysis
CONTEXT: @{relevant_files} | Git log: {top_10_commits}
EXPECTED: JSON [{commit_sha, risk_level, analysis, could_cause_issue, explanation}]
CONSTRAINTS: Focus on behavioral changes, not formatting
" --role analyze --mode analysis

Execute with run_in_background: true, then wait for callback (do NOT halt the Odyssey flow). Append results.

Step 3: Update understanding.md §2.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_ARCHAEOLOGY — 考古"

Stage 3: Exploration (S_EXPLORE)

Skip if no enabled CLI tools (W006).

maestro delegate "PURPOSE: Gather codebase evidence for: {issue}
TASK: Trace call chains | Find recent changes | Identify error gaps | Check similar patterns
MODE: analysis
CONTEXT: @**/*
EXPECTED: JSON {call_chains, recent_changes, error_gaps, similar_patterns}
CONSTRAINTS: Max 20 entries/category | Symptom-related code paths
Symptoms: {issue}  Archaeology hints: {suspicious_commits}
" --role explore --mode analysis

Execute with run_in_background: true, then wait for callback (do NOT halt the Odyssey flow).

Parse → write explore.json + evidence (phase: "explore"). Update §3. Mark G2 done.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_EXPLORE — 探索"

Stage 4: Diagnosis (S_DIAGNOSE)

  1. Form hypotheses from evidence (archaeology + explore), ranked [HIGH]/[MEDIUM]/[LOW] → §4
  2. Test each (rank order): design test → execute → evidence (phase: "diagnosis")
  3. Decision journal: ambiguity → evidence (phase: "decision"); Normal: request_user_input | -y: defer
  4. Root cause: confirmed → session.json.root_cause + §5. Mark G1 done.

Escalation (3-strike): All hypotheses fail → increment diagnosis_retries.

  • < 3: broaden scope via maestro delegate --role analyze, form new hypotheses.
  • = 3: Normal → request_user_input (broaden/new/INCONCLUSIVE) | -y → auto INCONCLUSIVE, proceed to S_RECORD.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_DIAGNOSE — 诊断"

Stage 5: Fix (S_FIX)

Skip if --skip-fix.

  1. Present root cause + proposed fix. Normal: request_user_input | -y: auto proceed
  2. Implement fix
  3. Record in evidence (phase: "decision")

📌 Auto-commit: git add -A && git commit -m "odyssey-debug({slug}): S_FIX — 修复"

Stage 6: Confirmation (S_CONFIRM)

Skip if --skip-fix.

  1. Tests: auto-detect framework, run covering tests
  2. CLI fix review:
maestro delegate "PURPOSE: Review fix for: {issue}
TASK: Verify correctness | Check regressions | Assess completeness
MODE: analysis
CONTEXT: @{modified_files} | Root cause: {summary} | Diff: {git_diff}
EXPECTED: JSON {verdict, findings [{severity, description, suggestion}], regression_risk}
CONSTRAINTS: Focus on correctness, not style
" --role review --mode analysis

Execute with run_in_background: true, then wait for callback (do NOT halt the Odyssey flow).

  1. Write session.json.confirmation: {test_result, cli_review, overall, timestamp}
  2. Update §6. needs_rework → Stage 5. confirmed → mark G3 done, advance.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_CONFIRM — 确认"

Stage 7: Generalization (S_GENERALIZE)

Skip if --skip-generalize. 举一反三: extract pattern, scan for siblings.

Step 1 — Multi-layer pattern extraction:

Layer Method Example
Syntax Regex patterns (direct Grep) eval(, missing await, unclosed resource
Semantic Anti-pattern description (Agent scan) Unhandled async errors, unvalidated input
Structural Architecture-level similarity Same import structure, missing override

Write session.json.patterns[]: [{id, source, layer, signature, description, risk, fix_template}]

Step 2 — 4-agent scan (spawn_agents_on_csv, Wave 2):

Append Wave 2 rows to tasks.csv:

"gen-syntax","Syntax Grep","Grep syntax-layer signatures '${signature}' across project. Return [{file,line,context,risk_level,layer:'syntax',confidence}].","generalization","","2","pending","","",""
"gen-semantic","Semantic Scan","Check related modules for anti-pattern: ${description}. Return [{file,line,context,risk_level,layer:'semantic',confidence}].","generalization","","2","pending","","",""
"gen-structural","Structural Match","Find structurally similar files to ${buggy_files}, check for anti-pattern. Return [{file,line,description,risk,layer:'structural',confidence}].","generalization","","2","pending","","",""
"gen-historical","Historical Grep","Run git log -S '${signature}' --oneline. Return [{sha,file,date,type:'introduced|fixed',context}].","generalization","","2","pending","","",""
spawn_agents_on_csv({ csv_path:"tasks.csv", id_column:"id",
  instruction: GENERALIZATION_INSTRUCTION + TERMINATION_CONTRACT,
  max_concurrency:4, max_runtime_seconds:300,
  output_csv_path:"wave-2-results.csv", output_schema: SHARED_OUTPUT_SCHEMA })

Step 3 — Cross-layer dedup: same file:line multi-layer → boost confidence | single-layer → needs_review | historical fixed → regression_risk

Step 4 — Iterative deepening: module ≥3 hits → targeted deep scan (max 1 round).

Step 5 — Quality Gate (self-iteration).

Step 6: Write §7 + session.json.generalization_stats: {patterns_extracted, total_hits, cross_layer_confirmed, regression_risks, by_layer, deepening_triggered}. Mark G4 done.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_GENERALIZE — 泛化"

Stage 8: Discovery (S_DISCOVER)

Skip if no generalization hits.

  1. Triage each hit: read ±10 lines → classify safe/risk/bug
  2. Route (explicit action required for each classification):
    • bug + directly fixable → fix immediately (not just log an issue) → back to S_FIX
    • bug + requires cross-module/architectural decision → create issue (with fix suggestion + impact analysis)
    • risk → evaluate if guard/validation can mitigate directly; if yes, fix it
    • safe → mark skip See Appendix -y behavior. Append evidence (phase: "discovery" + "decision")
  3. Cross-phase loop:
    • discovery finds new bug → S_DIAGNOSE (cross_phase_loops++)
    • same-pattern bug + fix template → S_FIX
    • S_DISCOVER → S_RECORD: triage complete AND remaining_actionable == 0
    • S_DISCOVER → S_RECORD: loops >= max_loops → MUST log each unfixed item with specific reason (blanket "pre-existing" is forbidden)
  4. Update §8. Mark G5 done.

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_DISCOVER — 发现"

Stage 9: Record (S_RECORD)

  1. Finalize understanding.md §9
  2. Write learnings to understanding.md §9: 按 Knowledge Persistence 表分类记录(临时),completion summary 列出建议的后续命令
  3. Mark G6 done. Pending decisions: Normal → request_user_input | -y → show deferred count
  4. Goal audit: all confirmed → phase_goals_all_done = true. Any false: Normal → request_user_input (回退/跳过/接受) | -y → auto accept
  5. Completion: current_state = "COMPLETED", emit summary:
--- DEBUG ODYSSEY COMPLETE ---
Issue:      {issue}
Root cause: {root_cause.hypothesis}
Fix:        {applied|skipped|inconclusive}
Patterns:   {patterns_extracted} ({by_layer} distribution)
Scan hits:  {total_hits} ({cross_layer_confirmed} cross-layer confirmed)
Issues:     {N} created
Decisions:  {N} resolved, {M} pending, {K} deferred
Learnings:  {N} spec entries persisted
Self-iter:  {N} quality gate rounds across {M} stages
Goals:      {done}/{total} confirmed ({skipped} skipped)
---

Next steps: $manage-issue list --source debug-odyssey, $learn-decompose <module>, $quality-review, $learn-second-opinion <understanding.md>, $learn-investigate "<question>"

📌 Auto-commit: git add understanding.md && git commit -m "odyssey-debug({slug}): S_RECORD — 总结"

Goal Prompt Template

时机守卫:仅在 Stage 1 完成后显示一次。Stage 9 完成时禁止重新显示。

📋 Debug Odyssey 会话已创建。可随时复制以下 /goal 设定终止条件:

/goal 穷尽迭代:直到根因确认(或明确 INCONCLUSIVE)且修复验证通过
且泛化扫描穷尽且 phase_goals_all_done=true 才停。
假设失败时扩范围→换视角→升级工具,不轻易放弃。
泛化发现的同类 bug 全部修复或创建 issue,不允许遗留。
遇到 phase=decision 的 pending 必须 request_user_input,不得自行 resolve。

Odyssey 输出提示词后继续执行不阻塞。/goal 由用户任意时刻输入。

-y Auto-Confirm Behavior

Decision Point Normal -y mode
Stage 4 ambiguity request_user_input blocks record deferred, best-effort continue
Stage 4 3-strike request_user_input 3-way auto INCONCLUSIVE
Stage 5 fix direction request_user_input confirm auto proceed
Stage 8 bug triage request_user_input route auto create issue
Stage 8 ambiguous request_user_input batch all deferred
Stage 9 decisions request_user_input per-item skip, show deferred count
Stage 9 goal audit request_user_input 3-way auto accept current state

deferred items shown as "待决策" in summary; recoverable via -c.

Phase Goal Lifecycle

pending → done (confirmed=true) normal | pending → skipped (confirmed=true) flags/manual | pending → failed (confirmed=false) INCONCLUSIVE

phase_goals_all_done = true only when ALL goals have completion_confirmed == true.

| Code | Severity | Condition | Recovery | |------|----------|-----------|----------| | E001 | error | No issue and no session to resume | Provide issue or use -c | | E003 | error | Resume but no session found | Start new session | | E004 | error | Delegate execution failed | Retry or proceed without CLI | | W001 | warning | No relevant git history | Proceed with limited context | | W002 | warning | All hypotheses inconclusive after 3 retries | INCONCLUSIVE | | W003 | warning | Generalization scan 0 hits | Skip discovery | | W004 | warning | Delegate parse failed | Use raw output | | W005 | warning | Pending decisions unresolved | Filter evidence.ndjson phase=decision | | W006 | warning | CLI exploration skipped (no tools) | Proceed without explore.json | - [ ] Session directory created with 4 output files - [ ] Prior knowledge searched (maestro search + sessions + architecture) - [ ] Git archaeology (spawn Wave 1) + CLI change review, evidence phase=archaeology - [ ] CLI exploration, explore.json written, evidence phase=explore - [ ] Hypotheses formed from archaeology + explore, tested and logged phase=diagnosis - [ ] Root cause declared with evidence refs - [ ] understanding.md tracks all 9 sections progressively - [ ] Fix implemented + confirmed with test + CLI review (unless --skip-fix) - [ ] Multi-layer patterns extracted (syntax/semantic/structural) (unless --skip-generalize) - [ ] 4-agent scan (spawn Wave 2) + cross-layer dedup + iterative deepening - [ ] Discoveries classified and routed (fix/issue/decision/skip) - [ ] Decision journal: all human-judgment items in evidence.ndjson phase=decision - [ ] phase_goals derived from flags, skip_when applied, each phase marks its goal - [ ] Goal audit in Stage 9 — unmet goals surfaced, phase_goals_all_done set correctly - [ ] Goal Prompt displayed once after session creation - [ ] `-y`: all decisions auto-resolve/defer, deferred count in summary - [ ] State saved at each transition (resumable via -c) - [ ] Quality Gate self-iteration when insufficient, logged in self_iteration_log - [ ] Spec entries persisted for reusable learnings - [ ] Completion summary with all stats - [ ] **Every unfixed finding has individual classification and reason** — blanket "pre-existing" labels are forbidden
Install via CLI
npx skills add https://github.com/catlog22/maestro-flow --skill odyssey-debug
Repository Details
star Stars 382
call_split Forks 51
navigation Branch main
article Path SKILL.md
More from Creator