name: debug description: >- Systematic debugging for kokoro-coreml: consult README/ (guides and learnings) and CLAUDE.md first, pull current library docs via Context7 MCP when coremltools/PyTorch/API behavior is uncertain, parallelize investigation with multiple subagents when stuck, prove fixes before calling success, then capture one consolidated note in README/Notes via write-notes as the final step before ending the session. For extreme cases, delegate a multi-agent audit via the Claude Code CLI. Use when the user invokes debug, use debug, asks to debug or fix a tricky bug, or work is blocked by unclear failure modes after quick local checks.
Debug
Purpose
Reduce guesswork: README/ and CLAUDE.md first, Context7 when library behavior is uncertain, parallel hypotheses when stuck, proof before “fixed”, one write-notes pass at session end, CLI multi-agent audit only for exceptional escalation.
Use When
- The user says debug, use the debug skill, debug this, or the task is bug investigation / root-cause analysis that is stalling.
- Failures involve coremltools, PyTorch, Core ML runtime, or Swift integration where training data may be stale—use Context7 before assuming APIs.
Do Not Use When
- The user only wants a trivial fix with an obvious stack trace and no doc
ambiguity (still skim
README/if the area is known to be finicky). - The task is greenfield feature build with no defect.
Workflow (in order)
1. Read repo knowledge first (mandatory)
Before writing code or deep-diving the stack:
- Skim
README/for files that match the subsystem (conversion, export, Core ML runtime, tokenizer bridge,kokoro.js, etc.). Open the most relevant guides and look for known issues, workarounds, and checklists. - Skim
README/Notes/(when present) and high-level docs linked fromREADME.mdfor past investigations. - Read
CLAUDE.mdfor PyTorch → Core ML playbook constraints (static shapes, ANE layout, divide-and-conquer).
2. Context7 MCP (library and platform docs)
When the bug touches coremltools, PyTorch, Swift Core ML, or related APIs:
- Use Context7: resolve the library ID, then fetch focused docs for the API or behavior in question.
- Prefer Context7 over memory or generic web search for API shape and version-sensitive behavior.
If Context7 is unavailable, fall back to official docs—do not invent APIs.
3. Reproduce and narrow
- Confirm minimal repro or the exact export step / predict call / log line that fails.
- State a one-sentence hypothesis and what evidence would falsify it.
4. Parallel investigation when stuck
If the problem is hard, cross-cutting, or you have been going in circles after the reproduce-and-narrow pass:
- In one turn, spin up multiple
Tasksubagents withreadonly: trueand narrow charters (e.g. “MIL/export path only”, “Swift shapes only”). - Merge overlapping hypotheses; avoid duplicate deep reads of the same file.
When in doubt, parallelize (same posture as audit, but
hypothesis-driven).
5. Exceptionally frustrating bugs — CLI + audit
When still stuck after parallel investigation, or blast radius is high:
- Use the Claude Code CLI from the repo root to run a multi-agent
auditover the scoped slice that matters. - In the handoff, cite relevant
README/paths, symptom + ruled-out items, and instruct use of theauditskill with readonly subagents.
Escalation only—after README, Context7, and a fair parallel pass.
6. Prove the fix (before claiming success)
Do not say fixed until there is objective proof: failing test passes, export completes, Core ML predict matches tolerance, bad log line absent—whatever matches the bug.
7. Write notes once (mandatory last step)
After investigation, perform a single README/Notes/ update using
write-notes (consolidate the session).
- One pass: symptom, repro, ruled out, root cause (
TBDif unknown), fix, verification (proven / not), pointers to guides. - Skip only when the user explicitly wants no note or the outcome is trivial.
Output expectations
- What was read from
README//CLAUDE.md(paths). - Verification: what proved the fix (or unverified / blocked).
- Which note file(s) were updated in step 7 (or why skipped).
- Whether Context7 was used and for which topic.
- Leading hypothesis(es) if still open, else root cause summary.
Anti-patterns
- Skipping
README//CLAUDE.mdto “save time.” - Assuming API behavior without Context7 or official docs when versions matter.
- Serial thrashing instead of parallel charters.
- Declaring victory without verification.
- Patching notes throughout instead of one end-of-session pass.
Relation to other skills
audit: structured review; use via CLI escalation when debug maxes out.write-notes: once at end of debug session.phase-audit: plan-phase checks—not the same as production debugging.