tech-debt-audit - SKILL.md Agent Skill

name: tech-debt-audit description: "Thorough, file-cited technical debt audit across 9 dimensions using AST-grep (tree-sitter), grep, language-native tooling, and optionally CodeGraph knowledge graph. Produces TECH_DEBT_AUDIT.md with severity, effort estimates, and prioritized fixes. Use when asked for codebase health check, tech debt audit, architecture review, code quality assessment, or cleanup planning. Triggers: 'tech debt', 'technical debt', 'debt audit', 'code health', 'technical debt audit', 'codebase health check', 'find tech debt', 'debt analysis', 'audit code quality'."

Tech Debt Audit Protocol

Model-agnostic technical debt audit for oh-my-openagent (OMO). Uses OMO's built-in tools (grep, glob, bash with sg, read, lsp_diagnostics, task) plus optional CodeGraph MCP for enhanced code graph analysis when available. Produces a grounded, citable TECH_DEBT_AUDIT.md artifact.

CodeGraph Enhancement (Optional)

If you have CodeGraph installed (check with codegraph status), its MCP tools (codegraph_search, codegraph_callers, codegraph_callees, codegraph_impact, codegraph_explore, etc.) can supersede or augment the standard tool searches in the dimensions marked below. CodeGraph gives you:

Symbol search — instant by-name lookup via FTS5
Call graph analysis — callers/callees for any function
Impact analysis — blast radius before changing any symbol
Smart context building — entry points, related symbols, and snippets in one call
Framework-aware routes — URL patterns linked to their handlers

To use CodeGraph, ensure the codegraph MCP server is configured in your project's .mcp.json or global MCP config. The skill will auto-detect CodeGraph by checking if codegraph MCP tools are available. Sub-agents spawned via task() cannot use CodeGraph — they use the standard tool fallback.

Output

Write results to TECH_DEBT_AUDIT.md in the repo root with:

Executive Summary — 3-5 sentences: overall health, worst dimension, quick wins count
Mental Model — the repo's architecture in 1 paragraph (what it does, stack, module boundaries)
Findings Table — columns: ID, Category, File:Line, Severity (Critical/High/Medium/Low), Effort (Hours), Description, Recommendation
Top 5 Priorities — ranked by impact/effort ratio
Quick Wins Checklist — items under 30 minutes each
"Looks Bad But Is Fine" — patterns that look like debt but are intentional
Open Questions — things the maintainer should clarify

Phase 0: Orient

Standard (always run)

glob("**/*.ts") / glob("**/*.py") / etc — map the language stack
glob("**/package.json") + read() — dependencies and build tooling
bash("git log --oneline -200") — churn: find highest-change files
glob("**/*") + basic math — find largest files (>300 LOC are candidates)
Cross-reference high-churn + large = debt hot zones
Write the mental model paragraph in your own working context

CodeGraph Enhancement (if available)

Instead of guessing module boundaries, query the code graph:

codegraph_explore(query="architecture overview and main modules")

This returns symbol relationships and source grouped by file. Use the structure as your architectural mental model instead of hand-inferring it from directory names.

codegraph_explore(query="main entry points and execution flow")

This surfaces entry points and call chains. Use these to understand how the code actually flows vs how the directory layout suggests it flows.

Phase 1: Audit Across 9 Dimensions

Use OMO tools for each dimension. Run parallel tool calls within each dimension. Every finding MUST cite file:line:col.

1. Architectural Decay

Standard (always run)

bash("sg -p \"import { $$$ } from '$SRC'\" -l ts .") — map module graph, look for circular patterns
bash("sg -p \"class $NAME { $$$ }\" -l ts .") — check for god classes
grep("TODO|FIXME|HACK|XXX|WORKAROUND|TEMP") — tagged debt markers
grep("async|await") on sync-looking files — misplaced async boundaries
bash("wc -l <file>") on each large file found in Phase 0

CodeGraph Enhancement (if available)

Dead code detection:

codegraph_callers(symbol="<suspected-dead-function>")
codegraph_callers(symbol="<suspected-dead-class>")

Run codegraph_callers on suspected dead exports found via grep/glob. If the result shows zero callers (excluding test files), it's dead code.

Circular dependency detection:

codegraph_impact(target="<module-or-file>", direction="upstream")

Use codegraph_impact on key modules to trace their dependents. If A depends on B and B depends on A, that's a cycle.

Architecture boundaries:

codegraph_explore(query="module dependencies and architecture boundaries")

Use codegraph_explore to survey actual module structure.

What to flag

Files > 500 LOC (god files)
Functions > 80 LOC or > 4 nesting levels
Classes with > 15 methods or > 400 LOC
Import cycles (A → B → A)
Dead exports: function/class defined but never imported elsewhere (CodeGraph: codegraph_callers)
Commented-out code blocks (>3 consecutive consecutive lines)

2. Consistency Rot

Standard (always run)

bash("sg -p \"import $CLIENT from '$PKG'\" -l ts .") — multiple HTTP clients
grep("console.log|console.error|console.warn") — direct console use vs logger
bash("sg -p \"try { $$$ } catch ($$$) { $$$ }\" -l ts .") — error handling patterns
grep("as any|@ts-ignore|@ts-expect-error|as unknown") — type escapes
grep("eslint-disable|prettier-ignore") — lint suppressions

What to flag

3+ ways of doing the same thing (HTTP, logging, validation, config)
Mixed naming conventions (camelCase + snake_case + PascalCase)
Multiple date/time handling libraries
Mixed error response shapes across modules

3. Type & Contract Debt

Standard (always run)

bash("sg -p \"$VALUE as any\" -l ts .") — runtime type escapes
grep("@ts-expect-error") — suppressed errors
grep("@ts-ignore") — suppressed errors (legacy)
bash("sg -p \"$NAME: any\" -l ts .") — typed as any
lsp_diagnostics(filePath="<src-dir>") — current type errors

What to flag

any types on public APIs and exported interfaces
Untyped function parameters
Missing schema validation at API/IO boundaries
LSP type errors grouped by file

4. Test Debt

Standard (always run)

glob("**/*.test.ts") — find all test files
bash("bun test 2>&1 | grep -E '(fail|skip|todo)'") — current test health
Cross-reference Phase 0 high-churn files with test existence

What to flag

Critical-path files with zero tests
Skipped tests (test.skip, describe.skip)
Tests asserting implementation details vs behavior
Slow tests (>1s each)

5. Dependency & Config Debt

Standard (always run)

bash("npm audit --omit=dev 2>&1 | head -40") — known CVEs (if node_modules present)
read("package.json") — check dependency count and stale deps
grep(".env|process.env|Bun.env") — env var usage
grep("API_KEY|SECRET|PASSWORD|TOKEN") in non-config files — hardcoded config

CodeGraph Enhancement (if available)

Blast radius of core dependencies:

codegraph_impact(target="<core-utility-function>", direction="upstream")

Run this on a few key internal modules (logger, config loader, HTTP client) to see how widely they're used. A widely-depended-on module with poor error handling or type safety is a high-priority refactor target because changes to it ripple everywhere.

What to flag

Outdated major-version deps
Dependencies that do the same thing (duplicate libraries)
Referenced env vars not documented in README
Hardcoded environment-specific values

6. Performance & Resource Hygiene

Standard (always run)

bash("sg -p \"for ($$$ of $$$) { $$$ await $$$ }\" -l ts .") — async-in-loop
grep("await.*map|await.*filter|await.*forEach") — sequential async iteration
grep("Promise\\.all|Promise\\.allSettled") — existing parallel patterns (good signal)
grep("addEventListener|on\\(|subscribe") without removeEventListener|off\\(|unsubscribe nearby — listener hygiene

What to flag

await inside for/of loops (sequential when parallel possible)
N+1 query patterns
Missing cleanup on event listeners, intervals, handles
Unnecessary serialization/deserialization

7. Error Handling & Observability

Standard (always run)

bash("sg -p \"catch ($$$) { $$$ }\" -l ts .") — catch blocks
grep("catch.*{}|catch.*{\\s*}") — empty catch blocks
grep("console.error|logger\\.error|log\\.error") — actual error logging
bash("sg -p \"throw new $ERR($$$)\" -l ts .") — error types used

CodeGraph Enhancement (if available)

Trace error propagation through call chains:

codegraph_callers(symbol="<key-error-handler-or-middleware>")
codegraph_explore(query="how errors propagate through <key-error-handler>")

Use codegraph_callers to find who calls your error handlers. If errors are caught and swallowed at multiple levels, that's a finding.

Impact of changing error types:

codegraph_impact(target="<error-class-or-interface>", direction="upstream")

Check the blast radius of custom error classes. If changing an error type would break 20+ consumers, the error contract is too tight.

What to flag

Empty catch blocks (worst offense)
Generic catch (e) { console.error(e) } without recovery
Inconsistent error shapes across modules
Missing structured logging on critical paths
Errors swallowed in promise chains (.catch(() => {}))

8. Security Hygiene

Standard (always run)

grep("api[Kk]ey|api_secret|password|secret|token|credential") in source files (not config or env)
grep("SELECT .* FROM|INSERT INTO|UPDATE.*SET|DELETE FROM") — SQL construction
grep("innerHTML|dangerouslySetInnerHTML") — XSS vectors
grep("eval\\(|Function\\(|setTimeout\\(.*string|setInterval\\(.*string") — code injection

What to flag

Hardcoded secrets in source
String-concatenated SQL
innerHTML / dangerouslySetInnerHTML usage
eval() or string-based setTimeout/setInterval
Permissive CORS or auth middleware

9. Documentation Drift

Standard (always run)

read("README.md") — check if claims match reality
grep("@param|@returns|@throws") — docstring coverage
grep("FIXME|TODO|HACK|XXX|WORKAROUND") — fixme density
Compare README API examples with actual signatures

What to flag

README claiming features that don't exist
Public functions without any doc comment
Comments that contradict the code
Stale architecture decision records (ADRs) if present

Phase 2: Deeper Dives (Parallel Sub-Agents)

For large codebases (>50k LOC), delegate heavy dimensions to parallel sub-agents. Sub-agents CANNOT use CodeGraph — they use standard tools only:

task(category="unspecified-low", run_in_background=true, load_skills=[], prompt="[CONTEXT] Tech debt audit. [GOAL] Audit dimensions 1 (Architecture) and 2 (Consistency). [REQUEST] Run ast_grep and grep searches for dimensions 1-2 from the tech-debt-audit skill. Report every finding with file:line:col. Tag severity: Critical/High/Medium/Low.")
task(category="unspecified-low", run_in_background=true, load_skills=[], prompt="[CONTEXT] Tech debt audit. [GOAL] Audit dimensions 3 (Type debt) and 7 (Error handling). [REQUEST] Run searches for dimensions 3 and 7 from the tech-debt-audit skill. Report every finding with file:line:col. Tag severity.")

Spawn 2-3 sub-agents for the heaviest dimensions, collect results in parallel, then synthesize. The main agent handles CodeGraph queries itself while sub-agents run the standard tool passes.

Phase 3: Synthesize & Deliver

Collect all findings from direct tool calls, CodeGraph queries (if available), and sub-agent results
Deduplicate — same issue mentioned by multiple dimensions
Classify severity:
- Critical — Causes incorrect behavior, data loss, or security vulnerability
- High — Will cause problems in production; blocks maintenance
- Medium — Reduces maintainability; violates conventions
- Low — Cosmetic; should fix when in the area
Estimate effort in hours per finding (conservative)
Write TECH_DEBT_AUDIT.md with all required sections
Report summary to the user

Severity Rubric

Critical = actively causing bugs or security holes
High     = will cause problems under normal operation; blocks changes
Medium   = reduces maintainability; inconsistent; violates team conventions
Low      = cosmetic; would be nice to fix when nearby

Quick Checks Before Finishing

Every concrete finding has file:line:col citation
No generic claims without evidence
"Looks Bad But Is Fine" section explains at least 2-3 patterns
Top 5 priorities ranked by impact/effort
Quick wins are things that can be fixed in <30 minutes each