name: code-audit description: "Use when the user says '/code-audit', 'audit this code', 'check code quality', 'find duplication', 'find dead code', 'code cleanup', 'technical debt audit', 'code review module', or wants a structured noetic investigation of code quality. This skill runs external analysis tools and structured manual review, producing Empirica artifacts (findings, goals, decisions) that any praxic agent can execute." version: 1.0.0
Code Audit: Noetic Investigation Skill
Investigate code quality. Produce structured remediation plans.
This skill is purely noetic — it discovers, triages, and plans. It does NOT make changes. The output (findings, goals, decisions) feeds directly into the Empirica workflow for any praxic agent to pick up and execute.
How to Run
/code-audit # Audit entire project
/code-audit --target src/handlers/ # Audit specific directory
/code-audit --target src/auth.py # Audit specific file
/code-audit --focus duplication # Focus on one dimension
Phase 1: Scope
Determine what to audit. If the user specified a target, use it. Otherwise, audit the current project root.
# Determine scope
TARGET="${1:-.}" # Default to current directory
# Quick size assessment
find "$TARGET" -name "*.py" | wc -l # File count
find "$TARGET" -name "*.py" -exec wc -l {} + | tail -1 # Total LOC
Log the audit scope:
empirica finding-log --finding "Audit scope: $TARGET — N files, N LOC" --impact 0.1
Phase 2: Automated Tool Passes
Run external tools and parse results into findings. Each tool covers a different dimension. Skip any tool that isn't installed — the audit still works without it.
2a. Linting & Style (ruff)
ruff check "$TARGET" --statistics --output-format json 2>/dev/null
Parse results:
- Group by rule category (import order, unused imports, complexity, etc.)
- Log aggregates as findings, not individual violations
- Impact scoring: unused imports = 0.2, complexity violations = 0.5, security = 0.8
# Example: aggregate finding
empirica finding-log --finding "ruff: 23 unused imports across 8 files (F401)" --impact 0.2
empirica finding-log --finding "ruff: 5 functions exceed complexity limit (C901)" --impact 0.5
2b. Dead Code (vulture)
vulture "$TARGET" --min-confidence 80
Parse results:
- Filter out false positives (dynamically called functions, CLI entry points)
- Log confirmed dead code as findings
- Log uncertain cases as unknowns
empirica finding-log --finding "vulture: 12 unused functions detected (80%+ confidence)" --impact 0.4
empirica unknown-log --unknown "vulture flagged handle_legacy_sync() — verify if called dynamically"
2c. Complexity Metrics (radon)
radon cc "$TARGET" -s -a --min C # Only show functions rated C or worse
radon mi "$TARGET" -s --min B # Maintainability index
Parse results:
- Functions rated C (11-20) → finding with impact 0.4
- Functions rated D (21-30) → finding with impact 0.6
- Functions rated F (31+) → finding with impact 0.8
- Log the worst offenders by name and file
empirica finding-log --finding "radon: process_vectors() in workflow_commands.py has CC=27 (D)" --impact 0.6
2d. Type Errors (pyright)
pyright "$TARGET" --outputjson 2>/dev/null
Parse results:
- Count errors by category (missing types, incompatible types, possibly unbound)
- Log aggregates, not individual errors
empirica finding-log --finding "pyright: 45 type errors — 20 missing annotations, 15 incompatible, 10 unbound" --impact 0.3
Phase 3: Structural Review (Manual)
These dimensions require AI judgment — tools can't catch them.
3a. File Size Analysis
find "$TARGET" -name "*.py" -exec wc -l {} + | sort -rn | head -20
Thresholds:
- >2000 LOC → finding with impact 0.5 ("needs splitting")
- >1000 LOC → finding with impact 0.3 ("consider splitting")
- >500 LOC → note only
empirica finding-log --finding "project_commands.py: 4214 LOC — contains 17 unrelated handlers, needs splitting" --impact 0.6
3b. Duplication Detection
Search for functions defined in multiple places:
# Find duplicated function definitions
grep -rn "^def " "$TARGET" | awk -F: '{print $NF}' | sort | uniq -c | sort -rn | head -20
# Find copy-pasted patterns (same function name in multiple files)
grep -rn "def get_instance_id\|def find_project_root\|def resolve_project" "$TARGET"
For each duplicate: Read both copies. Assess whether they're identical, similar, or intentionally different. Log accordingly:
# Identical copies
empirica finding-log --finding "get_instance_id() duplicated identically in 6 files" --impact 0.6
# Intentionally different (record WHY)
empirica decision-log --choice "Keep separate find_project_root() in hooks" \
--rationale "Hooks must be standalone — can't import from package" \
--reversibility exploratory
3c. Module Boundary Assessment
Read the import graph. Check for:
- Circular imports — A imports B, B imports A
- Layer violations — CLI importing from data, data importing from core
- God modules — single file that everything imports
# Check for late imports (circular dependency workaround)
grep -rn "import " "$TARGET" | grep "def \|if " | head -20
3d. Naming & Convention Consistency
Spot check:
- Mixed naming styles (snake_case vs camelCase)
- Inconsistent patterns (some files use
handle_X_command, others useX_handler) - Magic strings/numbers without constants
3e. Error Handling
# Find bare exception handlers
grep -rn "except Exception" "$TARGET" | grep "pass"
grep -rn "except:" "$TARGET"
empirica finding-log --finding "12 bare 'except Exception: pass' handlers — errors silently swallowed" --impact 0.5
Phase 4: Triage
Review all findings. Assign categories and prioritize.
Categories:
| Tag | Meaning |
|---|---|
duplication |
Same code in multiple places |
complexity |
Function/file too complex or too large |
dead-code |
Unused code that should be removed |
consistency |
Naming, patterns, conventions don't match |
architecture |
Module boundaries, layering, coupling |
reliability |
Error handling, edge cases, silent failures |
security |
Input validation, injection risks |
Impact scoring guide:
| Score | Meaning |
|---|---|
| 0.1-0.3 | Cosmetic — style, minor inconsistency |
| 0.4-0.6 | Structural — duplication, complexity, dead code |
| 0.7-0.9 | Critical — bug source, security risk, architecture violation |
Phase 5: Create Remediation Goals
Group related findings into actionable goals. Each goal should be independently executable by a praxic agent.
Goal template:
empirica goals-create --objective "Split project_commands.py into focused modules (artifact_log, workspace, ecosystem)"
Good goals have:
- Clear scope — which files, what changes
- Success criteria — how to verify it's done
- Linked findings — why this goal exists (reference finding IDs)
- Dependencies — what must happen first
Goal sizing:
- Each goal = 1 transaction of praxic work
- If a goal would take 3+ transactions, split it further
- If two goals always need to happen together, merge them
Phase 6: Record Decisions
Not every finding needs a fix. Some are acceptable trade-offs. Record these explicitly:
# Accept a trade-off
empirica decision-log --choice "Accept hook utility duplication" \
--rationale "Hooks must be standalone — extracting to shared module is the fix, not eliminating all copies" \
--reversibility exploratory
# Defer work
empirica decision-log --choice "Defer type hint completion to separate effort" \
--rationale "Functional correctness over type coverage for now" \
--reversibility easily_reversible
Why this matters: Enterprise auditors want to see that decisions were conscious, not accidental. A documented "we chose to keep this" is better than unexplained duplication.
Phase 7: Summarize
Present the audit results to the user:
- Scope — what was audited
- Tool results — aggregate numbers from ruff, vulture, radon, pyright
- Top findings — sorted by impact
- Goals created — the remediation plan
- Decisions recorded — what was accepted/deferred and why
- Unknowns remaining — what needs further investigation
# Show the full picture
empirica goals-list
Output Contract
After /code-audit completes, the following artifacts exist in the Empirica DB:
| Artifact Type | Purpose |
|---|---|
| Findings (impact-scored) | What was discovered — issues, patterns, metrics |
| Unknowns | What needs further investigation before acting |
| Decisions | What was consciously accepted, deferred, or prioritized |
| Goals | Remediation work packages — ready for any praxic agent |
Any agent can then:
empirica goals-list # See the work
empirica preflight-submit - << 'EOF' # Start a transaction
# ... pick up a goal and execute it
Re-Running
The skill is idempotent. Running /code-audit again after remediation shows:
- Which findings are resolved (goals completed)
- Which persist (goals still open)
- New findings from changes (regression detection)
Compare audit runs over time to track code quality trajectory.