code-audit - SKILL.md Agent Skill

name: code-audit description: "Use when the user says '/code-audit', 'audit this code', 'check code quality', 'find duplication', 'find dead code', 'code cleanup', 'technical debt audit', 'code review module', or wants a structured noetic investigation of code quality. This skill runs external analysis tools and structured manual review, producing Empirica artifacts (findings, goals, decisions) that any praxic agent can execute." version: 1.0.0

Code Audit: Noetic Investigation Skill

Investigate code quality. Produce structured remediation plans.

This skill is purely noetic — it discovers, triages, and plans. It does NOT make changes. The output (findings, goals, decisions) feeds directly into the Empirica workflow for any praxic agent to pick up and execute.

How to Run

/code-audit                              # Audit entire project
/code-audit --target src/handlers/       # Audit specific directory
/code-audit --target src/auth.py         # Audit specific file
/code-audit --focus duplication          # Focus on one dimension

Phase 1: Scope

Determine what to audit. If the user specified a target, use it. Otherwise, audit the current project root.

# Determine scope
TARGET="${1:-.}"  # Default to current directory

# Quick size assessment
find "$TARGET" -name "*.py" | wc -l          # File count
find "$TARGET" -name "*.py" -exec wc -l {} + | tail -1  # Total LOC

Log the audit scope:

empirica finding-log --finding "Audit scope: $TARGET — N files, N LOC" --impact 0.1

Phase 2: Automated Tool Passes

Run external tools and parse results into findings. Each tool covers a different dimension. Skip any tool that isn't installed — the audit still works without it.

2a. Linting & Style (ruff)

ruff check "$TARGET" --statistics --output-format json 2>/dev/null

Parse results:

Group by rule category (import order, unused imports, complexity, etc.)
Log aggregates as findings, not individual violations
Impact scoring: unused imports = 0.2, complexity violations = 0.5, security = 0.8

# Example: aggregate finding
empirica finding-log --finding "ruff: 23 unused imports across 8 files (F401)" --impact 0.2
empirica finding-log --finding "ruff: 5 functions exceed complexity limit (C901)" --impact 0.5

2b. Dead Code (vulture)

vulture "$TARGET" --min-confidence 80

Parse results:

Filter out false positives (dynamically called functions, CLI entry points)
Log confirmed dead code as findings
Log uncertain cases as unknowns

empirica finding-log --finding "vulture: 12 unused functions detected (80%+ confidence)" --impact 0.4
empirica unknown-log --unknown "vulture flagged handle_legacy_sync() — verify if called dynamically"

2c. Complexity Metrics (radon)

radon cc "$TARGET" -s -a --min C    # Only show functions rated C or worse
radon mi "$TARGET" -s --min B       # Maintainability index

Parse results:

Functions rated C (11-20) → finding with impact 0.4
Functions rated D (21-30) → finding with impact 0.6
Functions rated F (31+) → finding with impact 0.8
Log the worst offenders by name and file

empirica finding-log --finding "radon: process_vectors() in workflow_commands.py has CC=27 (D)" --impact 0.6

2d. Type Errors (pyright)

pyright "$TARGET" --outputjson 2>/dev/null

Parse results:

Count errors by category (missing types, incompatible types, possibly unbound)
Log aggregates, not individual errors

empirica finding-log --finding "pyright: 45 type errors — 20 missing annotations, 15 incompatible, 10 unbound" --impact 0.3

Phase 3: Structural Review (Manual)

These dimensions require AI judgment — tools can't catch them.

3a. File Size Analysis

find "$TARGET" -name "*.py" -exec wc -l {} + | sort -rn | head -20

Thresholds:

>2000 LOC → finding with impact 0.5 ("needs splitting")
>1000 LOC → finding with impact 0.3 ("consider splitting")
>500 LOC → note only

empirica finding-log --finding "project_commands.py: 4214 LOC — contains 17 unrelated handlers, needs splitting" --impact 0.6

3b. Duplication Detection

Search for functions defined in multiple places:

# Find duplicated function definitions
grep -rn "^def " "$TARGET" | awk -F: '{print $NF}' | sort | uniq -c | sort -rn | head -20

# Find copy-pasted patterns (same function name in multiple files)
grep -rn "def get_instance_id\|def find_project_root\|def resolve_project" "$TARGET"

For each duplicate: Read both copies. Assess whether they're identical, similar, or intentionally different. Log accordingly:

# Identical copies
empirica finding-log --finding "get_instance_id() duplicated identically in 6 files" --impact 0.6

# Intentionally different (record WHY)
empirica decision-log --choice "Keep separate find_project_root() in hooks" \
  --rationale "Hooks must be standalone — can't import from package" \
  --reversibility exploratory

3c. Module Boundary Assessment

Read the import graph. Check for:

Circular imports — A imports B, B imports A
Layer violations — CLI importing from data, data importing from core
God modules — single file that everything imports

# Check for late imports (circular dependency workaround)
grep -rn "import " "$TARGET" | grep "def \|if " | head -20

3d. Naming & Convention Consistency

Spot check:

Mixed naming styles (snake_case vs camelCase)
Inconsistent patterns (some files use handle_X_command, others use X_handler)
Magic strings/numbers without constants

3e. Error Handling

# Find bare exception handlers
grep -rn "except Exception" "$TARGET" | grep "pass"
grep -rn "except:" "$TARGET"

empirica finding-log --finding "12 bare 'except Exception: pass' handlers — errors silently swallowed" --impact 0.5

Phase 4: Triage

Review all findings. Assign categories and prioritize.

Categories:

Tag	Meaning
`duplication`	Same code in multiple places
`complexity`	Function/file too complex or too large
`dead-code`	Unused code that should be removed
`consistency`	Naming, patterns, conventions don't match
`architecture`	Module boundaries, layering, coupling
`reliability`	Error handling, edge cases, silent failures
`security`	Input validation, injection risks

Impact scoring guide:

Score	Meaning
0.1-0.3	Cosmetic — style, minor inconsistency
0.4-0.6	Structural — duplication, complexity, dead code
0.7-0.9	Critical — bug source, security risk, architecture violation

Phase 5: Create Remediation Goals

Group related findings into actionable goals. Each goal should be independently executable by a praxic agent.

Goal template:

empirica goals-create --objective "Split project_commands.py into focused modules (artifact_log, workspace, ecosystem)"

Good goals have:

Clear scope — which files, what changes
Success criteria — how to verify it's done
Linked findings — why this goal exists (reference finding IDs)
Dependencies — what must happen first

Goal sizing:

Each goal = 1 transaction of praxic work
If a goal would take 3+ transactions, split it further
If two goals always need to happen together, merge them

Phase 6: Record Decisions

Not every finding needs a fix. Some are acceptable trade-offs. Record these explicitly:

# Accept a trade-off
empirica decision-log --choice "Accept hook utility duplication" \
  --rationale "Hooks must be standalone — extracting to shared module is the fix, not eliminating all copies" \
  --reversibility exploratory

# Defer work
empirica decision-log --choice "Defer type hint completion to separate effort" \
  --rationale "Functional correctness over type coverage for now" \
  --reversibility easily_reversible

Why this matters: Enterprise auditors want to see that decisions were conscious, not accidental. A documented "we chose to keep this" is better than unexplained duplication.

Phase 7: Summarize

Present the audit results to the user:

Scope — what was audited
Tool results — aggregate numbers from ruff, vulture, radon, pyright
Top findings — sorted by impact
Goals created — the remediation plan
Decisions recorded — what was accepted/deferred and why
Unknowns remaining — what needs further investigation

# Show the full picture
empirica goals-list

Output Contract

After /code-audit completes, the following artifacts exist in the Empirica DB:

Artifact Type	Purpose
Findings (impact-scored)	What was discovered — issues, patterns, metrics
Unknowns	What needs further investigation before acting
Decisions	What was consciously accepted, deferred, or prioritized
Goals	Remediation work packages — ready for any praxic agent

Any agent can then:

empirica goals-list                    # See the work
empirica preflight-submit - << 'EOF'   # Start a transaction
# ... pick up a goal and execute it

Re-Running

The skill is idempotent. Running /code-audit again after remediation shows:

Which findings are resolved (goals completed)
Which persist (goals still open)
New findings from changes (regression detection)

Compare audit runs over time to track code quality trajectory.