code-docs-align - SKILL.md Agent Skill

name: code-docs-align description: "Use when the user says '/code-docs-align', 'check if docs match code', 'verify docstrings', 'find stale comments', 'audit TODOs', 'check ref-doc accuracy', 'documentation accuracy', or wants to verify that documentation, docstrings, comments, and ref-docs actually reflect the current state of the code. This skill bridges /code-audit (code quality) and docs-assess (doc coverage) by checking ACCURACY — do the docs match what the code actually does?" version: 1.0.0

Code-Docs Alignment: Documentation Accuracy Investigation

Verify that documentation matches code. Find stale, misleading, or phantom docs.

This skill is purely noetic — it discovers mismatches between documentation and code. It does NOT fix anything. The output (findings, goals, unknowns) feeds into the Empirica workflow for praxic remediation.

Why this matters: For AI-based workflows and enterprise evaluation, stale documentation is worse than missing documentation — it actively misleads. /code-audit checks code quality. docs-assess checks doc coverage. This skill checks the gap: do the docs match the code?

How to Run

/code-docs-align                              # Check entire project
/code-docs-align --target src/handlers/       # Check specific directory
/code-docs-align --focus docstrings           # Focus on one dimension
/code-docs-align --focus todos                # Focus on TODO/FIXME audit
/code-docs-align --focus ref-docs             # Focus on ref-doc accuracy

Phase 0: PREFLIGHT

Open an epistemic transaction before any investigation. Required by Sentinel.

empirica preflight-submit - << 'EOF'
{
  "vectors": {
    "know": 0.2, "do": 0.0, "context": 0.3,
    "clarity": 0.2, "coherence": 0.3, "signal": 0.2, "density": 0.1,
    "state": 0.1, "change": 0.0, "completion": 0.0, "impact": 0.0,
    "engagement": 0.8, "uncertainty": 0.7
  },
  "current_phase": "noetic",
  "notes": "Starting code-docs-align investigation"
}
EOF

Then gate through CHECK:

empirica check-submit - << 'EOF'
{
  "vectors": {
    "know": 0.2, "do": 0.0, "context": 0.3,
    "clarity": 0.3, "coherence": 0.3, "signal": 0.2, "density": 0.1,
    "state": 0.1, "change": 0.0, "completion": 0.0, "impact": 0.0,
    "engagement": 0.8, "uncertainty": 0.6
  },
  "current_phase": "noetic"
}
EOF

Phase 1: Scope

Determine what to check. If the user specified a target, use it. Otherwise, check the current project root.

# Determine scope
TARGET="${1:-.}"

# File count and LOC
find "$TARGET" -name "*.py" | wc -l
find "$TARGET" -name "*.py" -exec wc -l {} + | tail -1

# Prioritize recently-changed files (most likely to have stale docs)
git log --name-only --format="" HEAD~20..HEAD | sort -u | grep '\.py$'

# Collect registered ref-docs
empirica docs-assess --output json 2>/dev/null | python3 -c "import sys,json; d=json.load(sys.stdin); [print(r.get('path','')) for r in d.get('reference_docs',[])]"

Log the scope:

empirica finding-log --finding "Docs-align scope: $TARGET — N files, N LOC, N recently changed, N ref-docs registered" --impact 0.1

Phase 2: Docstring Accuracy (AST-based)

For each high-priority file (recently changed or large), read and compare signatures against docstrings. This requires AI judgment — parse the code, read the docstring, check alignment.

What to Check

For each function/method with a docstring:

Parameters: Compare function signature params against docstring Args section
- Phantom param: documented but not in signature → finding (impact 0.6)
- Missing param: in signature but not documented → finding (impact 0.3)
- Wrong type: docstring type doesn't match annotation → finding (impact 0.4)
Returns: Compare return annotation against docstring Returns section
- Wrong return type/description → finding (impact 0.6)
- Return documented but function returns None → finding (impact 0.6)
Raises: Compare actual raise statements against docstring Raises section
- Stale raises (documented but never raised) → finding (impact 0.5)
- Undocumented raises (raised but not in docstring) → finding (impact 0.3)
Behavioral claims: Does the docstring describe what the function ACTUALLY does?
- "Validates X" but no validation code → finding (impact 0.7)
- "Returns None if not found" but actually raises → finding (impact 0.7)

Impact Scoring

Mismatch Type	Impact	Rationale
Phantom param (in docstring, not in code)	0.6	Actively misleading
Missing param (in code, not in docstring)	0.3	Incomplete but not misleading
Wrong return description	0.6	Actively misleading
Stale raises clause	0.5	Moderately misleading
Behavioral mismatch	0.7	Dangerously misleading

Logging

empirica finding-log --finding "Phantom param 'timeout' in docstring of connect() at db/client.py:45 — param was removed in batch-3 cleanup" --impact 0.6
empirica unknown-log --unknown "process_batch() docstring mentions 'retry_count' param — unclear if this is intentional kwargs passthrough or stale"

Phase 3: Inline Comment Staleness

Scan for comments that reference patterns, functions, or behaviors that no longer exist.

What to Check

References to removed code:
- Comments mentioning function/class names that no longer exist in the codebase
- "See also X" where X was deleted
- References to files that were moved or removed
Contradicted behavior:
- "This uses bare except for safety" in files where bare excepts were already fixed
- "Temporary workaround" for code that's been in place >6 months
- "TODO: remove after migration" where migration is complete
Stale section headers:
- # --- Legacy handlers --- where the legacy code was removed
- Module-level docstrings describing features that were split elsewhere

Detection Strategy

# Find all comments
grep -rn "^[[:space:]]*#" "$TARGET" --include="*.py" | head -50

# Cross-reference: do mentioned symbols still exist?
# For each comment mentioning a specific function/class name,
# check if that symbol still exists in the codebase

Logging

empirica finding-log --finding "Stale comment at session_resolver.py:42 references get_identity_dir() — function was removed in batch-4" --impact 0.4
empirica finding-log --finding "Comment at workflow_commands.py:15 says 'bare except for safety' — already fixed to except Exception: in batch-5" --impact 0.3

Phase 4: TODO/FIXME Audit

Audit every TODO and FIXME in the codebase. Each one is either stale (work done), active (untracked work), or deferred (consciously parked).

What to Check

# Collect all TODOs and FIXMEs
grep -rn "TODO\|FIXME\|HACK\|XXX" "$TARGET" --include="*.py"

For each:

Is the described work already done? → stale TODO → finding (impact 0.4)
Is it untracked work that should be a goal? → active TODO → unknown
Was it consciously deferred? → check decision-log → skip if already recorded
Is it a stub placeholder? → finding (impact 0.2) if the stub has been fleshed out

Logging

# Stale TODO (feature is built)
empirica finding-log --finding "Stale TODO at memory_gap_detector.py:23 — feature was implemented in batch-2" --impact 0.4

# Active TODO (untracked work)
empirica unknown-log --unknown "TODO at firewall.py:89 — 'implement rate limiting' — is this planned or deferred?"

# Already covered by decision
# (skip — no artifact needed)

Phase 5: Ref-Doc Alignment

For each registered reference document, check whether the symbols (functions, classes, file paths) it mentions still exist in the codebase.

What to Check

# Get registered ref-docs
empirica docs-assess --output json

For each ref-doc:

Read the document
Extract mentioned file paths, function names, class names
Verify each still exists in the codebase
Check if code examples still work (syntax, imports)
Flag dead references

Impact Scoring

Issue	Impact	Rationale
Dead file path reference	0.7	Document points to nothing
Dead function/class reference	0.6	Symbol was renamed/removed
Stale code example	0.5	Example won't work if copied
Outdated architectural claim	0.7	Misleads about system structure

Logging

empirica finding-log --finding "Ref-doc 'architecture.md' references empirica/core/metrics.py — file was removed" --impact 0.7
empirica finding-log --finding "Ref-doc 'api-guide.md' shows import from empirica.utils.helpers — module renamed to empirica.utils.session_resolver" --impact 0.6

Phase 6: Plugin Meta-Check

Check SKILL.md files, CLAUDE.md, and plugin configuration for references to CLI commands, flags, or workflows that may have changed.

What to Check

CLI commands in SKILL.md files:
```
empirica --help   # Get current command list
```
Cross-reference against commands mentioned in all SKILL.md files.
Flags in CLAUDE.md: Check that CLI flags mentioned in the system prompt still exist.
Hook references: Verify that hook scripts reference valid tools and events.

Logging

empirica finding-log --finding "SKILL.md for 'empirica' references 'empirica status' command — actual command is 'empirica project-status'" --impact 0.5
empirica unknown-log --unknown "CLAUDE.md references --type flag on project-search — verify this flag still exists"

Phase 7: Triage + Goal Creation

Categorize all findings and group into remediation goals.

Finding Tags

Tag	Meaning
`phantom-param`	Documented param doesn't exist in code
`missing-param`	Code param missing from documentation
`stale-todo`	TODO describes work that's already done
`dead-ref-doc`	Ref-doc references code that doesn't exist
`stale-comment`	Comment describes removed/changed behavior
`wrong-raises`	Docstring raises section doesn't match code
`meta-drift`	SKILL.md/CLAUDE.md references stale commands
`behavioral-mismatch`	Docstring claims don't match actual behavior

Goal Creation

Group related findings into actionable goals:

# Example goals
empirica goals-create --objective "Fix phantom and missing params in docstrings across cli/command_handlers/"
empirica goals-create --objective "Remove 8 stale TODOs for completed features"
empirica goals-create --objective "Update ref-doc architecture.md — 5 dead symbol references"
empirica goals-create --objective "Clean stale inline comments referencing removed functions"

Goal sizing: Each goal = 1 praxic transaction. If a goal spans 3+ files in different domains, split it. If two goals always need to happen together, merge them.

Phase 8: Summarize

Present the alignment audit results:

Scope — files checked, ref-docs checked, TODOs audited
Docstring accuracy — phantom params, missing params, wrong returns/raises
Comment staleness — stale comments found and categorized
TODO status — stale vs active vs deferred
Ref-doc health — dead references, outdated examples
Meta accuracy — SKILL.md/CLAUDE.md drift
Goals created — remediation plan
Unknowns remaining — ambiguous cases needing human judgment

# Show the full picture
empirica goals-list

Output Contract

After /code-docs-align completes, the following artifacts exist in the Empirica DB:

Artifact Type	Purpose
Findings (impact-scored)	Documentation-code mismatches, stale content
Unknowns	Ambiguous cases requiring human judgment
Decisions	Conscious choices to keep certain doc patterns
Goals	Remediation work packages for praxic agents

Key Design Principle

This skill uses AI judgment, not just pattern matching. A phantom param might be intentional (kwargs passthrough omitted from docs). A stale TODO might be a conscious deferral. A behavioral claim might be approximately correct.

When uncertain: log an unknown, not a finding. False positives erode trust in the audit results. A well-calibrated unknown is more valuable than a noisy finding.

Re-Running

The skill is idempotent. Running /code-docs-align again after remediation shows:

Which mismatches are resolved (goals completed)
Which persist (goals still open)
New mismatches from recent changes (regression detection)

Compare alignment runs over time to track documentation accuracy trajectory.