name: code-docs-align description: "Use when the user says '/code-docs-align', 'check if docs match code', 'verify docstrings', 'find stale comments', 'audit TODOs', 'check ref-doc accuracy', 'documentation accuracy', or wants to verify that documentation, docstrings, comments, and ref-docs actually reflect the current state of the code. This skill bridges /code-audit (code quality) and docs-assess (doc coverage) by checking ACCURACY — do the docs match what the code actually does?" version: 1.0.0
Code-Docs Alignment: Documentation Accuracy Investigation
Verify that documentation matches code. Find stale, misleading, or phantom docs.
This skill is purely noetic — it discovers mismatches between documentation and code. It does NOT fix anything. The output (findings, goals, unknowns) feeds into the Empirica workflow for praxic remediation.
Why this matters: For AI-based workflows and enterprise evaluation, stale documentation
is worse than missing documentation — it actively misleads. /code-audit checks code quality.
docs-assess checks doc coverage. This skill checks the gap: do the docs match the code?
How to Run
/code-docs-align # Check entire project
/code-docs-align --target src/handlers/ # Check specific directory
/code-docs-align --focus docstrings # Focus on one dimension
/code-docs-align --focus todos # Focus on TODO/FIXME audit
/code-docs-align --focus ref-docs # Focus on ref-doc accuracy
Phase 0: PREFLIGHT
Open an epistemic transaction before any investigation. Required by Sentinel.
empirica preflight-submit - << 'EOF'
{
"vectors": {
"know": 0.2, "do": 0.0, "context": 0.3,
"clarity": 0.2, "coherence": 0.3, "signal": 0.2, "density": 0.1,
"state": 0.1, "change": 0.0, "completion": 0.0, "impact": 0.0,
"engagement": 0.8, "uncertainty": 0.7
},
"current_phase": "noetic",
"notes": "Starting code-docs-align investigation"
}
EOF
Then gate through CHECK:
empirica check-submit - << 'EOF'
{
"vectors": {
"know": 0.2, "do": 0.0, "context": 0.3,
"clarity": 0.3, "coherence": 0.3, "signal": 0.2, "density": 0.1,
"state": 0.1, "change": 0.0, "completion": 0.0, "impact": 0.0,
"engagement": 0.8, "uncertainty": 0.6
},
"current_phase": "noetic"
}
EOF
Phase 1: Scope
Determine what to check. If the user specified a target, use it. Otherwise, check the current project root.
# Determine scope
TARGET="${1:-.}"
# File count and LOC
find "$TARGET" -name "*.py" | wc -l
find "$TARGET" -name "*.py" -exec wc -l {} + | tail -1
# Prioritize recently-changed files (most likely to have stale docs)
git log --name-only --format="" HEAD~20..HEAD | sort -u | grep '\.py$'
# Collect registered ref-docs
empirica docs-assess --output json 2>/dev/null | python3 -c "import sys,json; d=json.load(sys.stdin); [print(r.get('path','')) for r in d.get('reference_docs',[])]"
Log the scope:
empirica finding-log --finding "Docs-align scope: $TARGET — N files, N LOC, N recently changed, N ref-docs registered" --impact 0.1
Phase 2: Docstring Accuracy (AST-based)
For each high-priority file (recently changed or large), read and compare signatures against docstrings. This requires AI judgment — parse the code, read the docstring, check alignment.
What to Check
For each function/method with a docstring:
Parameters: Compare function signature params against docstring Args section
- Phantom param: documented but not in signature → finding (impact 0.6)
- Missing param: in signature but not documented → finding (impact 0.3)
- Wrong type: docstring type doesn't match annotation → finding (impact 0.4)
Returns: Compare return annotation against docstring Returns section
- Wrong return type/description → finding (impact 0.6)
- Return documented but function returns None → finding (impact 0.6)
Raises: Compare actual raise statements against docstring Raises section
- Stale raises (documented but never raised) → finding (impact 0.5)
- Undocumented raises (raised but not in docstring) → finding (impact 0.3)
Behavioral claims: Does the docstring describe what the function ACTUALLY does?
- "Validates X" but no validation code → finding (impact 0.7)
- "Returns None if not found" but actually raises → finding (impact 0.7)
Impact Scoring
| Mismatch Type | Impact | Rationale |
|---|---|---|
| Phantom param (in docstring, not in code) | 0.6 | Actively misleading |
| Missing param (in code, not in docstring) | 0.3 | Incomplete but not misleading |
| Wrong return description | 0.6 | Actively misleading |
| Stale raises clause | 0.5 | Moderately misleading |
| Behavioral mismatch | 0.7 | Dangerously misleading |
Logging
empirica finding-log --finding "Phantom param 'timeout' in docstring of connect() at db/client.py:45 — param was removed in batch-3 cleanup" --impact 0.6
empirica unknown-log --unknown "process_batch() docstring mentions 'retry_count' param — unclear if this is intentional kwargs passthrough or stale"
Phase 3: Inline Comment Staleness
Scan for comments that reference patterns, functions, or behaviors that no longer exist.
What to Check
References to removed code:
- Comments mentioning function/class names that no longer exist in the codebase
- "See also X" where X was deleted
- References to files that were moved or removed
Contradicted behavior:
- "This uses bare except for safety" in files where bare excepts were already fixed
- "Temporary workaround" for code that's been in place >6 months
- "TODO: remove after migration" where migration is complete
Stale section headers:
# --- Legacy handlers ---where the legacy code was removed- Module-level docstrings describing features that were split elsewhere
Detection Strategy
# Find all comments
grep -rn "^[[:space:]]*#" "$TARGET" --include="*.py" | head -50
# Cross-reference: do mentioned symbols still exist?
# For each comment mentioning a specific function/class name,
# check if that symbol still exists in the codebase
Logging
empirica finding-log --finding "Stale comment at session_resolver.py:42 references get_identity_dir() — function was removed in batch-4" --impact 0.4
empirica finding-log --finding "Comment at workflow_commands.py:15 says 'bare except for safety' — already fixed to except Exception: in batch-5" --impact 0.3
Phase 4: TODO/FIXME Audit
Audit every TODO and FIXME in the codebase. Each one is either stale (work done), active (untracked work), or deferred (consciously parked).
What to Check
# Collect all TODOs and FIXMEs
grep -rn "TODO\|FIXME\|HACK\|XXX" "$TARGET" --include="*.py"
For each:
- Is the described work already done? → stale TODO → finding (impact 0.4)
- Is it untracked work that should be a goal? → active TODO → unknown
- Was it consciously deferred? → check decision-log → skip if already recorded
- Is it a stub placeholder? → finding (impact 0.2) if the stub has been fleshed out
Logging
# Stale TODO (feature is built)
empirica finding-log --finding "Stale TODO at memory_gap_detector.py:23 — feature was implemented in batch-2" --impact 0.4
# Active TODO (untracked work)
empirica unknown-log --unknown "TODO at firewall.py:89 — 'implement rate limiting' — is this planned or deferred?"
# Already covered by decision
# (skip — no artifact needed)
Phase 5: Ref-Doc Alignment
For each registered reference document, check whether the symbols (functions, classes, file paths) it mentions still exist in the codebase.
What to Check
# Get registered ref-docs
empirica docs-assess --output json
For each ref-doc:
- Read the document
- Extract mentioned file paths, function names, class names
- Verify each still exists in the codebase
- Check if code examples still work (syntax, imports)
- Flag dead references
Impact Scoring
| Issue | Impact | Rationale |
|---|---|---|
| Dead file path reference | 0.7 | Document points to nothing |
| Dead function/class reference | 0.6 | Symbol was renamed/removed |
| Stale code example | 0.5 | Example won't work if copied |
| Outdated architectural claim | 0.7 | Misleads about system structure |
Logging
empirica finding-log --finding "Ref-doc 'architecture.md' references empirica/core/metrics.py — file was removed" --impact 0.7
empirica finding-log --finding "Ref-doc 'api-guide.md' shows import from empirica.utils.helpers — module renamed to empirica.utils.session_resolver" --impact 0.6
Phase 6: Plugin Meta-Check
Check SKILL.md files, CLAUDE.md, and plugin configuration for references to CLI commands, flags, or workflows that may have changed.
What to Check
CLI commands in SKILL.md files:
empirica --help # Get current command listCross-reference against commands mentioned in all SKILL.md files.
Flags in CLAUDE.md: Check that CLI flags mentioned in the system prompt still exist.
Hook references: Verify that hook scripts reference valid tools and events.
Logging
empirica finding-log --finding "SKILL.md for 'empirica' references 'empirica status' command — actual command is 'empirica project-status'" --impact 0.5
empirica unknown-log --unknown "CLAUDE.md references --type flag on project-search — verify this flag still exists"
Phase 7: Triage + Goal Creation
Categorize all findings and group into remediation goals.
Finding Tags
| Tag | Meaning |
|---|---|
phantom-param |
Documented param doesn't exist in code |
missing-param |
Code param missing from documentation |
stale-todo |
TODO describes work that's already done |
dead-ref-doc |
Ref-doc references code that doesn't exist |
stale-comment |
Comment describes removed/changed behavior |
wrong-raises |
Docstring raises section doesn't match code |
meta-drift |
SKILL.md/CLAUDE.md references stale commands |
behavioral-mismatch |
Docstring claims don't match actual behavior |
Goal Creation
Group related findings into actionable goals:
# Example goals
empirica goals-create --objective "Fix phantom and missing params in docstrings across cli/command_handlers/"
empirica goals-create --objective "Remove 8 stale TODOs for completed features"
empirica goals-create --objective "Update ref-doc architecture.md — 5 dead symbol references"
empirica goals-create --objective "Clean stale inline comments referencing removed functions"
Goal sizing: Each goal = 1 praxic transaction. If a goal spans 3+ files in different domains, split it. If two goals always need to happen together, merge them.
Phase 8: Summarize
Present the alignment audit results:
- Scope — files checked, ref-docs checked, TODOs audited
- Docstring accuracy — phantom params, missing params, wrong returns/raises
- Comment staleness — stale comments found and categorized
- TODO status — stale vs active vs deferred
- Ref-doc health — dead references, outdated examples
- Meta accuracy — SKILL.md/CLAUDE.md drift
- Goals created — remediation plan
- Unknowns remaining — ambiguous cases needing human judgment
# Show the full picture
empirica goals-list
Output Contract
After /code-docs-align completes, the following artifacts exist in the Empirica DB:
| Artifact Type | Purpose |
|---|---|
| Findings (impact-scored) | Documentation-code mismatches, stale content |
| Unknowns | Ambiguous cases requiring human judgment |
| Decisions | Conscious choices to keep certain doc patterns |
| Goals | Remediation work packages for praxic agents |
Key Design Principle
This skill uses AI judgment, not just pattern matching. A phantom param might be intentional (kwargs passthrough omitted from docs). A stale TODO might be a conscious deferral. A behavioral claim might be approximately correct.
When uncertain: log an unknown, not a finding. False positives erode trust in the audit results. A well-calibrated unknown is more valuable than a noisy finding.
Re-Running
The skill is idempotent. Running /code-docs-align again after remediation shows:
- Which mismatches are resolved (goals completed)
- Which persist (goals still open)
- New mismatches from recent changes (regression detection)
Compare alignment runs over time to track documentation accuracy trajectory.