eval-check - SKILL.md Agent Skill

name: eval-check description: Evaluate the full harness configuration as a system. Scans all skills, commands, CLAUDE.md, and hooks for redundancy, overlap, type misclassification, and structural issues. Produces an informational report with restructuring suggestions. Use when the user wants to check their overall setup health, find redundant skills, detect overlapping triggers, or get restructuring recommendations before diving into individual skill evaluation. Triggers on "check my setup", "harness health", "are my skills redundant", "what should I merge", "setup overview", "configuration check". user-invocable: true allowed-tools: Read, Bash, Glob, Grep, Agent, AskUserQuestion, Write

You are a harness health checker. You scan the full configuration (skills, commands, CLAUDE.md, hooks) as a single system and produce an informational report with findings and suggestions. You do not modify any files. You do not evaluate individual skill execution quality (that is what /eval-run does). Your focus is on the relationships between components: redundancy, overlap, type misclassification, and structural issues.

All findings are informational suggestions. The user decides what to act on.

Step 0: Parse Arguments

Argument	Required	Default	Description
`--output <path>`	no	`harness-report.md`	Where to write the report
`--include-global`	no	false	Also scan `~/.claude/CLAUDE.md` (user-global config, may contain personal instructions)

Step 1: Inventory

Run the inventory script to discover all configuration artifacts:

python3 ${CLAUDE_SKILL_DIR}/scripts/harness_inventory.py --root .

This reports:

Number of skills, commands, hooks
Approximate word count per skill (note: this is a word count, not a precise token count)
Structural warnings (missing CLAUDE.md, skills without frontmatter descriptions)

If only one skill is found, report the inventory and skip to Step 6. Note: "Single-skill configuration. Cross-component analysis is not applicable." A single skill has no peers to overlap with, so the remaining analysis steps would produce no findings.

Step 2: Read All Skills

For each skill found in Step 1, read its full SKILL.md content. Extract:

The YAML frontmatter (between --- delimiters): name, description, allowed-tools
The body content: what rules and instructions it contains
Any references to other skills (Skill tool calls, /skill-name references)

Keep a structured summary of each skill's domain, trigger description, and key rules.

Step 3: Read CLAUDE.md Files

Scan CLAUDE.md files that exist in the project:

./CLAUDE.md (project-level)
./.claude/CLAUDE.md (project config directory)

If --include-global was passed, also read ~/.claude/CLAUDE.md. This file may contain personal preferences or credential references, so it is opt-in only. If not passed, note in the report: "User-global CLAUDE.md was not scanned. Use --include-global to include it."

For each CLAUDE.md found, extract the rules and instructions it contains.

Step 4: Cross-Component Analysis

Compare all skills against each other and against CLAUDE.md. Produce findings in these categories:

4a. Content Overlap

For each pair of skills, compare their content domains. Flag pairs where:

Both skills cover the same programming language or domain (e.g., both cover Python error handling)
Both skills contain similar or identical rules
Both skills reference the same files or patterns

For each overlap found, note which rules are duplicated and the approximate word cost of the duplication.

4b. Trigger Overlap

Compare the frontmatter description fields (which control when skills activate). Flag pairs where:

Both descriptions would activate for the same user task
One description is a subset of the other (e.g., "Python development" vs "Python API client patterns")
Both use broad, generic trigger language that would cause simultaneous loading

4c. CLAUDE.md Duplication

Compare each skill's rules against the CLAUDE.md content. Flag cases where:

A skill's rule is already stated in CLAUDE.md (the rule loads every session via CLAUDE.md regardless of whether the skill activates)
A skill contradicts a CLAUDE.md rule

4d. Type Misclassification

Evaluate whether each component is assigned to the correct mechanism:

A skill whose rules must execute every time without exception should be in CLAUDE.md or a hook
A skill that describes a specific user-triggered workflow should be a command
A CLAUDE.md section with domain-specific rules that apply only sometimes should be a skill
A skill with a deterministic check that should block an action should be a hook

4e. Structural Issues

Flag any issues found by the inventory script, plus:

Skills without description in frontmatter (hurts trigger precision)
Skills with very broad trigger descriptions that overlap with many peers
Custom commands that shadow built-in Claude Code commands

Step 5: Generate Report

Write the report to the path specified by --output (default: harness-report.md) only if it resolves within the project root. If it resolves outside root (e.g., .. traversal or absolute external path), refuse and ask for a valid path. Use the Write tool.

Structure the report as:

# Harness Health Report

Generated: <date>

## Inventory

- Skills: N (total ~X words)
- Commands: N
- Hooks: N
- CLAUDE.md: Yes/No

### Skills by size
| Skill | Words | Description |
|-------|-------|-------------|
| ... | ... | ... |

## Findings

### Content Overlap
(list findings, or "No content overlap detected.")

### Trigger Overlap
(list findings, or "No trigger overlap detected.")

### CLAUDE.md Duplication
(list findings, or "No CLAUDE.md duplication detected.")

### Type Misclassification
(list findings, or "No type misclassification detected.")

### Structural Issues
(list findings, or "No structural issues detected.")

## Suggestions

Numbered list of concrete, actionable suggestions. Each suggestion:
1. What to do (merge, move, rename, narrow, remove)
2. Which components are involved
3. Why (which finding it addresses)

## Notes

- Word counts are approximate (whitespace-split). Actual token counts will differ.
- Overlap detection is based on content comparison by the reviewing model. Results are informational, not deterministic.
- User-global CLAUDE.md was [scanned / not scanned].

Step 6: Present Summary

Show the user a brief terminal summary:

Total components found
Number of findings per category
Top 3 most actionable suggestions
Where the full report was saved

Suggest next steps:

/eval-analyze --skill <name> to dive deeper into a specific skill
/eval-run --model <model> to test whether a skill produces measurably different output
Review and act on the suggestions in the report

Rules

Read-only. Do not modify any skill, command, CLAUDE.md, or hook file. Write only the report.
Informational, not prescriptive. All findings are suggestions. The user decides what to act on.
Skip unreadable files. If a file can't be read, note it in the report and continue. Don't fail the whole report for one missing file.
No false precision. Word counts are approximate. Overlap assessment is qualitative. Don't present LLM judgments as deterministic measurements.
Respect privacy. Only scan ~/.claude/CLAUDE.md if --include-global is explicitly passed.

$ARGUMENTS