name: skill-audit
description: >
Audit Agent Skills against the agentskills.io spec and authoring best
practices. Use when the user asks to "audit", "triage", "review", "lint",
or "check" a skill, plugin, or marketplace; when reviewing a SKILL.md
before merging it; or when looking for description-tuning, progressive-
disclosure, or Gotchas opportunities. Produces a layered report — L1 spec
compliance (pass/fail), L2 structural metrics (counts), L3 craft
recommendations (ranked) — plus a prioritized quick-wins list. With
--tier local, toolshed, or marketplace it also emits a promotion-readiness
verdict gated on a privacy/genericization scan (real IDs, machine paths, key
material) and L1 spec compliance. Read-only: never modifies the audited
skills. Trigger even when the user doesn't say "audit" if they're asking for a
structured quality review of skill files or whether a skill is ready to promote.
argument-hint: "[||] [--quick-wins-only] [--tier <local|toolshed|marketplace>]"
user-invocable: true
allowed-tools: Read Glob Grep Bash
Skill Audit
Audit Agent Skills against the canonical agentskills.io spec and authoring guides. Produces a layered, read-only report:
- L1 — spec compliance (pass/fail).
- L2 — structural metrics (counts, thresholds).
- L3 — craft recommendations (ranked, judgment-laden).
Plus a prioritized quick-wins shortlist.
Gotchas
- Read
references/criteria.mdfirst. It pins the L1/L2/L3 criteria and the canonical rule-key catalog. Do not score from general intuition. - Every finding must cite a rule key from the catalog. Format:
- [<layer> <severity>] rule: <rule-key> — <one-line what>. The rule key is what makes the report machine-parseable later — keep the format stable. If the catalog doesn't cover a concern, put it in open questions or residual risk instead of inventing ad-hoc strings. - Read-only. Never edit the audited skills, never write into the audit target's directory tree. The report is the only output.
- L1 and L2 are quantitative; L3 is judgment. Do not produce a single
craft "score" — list ranked findings with severities (
high/medium/low). Quote the SKILL.md for each finding when possible. - Don't recommend running the skill. This audit only reads files.
- Token estimate is ~4 chars/token. Use it for L2 thresholds; do not invent a tokenizer.
- Generic advice is anti-gotcha. When evaluating L3 Gotchas quality, flag entries that say "handle errors appropriately" or "follow best practices" — those add no information.
Arguments
$ARGUMENTS is the audit target. Resolve in this order:
- Single skill — a path to a directory containing
SKILL.md, or a path to aSKILL.mdfile directly. Audit just that skill. - Plugin — a path to a directory containing
.claude-plugin/plugin.jsonor.codex-plugin/plugin.json, or askills/subdirectory. Audit everySKILL.mdunderskills/. - Repo / marketplace — a path to a directory containing
.claude-plugin/marketplace.jsonorplugins/. Audit everySKILL.mdunder every plugin. - No argument — treat the current working directory as the target and resolve as above.
Optional flag: --quick-wins-only — skip the per-skill L1/L2/L3 sections
and emit just the cross-cutting quick-wins shortlist.
Optional flag: --tier <local|toolshed|marketplace> — also run the
privacy/genericization rule class and emit a promotion-readiness verdict for
that tier (see Step 6.5). Composes with a full audit or with
--quick-wins-only. This is the input the memento:promote flow consumes as
Gate-1 — it does not push or modify anything.
Procedure
Step 1: Read the criteria
Read references/criteria.md in full. The audit applies its definitions
literally — do not improvise thresholds or rules.
Step 2: Resolve the target and inventory skills
Detect target type:
TARGET="${ARG:-.}"
if [ -f "$TARGET/SKILL.md" ] || [ "$(basename "$TARGET")" = "SKILL.md" ]; then
MODE=skill
elif [ -d "$TARGET/skills" ] || [ -f "$TARGET/.claude-plugin/plugin.json" ]; then
MODE=plugin
else
MODE=repo
fi
Build the skill list:
case "$MODE" in
skill) SKILLS=("$TARGET/SKILL.md") ;;
plugin) SKILLS=( $(find "$TARGET/skills" -name SKILL.md | sort) ) ;;
repo) SKILLS=( $(find "$TARGET" -path '*/skills/*/SKILL.md' | sort) ) ;;
esac
For each SKILL.md collect the inventory row: body lines, body chars (token
estimate = chars / 4), description chars, presence of references/,
scripts/, assets/ directories.
for f in "${SKILLS[@]}"; do
dir=$(dirname "$f")
body_chars=$(awk '/^---$/{c++; next} c>=2{print}' "$f" | wc -c | tr -d ' ')
body_lines=$(awk '/^---$/{c++; next} c>=2{print}' "$f" | wc -l | tr -d ' ')
# description char count, references presence, etc. — Read the file with
# the Read tool for parsing rather than chaining shell more.
done
Read each SKILL.md with the Read tool to parse frontmatter and section
structure. Bash is for size measurement and file enumeration; YAML parsing
and section detection happen in-context.
Step 3: L1 spec compliance
For each skill, check every L1 rule from references/criteria.md. Record
defects per skill. A skill with zero defects gets OK.
Common defects to test:
namemissing, malformed, or not matching directory.namecontains</>or the substringsanthropic/claude.descriptionmissing, empty, or > 1024 chars.descriptioncontains</>.compatibility> 500 chars.licensenot a string when present.allowed-toolspresent but not a string (YAML list ≠ spec — flag as warn).
Step 4: L2 structural metrics
For each skill, compute:
- Body lines, body tokens (chars / 4).
- Description chars.
- Inline fenced code blocks ≥ 30 lines (candidates for
assets/). - Sections (
## …) whose body exceeds 50 lines (candidates forreferences/). - Presence of
references/,assets/,scripts/.
Flag values past the thresholds in references/criteria.md. Do not flag
in-bounds values.
Step 5: L3 craft analysis
For each skill, read the SKILL.md body in full (already in context from Step 2) and produce ranked findings per the L3 criteria. Each finding:
- Severity (
high/medium/low). - One-line what, quoting SKILL.md where possible.
- One-line why referencing the best-practice.
- One-line fix sketch — direction only, do not prescribe wording.
Aim for 0–6 findings per skill. If a skill is in great shape, the section can be brief — do not invent findings to pad.
Step 6: Quick-wins synthesis
Across the audit, surface cross-cutting recommendations sorted by impact ÷ effort. Bias toward changes that:
- Touch many skills (e.g., "add Gotchas to N skills").
- Are mechanical (template extractions, version bumps).
Cap at 5–7 items. Reference the per-skill findings the items roll up from.
Step 6.5: Tier verdict (only when --tier is passed)
Run the privacy / genericization rule class from references/criteria.md
across every audited skill's files (SKILL.md, references/, scripts/,
assets/). Use Grep/Read — recognize concrete instances of each privacy-*
class (machine paths, cloud keys, key material, chat object IDs, internal
emails, customer/tenant/account identifiers, real people, real
groups/affiliations, private/ references — the Memento may be personal, so
people/groups include friends, family, and contacts, not just colleagues and
work teams; recognized semantically, not by regex). This is the agent-time advisory;
the deterministic enforcement is scripts/privacy-scan at the pre-push boundary
— do not invoke it from here, just cite the same rule keys.
Then apply the tier bars table from references/criteria.md and emit one
verdict per audited skill:
- Collect blocking findings for the tier: any
privacy-*finding (hard-block attoolshed/marketplace); any L1 defect (hard requirement attoolshed/marketplace); formarketplace, behavioral-CI + dedup are prerequisites out of audit scope — report them as unmet-unless-proven, not as a pass. - Verdict is
readyonly if every hard requirement for the tier is met; otherwisenot-ready, listing the blocking rule keys.
Mask any real value you cite (e.g. /U…dh) — never reproduce a real ID, path,
or key in the report; the report is itself a public-bound artifact when the
audited target is.
Step 7: Render the report
Use the template in assets/templates/report.md. Match its structure
exactly — the rule-key format on every finding line keeps the report
machine-parseable for any future eval runner. If --quick-wins-only was
passed, render only the inventory + quick-wins sections; skip the
per-skill L1/L2/L3 sections. If --tier was passed, also render the
Tier verdict section from the template (it stays machine-parseable: one
verdict: ready|not-ready line per skill plus the blocking rule keys).
What this skill does not do
- Run the audited skills.
- Modify any audited file.
- Score craft numerically across L3.
- Speculate about behavior not visible in
SKILL.mdand adjacent files.
References
references/criteria.md— pinned L1/L2/L3 criteria and rule-key catalog from agentskills.io.assets/templates/report.md— canonical report shape.