description: 'Use when asked to lint, audit, review, or score AI-facing instruction files such as SKILL.md, AGENT.md, AGENTS.md, CLAUDE.md, platform body.md files, prompt files, rules, policies, and agent-facing references. NOT for application code review, harness configuration review, ordinary docs, tests, or generated build output.
' name: reviewing-instructions
Instruction Review
Review AI-facing instruction files for routing precision, behavioral signal, output contracts, failure handling, grounding, and score stability. Do not score ordinary docs or source code.
Read first
references/scoring-rubric.mdfor gates, 0-10 bands, caps, confidence, and output schema.references/model-resolution.mdfor model alias mapping and fallback rules.references/calibration.mdonly when a score is borderline or confidence is low.references/models/<family>.mdonly after model family resolution.
Accepted inputs
The user may pass:
- file path, directory path, or plugin name
- omitted scope, meaning discover likely instruction files
--model <name>to override model family or variant- requests such as lint, audit, review, score, compare, or rerank
Plugin name without a path separator expands to matching src/skills/<name>,
src/agents/<name>, or src/plugins/<name> when present.
Scope boundaries
Review only markdown or prompt files that guide an AI agent or coding assistant. Include support files only when an entrypoint tells the agent to read them or when they live under that skill or agent folder.
Do not review:
- application source code, tests, or generated artifacts
- ordinary README, changelog, product, or design docs unless agent-facing
- harness config quality; use evolving-config
- code quality; use reviewing-code
If a candidate is ambiguous, put it in Candidates Not Reviewed with the reason.
Discovery
Build the review set in this order:
- Explicit paths from the user.
- Entrypoints: SKILL.md, AGENT.md, AGENTS.md, CLAUDE.md.
- Support files referenced by entrypoints: body.md, references, prompt, rules, context, and policy markdown.
- High-confidence agent-facing markdown in agents, skills, prompts, instructions, references, or rules directories.
For a single explicit file, review that file only unless the user asks for linked files. For a directory, include its entrypoint and local support files.
Model resolution
Use references/model-resolution.md.
Resolution order:
--model <name>from the user.- File frontmatter model or platform metadata.
- Parent entrypoint model for support files.
- Tool or target folder family when obvious.
- generic.
Report one line per review set: Model context: <family>/<variant or generic> — source <arg|frontmatter|parent|folder|generic>.
If resolution is ambiguous, use generic and set review confidence to medium or low.
Structural pre-pass
Run the lint script scoped to the review target when Bash is available:
uv run python src/skills/reviewing-instructions/scripts/lint-instructions.py <scope>
If scope is omitted, run the whole-repo pre-pass. If the script ignores scope, filter reported findings to reviewed files before scoring.
If the script fails or is unavailable, record Structural pre-pass: skipped with
the exact reason and continue semantic review.
The pre-pass is advisory. Semantic review and the scoring rubric are authoritative.
Semantic review
For each confirmed file:
- Read the file fully.
- Confirm it is agent-facing.
- Resolve model context.
- Apply hard gates from the scoring rubric.
- Score each dimension using band-first 0-10 anchors.
- Apply caps and confidence rules.
- Rate applicable lint rules as PASS, WARN, or FAIL.
- List the top 1-3 improvements by impact.
Use evidence for every score and finding: section name, line number, exact text, or missing evidence. No evidence, no finding.
Scoring stability rules
- Choose the rubric band first, then choose the midpoint unless evidence justifies an edge.
- Apply caps before computing the final score.
- Round final scores to the nearest 0.5.
- Use low confidence instead of over-precise scoring when context is partial.
- Do not let one polished section hide a missing hard gate.
- For repeated scoring or reranking, use the same scope, model context, and rubric version.
Output
## Instruction Review Report
Model context: <family/variant> — source <source>
Rubric version: <date or file path>
Review confidence: high | medium | low
### Summary
- Files reviewed: N
- Candidates not reviewed: N
- Structural pre-pass: <errors/warnings or skipped reason>
- Score range: X-Y / 10
- Main risk: <one sentence>
### Scores
path/to/file.md — overall X / 10, confidence <high|medium|low>
- Gates: pass | capped at N because <reason>
- Signal Density: X — <evidence>
- Scope Specificity: X — <evidence>
- Output Structure: X — <evidence>
- Format Efficiency: X — <evidence>
- Failure Handling: X — <evidence>
- Grounding Discipline: X — <evidence>
- Routing Precision: X — <evidence>
- Progressive Disclosure: X — <evidence>
- Lint: PASS <ids>; WARN <ids>; FAIL <ids>
### Findings
1. path — <severity> <rule or dimension>: <issue>. Evidence: <section/line/text>. Fix: <concrete fix>.
### Top Improvements
1. <highest-impact change>
2. <next change>
3. <next change>
### Candidates Not Reviewed
- path — <reason>
Omit empty sections. If no findings remain after evidence checks, say No confirmed findings.
Failure handling
- Missing scope and broad review would be expensive: ask one clarifying question.
- Unknown model alias: use generic, report the alias gap, and lower confidence.
- Vendor docs unavailable: use local model reference or generic; do not block review.
- Conflicting local and vendor guidance: local project rules win; report the conflict.
- Parallel or delegated reviews disagree: apply the same gates and caps, then keep the lower-confidence result out of confirmed findings.## Claude platform additions
The host SKILL.md is canonical. This overlay only adds Claude-specific argument, task, and aggregation behavior.
Argument parsing
From $ARGUMENTS:
- first non-flag token: file path, directory path, plugin name, or omitted scope
--model <name>: override model resolution--team: use parallel review agents for large scopes--rerank: use calibration anchors and pairwise comparison when comparing versions
Task use
Use direct review for one file or one small skill folder.
Use parallel tasks only when scope contains multiple independent files or plugins. Launch at most 3 review tasks at once. Batch deterministically by sorted path, not by estimated difficulty.
Each task prompt must include:
- exact file list
- resolved model context or instruction to resolve it
- path to
references/scoring-rubric.md - path to
references/model-resolution.md - requirement to cite evidence for every score
- requirement to apply gates and caps before final score
Aggregation
When task results return:
- Verify each reviewed file was in scope.
- Recompute caps when a task forgot them.
- Deduplicate findings by file plus rule or dimension.
- If two scores differ by more than 1 point, compare gates and caps first.
- Use the lower-confidence score only as a signal; do not average incompatible scopes.
- Put unresolved disagreements in the report as low-confidence notes.
Rerank mode
When $ARGUMENTS contains --rerank, read references/calibration.md.
For two versions of a file, compare gates first, then dimensions, then final score.
If the difference is less than 0.5, report a tie.
Web and model docs
Use web lookup for model docs only when local references are missing or the user asks for current vendor guidance. If web access fails, use local references or generic context and report the gap.