reviewing-instructions - SKILL.md Agent Skill

description: 'Use when asked to lint, audit, review, or score AI-facing instruction files such as SKILL.md, AGENT.md, AGENTS.md, CLAUDE.md, platform body.md files, prompt files, rules, policies, and agent-facing references. NOT for application code review, harness configuration review, ordinary docs, tests, or generated build output.

' name: reviewing-instructions

Instruction Review

Review AI-facing instruction files for routing precision, behavioral signal, output contracts, failure handling, grounding, and score stability. Do not score ordinary docs or source code.

Read first

references/scoring-rubric.md for gates, 0-10 bands, caps, confidence, and output schema.
references/model-resolution.md for model alias mapping and fallback rules.
references/calibration.md only when a score is borderline or confidence is low.
references/models/<family>.md only after model family resolution.

Accepted inputs

The user may pass:

file path, directory path, or plugin name
omitted scope, meaning discover likely instruction files
--model <name> to override model family or variant
requests such as lint, audit, review, score, compare, or rerank

Plugin name without a path separator expands to matching src/skills/<name>, src/agents/<name>, or src/plugins/<name> when present.

Scope boundaries

Review only markdown or prompt files that guide an AI agent or coding assistant. Include support files only when an entrypoint tells the agent to read them or when they live under that skill or agent folder.

Do not review:

application source code, tests, or generated artifacts
ordinary README, changelog, product, or design docs unless agent-facing
harness config quality; use evolving-config
code quality; use reviewing-code

If a candidate is ambiguous, put it in Candidates Not Reviewed with the reason.

Discovery

Build the review set in this order:

Explicit paths from the user.
Entrypoints: SKILL.md, AGENT.md, AGENTS.md, CLAUDE.md.
Support files referenced by entrypoints: body.md, references, prompt, rules, context, and policy markdown.
High-confidence agent-facing markdown in agents, skills, prompts, instructions, references, or rules directories.

For a single explicit file, review that file only unless the user asks for linked files. For a directory, include its entrypoint and local support files.

Model resolution

Use references/model-resolution.md.

Resolution order:

--model <name> from the user.
File frontmatter model or platform metadata.
Parent entrypoint model for support files.
Tool or target folder family when obvious.
generic.

Report one line per review set: Model context: <family>/<variant or generic> — source <arg|frontmatter|parent|folder|generic>.

If resolution is ambiguous, use generic and set review confidence to medium or low.

Structural pre-pass

Run the lint script scoped to the review target when Bash is available:

uv run python src/skills/reviewing-instructions/scripts/lint-instructions.py <scope>

If scope is omitted, run the whole-repo pre-pass. If the script ignores scope, filter reported findings to reviewed files before scoring.

If the script fails or is unavailable, record Structural pre-pass: skipped with the exact reason and continue semantic review.

The pre-pass is advisory. Semantic review and the scoring rubric are authoritative.

Semantic review

For each confirmed file:

Read the file fully.
Confirm it is agent-facing.
Resolve model context.
Apply hard gates from the scoring rubric.
Score each dimension using band-first 0-10 anchors.
Apply caps and confidence rules.
Rate applicable lint rules as PASS, WARN, or FAIL.
List the top 1-3 improvements by impact.

Use evidence for every score and finding: section name, line number, exact text, or missing evidence. No evidence, no finding.

Scoring stability rules

Choose the rubric band first, then choose the midpoint unless evidence justifies an edge.
Apply caps before computing the final score.
Round final scores to the nearest 0.5.
Use low confidence instead of over-precise scoring when context is partial.
Do not let one polished section hide a missing hard gate.
For repeated scoring or reranking, use the same scope, model context, and rubric version.

Output

## Instruction Review Report

Model context: <family/variant> — source <source>
Rubric version: <date or file path>
Review confidence: high | medium | low

### Summary

- Files reviewed: N
- Candidates not reviewed: N
- Structural pre-pass: <errors/warnings or skipped reason>
- Score range: X-Y / 10
- Main risk: <one sentence>

### Scores

path/to/file.md — overall X / 10, confidence <high|medium|low>

- Gates: pass | capped at N because <reason>
- Signal Density: X — <evidence>
- Scope Specificity: X — <evidence>
- Output Structure: X — <evidence>
- Format Efficiency: X — <evidence>
- Failure Handling: X — <evidence>
- Grounding Discipline: X — <evidence>
- Routing Precision: X — <evidence>
- Progressive Disclosure: X — <evidence>
- Lint: PASS <ids>; WARN <ids>; FAIL <ids>

### Findings

1. path — <severity> <rule or dimension>: <issue>. Evidence: <section/line/text>. Fix: <concrete fix>.

### Top Improvements

1. <highest-impact change>
2. <next change>
3. <next change>

### Candidates Not Reviewed

- path — <reason>

Omit empty sections. If no findings remain after evidence checks, say No confirmed findings.

Failure handling

Missing scope and broad review would be expensive: ask one clarifying question.
Unknown model alias: use generic, report the alias gap, and lower confidence.
Vendor docs unavailable: use local model reference or generic; do not block review.
Conflicting local and vendor guidance: local project rules win; report the conflict.
Parallel or delegated reviews disagree: apply the same gates and caps, then keep the lower-confidence result out of confirmed findings.## Claude platform additions

The host SKILL.md is canonical. This overlay only adds Claude-specific argument, task, and aggregation behavior.

Argument parsing

From $ARGUMENTS:

first non-flag token: file path, directory path, plugin name, or omitted scope
--model <name>: override model resolution
--team: use parallel review agents for large scopes
--rerank: use calibration anchors and pairwise comparison when comparing versions

Task use

Use direct review for one file or one small skill folder.

Use parallel tasks only when scope contains multiple independent files or plugins. Launch at most 3 review tasks at once. Batch deterministically by sorted path, not by estimated difficulty.

Each task prompt must include:

exact file list
resolved model context or instruction to resolve it
path to references/scoring-rubric.md
path to references/model-resolution.md
requirement to cite evidence for every score
requirement to apply gates and caps before final score

Aggregation

When task results return:

Verify each reviewed file was in scope.
Recompute caps when a task forgot them.
Deduplicate findings by file plus rule or dimension.
If two scores differ by more than 1 point, compare gates and caps first.
Use the lower-confidence score only as a signal; do not average incompatible scopes.
Put unresolved disagreements in the report as low-confidence notes.

Rerank mode

When $ARGUMENTS contains --rerank, read references/calibration.md. For two versions of a file, compare gates first, then dimensions, then final score. If the difference is less than 0.5, report a tie.

Web and model docs

Use web lookup for model docs only when local references are missing or the user asks for current vendor guidance. If web access fails, use local references or generic context and report the gap.