review-ai-prompt-changes - SKILL.md Agent Skill

name: review-ai-prompt-changes description: Review prompt and tool-description changes in a branch or PR. Use when asked to audit system prompts, prompt builders, tool descriptions, prompt inputs, or nearby evals/tests; summarize what changed; identify guidance that feels eval-shaped or too heavy for product use; and find repeated instructions between the main prompt and tool descriptions.

Review Prompt Changes

Review prompt-related diffs with a product lens. Focus on what actually changed, whether the guidance belongs in the prompt layer where it was added, and whether any wording is repetitive or overfit to evals.

Scope

Default to main...HEAD unless the user names a different base.
Prioritize runtime files that shape model behavior:
- system prompts
- prompt builders and prompt fragments
- tool descriptions and schema field descriptions
- response-formatting instructions
Read evals and unit tests only after the runtime diffs so you can infer the intent behind the change without letting the tests define the review.

In this repo, likely files live under:

apps/web/utils/ai/**
apps/web/utils/ai/assistant/**
apps/web/__tests__/eval/**
apps/web/__tests__/** for prompt-adjacent unit tests

Workflow

Find the changed files with git diff --name-only main...HEAD.
Narrow to prompt-relevant files first. Ignore unrelated UI or plumbing unless it changes what reaches the model.
Read diffs with enough context to see the full prompt block or tool description.
Then read nearby tests or evals to understand what failure mode the change is trying to prevent.
Separate your notes into four buckets:
- behavior changes
- product-helpful prompt guidance
- heavy or eval-shaped guidance
- repeated guidance across layers
Prefer exact file and line references in the review.

What To Look For

Product-helpful changes

Capability gating that matches real product states or disabled tools
Moving provider-specific or tool-specific guidance into the relevant tool description
Adding richer tool outputs that support grounded answers
Clear write-confirmation or safety rules tied to real UI behavior

Heavy or eval-shaped changes

Cross-cutting language added mainly to defend a known eval failure mode
Response-policy coaching inside a tool description when it is not tool-local
Repeated “treat as evidence,” “state the conflict plainly,” or “do not invent causes” language across multiple prompt layers
Tiny wording splits that duplicate long blocks for web vs messaging, provider A vs provider B, or enabled vs disabled states
Instructions that read like judging criteria instead of operating guidance

Repetition to call out

Check whether the same instruction appears in more than one of these places:

the system prompt
tool descriptions
input schema field descriptions
formatting blocks
tests or eval-driven comments copied into runtime prompt text

Tool-local guidance usually belongs in the tool description. Cross-cutting behavior rules usually belong in the main prompt. Flag cases where both layers say the same thing without adding meaning.

Review Output

Use this structure:

Findings
- findings first, ordered by severity
- include file and line references
- focus on heaviness, redundancy, or misplaced guidance
What Changed
- summarize the actual prompt/tool behavior changes
- mention tests or evals only as evidence of intent
Net Read
- say which changes feel product-helpful
- say which changes feel eval-shaped
- say what the best trim targets are

Keep the review concise, but cover the full prompt story. The user asked for both the diff summary and the judgment call.

Repo Notes

Project-specific skills live under .claude/skills/<skill-name>/SKILL.md.
Match the house style: short frontmatter, concise body, no extra README files.
Do not edit prompts by default. This skill is for audit/review unless the user explicitly asks for changes.