name: prompting-assist description: "사용자가 LLM에 보낼 프롬프트를 개선·리뷰·피드백받고 싶어할 때 사용. Anthropic 공식 프롬프팅 모범 사례에 근거한 체크리스트로 진단하고 개선안을 제시한다. '프롬프트 개선해줘', '이 프롬프트 리뷰해줘', '프롬프팅 팁', '/prompting' 등 명시적 어구에만 발동하며, 일반 대화 속 '프롬프트'라는 단어만으로는 발동하지 않는다." group: writing model: sonnet allowed-tools: Read, Edit, AskUserQuestion, WebFetch
Prompting Assist
Purpose
Diagnose a user-authored prompt against Anthropic's official prompting best practices and propose concrete improvements. Activates only when prompt authoring / improvement is the explicit subject — not whenever the word "prompt" appears.
Trigger Policy
Korean trigger phrases are kept verbatim because they must match user utterances directly.
Activate on:
- "프롬프트 개선해줘"
- "이 프롬프트 리뷰해줘" / "이 프롬프트 피드백 줘"
- "프롬프팅 팁 알려줘"
/prompting- "system prompt 개선해줘"
Do NOT activate on:
- "프롬프트가 너무 길어서..." (word appears ≠ improvement request)
- "프롬프트 엔지니어링이 뭐야?" (concept question)
- "이 프롬프트 의미가 뭐야?" (interpretation request)
If intent is ambiguous, ask one clarifying question first: "이 프롬프트를 개선해드릴까요, 아니면 의미를 설명해드릴까요?"
Workflow
Stage 1: Context Collection
Acquire the prompt source.
- If already pasted into chat, confirm the range.
- If a file path is given,
Readit. - If absent, request once: "어떤 프롬프트를 보고 싶으신가요?"
Collect the minimum necessary context via
AskUserQuestion(batch the questions, do not re-ask):- Target model: Claude 4.x family / another LLM / unknown
- Primary use case: one-shot / agentic / tool-calling / long-context / coding
- Hard constraints: response length / cost / latency / output format
If the model is unknown, default to Claude 4.6/4.7 and state the assumption explicitly.
Stage 2: Reference Load
The diagnostic baseline is the inline checklist in Stage 3 (10 categories). For current, authoritative phrasing and code snippets, live-fetch Anthropic's official prompt engineering guide:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
WebFetch is a deferred tool — load its schema first with ToolSearch (query select:WebFetch), then fetch. The overview links to per-technique pages (be clear and direct, multishot examples, chain-of-thought, XML tags, system prompts, prefill, prompt chaining, long-context tips); fetch the ones relevant to the failing categories. Reusable code snippets from the guide can be quoted verbatim as improvement examples.
On any failure (tool not loaded, offline, rate limit, layout change), fall back to the Stage 3 inline checklist and tell the user in one line: "Anthropic 가이드 라이브 페치 실패 — 인라인 체크리스트 기반으로 진단합니다."
Stage 3: Diagnosis
Judge pass / fail per checklist category:
| Category | Key question |
|---|---|
| Clarity & specificity | Is the desired outcome explicit? Are scope and exceptions clear? |
| Context & motivation | Is the reason for each constraint stated? |
| Examples | Are 3–5 examples present for few-shot tasks? Are they wrapped in <example>? |
| Structure | Are content types separated by XML tags? Is long context at the top, query at the bottom? |
| Role & identity | Does the system prompt assign a role? |
| Output control | Is the language prescriptive (do) rather than prohibitive (don't)? Is there no reliance on last-turn prefill? |
| Thinking & effort | Does the effort setting match task difficulty? Is there no aggressive over-trigger wording? |
| Tool use & agentic | Is the action-vs-suggest intent clear? Is parallel intent marked? |
| Long-horizon | Is state held in structured files? Are completion criteria verifiable? |
| Anti-patterns | No test hard-coding, no over-defensive coding, no pressure toward needless abstraction? |
For each failing item, record a short justification + improvement direction. Cite the checklist category (or a specific section of the fetched Anthropic guide).
Stage 4: Proposal
Pick the proposal format by change magnitude:
- Small fix (≤ 3 items): per-section diff
Before: "Make it better" After: "Refactor the loop to use parallel tool calls (see Parallel tool-call prompt)." Why: Clarity & specificity (§Stage 3), Tool use (§parallel) - Full rewrite (multiple failures): the improved prompt in full + a bullet list of key changes
Prefer presenting options when the user has a real choice: lay out "Option A (terse)" vs "Option B (strict)".
Close with a one-line checklist coverage report: "10개 범주 중 7개 합격, 3개 개선 반영."
Constraints
- Preserve original intent. Never change what the user is trying to do — only raise quality.
- Evidence-backed. Do not assert anything outside Anthropic's official guidance. Every recommendation maps to a checklist category (or a section of the fetched Anthropic guide).
- Language preservation. Keep the prompt's original language in the artifact. Diagnosis and explanation follow the conversation language (default Korean).
- Brevity. Diagnosis report: 1–2 lines per category. Strip filler.
- Model-version awareness. Claude 4.5 → 4.6 → 4.7 diverge in non-trivial ways. When the target model is unknown, state the assumption and proceed.
References
- Anthropic prompt engineering overview — primary reference; live-fetch per Stage 2 (per-technique pages are linked from the overview)
- The Stage 3 inline checklist is the offline fallback baseline when the live fetch fails
Gotchas
WebFetchis deferred — load it first.allowed-toolsonly pre-grants permission; the tool is not callable until its schema is loaded viaToolSearch(select:WebFetch). On any fetch failure, fall back to the Stage 3 inline checklist and say so in one line — never block the diagnosis on the network.Do not over-trigger. The description intentionally encodes do/don't patterns. Treat the word "prompt" in a user sentence as a keyword, not an invocation. Default to one clarifying question when the intent is ambiguous.
Never edit the prompt in place without consent.
Editis inallowed-toolsfor cases where the prompt lives in a file the user asked to be improved. Always show the proposal first, then apply the edit only after explicit confirmation.Model-version drift. Claude 4.5 → 4.6 → 4.7 differ enough (extended thinking defaults, parallel tool-call norms, effort tuning) that a checklist pass for 4.5 can be a near-fail for 4.7. When unknown, default to the latest and state the assumption.
Eval Criteria
EVAL 1: Trigger precision
Question: Given a user utterance that only contains the word "프롬프트" without
an improvement-request framing, does the skill decline to activate
(or ask a clarifying question) instead of running a full diagnosis?
Pass: Skill does not run Stage 2–4 without an explicit improvement intent.
Fail: Skill begins full diagnosis on mere keyword presence.
EVAL 2: Reference grounding
Question: Does every improvement recommendation cite a specific checklist
category (or a section of the fetched Anthropic guide)?
Pass: Each recommendation has an anchor (category name or §section).
Fail: Any recommendation is stated without a reference anchor.
EVAL 3: Intent preservation
Question: Does the proposed prompt preserve the user's original goal,
scope, and persona?
Pass: Target task, constraints, and role remain intact; only phrasing/
structure/specificity changes.
Fail: Meaning drifts — task narrowed/broadened, constraints dropped, or
persona replaced.
EVAL 4: Proposal structure
Question: Is the proposal formatted per Stage 4 (Before/After diff for
small fixes, full rewrite + key changes for larger ones) and
closed with a one-line coverage report?
Pass: Format matches change magnitude; coverage line present.
Fail: Format mismatched, or coverage report missing.
EVAL 5: Language fidelity
Question: Is the rewritten prompt in the same language as the original,
with diagnosis written in the conversation language?
Pass: Prompt language preserved; diagnosis in conversation language.
Fail: Prompt translated silently, or diagnosis in the wrong language.