name: prompting-effectively description: Structure Claude prompts for clarity and better results using roles, explicit instructions, context, positive framing, and strategic organization. Use when crafting prompts for complex tasks, long documents, tool workflows, or code generation.
Prompt Architecture
Effective prompts have structure. This skill covers the principles that make prompts clear, reliable, and maintainable — whether you're writing commands, agents, modes, skills, or API system prompts.
For detailed patterns and examples of the architecture principles below, see reference.md.
The Two Zones
The most important architectural decision: what goes where.
Behavior zone (system prompt, agents, modes):
- Identity, personality, tone
- Constraints and hard rules
- Tool usage policies (→ tool-design covers designing the tools themselves)
- Decision frameworks ("when X, do Y")
- Formatting preferences
Task zone (commands, skills, task prompts):
- Temporary role shifts and focus areas
- Step-by-step procedures or constraints for the task
- Context the agent needs to do this specific thing
Knowledge zone (skills, documents, user context, CLAUDE.md):
- Reference material and domain knowledge
- User-specific context and preferences
- Project information
- Data that changes between sessions
System prompt content gets treated as foundational identity. Task-zone content layers on top — it redirects the agent without replacing who it is. Knowledge gets treated as reference material. Mixing these up creates agents that treat their own rules as optional suggestions, or user-turn prompts that fight the system prompt for control of identity.
→ For the orthogonal API-level question — what goes in the system message vs the user message of a single API call — see system-vs-user-prompt.
XML for Instruction Domains
Use XML tags to separate concern domains (identity, safety, tools, formatting). Rules:
- Section-level boundaries only — don't wrap every paragraph
- Two levels of nesting maximum for instructions (deeper is fine for examples)
- Over-tagging creates noise that dilutes the structural signal
Over-formatted system prompts bleed into over-formatted responses — use the lightest formatting that makes content clear.
Tone Registers
The voice you use affects how the model interprets instructions — and the right voice depends on where the prompt lives.
Third person → Identity (system prompts only):
"Claude cares about people's wellbeing." "The agent avoids over-formatting responses."
Establishes personality traits. The model internalizes these as "who I am." This framing belongs in system prompts, agents, and modes — not in commands or skills.
"You are X" → Identity declaration (system prompts only):
"You are a senior security auditor who reviews code for vulnerabilities."
Defines the agent's core role. In a system prompt, this is foundational. In a user-turn prompt (command, skill, task), it conflicts with the identity already established — use "Act as X for this task" instead.
"Act as X" → Temporary role (commands, skills, task prompts):
"Act as a security auditor for this review. Focus on auth flows and input validation."
Layers a role on top of the existing identity without overriding it. The agent applies this lens for the current task, then returns to its base behavior.
Second person → Operations:
"When you encounter X, stop immediately." "You should use the minimum number of tools needed."
Direct instruction for procedures and workflows. Works in both zones.
Imperative → Hard rules:
"NEVER reproduce song lyrics." "ALWAYS search before responding about current events."
Absolute rules that should not be overridden by context.
The Escalation Ladder
- Suggestion: "It's fine to..." / "Claude can..."
- Default: "Claude does..." / "Claude avoids..."
- Instruction: "You should..." / "Always prefer..."
- Hard rule: "NEVER..." / "MUST..." / "CRITICAL:..."
- Repeated hard rule: Same rule in multiple sections with different framing
Most guidance sits at levels 2–3. Reserve 4–5 for genuine non-negotiables. If everything is CRITICAL, nothing is.
Subtract Before You Add
When the model produces bad output, scan the prompt for the instruction causing it and delete that instruction. The model has strong defaults — out-of-distribution behaviors (wrapping output in unusual tags, inventing structure, weird formatting) almost always trace back to a prompt that pushed the model there. The fix is subtraction, not addition.
Worked example. A tool description read:
"Task instruction wrapped in
<fork-task>verbatim."
The model wrapped its output in <fork-task> tags because the description told it to. The minimal fix is deletion:
"The task this fork should execute."
What goes wrong if you skip the delete step:
- Adding a prohibition while leaving the cause: keep the "wrapped in
<fork-task>" instruction, then bolt on "DO NOT wrap in<fork-task>tags." Now the prompt contradicts itself, and the prohibition primes attention toward the very tags you don't want. - Replacing with a positive description when none was needed: "One self-contained instruction, e.g. 'Send a DM to Priya.'" Reads fine, but the field name and schema already convey this. You've spent tokens telling the model to write normally — which is what it would have done anyway.
Order of preference:
- Delete the instruction causing the misbehavior. Often nothing needs to take its place — the field name, type, or surrounding context already conveys what's expected.
- Rephrase positively when the field genuinely lacks a description. "DO NOT use ellipses" → "Use periods instead" — describe the target, not its inverse.
- Prohibit only when you have observed the failure without any prompt pushing the model there. Rare. The bar is evidence, not preference.
When negative framing earns its place: the model has an observed prior toward the wrong behavior with no prompt cause. "Avoid Inter" works because Opus genuinely defaults to Inter. "Be conservative" in a code review actively suppresses findings. Otherwise: trust the default.
Heuristic: before adding anything in response to bad output, find the line in the prompt that caused it. If you find one, delete it and stop. Telling the model to "write normally" or "respond naturally" is a token-burning no-op — it would have done that already if you hadn't told it otherwise.
Positive Framing
For instructions that survive the subtraction check — genuine deviations from default behavior — describe the target, not its inverse.
Weak: "Do not use ellipses." Strong: "Use periods or commas to end clauses."
The model produces what you describe. "Don't do X" requires the model to represent X in order to avoid it, which can prime the failure mode rather than suppress it. "Do Y" steers directly. This holds across domains:
- Output style: "don't be verbose" → "keep responses to 2–3 sentences for simple questions"
- Tool use: "don't call this tool unnecessarily" → "call this tool when you see X, Y, or Z"
- Behavioral guidance: "don't be sycophantic" → "open with the answer, not a compliment"
Concrete positive instructions outperform abstract negative ones, especially in long prompts where any single rule's attention weight is diluted by surrounding context. If you find yourself writing a "don't," ask what the corresponding "do" is — that's usually the better instruction.
Examples
Rules:
- Place examples adjacent to the rules they illustrate — never in a separate appendix
- Use good/bad pairs for nuanced behaviors
- Add rationale tags when the reasoning matters for generalization
<example>
<user>What should I eat for lunch?</user>
<good_response>A Mediterranean bowl with...</good_response>
</example>
<example>
<user>Search for a recent article about fisheries.</user>
<response>I found the article... [paraphrased summary]</response>
<rationale>Paraphrases entirely in own words, conveying key facts
while respecting copyright.</rationale>
</example>
The bad example is as important as the good one — it shows the specific failure mode you're preventing, which is often a plausible response that happens to be wrong.
Strategic Repetition
Repetition is a deliberate design tool, not sloppiness. In long prompts, a rule stated once may get diluted by surrounding context.
Repeat (2–3 locations, different framing):
- Safety boundaries
- The 2–3 most important behavioral rules
- Cross-cutting constraints that apply across multiple contexts
Don't repeat:
- Personality traits (once is enough)
- Formatting preferences
- General tone guidance
Heuristic: If violating the rule would cause real harm or a terrible experience, repeat it. If it's stylistic, state it once.
→ For how repetition interacts with token budgets, placement effects, and caching in long prompts, see context-management.
Decision Frameworks
When the agent needs judgment calls, provide a framework — not an exhaustive rule list.
Checklist pattern — sequential with stop conditions:
## Step 0 — Does this need a visual at all?
Most requests are conversational. Check first.
## Step 1 — Is a specialized tool available?
Scan connected tools. Category match → use it. Stop.
## Step 2 — Did the person ask for a file?
"file," "download," "save" → create file. Stop.
## Step 3 — Default path
No matches → inline rendering.
Trigger pattern — concrete matching criteria:
**Always use this tool when you see:**
- Explicit references: "continue our conversation about..."
- Temporal references: "what did we talk about yesterday"
- Implicit signals:
- Past tense verbs suggesting prior exchanges
- Definite articles assuming shared knowledge
Both patterns beat "use the tool when appropriate" because they give the model concrete decision criteria.
Procedures vs Constraints
Choose the right structure for the guidance you're giving. Neither is inherently better.
Use procedures when the order genuinely matters — steps that must happen in sequence, every time, or the outcome breaks. A deployment checklist, a multi-stage data pipeline, a protocol with dependencies between steps. If step 3 can't happen before step 2, that's a procedure.
1. Run the test suite
2. If tests pass, build the artifact
3. Deploy to staging
4. Run smoke tests against staging
5. If smoke tests pass, promote to production
This is correct as a procedure because each step depends on the previous one. Rewriting it as constraints ("ensure tests pass", "verify staging health") loses the ordering that makes it work.
Use constraints when the guidance is about what matters, not what order to do it in. Behavioral rules, quality standards, and goals where the agent should decide how to get there.
Plan before building. A missed dependency or wrong assumption
caught during planning costs nothing — caught during implementation
it means rework. Surface your plan to the user before writing code.
This is correct as a constraint because the agent should adapt how it plans to the specific task.
The failure modes are opposite:
- Procedures where constraints belong → brittle agents that follow the letter and miss the point
- Constraints where procedures belong → agents that skip critical steps or reorder dependencies
When you're unsure
If it's not clear from the prompt whether something should be procedural or flexible, ask. "Should these steps happen in this exact order every time, or is the order flexible?" is a better question than guessing wrong.
The middle ground: principles + examples
When the guidance is flexible but the application isn't obvious, state the reasoning and ground it with an example:
Over-constrained: "Always spawn exactly 3 explore agents before planning." Too vague: "Explore proportional to codebase complexity." Sweet spot: "Understand the scope of changes before committing to a plan — misunderstanding the codebase is the most expensive mistake. For example, if you're unsure how a subsystem works, spinning up 2–3 explore agents to map it out before writing a plan is usually worth the cost."
The sweet spot teaches the principle, explains why it matters, and shows what good judgment looks like — while leaving the agent free to adapt.
Signs of wrong structure
Procedure masquerading as constraints:
- Steps that keep getting reordered or skipped, causing failures
- You keep adding "make sure X happened before Y" patches
Constraints masquerading as procedure:
- Fixed counts where the right number depends on the situation
- The prompt keeps growing as you patch failure modes with more rules
- The model follows instructions faithfully but produces worse outcomes than a looser prompt would
Scoping & Progressive Disclosure
Include: Rules specific to your agent's domain, constraints on available tools, edge cases from testing, behavioral patterns users will encounter.
Exclude: Generic LLM knowledge, exhaustive third-party docs, frequently changing information (put in user context), rules for features the agent doesn't have.
Progressive disclosure: List resources with short descriptions upfront, load detail on demand:
<available_resources>
<resource>
<name>pdf-creation</name>
<description>Use when creating PDF files</description>
<location>/path/to/detailed-instructions</location>
</resource>
</available_resources>
This is the pattern behind Claude Code skills — descriptions loaded upfront, full instructions loaded on demand.
Common Mistakes
- Over-formatting — heavily formatted system prompts produce heavily formatted responses
- Instruction-stuffing — trying to anticipate every edge case with exact steps; teach principles and let the model decide how to apply them
- Adding when you should delete — bad output usually traces back to a prompt instruction pushing the model there. Find the cause and remove it before adding anything new. "Write normally" is not a fix; deleting the line that made it write abnormally is
- Flat structure — long prompts with no XML sections or headers; the model can't locate guidance
- XML-everything — wrapping every paragraph; structural noise dilutes the signal
- Missing examples — "be helpful but not too eager" means nothing without a demonstration
- Repeating everything — if 80% of rules are repeated, repetition stops being a signal
- Behavioral rules in documents — agent instructions in uploaded docs get treated as suggestions
- Static knowledge in system prompts — facts that change (pricing, team members) belong in retrievable documents
Applying to Claude Code Components
| Component | Zone | Role framing | Key consideration |
|---|---|---|---|
| Agents | Behavior | "You are X" — defines identity | 3rd person for traits, 2nd person for operations. |
| Modes | Behavior | "You are X" — defines identity | append keeps standard identity; replace for full custom persona. |
| Commands | Task | "Act as X" — temporary lens | Constraints or procedures depending on the task. Minimal tokens. |
| Skills | Knowledge | No role — reference material | Progressive disclosure. Overview in SKILL.md, depth in reference files. |
| CLAUDE.md | Knowledge | No role — project context | Guardrails and pointers. Constraints Claude would get wrong without. |
| Rules | Behavior | No role — constraints only | Declarative constraints scoped by file pattern. |
Deeper guides for each component:
- Agents → multi-agent-orchestration for designing systems of cooperating agents
- Commands → commands-authoring
- Skills → skills-authoring
- CLAUDE.md → claude-md-authoring
- Rules → rules-authoring
- Hooks (lifecycle automation around prompts) → hooks
Related Skills
Cross-cutting prompting topics with their own skills:
- system-vs-user-prompt — placement of instructions in
systemvsusermessages at the API level - tool-design — designing tool descriptions, parameter schemas, and error messages for function-calling agents
- structured-output — reliable typed JSON via constrained decoding, JSON Schema, Zod, Pydantic
- context-management — token budgets, RAG, prompt caching, compression, multi-turn strategy
- output-variety — defeating sameyness in repeated LLM calls (personality lines, names, commentary)
- eval-and-quality-gates — measuring whether prompt changes actually help; LLM-as-judge, regression tests, guardrails
- sounding-human — removing AI-tells from generated text