200k-prompt-engineering - SKILL.md Agent Skill

name: 200k-prompt-engineering description: | The engineering skill behind every 200k-class output. Three layers: prompt engineering (writing instructions), context engineering (designing what agents know), and agentic workflow engineering (designing how agents work autonomously). Use when writing system prompts, SKILL.md files, CLAUDE.md files, agent instructions, MCP configurations, Archon workflows, or any artifact that controls agent behavior. Trigger on: "write a prompt", "design the context", "create a workflow", "improve this skill", "optimize the instructions", "agent architecture", "context window design", "prompt engineering", "how should the agent think about this". author: Robin Westerlund version: 1.0.0

200k Prompt Engineering

The skill that makes other skills work. Write prompts that produce consistent, measurable, 200k-class output — every time, any agent.

Purpose

After blueprint (what to build) and pipeline (how to create skills), this is the third pillar: how to engineer the instructions that control agents.

Three layers, one decision tree:

What are you engineering?
│
├── The WORDS an agent reads?
│   └── Layer 1: PROMPT ENGINEERING
│       System prompts, SKILL.md, CLAUDE.md, user instructions
│
├── The KNOWLEDGE an agent has access to?
│   └── Layer 2: CONTEXT ENGINEERING
│       Memory tiers, boot sequences, tool pools, knowledge bases
│
└── The STEPS an agent executes autonomously?
    └── Layer 3: AGENTIC WORKFLOW ENGINEERING
        DAG design, node types, approval gates, error recovery

Most tasks touch all three layers. The decision tree determines where to START.

Layer 1: PROMPT ENGINEERING

What you're controlling: The literal text an agent reads before acting.

The 7 Rules

#	Rule	Why	Anti-pattern
1	Imperative form	"Read the file" not "You should read the file"	Passive voice adds words without adding clarity
2	Tables over prose	3+ parallel items with shared attributes → table	Paragraphs listing options are 10x slower to parse
3	Front-loaded value	Most important content first in every section	Burying the key insight in paragraph 3
4	Anti-patterns stated	Every major section: what NOT to do	Describing only the happy path
5	Decision tables	If/Then/Because for every choice point	"Use your best judgment"
6	Testable rules	"Produce a 3-step outline before code"	"Think carefully about structure"
7	Progressive disclosure	Metadata ~100 words, body < 500 lines, refs unlimited	Dumping everything in one file

Trigger Optimization

The description field in SKILL.md frontmatter is the PRIMARY triggering mechanism.

Pattern	Effect	Example
Include action verbs	Agent recognizes intent faster	"Use when reviewing PRs, auditing code, checking quality"
Be "pushy"	Agent triggers when it SHOULD, not just when explicitly asked	"Also use when the user mentions code quality, even without saying 'review'"
List trigger phrases	Direct keyword matching	"Trigger on: 'review', 'audit', 'check', 'PR'"
State negative triggers	Prevents false activation	"NOT for: general questions, architecture design"

Anti-pattern: A description that only says what the skill IS without saying WHEN to use it.

Fixed/Variable Zone Pattern

Every template, every SKILL.md section, every output format:

<!-- [FIXED] This structure never changes -->
# [VARIABLE: Title]

> [VARIABLE: One-liner]

<!-- [VARIABLE] Content sections below -->

<!-- [FIXED] Verification section always last -->
## Verification

FIXED zones = consistency between uses. VARIABLE zones = customization per use. An agent reading the template knows EXACTLY what to keep and what to replace.

Prompt Quality Gate (from skill-forge)

Before shipping any prompt/skill/instruction:

Description >= 50 chars with trigger phrases
Imperative form throughout (0 instances of "you should")
Tables used where 3+ parallel items exist
At least 1 anti-pattern per major section
First section delivers standalone value
Every decision point has a decision table
< 500 lines (references for depth)

Layer 2: CONTEXT ENGINEERING

What you're controlling: What the agent knows and when it knows it.

The Memory Tier Model

Tier	What	Size	When loaded	Source
L0	Identity	~50 tokens	Always	`identity.txt`, CLAUDE.md first line
L1	Critical facts	~120 tokens	Always	Auto-generated from palace/project
L2	Topic recall	On demand	When topic surfaces	MemPalace search, file reads
L3	Deep search	Unlimited	Explicit query	Semantic search, web search

Rule: Load L0+L1 at boot (~170 tokens). Search L2-L3 only when needed. A context window stuffed with L3 data from boot is SLOWER than one that searches on demand.

Staged Boot Sequence

Session start
│
├── 1. Read CLAUDE.md (L0+L1 — who am I, what project is this)
├── 2. Read AGENTS.md (verify commands, stack, rules)
├── 3. Call mempalace_status (L1 — palace overview + protocol)
├── 4. Scan INDEX.md (L2 — what knowledge exists)
├── 5. Check .skills/ (L2 — what project-local skills are available)
├── 6. Check active tasks (L2 — is there ongoing work?)
│
└── READY — agent has enough context to route any request
    L3 is searched only when a specific question requires it

Tool Pool Assembly

Principle	Rule
Minimum viable tools	Load only tools needed for the current task
MCP server selection	Connect servers relevant to the project, not all available
Progressive tool loading	Start with read-only tools, escalate to write when needed
Tool descriptions matter	Agent uses tool descriptions to decide when to call them

Anti-pattern: Loading 50 MCP tools at session start "just in case." Each tool description consumes context tokens and adds decision overhead.

Knowledge Base Design

If the knowledge is...	Store it as...	Load it when...
Always needed (identity, rules)	CLAUDE.md / L0-L1	Every session
Sometimes needed (domain facts)	knowledge/ markdown files	Topic surfaces
Rarely needed (deep reference)	MemPalace drawers / web	Explicit query
Generated per session	.tmp/ or docs/	Never pre-load

Layer 3: AGENTIC WORKFLOW ENGINEERING

What you're controlling: How an agent executes multi-step work autonomously.

DAG Design (Archon Format)

name: my-workflow
description: What this workflow does
provider: claude
model: sonnet

nodes:
  - id: first-node
    prompt: "Inline AI prompt"

  - id: parallel-a
    depends_on: [first-node]
    prompt: "Runs alongside parallel-b"

  - id: parallel-b
    depends_on: [first-node]
    bash: "echo 'deterministic step'"

  - id: join
    depends_on: [parallel-a, parallel-b]
    loop:
      prompt: "Iterate until done: <promise>COMPLETE</promise>"
      until: COMPLETE
      max_iterations: 5
      fresh_context: true

Node Type Decision Table

If the step...	Use node type...	Because...
Requires AI reasoning	`prompt:`	AI fills in intelligence
Is purely mechanical (git, build, test)	`bash:`	Deterministic, no AI cost
Requires iteration until success	`loop:`	Agent retries with context
Loads a pre-written instruction file	`command:`	Separates prompt from workflow
Requires human approval	`loop: { interactive: true }`	Pauses for input

Key Design Decisions

Decision	Options	Recommendation
fresh_context per iteration	`true` (clean slate) vs `false` (carry forward)	`true` for long loops — prevents context pollution
Parallel vs sequential	Nodes without deps run concurrently	Parallelize independent work (research + scaffold)
Human gates	`interactive: true` on loop nodes	Add before irreversible actions (push, deploy, publish)
Error recovery	`trigger_rule: all_done` on join nodes	Continue even if one branch fails
Model selection	`model: sonnet` vs `opus` vs `haiku`	Sonnet for most; Haiku for classification; Opus for complex reasoning

Workflow Quality Checklist

Every node has a clear single responsibility
No node exceeds 3 paragraphs of prompt (use command: files for long prompts)
Independent nodes are parallel (no false dependencies)
At least one verification node after any build/write step
max_iterations set on all loop nodes (prevent infinite loops)
bash: nodes have timeout: (prevent hung processes)
Human approval gate before pushing to remote or deploying

Anti-patterns

Do NOT	What happens	Instead
Put all logic in one giant prompt node	Agent loses focus after 3 paragraphs	Split into focused nodes with `depends_on`
Use `prompt:` for deterministic steps	Wastes tokens, introduces variance	Use `bash:` for build/test/git operations
Skip `fresh_context: true` on long loops	Context accumulates garbage from prior iterations	Fresh context = clean reasoning each time
Omit `max_iterations`	Loop runs forever on stuck tasks	Always cap: 5 for build, 10 for implement, 3 for review
Depend on output format from AI nodes	AI output varies between runs	Use `output_format:` (JSON schema) for structured data
Skip verification after build steps	Broken code propagates to downstream nodes	Add `bash: "npm run build"` or equivalent between steps

The Decision Tree — What Kind of Engineering?

You need to improve agent output quality.
│
├── The output FORMAT is wrong (inconsistent, unstructured)?
│   └── Layer 1: Fix the PROMPT
│       Add Fixed/Variable zones. Add output templates. Add decision tables.
│
├── The agent DOESN'T KNOW enough (missing facts, wrong assumptions)?
│   └── Layer 2: Fix the CONTEXT
│       Add to L1 boot. Create knowledge files. Configure MemPalace search.
│
├── The agent STOPS or FAILS mid-task (loses focus, forgets steps)?
│   └── Layer 3: Fix the WORKFLOW
│       Split into DAG nodes. Add fresh_context. Add verification steps.
│
├── The output is MEDIOCRE (correct but not impressive)?
│   └── All three layers:
│       L1: Add anti-patterns + quality markers to prompt
│       L2: Add domain expertise to context (examples, reference articles)
│       L3: Add review/polish node to workflow
│
└── Not sure what's wrong?
    └── Read the output. Find the FIRST point where quality drops.
        That point tells you which layer to fix.

Integration with 200k System

Component	How this skill connects
skill-forge	This skill's Layer 1 rules are WHAT skill-forge enforces during AUTHOR
200k-blueprint	Blueprint output feeds into Layer 3 (workflow design for the product)
quality-gate	The 66-item checklist IS a Layer 1 artifact — produced by prompt engineering
MemPalace	Layer 2's memory tier model IS MemPalace's architecture (L0-L3)
Archon	Layer 3's DAG format IS Archon's workflow format
knowledge/gamechangers/	Layer 2 domain knowledge — 15 architecture insights
skill-engine	Layer 1 trigger optimization feeds the resolver's ranking model

Quick Reference

LAYER 1 (Prompt):    Imperative form · Tables > prose · Front-loaded value
                     Anti-patterns stated · Decision tables · Testable rules
                     Progressive disclosure · Fixed/Variable zones

LAYER 2 (Context):   L0-L3 memory tiers · Staged boot · Minimum tools
                     Knowledge base design · Search > preload

LAYER 3 (Workflow):  DAG nodes · prompt/bash/loop/command types
                     fresh_context on loops · Parallel independents
                     Human gates · Verification after build · max_iterations

DECISION:            Wrong format? → Fix prompt
                     Missing knowledge? → Fix context
                     Stops/fails? → Fix workflow
                     Mediocre? → Fix all three

Notes

This skill is the ENGINEERING discipline behind the 200k system
It does not replace skill-forge (which handles packaging) or 200k-blueprint (which handles product design)
It is the quality layer that makes both of those produce better output
Apply Layer 1 rules to EVERY prompt you write, even quick ones
Apply Layer 2 design to EVERY project CLAUDE.md
Apply Layer 3 patterns to any task with more than 3 steps