prompt-craft - SKILL.md Agent Skill

name: prompt-craft description: Prompt engineering techniques for dual-model systems — template structure, few-shot design, structured output prompting, model-adaptive strategies for both small (4B-8B) and large (GPT-5, Claude) models

Prompt Scaffold

[ROLE]        Who the model is / expertise
[CONTEXT]     Background, domain constraints
[TASK]        One clear instruction
[FORMAT]      Output structure with example
[CONSTRAINTS] Boundaries, edge cases, what to avoid
[EXAMPLES]    Few-shot demonstrations (if needed)

Claude: use XML tags. GPT: markdown headers. Small models: minimal structure, maximum explicitness.

Dual-Model Adaptation

For Small Models (4B-8B Ollama)

Max 1000 tokens prompt length. Cut ruthlessly.
One task per prompt. No compound instructions.
Complete worked example is mandatory — the model mimics format.
Every enum value listed explicitly: quality: one of "excellent", "good", "fair", "poor"
No chain-of-thought unless tested (often hurts structured output).
Positive instructions: "Write X" not "Don't do Y."
Sandwich: format spec at start AND end.

For Large Models (GPT-5, Claude)

Multi-section prompts with nuanced instructions.
<thinking> tags for chain-of-thought (Claude).
Few-shot optional — large models generalize from descriptions.
Can handle "Don't X unless Y" conditional constraints.
System prompt for persona, user prompt for task.

Few-Shot Design

3-5 examples. More than 7 has diminishing returns.
Include at least one edge case example.
Show reasoning, not just input→output.
Order: easy → medium → edge case.
For small models: examples are the prompt. They define the contract.

Structured Output

Provide the exact schema with field descriptions.
Include a complete worked example (JSON, ready to copy).
Specify handling for missing/ambiguous fields.
For enums, list ALL valid values in the prompt.
Show good AND bad examples for critical fields:

## Dilemma ID Naming (CRITICAL)
GOOD: `host_benevolent_or_self_serving`
BAD: `host_motivation`

Defensive Patterns

Sandwich: Repeat critical instructions at start and end.
Validate → Feedback → Repair: Validate output, format structured errors, ask model to fix.
Discuss → Freeze → Serialize: Separate exploration from structured output generation.
Anti-pleasantry: "Do NOT end with 'Good luck!' or similar."

Systematic Testing

Define success criteria before iterating.
Test set of 10-20 diverse inputs including edge cases.
Change one thing at a time.
Track: prompt version, model, temperature, pass rate, failure modes.
Failure taxonomy: wrong format, hallucination, refusal, partial output, off-topic.