name: presentation-generator
description: Generate superb 16:9 widescreen presentations (PDF and PPTX) where every slide is a custom AI-rendered image. Each slide is free to take any visual form — full-bleed photograph, structured infographic, architecture diagram, big-number callout, comparison cards, UI mockup with annotations, timeline, quote card, do/don't table — whatever the slide actually needs. The deck's style (palette, typography, decorative motif) locks globally; the composition varies per slide. The skill researches the topic (local files or web), designs a narrative arc (SCQA / Duarte oscillation / Kawasaki 10/20/30), locks a global visual style via a reference image, then generates all slides in parallel (concurrency 4) by calling the image-generation skill. Use whenever the user asks to create, build, design, generate, or draft a presentation, slideshow, slide deck, pitch deck, keynote, briefing deck, masterplan, or any visual document where slides should be visually distinctive rather than templated. SKIP when the user wants a text document, a Google Doc, a Notion page, or a natively-editable PowerPoint with shapes-and-text (use the official anthropics/skills pptx skill for that).
allowed-tools: Read, Write, Bash, Edit, WebFetch
Presentation Generator
Image-as-slide decks. Every slide is a single 16:9 PNG generated by gpt-image-2. The slide is free to be anything visual: a cinematic photograph, a structured infographic with stacked principle cards, an architecture flowchart, a big-number stat, a comparison split-screen, a UI mockup with side annotations, a quote card, a do/don't table, an icon grid, a timeline, a hand-drawn whiteboard sketch.
The skill's job is to make Claude think like a creative director — narrative arc first, visual style locked, composition chosen per slide, prompts engineered with intention — before spending API tokens on imagery.
The two axes of a slide
- Style (locks at the deck level): aesthetic, palette, typography vibe, decorative motif. Constant across all slides — this is what makes the deck feel like one artifact.
- Composition (varies per slide): is this slide a hero photo, a flowchart, an infographic, a big number, a comparison? Pick what serves the slide's job.
Both NotebookLM's Cinematic Video Overviews and well-designed brand decks do exactly this. Lock style; vary composition.
The six phases
Research → Narrative Plan → Style Lock → Parallel Generation (×4) → QA → Assemble
Runs end-to-end autonomously. No approval gate between phases.
Phase 1 — Research
Build a dossier on the topic before touching the deck.
- Local files referenced ("the project plan", a path, a slug) → read with the Read tool.
- Online topic → WebSearch + WebFetch.
- Mixed → both. Local files are ground truth; online is supporting context.
- Brand book exists → if there's a
BRAND.mdfrom thebrand-systemskill in the project, read it. Use its palette, typography, and motif language verbatim in the deck-plan'sstyle_brief. Setbrand_sourcein the plan to point at it. This gives pixel-tight brand fidelity.
Keep the dossier in working memory: 5-15 key facts, quotes worth surfacing, numbers worth visualizing, audience signals.
Phase 2 — Narrative Plan
This is the most important step in the entire workflow. It is what separates a memorable deck from a generic one. Read reference/narrative-frameworks.md first.
Before drafting, answer in your head (or out loud):
- Audience — who is this for, and what's their starting state of mind?
- One takeaway — the one thing they must walk away believing.
- SCQA — Situation → Complication → Question → Answer.
- Slide count — default 8-10 (Kawasaki). 6 for a quick brief, 12 for a deep explainer. Hard cap at 15 unless the user explicitly asks for more.
- Tone — formal / conversational / inspiring / pragmatic. Pick one.
- Visual aesthetic — modern editorial photography / dark UI infographic / hand-drawn whiteboard / cinematic moody / minimalist tech / corporate isometric. Pick one. Lock it.
- Palette — 3-5 hex colors. Primary, accent, neutrals.
- Recurring motif — a visual element that appears (in some form) on most slides to tie them together. See reference/visual-style-brief.md.
Then write deck-plan.json to the cwd. Use the schema in templates/deck-plan.schema.json and the worked example in templates/deck-plan.example.json — the example is an 8-slide infographic deck with eight different compositions.
Every slide entry has:
id—01,02, ..., zero-paddedrole— free-form: this slide's narrative function ("cover", "establish problem with visceral contrast", "land the headline metric", "walk through the architecture", "closing CTA"). Descriptive, not from an enum.composition— free-form: this slide's visual format. Examples: "title cover with central decorative motif", "comparison split-screen (left = current, right = proposed)", "stacked principle cards (4 cards with icon + heading + body)", "big-number callout with supporting metric cards", "architecture flowchart with router and three branches", "horizontal timeline with three milestone cards", "two-column do/don't comparison table", "three-column checklist". Pick the composition that fits the slide's job. Read reference/slide-compositions.md for the full vocabulary — 15+ formats with prompt skeletons.idea— one sentence: the single thing this slide must communicate.text_in_image— exact text to render IN the image. Headlines, labels, the quote, the big number, axis labels, callout text. Use exact wording. For Hebrew / Arabic / CJK, write in the target script. Empty string for image-only slides.image_prompt— the engineered prompt sent to openai-image.sh. Describe the composition explicitly (where elements sit on the canvas), the subject of each element, and any text-in-image with exact wording in quotes. Don't repeat the deck'sstyle_brief— it's appended automatically. See reference/image-prompting.md.speaker_notes— 40-60 words of narration the presenter says while this slide is up.model— usuallyopenai. Usegeminionly if the slide needs >2 reference images merged or has been failing on openai.
The deck plan also has top-level fields: title, subtitle, audience, takeaway, aesthetic, palette (array of hex), motif, style_brief (a paragraph that gets injected into every image prompt), and optional brand_source and language.
Phase 3 — Style Lock
~/.claude/skills/presentation-generator/scripts/lock-style.py \
--plan deck-plan.json \
--output-dir ./refs/
Generates 1-2 abstract reference frames from the style_brief alone — no slide content yet, just the aesthetic, palette, and motif.
Read the resulting PNGs with the Read tool. Claude is multimodal — actually look at the pixels. Score them:
- Is the palette right (compare hex codes to the planned palette)?
- Does the motif read as intended?
- Does the aesthetic match what you wrote?
If both refs land, pick the stronger one and set its path as style_ref in deck-plan.json. If neither lands, edit the style_brief and re-run. If the refs are mediocre but acceptable, ship them — Phase 5 QA will catch drift.
The chosen ref gets passed to openai-image.sh via --ref for every subsequent slide. This is what makes 8-10 independently-generated slides — even with totally different compositions — feel like one coherent deck.
Sibling decks — reuse the style ref. When the user is building a related deck for the same engagement (e.g., a product plan + a quote, or a pitch deck + an internal explainer), set both deck-plans' style_ref field to the same ref PNG path — copy or symlink it into the second deck's refs/ folder, or point both at an absolute path. Skip Phase 3 entirely for the second deck. This guarantees the two decks read as one engagement (same palette tone, same accent treatment, same motif), which is exactly what brand consistency demands.
Phase 4 — Parallel Generation (concurrency 4)
~/.claude/skills/presentation-generator/scripts/generate-deck.py \
--plan deck-plan.json \
--output-dir ./slides/ \
--concurrency 4
Reads the deck plan, builds one shell command per slide (calling openai-image.sh by default), and runs them through a Python ThreadPoolExecutor(max_workers=4). The deck's style_brief is appended to every per-slide prompt. The style_ref is passed via --ref for style coherence.
Outputs land in ./slides/slide-NN-<slug>.png. Default size: 2560×1440 (16:9, both edges multiples of 16 — required by gpt-image-2; 1920×1080 is invalid because 1080 isn't a multiple of 16). Quality: high. Background: opaque.
Progress streams to stderr. On failure, the slide is logged but does not kill siblings.
Phase 5 — QA
~/.claude/skills/presentation-generator/scripts/qa-slides.py \
--plan deck-plan.json \
--slides-dir ./slides/
Mechanical checks (file exists, dimensions correct, file size sane, not all-black/white). Then visual QA: Claude reads every slide PNG with the Read tool and scores it against the slide's idea, composition, and text_in_image:
- Did the composition land (right format, right layout)?
- Is text legible and spelled correctly? (Pay close attention to in-image text — the model's biggest failure mode is mangled letterforms or wrong wording.)
- Does the palette adhere to the plan?
- Is the motif present?
- Any defects (broken geometry, drifted style, wrong icons)?
Slides that fail visual QA get regenerated with a refined prompt. Cap at 2 retries per slide. If a slide still fails after 2 retries, surface it to the user with a specific description of what's wrong.
Phase 6 — Assemble
Always emits both PDF and PPTX by default. Pass --format pdf or --format pptx to constrain.
# PDF (HTML deck → Chrome headless)
~/.claude/skills/presentation-generator/scripts/render-pdf.sh \
--plan deck-plan.json \
--slides-dir ./slides/ \
--output ./output/<deck-slug>-v1.pdf
# PPTX (python-pptx, 13.333" × 7.5", full-bleed image, speaker notes attached)
~/.claude/skills/presentation-generator/scripts/build-pptx.py \
--plan deck-plan.json \
--slides-dir ./slides/ \
--output ./output/<deck-slug>-v1.pptx
Both run sequentially after Phase 5 (cheap relative to image generation). See reference/output-formats.md for dimension specifics.
Output convention
./
├── deck-plan.json # The narrative + composition plan
├── refs/
│ ├── style-ref-1.png
│ └── style-ref-2.png
├── slides/
│ ├── slide-01-cover.png
│ ├── slide-02-comparison-split.png
│ └── ...
└── output/
├── <deck-slug>-v1.pdf
└── <deck-slug>-v1.pptx
What's the style, what's the composition?
Before writing the deck plan, hold this distinction in your head:
Locks across the entire deck (in style_brief) |
Varies per slide (in each slide's composition + image_prompt) |
|---|---|
| Palette (exact hex codes) | What's depicted on the canvas |
| Typography vibe (e.g. "bold modern sans-serif") | Layout pattern (split / grid / single-subject / diagram) |
| Decorative background motif | Whether text dominates or imagery dominates |
| Aesthetic register (editorial / hand-drawn / dark UI) | Whether icons, charts, photos, or mockups appear |
| Card border styling, corner radius, glow treatments | Subject of any photographic content |
If you find yourself describing the palette per slide, you're doing it wrong — push it up to style_brief. If you find yourself describing layouts in style_brief, push them down to per-slide composition/image_prompt.
Composition rhythm — alternate to keep audiences awake
No two adjacent slides should use the same composition. Alternate high-info structured slides (cards, diagrams, charts) with low-info atmospheric slides (photo, quote, big-number). NotebookLM does this rigorously. So does any well-designed deck.
A typical 8-slide rhythm:
1. cover with motif ← title card
2. comparison split ← problem statement, visceral contrast
3. stacked principle cards ← codify the philosophy
4. big-number callout ← land the headline metric
5. architecture flowchart ← show how it works
6. timeline with cards ← roadmap
7. do/don't comparison ← brand voice / anti-patterns
8. three-column checklist ← CTA / next steps
Eight slides, eight different compositions, one consistent palette and motif. Reference: see templates/deck-plan.example.json for this exact structure as a worked example.
Hebrew / RTL / non-Latin scripts
gpt-image-2 renders Hebrew, Arabic, CJK, Hindi, and Bengali materially better than any prior model. For decks in those languages: write text_in_image in the target script, set language accordingly in the deck plan, and proceed normally. For long Hebrew/RTL paragraphs that come out garbled, see ~/.claude/skills/image-generation/reference/hebrew-rtl.md for the fallback two-stage workflow.
When NOT to use this skill
- The user wants a text document, Google Doc, or Notion page.
- The user wants natively-editable PowerPoint with rewritable text-and-shape slides → use the official
anthropics/skillspptxskill. - The user wants a single hero image or marketing visual → use the
image-generationskill directly. - The user wants a video / animated explainer → out of scope. Static decks only.
API keys and dependencies
OPENAI_IMAGE_API_KEY— required. From~/.claude/projects/-Users-shaharshavit/memory/api-keys.md→ "OpenAI (image generation)".GEMINI_IMAGE_API_KEY— optional. Only needed if any slide opts intomodel: gemini.- Python 3.10+ with
python-pptx(pip install python-pptx) andPillow. - Chrome / Chromium — for PDF rendering.
Cost discipline
A 10-slide deck at gpt-image-2 high (2560×1440) costs roughly $3-5 per generation pass, plus 2 style-lock refs ($0.40) and any QA-driven regenerations. Budget **$6 per finished deck**. If you exceed $12 on a single deck, stop and consult the user.
Reference docs
| File | What's in it |
|---|---|
| reference/narrative-frameworks.md | SCQA, Minto Pyramid, Duarte oscillation, Kawasaki 10/20/30 — distilled and operationalized |
| reference/slide-compositions.md | The composition vocabulary — 15+ formats with prompt skeletons (hero photo, big-number, comparison split, flowchart, timeline, infographic, UI mockup, quote card, etc.) |
| reference/visual-style-brief.md | How to write a style_brief that actually locks the deck's look. Brand-system integration. |
| reference/image-prompting.md | Per-slide prompt engineering for gpt-image-2 — text rendering, structured layouts, label legibility, palette injection |
| reference/consistency-tactics.md | Style lock, palette pinning, motif repetition, model selection |
| reference/output-formats.md | PDF vs PPTX dimensions, safe areas, gotchas |
| reference/examples/product-pitch-10.md | Worked example: 10-slide product pitch deck plan |
| reference/examples/research-explainer-7.md | Worked example: 7-slide research explainer |