presentation-generator - SKILL.md Agent Skill

name: presentation-generator description: Generate superb 16:9 widescreen presentations (PDF and PPTX) where every slide is a custom AI-rendered image. Each slide is free to take any visual form — full-bleed photograph, structured infographic, architecture diagram, big-number callout, comparison cards, UI mockup with annotations, timeline, quote card, do/don't table — whatever the slide actually needs. The deck's style (palette, typography, decorative motif) locks globally; the composition varies per slide. The skill researches the topic (local files or web), designs a narrative arc (SCQA / Duarte oscillation / Kawasaki 10/20/30), locks a global visual style via a reference image, then generates all slides in parallel (concurrency 4) by calling the image-generation skill. Use whenever the user asks to create, build, design, generate, or draft a presentation, slideshow, slide deck, pitch deck, keynote, briefing deck, masterplan, or any visual document where slides should be visually distinctive rather than templated. SKIP when the user wants a text document, a Google Doc, a Notion page, or a natively-editable PowerPoint with shapes-and-text (use the official anthropics/skills `pptx` skill for that). allowed-tools: Read, Write, Bash, Edit, WebFetch

Presentation Generator

Image-as-slide decks. Every slide is a single 16:9 PNG generated by gpt-image-2. The slide is free to be anything visual: a cinematic photograph, a structured infographic with stacked principle cards, an architecture flowchart, a big-number stat, a comparison split-screen, a UI mockup with side annotations, a quote card, a do/don't table, an icon grid, a timeline, a hand-drawn whiteboard sketch.

The skill's job is to make Claude think like a creative director — narrative arc first, visual style locked, composition chosen per slide, prompts engineered with intention — before spending API tokens on imagery.

The two axes of a slide

Style (locks at the deck level): aesthetic, palette, typography vibe, decorative motif. Constant across all slides — this is what makes the deck feel like one artifact.
Composition (varies per slide): is this slide a hero photo, a flowchart, an infographic, a big number, a comparison? Pick what serves the slide's job.

Both NotebookLM's Cinematic Video Overviews and well-designed brand decks do exactly this. Lock style; vary composition.

The six phases

Research → Narrative Plan → Style Lock → Parallel Generation (×4) → QA → Assemble

Runs end-to-end autonomously. No approval gate between phases.

Phase 1 — Research

Build a dossier on the topic before touching the deck.

Local files referenced ("the project plan", a path, a slug) → read with the Read tool.
Online topic → WebSearch + WebFetch.
Mixed → both. Local files are ground truth; online is supporting context.
Brand book exists → if there's a BRAND.md from the brand-system skill in the project, read it. Use its palette, typography, and motif language verbatim in the deck-plan's style_brief. Set brand_source in the plan to point at it. This gives pixel-tight brand fidelity.

Keep the dossier in working memory: 5-15 key facts, quotes worth surfacing, numbers worth visualizing, audience signals.

Phase 2 — Narrative Plan

This is the most important step in the entire workflow. It is what separates a memorable deck from a generic one. Read reference/narrative-frameworks.md first.

Before drafting, answer in your head (or out loud):

Audience — who is this for, and what's their starting state of mind?
One takeaway — the one thing they must walk away believing.
SCQA — Situation → Complication → Question → Answer.
Slide count — default 8-10 (Kawasaki). 6 for a quick brief, 12 for a deep explainer. Hard cap at 15 unless the user explicitly asks for more.
Tone — formal / conversational / inspiring / pragmatic. Pick one.
Visual aesthetic — modern editorial photography / dark UI infographic / hand-drawn whiteboard / cinematic moody / minimalist tech / corporate isometric. Pick one. Lock it.
Palette — 3-5 hex colors. Primary, accent, neutrals.
Recurring motif — a visual element that appears (in some form) on most slides to tie them together. See reference/visual-style-brief.md.

Then write deck-plan.json to the cwd. Use the schema in templates/deck-plan.schema.json and the worked example in templates/deck-plan.example.json — the example is an 8-slide infographic deck with eight different compositions.

Every slide entry has:

id — 01, 02, ..., zero-padded
role — free-form: this slide's narrative function ("cover", "establish problem with visceral contrast", "land the headline metric", "walk through the architecture", "closing CTA"). Descriptive, not from an enum.
composition — free-form: this slide's visual format. Examples: "title cover with central decorative motif", "comparison split-screen (left = current, right = proposed)", "stacked principle cards (4 cards with icon + heading + body)", "big-number callout with supporting metric cards", "architecture flowchart with router and three branches", "horizontal timeline with three milestone cards", "two-column do/don't comparison table", "three-column checklist". Pick the composition that fits the slide's job. Read reference/slide-compositions.md for the full vocabulary — 15+ formats with prompt skeletons.
idea — one sentence: the single thing this slide must communicate.
text_in_image — exact text to render IN the image. Headlines, labels, the quote, the big number, axis labels, callout text. Use exact wording. For Hebrew / Arabic / CJK, write in the target script. Empty string for image-only slides.
image_prompt — the engineered prompt sent to openai-image.sh. Describe the composition explicitly (where elements sit on the canvas), the subject of each element, and any text-in-image with exact wording in quotes. Don't repeat the deck's style_brief — it's appended automatically. See reference/image-prompting.md.
speaker_notes — 40-60 words of narration the presenter says while this slide is up.
model — usually openai. Use gemini only if the slide needs >2 reference images merged or has been failing on openai.

The deck plan also has top-level fields: title, subtitle, audience, takeaway, aesthetic, palette (array of hex), motif, style_brief (a paragraph that gets injected into every image prompt), and optional brand_source and language.

Phase 3 — Style Lock

~/.claude/skills/presentation-generator/scripts/lock-style.py \
  --plan deck-plan.json \
  --output-dir ./refs/

Generates 1-2 abstract reference frames from the style_brief alone — no slide content yet, just the aesthetic, palette, and motif.

Read the resulting PNGs with the Read tool. Claude is multimodal — actually look at the pixels. Score them:

Is the palette right (compare hex codes to the planned palette)?
Does the motif read as intended?
Does the aesthetic match what you wrote?

If both refs land, pick the stronger one and set its path as style_ref in deck-plan.json. If neither lands, edit the style_brief and re-run. If the refs are mediocre but acceptable, ship them — Phase 5 QA will catch drift.

The chosen ref gets passed to openai-image.sh via --ref for every subsequent slide. This is what makes 8-10 independently-generated slides — even with totally different compositions — feel like one coherent deck.

Sibling decks — reuse the style ref. When the user is building a related deck for the same engagement (e.g., a product plan + a quote, or a pitch deck + an internal explainer), set both deck-plans' style_ref field to the same ref PNG path — copy or symlink it into the second deck's refs/ folder, or point both at an absolute path. Skip Phase 3 entirely for the second deck. This guarantees the two decks read as one engagement (same palette tone, same accent treatment, same motif), which is exactly what brand consistency demands.

Phase 4 — Parallel Generation (concurrency 4)

~/.claude/skills/presentation-generator/scripts/generate-deck.py \
  --plan deck-plan.json \
  --output-dir ./slides/ \
  --concurrency 4

Reads the deck plan, builds one shell command per slide (calling openai-image.sh by default), and runs them through a Python ThreadPoolExecutor(max_workers=4). The deck's style_brief is appended to every per-slide prompt. The style_ref is passed via --ref for style coherence.

Outputs land in ./slides/slide-NN-<slug>.png. Default size: 2560×1440 (16:9, both edges multiples of 16 — required by gpt-image-2; 1920×1080 is invalid because 1080 isn't a multiple of 16). Quality: high. Background: opaque.

Progress streams to stderr. On failure, the slide is logged but does not kill siblings.

Phase 5 — QA

~/.claude/skills/presentation-generator/scripts/qa-slides.py \
  --plan deck-plan.json \
  --slides-dir ./slides/

Mechanical checks (file exists, dimensions correct, file size sane, not all-black/white). Then visual QA: Claude reads every slide PNG with the Read tool and scores it against the slide's idea, composition, and text_in_image:

Did the composition land (right format, right layout)?
Is text legible and spelled correctly? (Pay close attention to in-image text — the model's biggest failure mode is mangled letterforms or wrong wording.)
Does the palette adhere to the plan?
Is the motif present?
Any defects (broken geometry, drifted style, wrong icons)?

Slides that fail visual QA get regenerated with a refined prompt. Cap at 2 retries per slide. If a slide still fails after 2 retries, surface it to the user with a specific description of what's wrong.

Phase 6 — Assemble

Always emits both PDF and PPTX by default. Pass --format pdf or --format pptx to constrain.

# PDF (HTML deck → Chrome headless)
~/.claude/skills/presentation-generator/scripts/render-pdf.sh \
  --plan deck-plan.json \
  --slides-dir ./slides/ \
  --output ./output/<deck-slug>-v1.pdf

# PPTX (python-pptx, 13.333" × 7.5", full-bleed image, speaker notes attached)
~/.claude/skills/presentation-generator/scripts/build-pptx.py \
  --plan deck-plan.json \
  --slides-dir ./slides/ \
  --output ./output/<deck-slug>-v1.pptx

Both run sequentially after Phase 5 (cheap relative to image generation). See reference/output-formats.md for dimension specifics.

Output convention

./
├── deck-plan.json          # The narrative + composition plan
├── refs/
│   ├── style-ref-1.png
│   └── style-ref-2.png
├── slides/
│   ├── slide-01-cover.png
│   ├── slide-02-comparison-split.png
│   └── ...
└── output/
    ├── <deck-slug>-v1.pdf
    └── <deck-slug>-v1.pptx

What's the style, what's the composition?

Before writing the deck plan, hold this distinction in your head:

Locks across the entire deck (in `style_brief`)	Varies per slide (in each slide's `composition` + `image_prompt`)
Palette (exact hex codes)	What's depicted on the canvas
Typography vibe (e.g. "bold modern sans-serif")	Layout pattern (split / grid / single-subject / diagram)
Decorative background motif	Whether text dominates or imagery dominates
Aesthetic register (editorial / hand-drawn / dark UI)	Whether icons, charts, photos, or mockups appear
Card border styling, corner radius, glow treatments	Subject of any photographic content

If you find yourself describing the palette per slide, you're doing it wrong — push it up to style_brief. If you find yourself describing layouts in style_brief, push them down to per-slide composition/image_prompt.

Composition rhythm — alternate to keep audiences awake

No two adjacent slides should use the same composition. Alternate high-info structured slides (cards, diagrams, charts) with low-info atmospheric slides (photo, quote, big-number). NotebookLM does this rigorously. So does any well-designed deck.

A typical 8-slide rhythm:

1. cover with motif      ← title card
2. comparison split      ← problem statement, visceral contrast
3. stacked principle cards ← codify the philosophy
4. big-number callout    ← land the headline metric
5. architecture flowchart ← show how it works
6. timeline with cards   ← roadmap
7. do/don't comparison   ← brand voice / anti-patterns
8. three-column checklist ← CTA / next steps

Eight slides, eight different compositions, one consistent palette and motif. Reference: see templates/deck-plan.example.json for this exact structure as a worked example.

Hebrew / RTL / non-Latin scripts

gpt-image-2 renders Hebrew, Arabic, CJK, Hindi, and Bengali materially better than any prior model. For decks in those languages: write text_in_image in the target script, set language accordingly in the deck plan, and proceed normally. For long Hebrew/RTL paragraphs that come out garbled, see ~/.claude/skills/image-generation/reference/hebrew-rtl.md for the fallback two-stage workflow.

When NOT to use this skill

The user wants a text document, Google Doc, or Notion page.
The user wants natively-editable PowerPoint with rewritable text-and-shape slides → use the official anthropics/skills pptx skill.
The user wants a single hero image or marketing visual → use the image-generation skill directly.
The user wants a video / animated explainer → out of scope. Static decks only.

API keys and dependencies

OPENAI_IMAGE_API_KEY — required. From ~/.claude/projects/-Users-shaharshavit/memory/api-keys.md → "OpenAI (image generation)".
GEMINI_IMAGE_API_KEY — optional. Only needed if any slide opts into model: gemini.
Python 3.10+ with python-pptx (pip install python-pptx) and Pillow.
Chrome / Chromium — for PDF rendering.

Cost discipline

A 10-slide deck at gpt-image-2 high (2560×1440) costs roughly $3-5 per generation pass, plus 2 style-lock refs ($0.40) and any QA-driven regenerations. Budget **$6 per finished deck**. If you exceed $12 on a single deck, stop and consult the user.

Reference docs

File	What's in it
reference/narrative-frameworks.md	SCQA, Minto Pyramid, Duarte oscillation, Kawasaki 10/20/30 — distilled and operationalized
reference/slide-compositions.md	The composition vocabulary — 15+ formats with prompt skeletons (hero photo, big-number, comparison split, flowchart, timeline, infographic, UI mockup, quote card, etc.)
reference/visual-style-brief.md	How to write a `style_brief` that actually locks the deck's look. Brand-system integration.
reference/image-prompting.md	Per-slide prompt engineering for gpt-image-2 — text rendering, structured layouts, label legibility, palette injection
reference/consistency-tactics.md	Style lock, palette pinning, motif repetition, model selection
reference/output-formats.md	PDF vs PPTX dimensions, safe areas, gotchas
reference/examples/product-pitch-10.md	Worked example: 10-slide product pitch deck plan
reference/examples/research-explainer-7.md	Worked example: 7-slide research explainer