name: illustrations description: > Generates the visual layer of a talk: deck illustrations (FULL / IMG+TXT slides with a shared style anchor), progressive-reveal build chains, and YouTube thumbnails. Owns style-strategy collaboration (informed by the speaker's visual_style_history in the vault), prompt safety, edit-vs-regenerate asymmetry, build chaining, title-safe-zone composition, and thumbnail composition with a real speaker photo. Invoked by presentation-creator during illustration strategy (Phase 2), illustration generation and application to the deck (Phase 5), and the post-event YouTube thumbnail (Phase 7). Triggers: "illustrate the deck", "generate illustrations", "create slide visuals", "design the visual style", "make a thumbnail", "build a YouTube thumbnail", "add visuals to my talk", "regenerate slide image", "fix the thumbnail", "generate progressive reveals", "build sequence for a slide". user-invocable: true
Illustrations
This skill is an action router — pick the step that matches the user's intent and execute only that step. Do not run other steps; do not parallelize.
Step 1 inspects the request and the talk-directory state to decide the mode (Strategy / Generation / Thumbnail) and which subsequent step is the entry point. The "Multi-mode chaining" section at the end of Step 1 is the one explicit exception, and only triggers when a single invocation requests multiple modes.
Owns every AI-generated image the toolkit produces: deck illustrations, build
chains, and thumbnails. Reads the vault for visual history, outline.yaml for
the style anchor and slide-level prompts, and the speaker-profile.json
for visual_style_history and publishing_process.thumbnail config.
The auto-loaded steering rules are the constitution: illustration-rules
(edit vs regenerate, build chains, iteration hygiene), title-overlay-rules
(safe-zone composition), and thumbnail-generation-rules (Phase 7 specifics).
Do not restate them here — apply them.
Key Files & References
| File / Reference | Purpose |
|---|---|
outline.yaml |
Source of truth — style_anchor (model, per-format anchors, composition, embedded_footer, text_treatment) + per-slide format / image_prompt / text_overlay / safe_zone / builds |
speaker-profile.json → visual_style_history |
Default style, departures, mode profiles, confirmed visual intents |
speaker-profile.json → publishing_process.thumbnail |
Speaker photo path + aesthetic preference |
illustrations/ (alongside outline) |
Generated images, builds, model-comparison output |
style-explore/ (alongside outline) |
Phase 2 exploration grid (style × model × format) + index.md + rendered.json manifest |
| skills/illustrations/references/strategy.md | Phase 2 detail — idea-sourcing wizard, optimization priorities, model shortlist, style proposals, exploration render, the render-before-bake gate, continuity devices |
| skills/illustrations/references/generation.md | Deck generation, edit/fix workflow, model comparison |
| skills/illustrations/references/builds.md | Backwards-chained build generation |
| skills/illustrations/references/thumbnails.md | Phase 7 thumbnail composition + slide selection |
| skills/illustrations/references/style-explore-candidates-schema.md | --style-explore contract — candidates.json input + rendered.json output |
skills/illustrations/scripts/model_registry.py |
Model roster, aliases, attributes; --check-freshness + --shortlist |
skills/illustrations/scripts/generate-illustrations.py |
Deck illustrations, edits, fixes, builds, model comparison, style exploration |
skills/illustrations/scripts/apply-illustrations-to-deck.py |
Insert illustrations + builds into a .pptx |
skills/illustrations/scripts/generate-thumbnail.py |
YouTube thumbnail composition |
Step 1 — Route by Mode
Determine which of three modes applies and execute only the matching steps:
- Strategy — outline has no STYLE ANCHOR yet, or the author wants to revise it. Run Step 2 (freshness), then Steps 3-9 (style strategy), then stop unless generation was also requested.
- Generation — outline has a STYLE ANCHOR and per-slide prompts. Run
Step 2 (freshness), Step 10 (illustrations), Step 11 (builds, if any slides
have a
builds:block), and Step 12 (apply to deck, if a .pptx exists). - Thumbnail — talk has been delivered and a video URL is available. Skip to Step 13.
Multi-mode chaining
If — and only if — a single invocation requests multiple modes (e.g., "design the visual style, then generate everything"), run them in order Strategy → Generation → Thumbnail. Proceed immediately to the first applicable step; do not pause for confirmation between modes. A single-mode invocation runs exactly the one matching step's chain and stops.
"Proceed immediately" governs handoff between modes — it never overrides the within-Strategy human gates (Steps 3, 4, 9). Those stop for the speaker even in a chained invocation.
Step 2 — Check Image-Model Freshness
Run the deterministic precheck first, then report its verdict in one line — never skip this step silently:
python3 skills/illustrations/scripts/model_registry.py --check-freshness
The script emits JSON: last_reviewed, age_days, stale, and the full
models roster (ids, aliases, attributes). State the verdict before
proceeding (e.g. "Registry reviewed 2026-06-02, 0 days old — fresh").
stale: true— useWebSearchto identify the current flagship image models from Google (Gemini image, Imagen) and OpenAI (gpt-image-*), plus any other vendor with a public image API. Reconcile the registry inskills/illustrations/scripts/model_registry.py: add new flagships, drop discontinued ones, refresh the attribute tiers, bumpREGISTRY_LAST_REVIEWED. "Nano Banana" is Google's codename for the Gemini image line (Nano Banana Pro = Gemini 3 Pro Image) — fold a codename into the matching entry'saliases, never add it as a separate model. The speaker approves the edits.stale: false— the roster is current; do not web-search.
For Generation mode entering an existing outline, also confirm the outline's
baked model is still in the roster. Check it against the JSON models list
(the alias map resolves codenames). If absent or clearly superseded, surface
that to the speaker — they keep the baked model or re-run the exploration
(Step 3). Never silently swap the model.
generate-illustrations.py dispatches by family: gemini-* / nano-banana-*
(Google generateContent), imagen-* (Google :predict), gpt-image-*
(OpenAI). A model from a new vendor family needs a model_family() +
_call_<vendor> adapter before it can render — surface that as a follow-up
script change.
Proceed immediately to Step 3 or Step 10 per Step 1's routing.
Step 3 — Source Style Ideas
Open the strategy collaboration by asking where the visual ideas come from.
Present an AskUserQuestion multi-select of idea sources — your usual, mode /
series match, new to you, wild card, what's trending, "I'll drive (here are my
references)" — plus a Quick-default fast path. The 3-4 proposals in Step 7 span
the checked sources. Never skip this step silently; never bake a model or style
from this menu by reasoning — every candidate reaches the speaker as a rendered
sample in Steps 8-9.
Source vocabulary, the Quick-default path, no-profile degradation: skills/illustrations/references/strategy.md.
Proceed immediately to Step 4.
Step 4 — Establish Optimization Priorities
Elicit what the speaker optimizes for with an AskUserQuestion multi-select
(checkboxes, not radio): cost, speed, quality, build-editability. Auto-add
build-editability when any slide has a Builds: block. Never skip this step
silently — the priorities drive the shortlist in Step 6.
Proceed immediately to Step 5.
Step 5 — Define Format Vocabulary + Composition
Ask the speaker — AskUserQuestion, never infer — how titles and footers are rendered:
- Bleed — title + footer baked into every image, stylized to the art (the
example noir deck). Striking and consistent, but not editable after
generation. Sets
style_anchor.composition: poster-theatricaland locks every illustrated slide to FULL (no IMG+TXT, no safe zones — EXCEPTION/screenshot slides without animage_promptare exempt); the text treatment + footer are recorded in the anchor at Step 9. - Overlay — titles + footers added by PowerPoint over a per-slide safe zone. Editable, uniform font, less integrated. Standard composition; the format vocabulary is FULL / IMG+TXT / EXCEPTION + any talk-specific additions.
Detail: skills/illustrations/references/strategy.md.
Proceed immediately to Step 6.
Step 6 — Narrow the Model Shortlist
Filter the roster to a shortlist by the Step 4 priorities — no render yet:
python3 skills/illustrations/scripts/model_registry.py --shortlist <priorities>
The roster is a seed cache, not an allowlist. To rank a model not in it (a new
flagship from Step 2, or one the speaker names), WebSearch its attributes and
pass --add '<json>', or list its id directly in candidates.json — see
strategy.md.
Proceed immediately to Step 7.
Step 7 — Propose Styles Across the Checked Sources
Propose 3-4 style options spanning the Step 3 sources, grounded in concept fit +
vault context (the speaker's visual_style_history, rhetoric-style-summary.md
Section 13). These are candidates for the render, not a commitment — the speaker
picks from rendered pixels, not prose. Option template:
skills/illustrations/references/strategy.md.
Proceed immediately to Step 8.
Step 8 — Render the Exploration Grid
Write style-explore/candidates.json (styles × shortlist × formats per the
schema), then render:
python3 skills/illustrations/scripts/generate-illustrations.py \
<outline> --style-explore style-explore/candidates.json
This writes the grid under style-explore/, an index.md contact sheet, and a
rendered.json manifest of what actually rendered. The grid is the only place a
model becomes eligible to bake. Never skip this step silently. The speaker picks
a style + model from style-explore/index.md.
candidates.json input + rendered.json output contract:
skills/illustrations/references/style-explore-candidates-schema.md.
Proceed immediately to Step 9.
Step 9 — Bake the Anchor, Then Verify
Write the style_anchor block — chosen model + per-format anchors + visual
continuity devices (conventions) — into outline.yaml. For poster-theatrical
composition, also set style_anchor.composition: poster-theatrical,
style_anchor.embedded_footer: <text>, and style_anchor.text_treatment: <how baked title + footer are rendered> (e.g. "glowing hand-script neon on an
in-scene surface").
Anchor vs per-slide split — everything that must look identical across slides lives in the anchor; only the scene and the literal title vary per slide:
- Anchor: visual style (per-format anchors),
text_treatment(the title + footer rendering style), andembedded_footer(the full footer text). These keep every baked title/footer consistent. - Per slide:
image_promptcarries only the scene;text_overlaycarries only the literal title string for that slide — never the text styling.
Then run the deterministic precheck and report its verdict in one line; never skip this step silently:
python3 skills/illustrations/scripts/generate-illustrations.py \
<outline> --check-style-explore
It confirms the baked model was rendered in the grid (exit 0) or refuses with an
actionable message (exit non-zero). On failure the baked model was never shown —
pick a rendered model from style-explore/index.md and re-verify, or re-render
(Step 8). Step 10 generation re-runs the same gate, so a baked-but-unrendered
model cannot generate.
Full protocol — the option template, continuity options, the gate contract: skills/illustrations/references/strategy.md.
Proceed immediately to Step 10 if generation was also requested; otherwise finish here.
Step 10 — Generate Deck Illustrations
Batch-generate every missing slide illustration from the outline:
python3 skills/illustrations/scripts/generate-illustrations.py \
outline.yaml remaining
Review with the author. For targeted corrections use --fix (preserves the
near-good output); for additions use full regeneration; for removals use
--edit. The edit-vs-regenerate asymmetry rule (illustration-rules §1)
governs which to pick. Save iteration versions (v2, v3) instead of
overwriting — see illustration-rules Iteration Hygiene.
Operational detail (compare modes, prompt patterns, retry ladder): skills/illustrations/references/generation.md.
Proceed immediately to Step 11.
Step 11 — Generate Builds
If any slides in the outline have a builds: block, generate the
backwards-chained build images. Each step's input is the previous step's
output — never regenerate independently from prompts. The edit prompt for each
step comes from builds[].erase (the additive builds[].desc is the
human-facing reveal); --build skips any non-final step whose erase prompt
lacks a "Keep ..." clause.
python3 skills/illustrations/scripts/generate-illustrations.py \
outline.yaml --build all
Output: illustrations/builds/slide-NN-build-MM.<ext> where <ext> is
the MIME-derived extension for each step (.jpg / .png / .webp
depending on the model and source image). Build-00 is the empty frame;
build-N is the full image. Detail and the per-step contract:
skills/illustrations/references/builds.md.
If no slides specify builds, proceed silently to Step 12.
Step 12 — Apply Illustrations to Deck
Insert generated illustrations and build sequences into the .pptx. Build slides replace their parent slide rather than duplicating after it; speaker notes go on the final build step only.
The script contract is DECK ILLUSTRATIONS_DIR OUTLINE_YAML (positional, in
that order), with optional --out, --image-ext, --scrim-color,
--scrim-alpha. It writes a new <stem>-with-titles.pptx next to the
input deck unless --out is given.
python3 skills/illustrations/scripts/apply-illustrations-to-deck.py \
deck.pptx illustrations/ outline.yaml
If no .pptx exists yet (Phase 5 hasn't run), finish here — presentation-creator Phase 5 will call back into this skill at Step 12 once the deck is built.
Step 13 — Generate Thumbnail
Run the thumbnail composition for a delivered talk. Surface 3–5 candidate slides ranked by visual impact, let the speaker pick, then compose:
python3 skills/illustrations/scripts/generate-thumbnail.py \
--slide-image illustrations/slide-NN.png \
--speaker-photo "$SPEAKER_PHOTO" \
--title "HOOK TITLE" \
--aesthetic <photo|comic_book>
Aesthetic precedence (thumbnail-generation-rules §7): explicit speaker
preference → default_illustration_style → confirmed intents → photo.
For illustrated decks, also pass --portrait-style "<anchor>" so the
portrait is pre-stylized to match the deck.
Iteration is conversational — change one thing at a time (style variant, expression, colors, title text, slide). Detail: skills/illustrations/references/thumbnails.md.
Finish here.