visual-storyboard - SKILL.md Agent Skill

name: visual-panels description: > Render a 4K VISUAL PANEL SHEET (a single composite IMAGE: a clean full-bleed grid of numbered panels, each cell is the panel image edge-to-edge plus a plain corner number) by composing the image-generation prompt (a single multi-section block tuned for Nano Banana Pro / GPT Image 2) and calling `generate_image`. The deliverable is the rendered image, never a raw prompt for the user to copy. This skill does NOT author or plan a story: the editable SOURCE storyboard is the `interactive-storyboard` JSON, and this skill renders that finished JSON into a visual sheet. THE CYCLE is always: interactive storyboard → visual panels → video — do NOT skip the interactive storyboard. If the user wants panels (or a video) for a multi-shot story but there is NO storyboard yet, author the `interactive-storyboard` FIRST; render straight from a bare idea (no storyboard) ONLY when the user EXPLICITLY asks for just the image/sheet and not a storyboard. Use ONLY when the user explicitly wants the VISUAL / IMAGE output — a "panel sheet", "visual panels", "panel layout", "render the storyboard as an image", "break this story into panels", panels they can SEE — or when they pair "storyboard" with an image / video generation tool (Nano Banana Pro, GPT Image, Midjourney, DALL-E, Seedance, Kling, Sora, Veo, Runway, Luma, Hailuo, Wan, Higgsfield, Flux), or upload character references for a visual. Do NOT use this skill for a bare "make me a storyboard" / "shot list" / "scene plan" with no mention of images — that is the interactive JSON (`interactive-storyboard`), not this skill. Works for any visual style — 3D animation, live-action, anime, 2D animation, stop-motion, editorial, comic book, or any other aesthetic. To animate the approved sheet into the final video (per-clip prompt + rendering + timeline assembly), see `visual-panels-to-video`.

Visual Panels

Turn a story idea + character references into a 4K visual panel sheet: a single composite image, a clean full-bleed grid of numbered panels (each cell is the panel image edge-to-edge plus a plain number in the top-left corner).

This skill is end-to-end — it composes the image prompt AND calls generate_image to render the sheet. The deliverable is the rendered image, NOT a prompt string for the user to copy elsewhere.

⛔ Style: USE what the user already gave you — ask ONLY when it's genuinely missing

The visual style is a USER decision — NEVER infer it from the brand, topic, or vibe (a Chanel storyboard is not automatically "live-action luxury"; a Pokémon one not "anime"; a kids' product not "3D family-film"). But "don't infer" does NOT mean "always ask". First GATHER the style the user already provided; ask ONLY if none exists. Re-asking for a style the user already gave is its own reported bug ("¿por qué me vuelve a preguntar el estilo si ya se lo dije / si ya está en el storyboard?").

GATHER the style from ALL of these before deciding — use the first that is explicit:

Anything the user said in THIS conversation — not just their first/latest message. "en estilo anime", "fotorrealista", "como una peli", "el mismo estilo de antes" all count, wherever they said it.
The source interactive storyboard's stylePrompt — when NON-EMPTY it IS an explicit choice the user made upstream (the visor's style field). USE it; do NOT re-ask. (An EMPTY stylePrompt is the only "no choice" case → then style is genuinely missing.)
The brief / # WORKING AREA context the user pointed you at.

If an explicit style exists in ANY of the above → proceed with it, NO form. Raise the prompt_form style picker (Step 1 of STORYBOARD_ANATOMY.md: the 3 presets + custom + optional character-ref pickers + notes) ONLY when NONE of the sources carry an explicit style.

Same rule for character references — pull them from the storyboard's references / the user's attachments; do NOT re-ask for a photo you already have (see INPUTS). General principle: STOP asking for anything you can already read from the conversation, the storyboard JSON, or the working area — ask only the genuine unknowns.

This skill's reference files (read them, don't paraphrase from memory)

The authoritative specs live in this skill's own references/ directory. When the skill is activated, the runtime returns the skill's absolute directory plus a resources list — read each file from <that directory>/references/<file> (or list_skills → this skill's directory). NEVER hardcode ~/.koi/skills/...; in a dev checkout the skill resolves to the plugin repo path, so always use the activation-returned directory.

references/STORYBOARD_ANATOMY.md — the authoritative Phase 1 spec. The 6 steps (gather inputs → analyse references → break the story into beats → compose the prompt with sections A–H → call generate_image at 4K → companion note), the grid chooser, the per-section prompt template, the length targets, and the handling-variations table. Read this first before writing any prompt; the SKILL.md you're reading right now is just the entrypoint.
references/STYLE_PRESETS.md — the 3 official visual style presets (Premium 3D / Claymation / Realistic UGC) with ready-to-paste phrasing blocks for section B of the prompt, plus the custom-style flow for anything else (anime, live-action, watercolor, cyberpunk, …). POV is not a style here — it's a per-shot camera angle that combines with any of the styles.
references/VIDEO_TYPE_<TYPE>.md — five per-type spec files (ad / explainer / tutorial / demo / social-post). Read ONLY the one matching the user-named type for its brief-context note (internal — informs the panels, NOT rendered), the caption style, the shot mix and the audio cue. Never read all five. Skip entirely when the user didn't name a video type.
Sheet FORMAT lives in the prose of section E. The deliverable is a clean full-bleed grid: each panel's image fills its cell edge-to-edge, panels separated ONLY by thin black gutter lines, a plain number in the top-left corner of each panel, and the grid runs to all four margins. NOTHING else: no title banner, no cards, no drop shadows, no caption bars, no timecodes, no number badges, no footer/legend. See STORYBOARD_ANATOMY.md → section E for the exact wording. (references/LAYOUT_TEMPLATE.png is dead: the format used to be copied from that attached skeleton, do NOT attach it or any other format reference.)

⚠ STAMP THE SOURCE-STORYBOARD METADATA — non-negotiable

Every generate_image call this skill makes MUST carry metadata that declares where the sheet came from. This is what lets the downstream visual-panels-to-video skill recover per-shot durations / dialogue / SFX exactly as the user set them in the visor — without this link the next step has to guess from pixels and may confabulate (the reported bug "de repente cambia de tema, era un viejo con un reloj y dijo que era SOC 2").

Two cases, ONE field. Pick the right one and ALWAYS pass it:

Source = interactive storyboard JSON (most common — the user has a storyboard open in the visor and asked to render it visually):

metadata: {
  sourceStoryboard: "/Users/.../.koi/storyboards/<id>.json",  // ← absolute path
  storyboardPart: K,           // 1-based SHEET index (1 if single-sheet)
  storyboardParts: K_total,    // total SHEETS (1 if single-sheet)
  storyboardShotIds: ["sh1","sh2", …],  // ALL shot ids on THIS sheet (union across its clips)
  // panel→clip map — a sheet can hold several clips; this is what lets
  // visual-panels-to-video render one generate_video per clip from the
  // right panels. clipIndex is GLOBAL (1-based, timeline order across sheets).
  // See references/STORYBOARD_ANATOMY.md → Chunking → Step B.
  clips: [
    { clipIndex: 1, shotIds: ["sh1","sh2"], panels: [1,2,3], durationSec: 12 },
    { clipIndex: 2, shotIds: ["sh3"],        panels: [4,5],   durationSec: 8  }
  ]
}

You already read_filed the JSON to compose the prompt — its absolute path is what you pass.

Source = idea + refs only (no JSON, the user's first message was "hazme un storyboard de X" with refs / notes, never an interactive JSON):
```
metadata: {
  storyboardOrigin: "idea"
}
```
Explicit declaration that there is no JSON to link to — keeps audit trail clean.

Never call generate_image from this skill without one of those two metadata shapes. Both downstream tools and image-lineage notes depend on this. The runtime now logs a loud warning when this skill's call signature (label: "visual_storyboard") is missing both — don't ignore the warning, fix the call.

High-level flow

Activate this skill. The runtime returns the absolute directory; remember it for the reads below.
read_file references/STORYBOARD_ANATOMY.md. That's the spec — every step you need is there.
read_file references/STYLE_PRESETS.md to grab the phrasing block for the chosen style (or follow the custom-style flow there for anything outside the 3 presets).
read_file references/VIDEO_TYPE_<TYPE>.md IF the user named a video type. Skip otherwise.
Follow STORYBOARD_ANATOMY's 6 steps verbatim: gather inputs → analyse references → break the story into beats → compose the prompt → call generate_image (with resolution: "4k" AND the metadata block above — both mandatory) → show_result + companion note.

That's it. The detail lives in STORYBOARD_ANATOMY.md. Don't re-derive it here.

Pairs with

visual-panels-to-video (downstream) — once the user approves the sheet, this skill composes the cinematic per-clip video prompt, renders each clip, and assembles them on a timeline.