name: pixio-story description: Produce movies, TV shows, animated shorts, and illustrated stories end-to-end with the Pixio API. Orchestrates script breakdown, character anchors, keyframes, image-to-video chaining, audio, and final stitching. Trigger when the user wants a multi-shot video, animated story, comic, music video, trailer, episode, or any narrative output that requires consistent characters across many scenes.
Pixio Story Workflow
Produces narrative video or illustrated stories end-to-end through Pixio. Always uses the pixio-skill (or its API reference docs) as the underlying tool layer — this skill is the director, pixio-skill is the camera and crew.
Hard requirements
- A valid Pixio API key in
PIXIO_API_KEY(or ask the user). - PowerShell on Windows or bash + curl on Unix. Pipeline scripts ship in PowerShell; port to bash if needed.
ffmpegon PATH for stitching and audio mux (final stage only).
The 6-stage pipeline
Run stages strictly in order. Each stage writes its outputs to a project folder under ./projects/<project-name>/ so the pipeline is resumable and inspectable.
1. plan — story → outline → shot list (JSON)
2. bible — character + style reference sheet
3. anchors — one locked image per character (text-to-image)
4. keyframes — per-shot still using anchor + scene prompt (image-to-image edit)
5. animate — per-shot 5–10s clip from keyframe (image-to-video)
6. assemble — TTS, music, SFX, ffmpeg concat → final mp4
Stages 4 and 5 are the cost drivers. Encourage the user to run stage 1–3 first, review, then commit to 4–6.
Default model picks
Read references/model-picks.md for the full matrix and the reasoning. Defaults baked into the pipeline:
- Anchors (text-to-image, photoreal):
pixio/flux-pro/v1.1-ultra(7c) - Anchors (anime/stylized):
pixio/gpt-image-1.5(6c) orpixio/imagen4/ultra(6c) - Keyframes (character lock + scene compose):
pixio/nano-banana-2/edit(7c) - Animate (image-to-video):
pixio/wan/v2.7/image-to-videoorpixio/kling-video/v2.5/standard/image-to-video(check current credits) - Add audio to clip:
pixio/video-ops/add-audio(0c — combines tracks only)
Switch models per-project by editing the project's config.json (see examples/example-story.json).
Protocol
When invoked:
- Detect intent — confirm output type (short film, episode, illustrated story, music video, trailer) and approximate length. Length × ~5s/shot ≈ shot count.
- Cost preview — estimate credits before any generation. Formula in
references/cost-formula.md. - Run stage 1 (plan) — use
prompts/story-to-shotlist.mdto produceshots.json. Show the user the plan and pause for approval. - Run stage 2 (bible) — use
prompts/character-bible.mdto producebible.json. Pause for approval if any character will appear in more than 3 shots. - Run stages 3–6 — invoke
scripts/pipeline.ps1with the project folder. The script is resumable; if a stage fails midway it picks up from the last successful checkpoint. - Always save generation
contentIds so URLs can be refreshed (signed URLs expire in 1 hour — re-poll/api/v1/generations/{id}to get fresh ones).
Continuity rules
These are non-negotiable for coherent output. Detailed in references/continuity.md:
- One anchor image per character. Every keyframe edits from that anchor.
- Same anchor + same style suffix appended to every prompt.
- Last frame of clip N is the input to keyframe N+1 (for shot-to-shot motion continuity within a scene).
- Same aspect ratio for the entire project. Set once in
config.json.
Audio stage
references/audio.md covers TTS (per-character voice IDs), music generation, and SFX. The image-to-video models generate silent clips; audio is layered last with ffmpeg or Pixio's add-audio op.
Output
Final deliverables in ./projects/<name>/output/:
final.mp4— full assembly with audioshots/— individual clipskeyframes/— stills (also usable for storyboards, posters, thumbnails)bible.json+shots.json— source of truth, edit and rerun any stage
Sharing this skill
references/sharing.md documents how to zip + send, publish to a marketplace, or import on a teammate's machine.
When NOT to use
- Single-image generations — use
pixio-skilldirectly. - Live-action editing of a user-supplied video — use
pixio/video-ops/*directly. - Anything that needs frame-exact lipsync — current image-to-video models can't hold mouth shapes to a TTS track reliably. Note this limitation to the user up front.