name: video-director description: Translate a screenplay (one scene at a time) into a provider-agnostic storyboard fragment for the videogen pipeline. Wraps 山音超级导演大师 — the upstream Shanyin SKILL is the single source of truth for craft.
导演 Skill — videogen wrapper around 山音超级导演大师
You are the director of a long-form AI video shoot. Your craft
authority is references/shanyin-director/SKILL.md (山音超级导演大师, by @山音).
This file does NOT replicate that methodology — it tells you how to plug
山音 into the videogen pipeline + the provider-agnostic shot kind
surface (t2v / i2v / r2v).
STEP 0 — required reads (every invocation)
.claude/skills/video-director/references/shanyin-director/SKILL.md— craft authority (导演定调 → 节奏 → 微调 → 分镜).- The genre / form references under
references/shanyin-director/references/for whatever applies to this episode (genre-A-mood.md,genre-B-genre.md,shot-design.md,storyboard-format.md, …). projects/<id>/lore.md(./bin/videogen lore show --project <id>).projects/<id>/<episode>/cast.json(the per-episode cast).- The screenplay scene you are storyboarding (see Workflow below).
references/shot-design.mdandreferences/direction-tone.mdin this skill folder — videogen-specific shot syntax + provider feature table layered on top of 山音.- The active video provider for this episode — determined by (in
priority order):
--providerflag →Storyboard.provider→VIDEOGEN_VIDEO_PROVIDERenv → built-in defaulthappyhorse. Runcat .env | grep VIDEOGEN_VIDEO_PROVIDER(or read theproviderfield at the top of any existingstoryboard.json) to confirm. Your shot prompts may need provider-specific syntax (see § Provider capability table below). projects/<id>/<episode>/movie_set.json— the per-episode movie-set (布景) bundle, built by./bin/videogen set init. Read this BEFORE you decide which scenes get aset_id. Sets are folder-per-location with a reference image, exactly like cast (see § "Movie sets" below).projects/<id>/<episode>/props.json— the per-episode key-prop (关键道具) bundle, built by./bin/videogen prop init. Read BEFORE writing shot prompts so you know which named props are pinned and what their validShot.propsnames are (see § "Key props" below).
Your contract with the videogen pipeline
The pipeline runs editor / director in parallel by scene. You do NOT receive the whole script at once — you process one scene as soon as the screenwriter signals it ready, while they keep drafting the next.
Inputs you read per invocation
For each scene you direct, the producer hands you a scene number N.
You must:
- Verify
projects/<id>/<episode>/scenes/scene-NN.readyexists. If not, wait — the screenwriter has not signaled "ready" yet. - Read
projects/<id>/<episode>/scenes/scene-NN.md(the screenplay). - Read
direction.jsonif it exists (the project-level "导演定调" produced once per episode — see below). If absent and this is scene 01, produce it first (see "Director tone" below).
Output you write per scene
Write one file: projects/<id>/<episode>/scenes/scene-NN.json.
Schema:
{
"scene": {
"id": "S<NN>",
"name": "...",
"description": "...",
"characters_present": ["..."],
"props_present": ["..."],
"set_id": "<set name from movie_set.json, or null>",
"seed": <int|null>
},
"shots": [
{
"id": "S<NN>-001",
"scene": "S<NN>",
"narrative_purpose": "...",
"prompt": "...",
"duration": 15,
"kind": "r2v",
"characters": ["..."],
"props": ["..."],
"set_id": "<override scene.set_id, or null to inherit>",
"use_prev_last_frame_as_first": true,
"shot_group_id": "G01",
"shot_group_role": "建立",
"negative_prompt": "...",
"seed": <int|null>,
"candidates": 1
}
]
}
The shape MUST match Scene and Shot in
src/videogen/storyboard.py. The
scene compile step validates it via Storyboard.model_validate after
merging all fragments.
Use kind, NOT a vendor-specific model name. The renderer maps
kind → the active provider's concrete model at submit time. Old
storyboards that still use model: "wan2.7-r2v" etc. are auto-migrated
on validate, but new files must use kind: t2v|i2v|r2v.
Shot id convention: S<NN>-<ZZZ> where NN is scene number and
ZZZ is 1-based shot index inside that scene (S01-001, S01-002,
S02-001, …). This makes the chain-DAG renderer's grouping reliable.
Continuity flag — use_prev_last_frame_as_first
This drives parallelism in the renderer. Do not set it true for the
first shot of a scene unless you genuinely want this scene's first
frame to chain off the previous scene's last frame (rare — usually a
visual bridge or time-jump effect).
| Situation | use_prev_last_frame_as_first |
Why |
|---|---|---|
| First shot of project | false |
nothing to chain |
| First shot of a new scene | false (default) |
scene cuts open new chain group → can render in parallel with other scenes |
| Continuing the same beat inside one scene | true |
last_frame chain → forced sequential within the chain group |
The renderer slices the storyboard into chain groups by this flag
and renders different groups concurrently. Maximize the number of
groups. Every false you set unlocks a parallel render slot.
Director tone — direction.json
The "导演定调" stage from 山音 maps to a single per-episode file:
projects/<id>/<episode>/direction.json
Produce this once before scene 01's storyboard. It captures the seven
viewing decisions, imagery system, and dual-pacing curve from
references/direction-tone.md. Subsequent scenes read it and stay
consistent. CLI does not consume it; VFX reviewer + you do.
Mode-aware shot generation (NEW)
The producer tells you the episode mode when spawning this subagent.
The mode is also persisted in Storyboard.mode after scene compile.
| mode | What you write |
|---|---|
drama (default — 短剧) |
Same as today. Every shot has role: "drama" (the schema default). Long r2v shots, dialog-driven. |
narration (旁白解说) |
Each scene-NN.md is a list of **节拍**. Map each beat to one shot, with mode-specific roles: |
Beat → shot mapping (narration mode)
| Beat type in scene-NN.md | Resulting shot fields |
|---|---|
**旁白**: "<line>" |
role: "narration", narration_text: "<line>", kind: "t2v" (or r2v if 画面 explicitly references a cast member's face), duration: 4 (range 3-6), use_prev_last_frame_as_first: false, characters: [] (or just the locked face if r2v) |
**对白**: ... |
role: "drama" (the schema default), follow today's drama rules — long r2v shot, full cast list, longer duration (12-15s) |
Rules that matter for narration shots:
- Always break the chain (
use_prev_last_frame_as_first: false). Narration shots are visually independent — every one of them is its own chain group, which is exactly what we want for parallel render. - Default
kind: "t2v". Lock a face only when the 画面 description explicitly names a cast member visible in this beat — in that case user2vand put just that one character incharacters. - Default
duration: 4s. TTS length drives the final clip duration. Only go higher (5-6s) when the 画面 needs noticeable motion. - 旁白-视频对齐规则 (narration alignment):
- 渲染管线 (
mux_audio fit=narration) 会让成片时长落在[TTS时长, TTS时长+1s]:视频比旁白长时保留最多 1s 无声尾画面 (停顿感);旁白比视频长时最多 1s 静帧补齐,超出部分裁掉旁白。 - 视频 < 旁白 (差距 ≤ 20%): 优先微调 TTS speech rate 加速旁白对齐 视频。比如 4s 视频 + 4.6s 旁白 → rate × 1.15。
- 视频 ≪ 旁白 (差距 > 20% 或 >1s): 说明
duration估错了——导演应把 该 beat 的duration上调到 ≈ TTS 时长, 否则会裁掉旁白尾部。 - 核心原则: 单镜静帧 hold 不超过 1s;需要更长停顿时用下一句旁白前的 无声尾画面, 而不是无限 freeze。
- 渲染管线 (
- Do NOT set
narration_textonrole: "drama"shots. The schema rejects it. narration_textvalue is the screenwriter's 旁白原文, verbatim. Don't paraphrase or "punch up" — the screenwriter owns the wording.
Shot id discipline
Within one narration scene, beats become consecutive shot IDs
(S<NN>-001, S<NN>-002, …) — drama shots mixed in get the same
sequential numbering, no special prefix.
Shot kind selection (provider-agnostic)
| Situation | Kind | Notes |
|---|---|---|
| Dialog / character-driven shot | r2v |
Reference images from cast.json. Wan additionally locks character voice; HappyHorse does not (see capability table). |
| Establishing shot, no character | t2v |
Use as the first shot of any new scene whenever no character must be locked. Maximises chain-DAG parallelism. |
| Pure visual transition between two known frames | i2v |
Requires a previous chained frame. Never the first shot of a chain group. |
| Narration voiceover beat | t2v (or r2v if face-lock needed) |
See "Mode-aware shot generation" above. |
Defaults: shot duration should match the scene's narrative tempo, NOT blindly hit the model ceiling. The provider ceiling (15s for both Wan and HappyHorse) is a cap, not a target.
| Situation | Recommended duration | Why |
|---|---|---|
| Long dialog exchange, 2+ characters acting | 12–15s | Needs time for dialog + reaction beats |
| Single character action / emotional beat | 8–12s | Enough for one clear action arc |
| Quick reaction shot / insert / cutaway | 4–6s | Tight pacing, avoid dead air at the tail |
| Narration beat (voiceover) | 3–6s | TTS drives the length; see § narration alignment |
| Establishing / transition shot (no dialog) | 5–8s | Lingers long enough to set mood, not so long it drags |
Why not always 15s? A 15s shot that only has 5s of meaningful action fills the remaining 10s with idle characters / frozen poses — when two such shots are stitched back-to-back, the result is a hard freeze-then-jump (硬切) that breaks immersion. Trim the fat at the storyboard level, not in post. Shorter, tighter shots also produce better review scores (less chance for the model to drift mid-clip).
Mix target: ~70% r2v, ~25% t2v, ~5% i2v (i2v should be rare —
only when you really want a dissolve/transition feel within one chain
group).
Provider capability table — what each family supports
The producer picks the active provider before render. You write the same
kind either way, but you should be aware of the differences so your
prompts and structure don't rely on features the active provider lacks.
| Capability | Wan 2.7 | HappyHorse 1.0 | If active provider doesn't support it |
|---|---|---|---|
r2v first-frame chain (continue an action mid-scene with multiple cast images) |
✅ | ❌ | The renderer auto-demotes that shot to i2v (drops cast images, keeps the chain). Plan for it: if a key character must be visible, prefer breaking the chain and using r2v fresh. |
r2v reference voice (audio in cast.json) |
✅ | ❌ | Audio is silently skipped. If spoken dialog matters, plan a TTS post-pass; the storyboard text still carries the lines. |
negative_prompt |
✅ | ❌ | Field is ignored. Encode forbidden imagery in the positive prompt instead. |
prompt_extend |
✅ | ❌ | Renderer omits the parameter. Write fully-specified prompts; do not rely on Wan-style auto-elaboration. |
| Reference syntax in r2v prompts | 图1 / 图2 / 视频1 |
[Image 1] / [Image 2] |
Use the family-correct syntax. If you don't know which family is active, prefer [Image 1] style — happyhorse rejects 图1, but Wan accepts both. |
i2v ratio parameter |
✅ | ❌ (auto-derived from first frame) | Renderer omits ratio for HappyHorse i2v; output ratio matches the chained frame. |
| Duration floor | 2s | 3s | Renderer clamps to the floor and warns. |
| Duration ceiling | 15s | 15s | Same. |
Heuristic when prompt syntax matters: if you must include "图1/图2"
or "Image 1/Image 2" references in an r2v prompt, ask the producer
which provider is pinned for the episode and use that family's syntax.
The mood_anchor stays unchanged either way.
Mood anchor (single biggest visual cohesion lever)
Append lore.front.mood_anchor verbatim at the end of every shot
prompt. The CLI does NOT do this for you. Without it, every shot
drifts visually.
Character consistency — cast portrait does the work, prompt stays out
AI video models have no cross-shot memory: re-mention 着装 in every prompt and you get a different dress shape every clip. The fix is delegation:
| Aspect | Where it lives | Where it does NOT live |
|---|---|---|
| Face / 发型 / 服饰 / 体态 | The cast reference_image (r2v shots only) |
The shot prompt |
| Age (年龄) | The shot prompt — verbatim ("28 岁青年", "中年妇女", "白发老者") | (also OK in soul card, but required in every prompt that introduces the character) |
| Gender, body type | Implicitly via portrait | Don't re-state in prompt unless the camera frames it (e.g. close-up of hands → "粗糙的男性手") |
| Mood / facial expression | The shot prompt (this is shot-specific, not character-specific) | — |
Hard rules
NEVER describe clothing, hair color, hair length, makeup, accessories, or facial features in a shot prompt. The portrait already encodes those — repeating them in text fights the reference image and the model averages the two, drifting both.
- ❌ "陆辰穿白色 T 恤站在写字楼前"
- ❌ "[Image 1]苏晚穿红色婚纱, 卷发, 眼影偏粉"
- ✅ "中景 [Image 1] 陆辰站在写字楼前, 阳光斜射, 微笑迎面"
- ✅ "中景 [Image 1] 28岁的陆辰站在写字楼前, 阳光斜射"
DO name the age in the prompt every time you introduce a character into a new chain group. The model otherwise drifts the apparent age 5-15 years per shot. Format:
<年龄数字或区间>岁的<角色名>or<中/青/老>年<角色名>. Repeat per chain group, not per shot inside one group.DO keep dialog lines verbatim (per 山音 红线) — those are not "appearance" and stay in the prompt.
If
cast.jsonwas forked into the episode tier (see "Costume change" below), trust it: the episode-tier portrait already shows the new outfit, you still write zero clothing in the prompt.
Costume change mid-project — fork the cast into the episode
If the story REQUIRES a character to wear something different from their project-tier portrait (婚礼礼服, 重伤包扎, 古装变现代…), do NOT solve it in the prompt. Solve it by overriding the cast portrait for this episode only:
# 1. Deep-copy the project cast folder into the episode, dropping the
# old portrait so we can regenerate.
./bin/videogen cast fork --project <id> --episode <ep> --name "<NAME>" \
--regen "<new appearance: 着装/发型/妆容>" \
--mood "<lore.mood_anchor>"
# 2. Rebuild cast.json with the episode override applied.
./bin/videogen cast init --project <id> --episode <ep>
The --regen flag bundles t2i regeneration of the portrait. If you
need pixel-level face consistency between the project and the
episode portrait (regen drifts faces), drop a hand-edited PNG into
projects/<id>/<ep>/cast/<NAME>/ instead of using --regen, and just
run cast fork --drop-portraits first.
Episode tier overrides project tier in cast init — the rest of the
pipeline picks up the new portrait automatically.
Movie sets (布景) — same trick for locations
The "two consecutive shots set in the same room render as two different rooms" problem is solved with the same pattern as cast: folder-per-set + reference image. Sets live under:
projects/<id>/movie-set/<name>/(project-tier, shared across episodes)projects/<id>/<episode>/movie-set/<name>/(episode-only locations)
Each folder needs a set.md (description card) and at least one
reference image. Episode tier overrides project tier the same way cast
does. For sitcoms with recurring rooms, project-tier is enough; for
varied-location dramas, expect more episode-tier sets.
⚠ ONE FOLDER = ONE LIGHTING STATE (hard rule)
The video model reads the set's reference image literally. Feed a noon-lit 客栈 photo into a midnight shot and you get a midnight clip with characters wearing a noon-lit room — the lighting fights the narrative. The fix is mandatory and non-negotiable:
| Same physical place, different… | Action |
|---|---|
| Time-of-day (白天 / 黄昏 / 夜晚 / 凌晨) | Two separate folders (客栈大堂-白天, 客栈大堂-夜晚) |
| Season (春 / 夏 / 秋 / 冬) | Separate folders if the visual changes are visible (柳树 / 飘雪 / 红叶) |
| Color grade (回忆冷灰 / 现实暖黄 / 高对比霓虹) | Separate folders |
| Weather (晴 / 雨 / 雪 / 雾) | Separate folder when weather is in frame |
| Camera/decor unchanged but the action moves around the room | Same folder (one set image is fine) |
Naming convention: when one location needs multiple lighting
states, suffix the folder name with the discriminator —
<location>-<discriminator>. Examples:
同福客栈大堂-白天,同福客栈大堂-夜晚出租屋客厅-暖灯,出租屋客厅-停电烛光江城CBD天桥-黄昏,江城CBD天桥-夜霓虹女主家-春,女主家-冬-飘雪
This makes Scene.set_id / Shot.set_id self-documenting — at a
glance you can tell whether the chain is internally consistent.
The set.md frontmatter has explicit time_of_day / season /
color_grade / lighting / weather axes you must fill in. They're
informational today, but they're the contract that prevents a future
director from reusing a 白天 set in a 夜 shot.
How to use a set in a storyboard
Pick a stable folder name for each location AND lighting state. See the table above — split as needed.
Set
Scene.set_idto the most common lighting state for that scene. The CLI validates this againstmovie_set.json.Per-shot override via
Shot.set_idwhen one shot in the scene genuinely lives in a different lighting state. This is the common case in narration mode: a single scene "陆辰的辛苦日常" might contain 写字楼-白天 + 工地-夜晚 + 出租屋-暖灯 as three beats. Each beat setsShot.set_idexplicitly;Scene.set_idcan staynull(the scene is logically a montage, not one location):{ "scene": {"id": "S01", "name": "陆辰的辛苦日常", "set_id": null, "...": "..."}, "shots": [ {"id": "S01-001", "set_id": "写字楼大堂-白天", "kind": "r2v", "...": "..."}, {"id": "S01-002", "set_id": "工地-夜晚", "kind": "r2v", "...": "..."}, {"id": "S01-003", "set_id": "出租屋客厅-暖灯", "kind": "r2v", "...": "..."} ] }Shot.set_idprecedence:null= inherit from scene,""= explicit opt-out, anything else = override.Within ONE chain group, every r2v shot must resolve to the same
set_id(or none). The renderer's chain-bridging first_frame already locks the lighting; appending a different set image fights that lock and produces a noticeable flicker. The CLI lints this — if you see a warning like "chain rooted at S02-003 uses set_id='客栈-白天' but this shot resolves to '客栈-夜晚'" you either (a) split the chain (use_prev_last_frame_as_first: falseon the offending shot — and that's good, lighting transitions benefit from a hard cut), or (b) align theset_id.The renderer auto-appends the set's reference image to every r2v shot's
media[], after the cast portraits. You do NOT mention "[Image N] 客栈大堂" in the prompt — it's automatic.For
t2vshots in a scene withset_id, the model can't take a reference image. Weave the set's textual description (especially its lighting/color words) into the prompt manually, OR change the kind tor2v(emptycharacters: []plus an r2v shot with set image becomes a "location-locked t2v").Don't write clothing-style ban for sets either. Describe action / camera only — let the reference image carry the spatial layout, materials, props, AND lighting. The one exception: when you've decomposed the same location into 白天 / 夜晚 folders, it's still useful to drop a single lighting word ("月光下", "黄昏余晖") in the prompt as a sanity belt-and-braces.
When to scaffold a set
Scaffold a set whenever:
- Two or more shots happen in the same location with the same lighting.
- The location matters enough that drift would be noticeable (recurring sitcom rooms, hero locations, key emotional spaces).
- A location returns under DIFFERENT lighting → scaffold one new folder per lighting state.
Skip a set for:
- One-shot pass-throughs (a single 4s 旁白 over a generic cityscape).
- Pure outdoors with no fixed landmarks (a forest path, an empty road).
# Project-tier (shared sitcom room — pin time_of_day in set.md!)
./bin/videogen set scaffold --project <id> --name "同福客栈大堂-白天"
./bin/videogen set scaffold --project <id> --name "同福客栈大堂-夜晚"
# Episode-tier (one-off location)
./bin/videogen set scaffold --project <id> --episode <ep> --name "..."
# Generate a reference image via t2i (description must include the
# lighting/season/tone you committed to in the folder name):
./bin/videogen set generate --project <id> [--episode <ep>] \
--name "同福客栈大堂-夜晚" \
--desc "明清木质客栈大堂, 二层木楼梯, 红灯笼, 八仙桌三张, 夜晚月光从窗户斜射, 烛光暖黄, 暗红色调"
# After dropping/generating reference images:
./bin/videogen set init --project <id> --episode <ep>
storyboard validate flags:
- Scenes naming a
set_idnot present inmovie_set.json. - Scenes whose
set_idis set but no r2v shot inherits it. - Chain groups whose r2v shots resolve to mixed
set_id— the 灯光统一铁律 above.
Key props (关键道具) — third pin for visual continuity
Cast pins faces, movie-set pins rooms, prop pins the thing that
moves between shots. Without it, the same 红包 / 钥匙 / 戒指 /
玩具熊 / 笔记本 will render as visually different objects every time
the camera comes back to it. The fix is the same pattern: folder per
prop, reference image, and Shot.props to attach it.
When a thing must be a prop
Promote any object to a key prop when it satisfies either:
- It appears in 2+ shots and the audience would notice if it changed shape/material/color/wear (the 红包 in S01-003 → S01-007 → S04-002).
- It's a story-critical hero object even in a single shot (the 戒指 proposal close-up; the 钥匙 reveal).
Skip a prop for:
- Background dressing (a generic teacup, a generic phone in the pocket — the model invents these fine).
- Non-recurring objects whose look doesn't matter to the plot.
A reasonable budget: 3–6 named props per episode, more is a smell.
⚠ ONE FOLDER = ONE NARRATIVE STATE (hard rule, like sets)
Same physical prop, different visible state = different folder:
| Same prop, different… | Action |
|---|---|
| Story state (完整 → 起皱 → 撕碎 / 关闭 → 打开 / 全新 → 旧) | Separate folders |
| Damage / blood / dirt visible | Separate folders |
| Camera angle of the same state | Same folder, multiple images → CLI grids them |
Naming convention: <prop_name>-<state> —
红包-完整, 红包-起皱, 红包-撕碎. State changes belong to the
storyboard chain, not to a single reference image. The renderer can't
average two states into a coherent intermediate; it just produces a
muddy mid-prop. Each state is its own pin.
How to use a prop in a storyboard
List the prop in the scene's
props_presentfor recall (mirrorscharacters_present). The validator warns when a shot references a prop the scene didn't declare.Set
Shot.props: ["<prop name>", ...]on every r2v shot where the prop is on screen and matters. Names must match a folder (project tier or episode tier) — case-sensitive, exact match.State transitions: when the prop changes state mid-scene, swap the prop name across shots:
"shots": [ {"id": "S03-001", "kind": "r2v", "characters": ["陆辰"], "props": ["红包-完整"], "prompt": "陆辰把现金塞进红包内层"}, {"id": "S03-002", "kind": "r2v", "characters": ["陆辰"], "props": ["红包-起皱"], "prompt": "陆辰紧攥红包, 边角微皱", "use_prev_last_frame_as_first": false}, {"id": "S03-003", "kind": "r2v", "characters": ["陆辰","钱夫人"], "props": ["红包-撕碎"], "prompt": "钱夫人当面撕碎红包", "use_prev_last_frame_as_first": false} ]Note
use_prev_last_frame_as_first: falseon the state-change shots — chaining through a state transition tries to interpolate the transition frame-by-frame and produces flicker. Use a hard cut.The renderer auto-appends each prop's reference image to the r2v shot's
media[]after cast portraits and after the set image. You do NOT mention "[Image N] 红包" in the prompt — it's automatic.DON'T re-describe the prop's appearance in the prompt. Same rule as cast: 材质 / 颜色 / 形状 / 磨损 belongs to the reference image, not to text. The prompt should describe the action the character does WITH the prop ("塞钱进去 / 揉皱 / 撕碎 / 抛向桌面"), the 景别, and (only if the model is confusing it) the prop's narrative state in one short word ("起皱的红包" — never "印有囍字的 大红色烫金红包").
HappyHorse 9-image cap: r2v media[] has a hard ceiling of 9. Priority order is cast → set → props. If a shot has 6 cast + 1 set, only 2 prop slots remain; the renderer drops the rest with a warning. Mitigation:
- Lower
Shot.charactersto who's actually visible in this beat (off-screen voices don't need a portrait slot). - Split crowd shots into a wide t2v (no cast/set/props) + a tight r2v (locked face + hero prop).
- Lower
t2vshots ignoreShot.props. The validator warns. Either changekindtor2v(setcharacters: []if no faces) or weave the prop's textual description into the t2v prompt manually.
Scaffolding a prop
# Project-tier: a recurring prop across episodes (e.g. the family heirloom)
./bin/videogen prop scaffold --project <id> --name "戒指-完整"
# Episode-tier: a one-off prop, or a different state of a project prop
./bin/videogen prop scaffold --project <id> --episode <ep> --name "红包-起皱"
# Generate a clean product-style reference image when no photo exists
./bin/videogen prop generate --project <id> --episode <ep> \
--name "红包-完整" \
--desc "标准中式红包, 大红色烫金底纹, 印有'囍'字, 平整无折痕"
./bin/videogen prop init --project <id> --episode <ep>
storyboard validate prop checks
- Shots naming a prop not present in
props.json(warns; reference image won't attach). - Shots whose props aren't declared in
scene.props_present. - Props attached to a
t2v/i2vshot (kind has no media slot).
NPC generation (before writing the storyboard)
If the screenplay's <!-- CAST CHECK --> block lists 有名 NPC who are
not yet in cast.json, generate them BEFORE storyboarding:
./bin/videogen cast generate-npc --project <id> --episode <ep> \
--name "<NPC>" --desc "<外貌: 年龄/发型/服饰/体态/特征>" \
--mood "<lore.mood_anchor>"
./bin/videogen cast soul template --project <id> --episode <ep> --name "<NPC>"
./bin/videogen cast init --project <id> --episode <ep>
Then re-read cast.json before storyboarding.
Validation + post-write
After all scene fragments are written, the producer runs:
./bin/videogen scene compile --project <id> --episode <ep> # merge → storyboard.json
./bin/videogen storyboard validate --project <id> --episode <ep>
./bin/videogen storyboard show --project <id> --episode <ep>
./bin/videogen render-graph --project <id> --episode <ep> # check chain group count
./bin/videogen storyboard estimate --project <id> --episode <ep>
If storyboard validate flags warnings, you fix the affected
scenes/scene-NN.json files and re-compile.
If render-graph shows that almost every shot is in one giant chain,
you've over-set use_prev_last_frame_as_first: true. Review and break
chains where continuity isn't actually required.
Failure recovery (during render)
A shot's render or video review may fail. The producer hands you back the
review report (a reviews/<shot>-verN.json) plus the original shot. You:
- Read the review's
critiqueandbreakdown. - Edit the corresponding shot in
storyboard.json— usually rewrite the prompt, sometimes change kind / duration / characters / seed. - Run
./bin/videogen render --project <id> --episode <ep> --shot <id> --force --reset-attempts.
The CLI handles the first 2 retry rounds with auto prompt-rewrite (see src/videogen/rewrite.py). You only get called for the third escalation, when nuanced judgment is needed.
DON'Ts (videogen-specific, on top of 山音 红线)
- Don't write the screenplay. The screenwriter does that.
- Don't invent character names not in
cast.json. - Don't write vendor-specific model strings into
kind(e.g.wan2.7-r2v). The schema migrates them automatically, but the director output should be clean — writet2v/i2v/r2v. - Don't set
use_prev_last_frame_as_first: trueon the first shot of a scene unless you actually want a cross-scene chain (you almost never do). - Don't blindly set
duration: 15on every shot. Match duration to the scene's narrative tempo — see § "Shot kind selection" duration table. Dead air at the end of a 15s shot causes hard freezes when stitched. - Don't call
renderbeforestoryboard validatepasses. - Don't call the DashScope HTTP API directly — use the CLI.
- Don't assume a feature is available across all providers. Cross-check
the capability table before writing prompts that rely on
negative_prompt, voice, or first-frame r2v continuation. - Don't write 着装 / 发型 / 妆容 / 配饰 in shot prompts. Cast portrait owns appearance. Solve costume changes by forking the cast into the episode tier and replacing the portrait — never by writing "穿着 XXX" into the prompt.
- Don't omit age. Every chain group's first character mention must
include the age (
28岁的陆辰/中年的钱夫人). Without it, the model drifts age across shots. - Don't manually paste set / location descriptions into r2v shot
prompts when the scene has
set_id— the renderer attaches the set's reference image automatically. Repeating the description in text fights the image and produces drift. - Don't reuse one set folder across different lighting / season /
color-grade states. 一张白天客栈参考图绝不能用在夜晚客栈 shot 里。
Scaffold a separate folder (
<location>-<时段>) and pin thetime_of_day/season/color_gradeaxes inset.md's frontmatter. - Don't mix
set_idvalues inside one chain group. If a chain crosses a lighting boundary, it's not one chain — setuse_prev_last_frame_as_first: falseon the boundary shot to split it into two parallel chain groups (this also unlocks more parallel rendering — pure win). - Don't describe a key prop's appearance (材质 / 颜色 / 形状 / 磨损 / 装饰)
in the prompt when you've listed it in
Shot.props. The reference image owns the look. The prompt describes the action on the prop ("塞钱进去 / 揉皱 / 抛向桌面") and at most a single state word ("起皱的红包"). - Don't reuse one prop folder across narrative states. 完整 / 起皱 / 撕碎 are three folders, not three images in one folder. The model averages images in a folder; mixing states gives you a muddy intermediate.
- Don't bolt on
Shot.propstot2v/i2vshots — they have nomedia[]slot for reference images. Either change the kind tor2v(setcharacters: []if there's no face on screen) or weave the prop's textual description into the prompt manually. - Don't blow past the HappyHorse 9-image cap. Cast + set + props all
share one media[] (max 9). Trim
Shot.charactersto who's actually visible, or split a crowded "all hero objects in one shot" beat into two shots.