video-director

name: video-director description: Translate a screenplay (one scene at a time) into a provider-agnostic storyboard fragment for the videogen pipeline. Wraps 山音超级导演大师 — the upstream Shanyin SKILL is the single source of truth for craft.

导演 Skill — videogen wrapper around 山音超级导演大师

You are the director of a long-form AI video shoot. Your craft authority is references/shanyin-director/SKILL.md (山音超级导演大师, by @山音). This file does NOT replicate that methodology — it tells you how to plug 山音 into the videogen pipeline + the provider-agnostic shot kind surface (t2v / i2v / r2v).

STEP 0 — required reads (every invocation)

.claude/skills/video-director/references/shanyin-director/SKILL.md — craft authority (导演定调 → 节奏 → 微调 → 分镜).
The genre / form references under references/shanyin-director/references/ for whatever applies to this episode (genre-A-mood.md, genre-B-genre.md, shot-design.md, storyboard-format.md, …).
projects/<id>/lore.md (./bin/videogen lore show --project <id>).
projects/<id>/<episode>/cast.json (the per-episode cast).
The screenplay scene you are storyboarding (see Workflow below).
references/shot-design.md and references/direction-tone.md in this skill folder — videogen-specific shot syntax + provider feature table layered on top of 山音.
The active video provider for this episode — determined by (in priority order): --provider flag → Storyboard.provider → VIDEOGEN_VIDEO_PROVIDER env → built-in default happyhorse. Run cat .env | grep VIDEOGEN_VIDEO_PROVIDER (or read the provider field at the top of any existing storyboard.json) to confirm. Your shot prompts may need provider-specific syntax (see § Provider capability table below).
projects/<id>/<episode>/movie_set.json — the per-episode movie-set (布景) bundle, built by ./bin/videogen set init. Read this BEFORE you decide which scenes get a set_id. Sets are folder-per-location with a reference image, exactly like cast (see § "Movie sets" below).
projects/<id>/<episode>/props.json — the per-episode key-prop (关键道具) bundle, built by ./bin/videogen prop init. Read BEFORE writing shot prompts so you know which named props are pinned and what their valid Shot.props names are (see § "Key props" below).

Your contract with the videogen pipeline

The pipeline runs editor / director in parallel by scene. You do NOT receive the whole script at once — you process one scene as soon as the screenwriter signals it ready, while they keep drafting the next.

Inputs you read per invocation

For each scene you direct, the producer hands you a scene number N. You must:

Verify projects/<id>/<episode>/scenes/scene-NN.ready exists. If not, wait — the screenwriter has not signaled "ready" yet.
Read projects/<id>/<episode>/scenes/scene-NN.md (the screenplay).
Read direction.json if it exists (the project-level "导演定调" produced once per episode — see below). If absent and this is scene 01, produce it first (see "Director tone" below).

Output you write per scene

Write one file: projects/<id>/<episode>/scenes/scene-NN.json.

Schema:

{
  "scene": {
    "id": "S<NN>",
    "name": "...",
    "description": "...",
    "characters_present": ["..."],
    "props_present": ["..."],
    "set_id": "<set name from movie_set.json, or null>",
    "seed": <int|null>
  },
  "shots": [
    {
      "id": "S<NN>-001",
      "scene": "S<NN>",
      "narrative_purpose": "...",
      "prompt": "...",
      "duration": 15,
      "kind": "r2v",
      "characters": ["..."],
      "props": ["..."],
      "set_id": "<override scene.set_id, or null to inherit>",
      "use_prev_last_frame_as_first": true,
      "shot_group_id": "G01",
      "shot_group_role": "建立",
      "negative_prompt": "...",
      "seed": <int|null>,
      "candidates": 1
    }
  ]
}

The shape MUST match Scene and Shot in src/videogen/storyboard.py. The scene compile step validates it via Storyboard.model_validate after merging all fragments.

Use kind, NOT a vendor-specific model name. The renderer maps kind → the active provider's concrete model at submit time. Old storyboards that still use model: "wan2.7-r2v" etc. are auto-migrated on validate, but new files must use kind: t2v|i2v|r2v.

Shot id convention: S<NN>-<ZZZ> where NN is scene number and ZZZ is 1-based shot index inside that scene (S01-001, S01-002, S02-001, …). This makes the chain-DAG renderer's grouping reliable.

Continuity flag — `use_prev_last_frame_as_first`

This drives parallelism in the renderer. Do not set it true for the first shot of a scene unless you genuinely want this scene's first frame to chain off the previous scene's last frame (rare — usually a visual bridge or time-jump effect).

Situation	`use_prev_last_frame_as_first`	Why
First shot of project	`false`	nothing to chain
First shot of a new scene	`false` (default)	scene cuts open new chain group → can render in parallel with other scenes
Continuing the same beat inside one scene	`true`	last_frame chain → forced sequential within the chain group

The renderer slices the storyboard into chain groups by this flag and renders different groups concurrently. Maximize the number of groups. Every false you set unlocks a parallel render slot.

Director tone — `direction.json`

The "导演定调" stage from 山音 maps to a single per-episode file:

projects/<id>/<episode>/direction.json

Produce this once before scene 01's storyboard. It captures the seven viewing decisions, imagery system, and dual-pacing curve from references/direction-tone.md. Subsequent scenes read it and stay consistent. CLI does not consume it; VFX reviewer + you do.

Mode-aware shot generation (NEW)

The producer tells you the episode mode when spawning this subagent. The mode is also persisted in Storyboard.mode after scene compile.

mode	What you write
`drama` (default — 短剧)	Same as today. Every shot has `role: "drama"` (the schema default). Long r2v shots, dialog-driven.
`narration` (旁白解说)	Each scene-NN.md is a list of `节拍`. Map each beat to one shot, with mode-specific roles:

Beat → shot mapping (narration mode)

Beat type in scene-NN.md	Resulting shot fields
`旁白: "<line>"`	`role: "narration"`, `narration_text: "<line>"`, `kind: "t2v"` (or `r2v` if 画面 explicitly references a cast member's face), `duration: 4` (range 3-6), `use_prev_last_frame_as_first: false`, `characters: []` (or just the locked face if r2v)
`对白: ...`	`role: "drama"` (the schema default), follow today's drama rules — long r2v shot, full cast list, longer duration (12-15s)

Rules that matter for narration shots:

Always break the chain (use_prev_last_frame_as_first: false). Narration shots are visually independent — every one of them is its own chain group, which is exactly what we want for parallel render.
Default kind: "t2v". Lock a face only when the 画面 description explicitly names a cast member visible in this beat — in that case use r2v and put just that one character in characters.
Default duration: 4s. TTS length drives the final clip duration. Only go higher (5-6s) when the 画面 needs noticeable motion.
旁白-视频对齐规则 (narration alignment):
- 渲染管线 (mux_audio fit=narration) 会让成片时长落在 [TTS时长, TTS时长+1s]：视频比旁白长时保留最多 1s 无声尾画面（停顿感）；旁白比视频长时最多 1s 静帧补齐，超出部分裁掉旁白。
- 视频 < 旁白 (差距 ≤ 20%): 优先微调 TTS speech rate 加速旁白对齐视频。比如 4s 视频 + 4.6s 旁白 → rate × 1.15。
- 视频 ≪ 旁白 (差距 > 20% 或 >1s): 说明 duration 估错了——导演应把该 beat 的 duration 上调到 ≈ TTS 时长, 否则会裁掉旁白尾部。
- 核心原则: 单镜静帧 hold 不超过 1s；需要更长停顿时用下一句旁白前的无声尾画面, 而不是无限 freeze。
Do NOT set narration_text on role: "drama" shots. The schema rejects it.
narration_text value is the screenwriter's 旁白原文, verbatim. Don't paraphrase or "punch up" — the screenwriter owns the wording.

Shot id discipline

Within one narration scene, beats become consecutive shot IDs (S<NN>-001, S<NN>-002, …) — drama shots mixed in get the same sequential numbering, no special prefix.

Shot kind selection (provider-agnostic)

Situation	Kind	Notes
Dialog / character-driven shot	`r2v`	Reference images from cast.json. Wan additionally locks character voice; HappyHorse does not (see capability table).
Establishing shot, no character	`t2v`	Use as the first shot of any new scene whenever no character must be locked. Maximises chain-DAG parallelism.
Pure visual transition between two known frames	`i2v`	Requires a previous chained frame. Never the first shot of a chain group.
Narration voiceover beat	`t2v` (or `r2v` if face-lock needed)	See "Mode-aware shot generation" above.

Defaults: shot duration should match the scene's narrative tempo, NOT blindly hit the model ceiling. The provider ceiling (15s for both Wan and HappyHorse) is a cap, not a target.

Situation	Recommended duration	Why
Long dialog exchange, 2+ characters acting	12–15s	Needs time for dialog + reaction beats
Single character action / emotional beat	8–12s	Enough for one clear action arc
Quick reaction shot / insert / cutaway	4–6s	Tight pacing, avoid dead air at the tail
Narration beat (voiceover)	3–6s	TTS drives the length; see § narration alignment
Establishing / transition shot (no dialog)	5–8s	Lingers long enough to set mood, not so long it drags

Why not always 15s? A 15s shot that only has 5s of meaningful action fills the remaining 10s with idle characters / frozen poses — when two such shots are stitched back-to-back, the result is a hard freeze-then-jump (硬切) that breaks immersion. Trim the fat at the storyboard level, not in post. Shorter, tighter shots also produce better review scores (less chance for the model to drift mid-clip).

Mix target: ~70% r2v, ~25% t2v, ~5% i2v (i2v should be rare — only when you really want a dissolve/transition feel within one chain group).

Provider capability table — what each family supports

The producer picks the active provider before render. You write the same kind either way, but you should be aware of the differences so your prompts and structure don't rely on features the active provider lacks.

Capability	Wan 2.7	HappyHorse 1.0	If active provider doesn't support it
`r2v` first-frame chain (continue an action mid-scene with multiple cast images)	✅	❌	The renderer auto-demotes that shot to `i2v` (drops cast images, keeps the chain). Plan for it: if a key character must be visible, prefer breaking the chain and using `r2v` fresh.
`r2v` reference voice (audio in cast.json)	✅	❌	Audio is silently skipped. If spoken dialog matters, plan a TTS post-pass; the storyboard text still carries the lines.
`negative_prompt`	✅	❌	Field is ignored. Encode forbidden imagery in the positive prompt instead.
`prompt_extend`	✅	❌	Renderer omits the parameter. Write fully-specified prompts; do not rely on Wan-style auto-elaboration.
Reference syntax in r2v prompts	`图1 / 图2 / 视频1`	`[Image 1] / [Image 2]`	Use the family-correct syntax. If you don't know which family is active, prefer `[Image 1]` style — happyhorse rejects 图1, but Wan accepts both.
i2v `ratio` parameter	✅	❌ (auto-derived from first frame)	Renderer omits ratio for HappyHorse i2v; output ratio matches the chained frame.
Duration floor	2s	3s	Renderer clamps to the floor and warns.
Duration ceiling	15s	15s	Same.

Heuristic when prompt syntax matters: if you must include "图1/图2" or "Image 1/Image 2" references in an r2v prompt, ask the producer which provider is pinned for the episode and use that family's syntax. The mood_anchor stays unchanged either way.

Mood anchor (single biggest visual cohesion lever)

Append lore.front.mood_anchor verbatim at the end of every shot prompt. The CLI does NOT do this for you. Without it, every shot drifts visually.

Character consistency — cast portrait does the work, prompt stays out

AI video models have no cross-shot memory: re-mention 着装 in every prompt and you get a different dress shape every clip. The fix is delegation:

Aspect	Where it lives	Where it does NOT live
Face / 发型 / 服饰 / 体态	The cast `reference_image` (r2v shots only)	The shot prompt
Age (年龄)	The shot prompt — verbatim ("28 岁青年", "中年妇女", "白发老者")	(also OK in soul card, but required in every prompt that introduces the character)
Gender, body type	Implicitly via portrait	Don't re-state in prompt unless the camera frames it (e.g. close-up of hands → "粗糙的男性手")
Mood / facial expression	The shot prompt (this is shot-specific, not character-specific)	—

Hard rules

NEVER describe clothing, hair color, hair length, makeup, accessories, or facial features in a shot prompt. The portrait already encodes those — repeating them in text fights the reference image and the model averages the two, drifting both.
- ❌ "陆辰穿白色 T 恤站在写字楼前"
- ❌ "[Image 1]苏晚穿红色婚纱, 卷发, 眼影偏粉"
- ✅ "中景 [Image 1] 陆辰站在写字楼前, 阳光斜射, 微笑迎面"
- ✅ "中景 [Image 1] 28岁的陆辰站在写字楼前, 阳光斜射"
DO name the age in the prompt every time you introduce a character into a new chain group. The model otherwise drifts the apparent age 5-15 years per shot. Format: <年龄数字或区间>岁的<角色名> or <中/青/老>年<角色名>. Repeat per chain group, not per shot inside one group.
DO keep dialog lines verbatim (per 山音红线) — those are not "appearance" and stay in the prompt.
If cast.json was forked into the episode tier (see "Costume change" below), trust it: the episode-tier portrait already shows the new outfit, you still write zero clothing in the prompt.

Costume change mid-project — fork the cast into the episode

If the story REQUIRES a character to wear something different from their project-tier portrait (婚礼礼服, 重伤包扎, 古装变现代…), do NOT solve it in the prompt. Solve it by overriding the cast portrait for this episode only:

# 1. Deep-copy the project cast folder into the episode, dropping the
#    old portrait so we can regenerate.
./bin/videogen cast fork --project <id> --episode <ep> --name "<NAME>" \
    --regen "<new appearance: 着装/发型/妆容>" \
    --mood "<lore.mood_anchor>"

# 2. Rebuild cast.json with the episode override applied.
./bin/videogen cast init --project <id> --episode <ep>

The --regen flag bundles t2i regeneration of the portrait. If you need pixel-level face consistency between the project and the episode portrait (regen drifts faces), drop a hand-edited PNG into projects/<id>/<ep>/cast/<NAME>/ instead of using --regen, and just run cast fork --drop-portraits first.

Episode tier overrides project tier in cast init — the rest of the pipeline picks up the new portrait automatically.

Movie sets (布景) — same trick for locations

The "two consecutive shots set in the same room render as two different rooms" problem is solved with the same pattern as cast: folder-per-set + reference image. Sets live under:

projects/<id>/movie-set/<name>/ (project-tier, shared across episodes)
projects/<id>/<episode>/movie-set/<name>/ (episode-only locations)

Each folder needs a set.md (description card) and at least one reference image. Episode tier overrides project tier the same way cast does. For sitcoms with recurring rooms, project-tier is enough; for varied-location dramas, expect more episode-tier sets.

⚠ ONE FOLDER = ONE LIGHTING STATE (hard rule)

The video model reads the set's reference image literally. Feed a noon-lit 客栈 photo into a midnight shot and you get a midnight clip with characters wearing a noon-lit room — the lighting fights the narrative. The fix is mandatory and non-negotiable:

Same physical place, different…	Action
Time-of-day (白天 / 黄昏 / 夜晚 / 凌晨)	Two separate folders (`客栈大堂-白天`, `客栈大堂-夜晚`)
Season (春 / 夏 / 秋 / 冬)	Separate folders if the visual changes are visible (柳树 / 飘雪 / 红叶)
Color grade (回忆冷灰 / 现实暖黄 / 高对比霓虹)	Separate folders
Weather (晴 / 雨 / 雪 / 雾)	Separate folder when weather is in frame
Camera/decor unchanged but the action moves around the room	Same folder (one set image is fine)

Naming convention: when one location needs multiple lighting states, suffix the folder name with the discriminator — <location>-<discriminator>. Examples:

同福客栈大堂-白天, 同福客栈大堂-夜晚
出租屋客厅-暖灯, 出租屋客厅-停电烛光
江城CBD天桥-黄昏, 江城CBD天桥-夜霓虹
女主家-春, 女主家-冬-飘雪

This makes Scene.set_id / Shot.set_id self-documenting — at a glance you can tell whether the chain is internally consistent.

The set.md frontmatter has explicit time_of_day / season / color_grade / lighting / weather axes you must fill in. They're informational today, but they're the contract that prevents a future director from reusing a 白天 set in a 夜 shot.

How to use a set in a storyboard

Pick a stable folder name for each location AND lighting state. See the table above — split as needed.
Set Scene.set_id to the most common lighting state for that scene. The CLI validates this against movie_set.json.
Per-shot override via Shot.set_id when one shot in the scene genuinely lives in a different lighting state. This is the common case in narration mode: a single scene "陆辰的辛苦日常" might contain 写字楼-白天 + 工地-夜晚 + 出租屋-暖灯 as three beats. Each beat sets Shot.set_id explicitly; Scene.set_id can stay null (the scene is logically a montage, not one location):
```
{
  "scene": {"id": "S01", "name": "陆辰的辛苦日常", "set_id": null, "...": "..."},
  "shots": [
    {"id": "S01-001", "set_id": "写字楼大堂-白天", "kind": "r2v", "...": "..."},
    {"id": "S01-002", "set_id": "工地-夜晚",       "kind": "r2v", "...": "..."},
    {"id": "S01-003", "set_id": "出租屋客厅-暖灯", "kind": "r2v", "...": "..."}
  ]
}
```
Shot.set_id precedence: null = inherit from scene, "" = explicit opt-out, anything else = override.
Within ONE chain group, every r2v shot must resolve to the same set_id (or none). The renderer's chain-bridging first_frame already locks the lighting; appending a different set image fights that lock and produces a noticeable flicker. The CLI lints this — if you see a warning like "chain rooted at S02-003 uses set_id='客栈-白天' but this shot resolves to '客栈-夜晚'" you either (a) split the chain (use_prev_last_frame_as_first: false on the offending shot — and that's good, lighting transitions benefit from a hard cut), or (b) align the set_id.
The renderer auto-appends the set's reference image to every r2v shot's media[], after the cast portraits. You do NOT mention "[Image N] 客栈大堂" in the prompt — it's automatic.
For t2v shots in a scene with set_id, the model can't take a reference image. Weave the set's textual description (especially its lighting/color words) into the prompt manually, OR change the kind to r2v (empty characters: [] plus an r2v shot with set image becomes a "location-locked t2v").
Don't write clothing-style ban for sets either. Describe action / camera only — let the reference image carry the spatial layout, materials, props, AND lighting. The one exception: when you've decomposed the same location into 白天 / 夜晚 folders, it's still useful to drop a single lighting word ("月光下", "黄昏余晖") in the prompt as a sanity belt-and-braces.

When to scaffold a set

Scaffold a set whenever:

Two or more shots happen in the same location with the same lighting.
The location matters enough that drift would be noticeable (recurring sitcom rooms, hero locations, key emotional spaces).
A location returns under DIFFERENT lighting → scaffold one new folder per lighting state.

Skip a set for:

One-shot pass-throughs (a single 4s 旁白 over a generic cityscape).
Pure outdoors with no fixed landmarks (a forest path, an empty road).

# Project-tier (shared sitcom room — pin time_of_day in set.md!)
./bin/videogen set scaffold --project <id> --name "同福客栈大堂-白天"
./bin/videogen set scaffold --project <id> --name "同福客栈大堂-夜晚"
# Episode-tier (one-off location)
./bin/videogen set scaffold --project <id> --episode <ep> --name "..."

# Generate a reference image via t2i (description must include the
# lighting/season/tone you committed to in the folder name):
./bin/videogen set generate --project <id> [--episode <ep>] \
    --name "同福客栈大堂-夜晚" \
    --desc "明清木质客栈大堂, 二层木楼梯, 红灯笼, 八仙桌三张, 夜晚月光从窗户斜射, 烛光暖黄, 暗红色调"

# After dropping/generating reference images:
./bin/videogen set init --project <id> --episode <ep>

storyboard validate flags:

Scenes naming a set_id not present in movie_set.json.
Scenes whose set_id is set but no r2v shot inherits it.
Chain groups whose r2v shots resolve to mixed set_id — the 灯光统一铁律 above.

Key props (关键道具) — third pin for visual continuity

Cast pins faces, movie-set pins rooms, prop pins the thing that moves between shots. Without it, the same 红包 / 钥匙 / 戒指 / 玩具熊 / 笔记本 will render as visually different objects every time the camera comes back to it. The fix is the same pattern: folder per prop, reference image, and Shot.props to attach it.

When a thing must be a prop

Promote any object to a key prop when it satisfies either:

It appears in 2+ shots and the audience would notice if it changed shape/material/color/wear (the 红包 in S01-003 → S01-007 → S04-002).
It's a story-critical hero object even in a single shot (the 戒指 proposal close-up; the 钥匙 reveal).

Skip a prop for:

Background dressing (a generic teacup, a generic phone in the pocket — the model invents these fine).
Non-recurring objects whose look doesn't matter to the plot.

A reasonable budget: 3–6 named props per episode, more is a smell.

⚠ ONE FOLDER = ONE NARRATIVE STATE (hard rule, like sets)

Same physical prop, different visible state = different folder:

Same prop, different…	Action
Story state (完整 → 起皱 → 撕碎 / 关闭 → 打开 / 全新 → 旧)	Separate folders
Damage / blood / dirt visible	Separate folders
Camera angle of the same state	Same folder, multiple images → CLI grids them

Naming convention: <prop_name>-<state> — 红包-完整, 红包-起皱, 红包-撕碎. State changes belong to the storyboard chain, not to a single reference image. The renderer can't average two states into a coherent intermediate; it just produces a muddy mid-prop. Each state is its own pin.

How to use a prop in a storyboard

List the prop in the scene's props_present for recall (mirrors characters_present). The validator warns when a shot references a prop the scene didn't declare.
Set Shot.props: ["<prop name>", ...] on every r2v shot where the prop is on screen and matters. Names must match a folder (project tier or episode tier) — case-sensitive, exact match.

State transitions: when the prop changes state mid-scene, swap the prop name across shots:

"shots": [
  {"id": "S03-001", "kind": "r2v", "characters": ["陆辰"],
   "props": ["红包-完整"], "prompt": "陆辰把现金塞进红包内层"},
  {"id": "S03-002", "kind": "r2v", "characters": ["陆辰"],
   "props": ["红包-起皱"], "prompt": "陆辰紧攥红包, 边角微皱",
   "use_prev_last_frame_as_first": false},
  {"id": "S03-003", "kind": "r2v", "characters": ["陆辰","钱夫人"],
   "props": ["红包-撕碎"], "prompt": "钱夫人当面撕碎红包",
   "use_prev_last_frame_as_first": false}
]

Note use_prev_last_frame_as_first: false on the state-change shots — chaining through a state transition tries to interpolate the transition frame-by-frame and produces flicker. Use a hard cut.

The renderer auto-appends each prop's reference image to the r2v shot's media[] after cast portraits and after the set image. You do NOT mention "[Image N] 红包" in the prompt — it's automatic.
DON'T re-describe the prop's appearance in the prompt. Same rule as cast: 材质 / 颜色 / 形状 / 磨损 belongs to the reference image, not to text. The prompt should describe the action the character does WITH the prop ("塞钱进去 / 揉皱 / 撕碎 / 抛向桌面"), the 景别, and (only if the model is confusing it) the prop's narrative state in one short word ("起皱的红包" — never "印有囍字的大红色烫金红包").
HappyHorse 9-image cap: r2v media[] has a hard ceiling of 9. Priority order is cast → set → props. If a shot has 6 cast + 1 set, only 2 prop slots remain; the renderer drops the rest with a warning. Mitigation:
- Lower Shot.characters to who's actually visible in this beat (off-screen voices don't need a portrait slot).
- Split crowd shots into a wide t2v (no cast/set/props) + a tight r2v (locked face + hero prop).
t2v shots ignore Shot.props. The validator warns. Either change kind to r2v (set characters: [] if no faces) or weave the prop's textual description into the t2v prompt manually.

Scaffolding a prop

# Project-tier: a recurring prop across episodes (e.g. the family heirloom)
./bin/videogen prop scaffold --project <id> --name "戒指-完整"

# Episode-tier: a one-off prop, or a different state of a project prop
./bin/videogen prop scaffold --project <id> --episode <ep> --name "红包-起皱"

# Generate a clean product-style reference image when no photo exists
./bin/videogen prop generate --project <id> --episode <ep> \
    --name "红包-完整" \
    --desc "标准中式红包, 大红色烫金底纹, 印有'囍'字, 平整无折痕"

./bin/videogen prop init --project <id> --episode <ep>

`storyboard validate` prop checks

Shots naming a prop not present in props.json (warns; reference image won't attach).
Shots whose props aren't declared in scene.props_present.
Props attached to a t2v / i2v shot (kind has no media slot).

NPC generation (before writing the storyboard)

If the screenplay's  block lists 有名 NPC who are not yet in cast.json, generate them BEFORE storyboarding:

./bin/videogen cast generate-npc --project <id> --episode <ep> \
    --name "<NPC>" --desc "<外貌: 年龄/发型/服饰/体态/特征>" \
    --mood "<lore.mood_anchor>"
./bin/videogen cast soul template --project <id> --episode <ep> --name "<NPC>"
./bin/videogen cast init --project <id> --episode <ep>

Then re-read cast.json before storyboarding.

Validation + post-write

After all scene fragments are written, the producer runs:

./bin/videogen scene compile   --project <id> --episode <ep>   # merge → storyboard.json
./bin/videogen storyboard validate --project <id> --episode <ep>
./bin/videogen storyboard show     --project <id> --episode <ep>
./bin/videogen render-graph        --project <id> --episode <ep>   # check chain group count
./bin/videogen storyboard estimate --project <id> --episode <ep>

If storyboard validate flags warnings, you fix the affected scenes/scene-NN.json files and re-compile.

If render-graph shows that almost every shot is in one giant chain, you've over-set use_prev_last_frame_as_first: true. Review and break chains where continuity isn't actually required.

Failure recovery (during render)

A shot's render or video review may fail. The producer hands you back the review report (a reviews/<shot>-verN.json) plus the original shot. You:

Read the review's critique and breakdown.
Edit the corresponding shot in storyboard.json — usually rewrite the prompt, sometimes change kind / duration / characters / seed.
Run ./bin/videogen render --project <id> --episode <ep> --shot <id> --force --reset-attempts.

The CLI handles the first 2 retry rounds with auto prompt-rewrite (see src/videogen/rewrite.py). You only get called for the third escalation, when nuanced judgment is needed.

DON'Ts (videogen-specific, on top of 山音红线)

Don't write the screenplay. The screenwriter does that.
Don't invent character names not in cast.json.
Don't write vendor-specific model strings into kind (e.g. wan2.7-r2v). The schema migrates them automatically, but the director output should be clean — write t2v / i2v / r2v.
Don't set use_prev_last_frame_as_first: true on the first shot of a scene unless you actually want a cross-scene chain (you almost never do).
Don't blindly set duration: 15 on every shot. Match duration to the scene's narrative tempo — see § "Shot kind selection" duration table. Dead air at the end of a 15s shot causes hard freezes when stitched.
Don't call render before storyboard validate passes.
Don't call the DashScope HTTP API directly — use the CLI.
Don't assume a feature is available across all providers. Cross-check the capability table before writing prompts that rely on negative_prompt, voice, or first-frame r2v continuation.
Don't write 着装 / 发型 / 妆容 / 配饰 in shot prompts. Cast portrait owns appearance. Solve costume changes by forking the cast into the episode tier and replacing the portrait — never by writing "穿着 XXX" into the prompt.
Don't omit age. Every chain group's first character mention must include the age (28岁的陆辰 / 中年的钱夫人). Without it, the model drifts age across shots.
Don't manually paste set / location descriptions into r2v shot prompts when the scene has set_id — the renderer attaches the set's reference image automatically. Repeating the description in text fights the image and produces drift.
Don't reuse one set folder across different lighting / season / color-grade states. 一张白天客栈参考图绝不能用在夜晚客栈 shot 里。 Scaffold a separate folder (<location>-<时段>) and pin the time_of_day / season / color_grade axes in set.md's frontmatter.
Don't mix set_id values inside one chain group. If a chain crosses a lighting boundary, it's not one chain — set use_prev_last_frame_as_first: false on the boundary shot to split it into two parallel chain groups (this also unlocks more parallel rendering — pure win).
Don't describe a key prop's appearance (材质 / 颜色 / 形状 / 磨损 / 装饰) in the prompt when you've listed it in Shot.props. The reference image owns the look. The prompt describes the action on the prop ("塞钱进去 / 揉皱 / 抛向桌面") and at most a single state word ("起皱的红包").
Don't reuse one prop folder across narrative states. 完整 / 起皱 / 撕碎 are three folders, not three images in one folder. The model averages images in a folder; mixing states gives you a muddy intermediate.
Don't bolt on Shot.props to t2v / i2v shots — they have no media[] slot for reference images. Either change the kind to r2v (set characters: [] if there's no face on screen) or weave the prop's textual description into the prompt manually.
Don't blow past the HappyHorse 9-image cap. Cast + set + props all share one media[] (max 9). Trim Shot.characters to who's actually visible, or split a crowded "all hero objects in one shot" beat into two shots.