name: video-script description: > Write a timestamped Chinese narration script (解说词 / 旁白) for an already-analyzed video, then lint/validate it. Use after video-understanding has produced agent_narration_brief.md + vlm_analysis.json, when you need to author the recap narration (style, anti-hallucination, 字数公式, density, hook/throughline). Input: the understanding index in work_dir. Output: narration.json (validated). 触发词: 解说词, 写解说, 视频旁白, narration script, 写稿, 解说文案.
What this does
Authoring + validation of the narration script. The agent writes work_dir/narration.json
following the rules below; then validate.py lints it against the understanding index, and in
full mode time-aligns it to quiet windows.
Step 1 — read the brief
Read work_dir/agent_narration_brief.md (scenes, durations, quiet windows, char budget) first.
Digest long dialogue via asr_writing_chunks.json; judge "is there speech/a silent slot here?"
via timeline_fusion.json. Check raw vlm_analysis.json / asr_result.json for details.
All timestamps are original-video time.
Step 2 — write narration.json
[
{"start": 5.0, "end": 12.0, "narration": "解说文本。", "pause_after_ms": 250, "overlaps_speech": true}
]
| Field | Meaning |
|---|---|
start / end |
narration start/end seconds (original-video time) |
narration |
narration text |
pause_after_ms |
pause after segment, default 250 (keeps a tight rhythm) |
overlaps_speech |
overlaps original dialogue; default true for continuous-bed style, false only in true silence |
Optionally also author original_subtitles.json — [{start,end,text}] (OUTPUT time) — the calibrated
original dialogue burned during the original-audio gaps (ASR errors/names fixed, only what is actually
spoken there). Rendered in 「」 to set it apart from narration. If omitted, assemble falls back to a
conservative auto-ASR mapping. See the brief's 原声留白字幕 section.
写作规则(连续原声铺底的高密度 recap)
- 连续解说:沿整条时间轴用短促 beat 连续解说,原声作为压低的背景一直存在。
- 达到密度目标:按 brief 头部给出的目标(约 9.6 段/分钟,最低 6.24),相邻 beat 间隔不要超过 11 秒。
- 默认重叠原声:
overlaps_speech默认 true;只有刻意放进真正静音空档的 beat 才设 false。 - 每段短小:约一句短句(1-2 行字幕),宁短不长。字数 ≤
(end - start - 0.25) × 3。 - 不要看图说话:观众看得见动作表情,解说讲动机、关系、潜台词和剧情意义(基于画面可见的证据,不要编造)。
- 用已知角色名:
--context或background_research.json给出角色名时优先使用。 - 完整句子:以句号 / 问号 / 感叹号结束,不写半句。
解说手法(区分“解说”和“字幕”)
- 钩子:开头 1-2 个 beat 制造悬念或利害,不是交代场景。
- 主线:选一条主线(目标 / 关系 / 悬念),每个 beat 都推进它。
- 递进:信息和张力逐步升级。悬念缺口:提前埋后果,后面回收。
- 收尾:最后 1-2 个 beat 给出结果或反转,不要泛泛收场。
- 衔接:解说块和它旁边的原声留白是同一个 beat——块尾给原声铺垫,下一块承接原声刚呈现的内容,别和原声各说各的。
- 给信息而非念画面;去废词:用具体名词动词,删空泛形容。
Step 2.5 — review GATE (advisory, logged, overridable)
A separate quality pass (LLM-as-judge), distinct from the mechanical lint below. Needs the chat API key (same as VLM).
- Run:
python3 scripts/review.py --work-dir <work_dir> - Open
narration_review.md. For everyerrorfinding (ESPECIALLYcategory=hallucination— a claim not grounded in the visual/ASR evidence), revisenarration.jsonand re-run review until either:- (a) verdict ==
OKwith zeroerrorfindings, OR - (b) you consciously OVERRIDE a remaining finding (next step).
- (a) verdict ==
- To OVERRIDE: append a block to
work_dir/narration_review_override.mdnaming WHICH finding (segment + category), WHY it is acceptable, and who signed off. Unaddressederrorfindings with no override entry mean the draft is NOT ready. - Only then proceed to Step 3 (
validate.py— the hard gate).
GATE rule: review NEVER blocks the tooling (it leans on a flaky chat API and a re-render is cheap).
validate.py is the deterministic hard gate. The override log makes "we saw the finding and chose to
ship it" auditable — review.py / validate.py never read it; it is a record for the human in the loop.
Override block shape — work_dir/narration_review_override.md (append-only):
## Override — <date>
- Finding: segment 4 / category=hallucination
- Reviewer said: "‘他早已知情’无画面/对白依据"
- Decision: KEEP — grounded in the --context synopsis (s2 reveal); reviewer lacked that context.
- Signed: <agent/human>
Step 3 — validate
python3 scripts/validate.py --work-dir <work_dir> --mode full # or --mode cut
Writes narration_lint.json; in full mode rewrites narration.json with quiet-window alignment.
Fix any lint errors and re-run until clean.
Cut mode (long video → short recap)
Before narration, write work_dir/clip_plan.json (original-time source ranges to keep), optionally
self-review it in clip_plan_review.md (agent-only; the tooling does not read it), then write
narration.json with timestamps that fall inside the kept clips. The video-cut skill maps both to
the shortened timeline. Validate with --mode cut.
{"target_duration": "10m", "clips": [{"start": 12.0, "end": 38.0, "reason": "冲突开端"}]}
剪辑模式写作要点(解说要对上剪后的画面,不是原片):
- 按成片时长写:解说的 beat 数量对齐目标成片(brief 头部已按目标时长算好),不是原片时长。中点落在任何保留片段之外的 beat 会被丢弃——别在被剪掉的段落上写解说。
- 每段落在单个片段内:一个 beat 的
[start, end]不要跨片段边界,否则会被裁到片段长度,配音就会念到剪掉的画面(--mode cut会以crosses_clip_boundary警告)。 - 按成片顺序讲:beat 按它们所属片段在成片里出现的先后排序,让解说在剪后画面上连成一条线。
- 重复源区间:clips 复用了重叠的源区间时,给 beat 标
source_clip_id,确保映射到正确片段。
片名/题材明确但缺乏剧情上下文时,先按 背景调研指南 写
background_research.json再写解说——否则解说只能"看图说话"。brief 在 substrate 偏薄时会把密度目标降为上限而非配额:宁可少写、写实,也不要为凑数堆画面描述。
What this skill does NOT do
- Does NOT run ASR/VLM or analyze the video — it consumes the understanding index.
- Does NOT synthesize audio or render video.
review.pydoes NOT edit narration.json and does NOT block the pipeline — it is advisory.validate.pydoes NOT rewrite the meaning of the text — it only checks/aligns timing and quiet windows.