name: subtitle-craft description: Create, translate, repair, and burn subtitles through the Subtitle Craft plugin. risk_class: mutating_scoped
Subtitle Craft Skill Definition
1. Trigger Scenarios
Use this skill when the user wants to:
- 自动给视频/音频生成字幕(听写)
- 把已有 SRT 翻译为另一种语言(单语 or 双语字幕)
- 修复时间轴对齐 / 断行重排 / 重叠问题
- 把字幕烧入视频(软字幕 sidecar 或硬烧 in-stream)
- 区分多说话人 / 把 SPEAKER_xx 自动映射成角色名
- 估算「这段视频做字幕大概要花多少 ¥」
Keywords (zh): 字幕, 听写, 转录, 翻译字幕, 字幕修复, 字幕烧制, 双语字幕, 角色识别, 说话人分离 Keywords (en): subtitle, caption, transcribe, translate srt, burn subtitle, diarization, speaker
2. Command Reference (4 tools)
| Tool | Purpose |
|---|---|
subtitle_craft_create |
Create a subtitle task (mode = auto_subtitle / translate / repair / burn) |
subtitle_craft_status |
Inspect a single task's status, pipeline step, error_kind |
subtitle_craft_list |
List recent tasks (default 10) |
subtitle_craft_cancel |
Cooperative cancel of a running task |
v1.0 explicitly does NOT declare any
subtitle_craft_handoff_*tool — cross-plugin dispatch is deferred to v2.0 with schema reservation only.
3. Input Schema
subtitle_craft_create
{
"mode": "auto_subtitle | translate | repair | burn",
"source_path": "/abs/path/to/video.mp4", // for auto_subtitle / burn
"srt_path": "/abs/path/to/input.srt", // for translate / repair / burn
"source_lang": "", // empty = auto-detect (Paraformer language_hints)
"target_lang": "en", // for translate
"translation_model": "qwen-mt-flash", // qwen-mt-flash | qwen-mt-plus | qwen-mt-lite
"diarization_enabled": false, // auto_subtitle only
"speaker_count": 0, // 0 = auto
"character_identify_enabled": false, // requires diarization_enabled=true
"disfluency_removal_enabled": false, // remove um/uh/嗯/啊
"bilingual": false, // translate only — keep both lines
"subtitle_style": "default", // default | cinema | youtube | tiktok | tv | <custom_id>
"burn_engine": "ass", // ass | html
"burn_mode": "hard" // hard (in-stream) | soft (mp4 sidecar)
}
subtitle_craft_status
{ "task_id": "abc123def456" }
subtitle_craft_cancel
{ "task_id": "abc123def456" }
4. Output Schema
Task envelope (returned by create / get / list elements)
{
"id": "abc123def456",
"mode": "auto_subtitle",
"status": "pending|running|succeeded|failed|canceled",
"pipeline_step": "setup_environment|estimate_cost|prepare_assets|asr_or_load|identify_characters|translate_or_repair|render_output|burn_or_finalize",
"progress": 0.42,
"source_path": "...",
"output_srt_path": "...",
"output_vtt_path": "...",
"output_video_path": "...",
"error_kind": "network|timeout|auth|quota|moderation|dependency|format|duration|unknown",
"error_message": "...",
"error_hints": "..."
}
5. Mode Selection Heuristics (when the user is vague)
| User says | Choose mode | Notes |
|---|---|---|
| "给这段视频加字幕" / "transcribe video" | auto_subtitle |
If the user mentions multiple people, also turn on diarization_enabled |
| "把字幕翻译成英文" / "translate srt to english" | translate |
Default translation_model=qwen-mt-flash (best value) |
| "字幕重叠了" / "时间轴乱" / "fix overlap" | repair |
No API cost — pure local rewrite |
| "烧制" / "硬字幕" / "burn subtitle" | burn |
Default burn_engine=ass (FFmpeg native) — switch to html only when the user wants custom CSS |
| 提到 "区分谁在说话" / "speaker" | auto_subtitle + diarization_enabled=true |
Add character_identify_enabled=true if the user wants character names instead of SPEAKER_xx |
6. Cost Hints (for setting expectations before invoking)
- Paraformer-v2: ¥0.00027 / second of audio (~¥1 / hour)
- Qwen-MT Flash: ¥0.000007 / token (~¥0.005 / 1k tokens)
- Qwen-MT Plus: ¥0.000016 / token
- Qwen-MT Lite: ¥0.000003 / token
- Qwen-VL Max (character ID): one call per task, ¥0.020 typical
- Local burn (FFmpeg/Playwright): free
A 30-min YouTube → English subtitles round-trip is usually ¥0.50–¥0.80.
7. Error Handling Contract (9 canonical kinds)
The error_kind field always falls into one of these 9 values, identical
to clip-sense's taxonomy:
error_kind |
What to tell the user |
|---|---|
network |
Suggest checking VPN / DNS, then retry |
timeout |
Increase Paraformer timeout in Settings, then retry |
auth |
API key wrong / 4xx → open Settings |
quota |
Bailian balance / quota exhausted |
moderation |
Content flagged → ask the user to trim sensitive parts |
dependency |
FFmpeg / Playwright missing — HTML burn auto-falls back to ASS |
format |
Bad SRT encoding (must be UTF-8) / unsupported video container |
duration |
File >2 GB or audio >12 h — ask user to split first |
unknown |
Surface the raw error_message and link to logs |
8. SSE Live Updates
The plugin emits a single event name task_update with payload:
{
"task_id": "...",
"status": "...",
"mode": "...",
"pipeline_step": "...",
"progress": 0.42,
"error_kind": "...",
"error_message": "...",
"error_hints": "..."
}
Subscribe via onEvent("task_update", handler) — the plugin host fans
this out to the iframe via bridge:event.
9. v1.0 Scope Limit (read carefully before suggesting v2 features)
v1.0 ships without:
/handoff/*routes → no cross-plugin dispatch surfacesubtitle_craft_handoff_*tools → only the 4 above- "Send to clip-sense" / "送往 …" UI buttons → no cross-plugin CTA
assets_bus.write/tasks.origin_*fills → schema is reserved but always NULL in v1.0
When the user asks "can it pipe into clip-sense?" answer with: "v1.0
exports SRT/VTT files; v2.0 will add a one-click Handoff. For now you
can manually re-upload the SRT into clip-sense's burn_subtitle mode."
10. Quick Reference Recipe
// Pure transcription, Chinese, no diarization
{ "mode": "auto_subtitle", "source_path": "/x.mp4", "source_lang": "zh" }
// Transcription + diarization + character ID (the only place the toggle
// is active — embedded inside auto_subtitle, not a standalone mode)
{ "mode": "auto_subtitle", "source_path": "/x.mp4",
"diarization_enabled": true, "character_identify_enabled": true,
"speaker_count": 3 }
// Translate an existing SRT to English, bilingual output
{ "mode": "translate", "srt_path": "/x.srt", "target_lang": "en",
"bilingual": true, "translation_model": "qwen-mt-flash" }
// Repair a glitchy SRT (no API cost)
{ "mode": "repair", "srt_path": "/x.srt" }
// Burn an existing SRT into video using ffmpeg ASS (recommended)
{ "mode": "burn", "source_path": "/x.mp4", "srt_path": "/x.srt",
"burn_engine": "ass", "burn_mode": "hard", "subtitle_style": "youtube" }
// AI Hook Picker (v1.1) — Qwen-Plus selects an opening hook
{ "mode": "hook_picker", "srt_path": "/x.srt",
"instruction": "find the strongest opening line",
"target_duration_sec": 12, "hook_model": "qwen-plus" }
11. AI Hook Picker (mode hook_picker, added in v1.1)
A 5th processing mode that runs an existing SRT through Qwen-Plus to
pick the strongest opening "hook" dialogue for short-form video. The
algorithm is ported 1:1 from CutClaw's
Screenwriter_scene_short.py and lives in the dedicated
subtitle_hook_picker.py module — the prompt (SELECT_HOOK_DIALOGUE_PROMPT),
the fuzzy-match threshold (0.55) and the 3-window-with-2-retries loop
are red lines: do NOT change them without re-validating against
CutClaw output.
Inputs (extends CreateTaskBody)
| field | type | default | meaning |
|---|---|---|---|
srt_path |
str | "" | Required when from_task_id is empty |
from_task_id |
str | "" | Reuse another task's output_srt_path |
instruction |
str | "" | Free-form direction for the AI |
main_character |
str | "" | Constrain selection to this speaker |
target_duration_sec |
float | 12.0 | Hook length target (6–30) |
prompt_window_mode |
str | "tail_then_head" |
or "random_window" |
random_window_attempts |
int | 3 | 1–5; cost scales linearly |
hook_model |
str | "qwen-plus" |
Or qwen-plus-2025-09-11 / qwen-max |
Outputs
task_dir/hook.srt— single-cue SRT for the chosen window.task_dir/hook.json—{ hook: {...}, telemetry: {...} }payload (lines, timed_lines, source_start/_end, duration_seconds, selected_window, selected_attempt, reason).GET /tasks/{task_id}enriches succeeded hook tasks withhook+hook_telemetryso the UI right-paneHookResultPanelrenders without an extra fetch.GET /library/hooksaggregates every succeeded hook task ({items: [...], total: N}).
Skipped pipeline steps
hook_picker declares skip_steps = ("prepare_assets", "identify_characters", "translate_or_repair", "burn_or_finalize").
ASR is NOT skipped; instead _step_asr_or_load short-circuits to
_load_srt_input (the SRT must contain ≥5 cues — fewer raises
PipelineError(kind="format")).
Cost guard
Server-side estimate_cost bills qwen-plus (¥0.005/round) ×
(2 + random_window_attempts) rounds; a typical run costs < ¥0.01.
The UI surfaces this estimate in the right-pane oa-preview-card
before the user clicks Start.