subtitle-craft

star 1.8k

Create, translate, repair, and burn subtitles through the Subtitle Craft plugin.

openakita By openakita schedule Updated 6/13/2026

name: subtitle-craft description: Create, translate, repair, and burn subtitles through the Subtitle Craft plugin. risk_class: mutating_scoped

Subtitle Craft Skill Definition

1. Trigger Scenarios

Use this skill when the user wants to:

  • 自动给视频/音频生成字幕(听写)
  • 把已有 SRT 翻译为另一种语言(单语 or 双语字幕)
  • 修复时间轴对齐 / 断行重排 / 重叠问题
  • 把字幕烧入视频(软字幕 sidecar 或硬烧 in-stream)
  • 区分多说话人 / 把 SPEAKER_xx 自动映射成角色名
  • 估算「这段视频做字幕大概要花多少 ¥」

Keywords (zh): 字幕, 听写, 转录, 翻译字幕, 字幕修复, 字幕烧制, 双语字幕, 角色识别, 说话人分离 Keywords (en): subtitle, caption, transcribe, translate srt, burn subtitle, diarization, speaker

2. Command Reference (4 tools)

Tool Purpose
subtitle_craft_create Create a subtitle task (mode = auto_subtitle / translate / repair / burn)
subtitle_craft_status Inspect a single task's status, pipeline step, error_kind
subtitle_craft_list List recent tasks (default 10)
subtitle_craft_cancel Cooperative cancel of a running task

v1.0 explicitly does NOT declare any subtitle_craft_handoff_* tool — cross-plugin dispatch is deferred to v2.0 with schema reservation only.

3. Input Schema

subtitle_craft_create

{
  "mode": "auto_subtitle | translate | repair | burn",
  "source_path": "/abs/path/to/video.mp4",      // for auto_subtitle / burn
  "srt_path":    "/abs/path/to/input.srt",      // for translate / repair / burn
  "source_lang": "",                             // empty = auto-detect (Paraformer language_hints)
  "target_lang": "en",                           // for translate
  "translation_model": "qwen-mt-flash",          // qwen-mt-flash | qwen-mt-plus | qwen-mt-lite
  "diarization_enabled": false,                  // auto_subtitle only
  "speaker_count": 0,                            // 0 = auto
  "character_identify_enabled": false,           // requires diarization_enabled=true
  "disfluency_removal_enabled": false,           // remove um/uh/嗯/啊
  "bilingual": false,                            // translate only — keep both lines
  "subtitle_style": "default",                   // default | cinema | youtube | tiktok | tv | <custom_id>
  "burn_engine": "ass",                          // ass | html
  "burn_mode": "hard"                            // hard (in-stream) | soft (mp4 sidecar)
}

subtitle_craft_status

{ "task_id": "abc123def456" }

subtitle_craft_cancel

{ "task_id": "abc123def456" }

4. Output Schema

Task envelope (returned by create / get / list elements)

{
  "id": "abc123def456",
  "mode": "auto_subtitle",
  "status": "pending|running|succeeded|failed|canceled",
  "pipeline_step": "setup_environment|estimate_cost|prepare_assets|asr_or_load|identify_characters|translate_or_repair|render_output|burn_or_finalize",
  "progress": 0.42,
  "source_path": "...",
  "output_srt_path": "...",
  "output_vtt_path": "...",
  "output_video_path": "...",
  "error_kind": "network|timeout|auth|quota|moderation|dependency|format|duration|unknown",
  "error_message": "...",
  "error_hints": "..."
}

5. Mode Selection Heuristics (when the user is vague)

User says Choose mode Notes
"给这段视频加字幕" / "transcribe video" auto_subtitle If the user mentions multiple people, also turn on diarization_enabled
"把字幕翻译成英文" / "translate srt to english" translate Default translation_model=qwen-mt-flash (best value)
"字幕重叠了" / "时间轴乱" / "fix overlap" repair No API cost — pure local rewrite
"烧制" / "硬字幕" / "burn subtitle" burn Default burn_engine=ass (FFmpeg native) — switch to html only when the user wants custom CSS
提到 "区分谁在说话" / "speaker" auto_subtitle + diarization_enabled=true Add character_identify_enabled=true if the user wants character names instead of SPEAKER_xx

6. Cost Hints (for setting expectations before invoking)

  • Paraformer-v2: ¥0.00027 / second of audio (~¥1 / hour)
  • Qwen-MT Flash: ¥0.000007 / token (~¥0.005 / 1k tokens)
  • Qwen-MT Plus: ¥0.000016 / token
  • Qwen-MT Lite: ¥0.000003 / token
  • Qwen-VL Max (character ID): one call per task, ¥0.020 typical
  • Local burn (FFmpeg/Playwright): free

A 30-min YouTube → English subtitles round-trip is usually ¥0.50–¥0.80.

7. Error Handling Contract (9 canonical kinds)

The error_kind field always falls into one of these 9 values, identical to clip-sense's taxonomy:

error_kind What to tell the user
network Suggest checking VPN / DNS, then retry
timeout Increase Paraformer timeout in Settings, then retry
auth API key wrong / 4xx → open Settings
quota Bailian balance / quota exhausted
moderation Content flagged → ask the user to trim sensitive parts
dependency FFmpeg / Playwright missing — HTML burn auto-falls back to ASS
format Bad SRT encoding (must be UTF-8) / unsupported video container
duration File >2 GB or audio >12 h — ask user to split first
unknown Surface the raw error_message and link to logs

8. SSE Live Updates

The plugin emits a single event name task_update with payload:

{
  "task_id": "...",
  "status":  "...",
  "mode":    "...",
  "pipeline_step": "...",
  "progress": 0.42,
  "error_kind": "...",
  "error_message": "...",
  "error_hints": "..."
}

Subscribe via onEvent("task_update", handler) — the plugin host fans this out to the iframe via bridge:event.

9. v1.0 Scope Limit (read carefully before suggesting v2 features)

v1.0 ships without:

  • /handoff/* routes → no cross-plugin dispatch surface
  • subtitle_craft_handoff_* tools → only the 4 above
  • "Send to clip-sense" / "送往 …" UI buttons → no cross-plugin CTA
  • assets_bus.write / tasks.origin_* fills → schema is reserved but always NULL in v1.0

When the user asks "can it pipe into clip-sense?" answer with: "v1.0 exports SRT/VTT files; v2.0 will add a one-click Handoff. For now you can manually re-upload the SRT into clip-sense's burn_subtitle mode."

10. Quick Reference Recipe

// Pure transcription, Chinese, no diarization
{ "mode": "auto_subtitle", "source_path": "/x.mp4", "source_lang": "zh" }

// Transcription + diarization + character ID (the only place the toggle
// is active — embedded inside auto_subtitle, not a standalone mode)
{ "mode": "auto_subtitle", "source_path": "/x.mp4",
  "diarization_enabled": true, "character_identify_enabled": true,
  "speaker_count": 3 }

// Translate an existing SRT to English, bilingual output
{ "mode": "translate", "srt_path": "/x.srt", "target_lang": "en",
  "bilingual": true, "translation_model": "qwen-mt-flash" }

// Repair a glitchy SRT (no API cost)
{ "mode": "repair", "srt_path": "/x.srt" }

// Burn an existing SRT into video using ffmpeg ASS (recommended)
{ "mode": "burn", "source_path": "/x.mp4", "srt_path": "/x.srt",
  "burn_engine": "ass", "burn_mode": "hard", "subtitle_style": "youtube" }

// AI Hook Picker (v1.1) — Qwen-Plus selects an opening hook
{ "mode": "hook_picker", "srt_path": "/x.srt",
  "instruction": "find the strongest opening line",
  "target_duration_sec": 12, "hook_model": "qwen-plus" }

11. AI Hook Picker (mode hook_picker, added in v1.1)

A 5th processing mode that runs an existing SRT through Qwen-Plus to pick the strongest opening "hook" dialogue for short-form video. The algorithm is ported 1:1 from CutClaw's Screenwriter_scene_short.py and lives in the dedicated subtitle_hook_picker.py module — the prompt (SELECT_HOOK_DIALOGUE_PROMPT), the fuzzy-match threshold (0.55) and the 3-window-with-2-retries loop are red lines: do NOT change them without re-validating against CutClaw output.

Inputs (extends CreateTaskBody)

field type default meaning
srt_path str "" Required when from_task_id is empty
from_task_id str "" Reuse another task's output_srt_path
instruction str "" Free-form direction for the AI
main_character str "" Constrain selection to this speaker
target_duration_sec float 12.0 Hook length target (6–30)
prompt_window_mode str "tail_then_head" or "random_window"
random_window_attempts int 3 1–5; cost scales linearly
hook_model str "qwen-plus" Or qwen-plus-2025-09-11 / qwen-max

Outputs

  • task_dir/hook.srt — single-cue SRT for the chosen window.
  • task_dir/hook.json{ hook: {...}, telemetry: {...} } payload (lines, timed_lines, source_start/_end, duration_seconds, selected_window, selected_attempt, reason).
  • GET /tasks/{task_id} enriches succeeded hook tasks with hook + hook_telemetry so the UI right-pane HookResultPanel renders without an extra fetch.
  • GET /library/hooks aggregates every succeeded hook task ({items: [...], total: N}).

Skipped pipeline steps

hook_picker declares skip_steps = ("prepare_assets", "identify_characters", "translate_or_repair", "burn_or_finalize"). ASR is NOT skipped; instead _step_asr_or_load short-circuits to _load_srt_input (the SRT must contain ≥5 cues — fewer raises PipelineError(kind="format")).

Cost guard

Server-side estimate_cost bills qwen-plus (¥0.005/round) × (2 + random_window_attempts) rounds; a typical run costs < ¥0.01. The UI surfaces this estimate in the right-pane oa-preview-card before the user clicks Start.

Install via CLI
npx skills add https://github.com/openakita/openakita --skill subtitle-craft
Repository Details
star Stars 1,816
call_split Forks 257
navigation Branch main
article Path SKILL.md
More from Creator