opc

name: opc description: "多引擎 AI 创作工具链。TTS 语音合成（edge-tts / Qwen3-TTS）、ASR 语音识别与卡拉OK字幕、ComfyUI 图片生成（ERNIE/Qwen/Z-Image + PromptKG 知识图谱）、视频剪辑（Cut）。使用场景：(1) 文本转语音播放，(2) 音频转录生成 SRT/ASS 字幕，(3) AI 图片生成与风格探索，(4) Prompt 知识图谱查询与模板发现，(5) 字幕级视频剪辑。触发词：语音、TTS、ASR、字幕、图片生成、prompt、知识图谱、KG、视频剪辑、cut"

opc - AI 创作工具链

TTS + ASR + AI 图片生成 + 视频剪辑。

环境安装

curl -LsSf https://astral.sh/uv/install.sh | sh
cd ~/.claude/skills/opc-cli && uv sync

跨平台：Linux 用 CUDA，macOS 用 MLX，命令一致。

模型下载源：

opc config --set-model-source modelscope   # 默认
opc config --set-model-source huggingface  # 备选
opc config --set-model-cache-dir ~/models

快速开始

opc discover --set-default          # 发现播放设备
opc tts "你好" -e edge-tts          # 生成语音
opc say "你好"                       # 生成并播放
opc asr audio.mp3 --format srt      # 生成字幕
opc image -w ernie-turbo -p "a cat"  # AI 生图
opc image kg skeleton subject:food style:photography  # KG prompt 规划

所有命令通过 uv run --project ~/.claude/skills/opc-cli python -m scripts.opc 执行。opc 是上述命令的简写别名。

TTS 命令

`opc tts <text>` — 生成语音文件

opc tts "你好" -e edge-tts                           # edge-tts
opc tts "你好" -e edge-tts --rate +20% --pitch +5Hz  # 带语速/音调
opc tts "你好" -e qwen --speaker Vivian              # qwen 内置音色
opc tts "你好" -e qwen --instruct "用愤怒的语气说"    # 情绪指令
opc tts "你好" -e qwen --mode voice_design --instruct "温柔的女声"  # 声音设计
opc tts "你好" -e qwen --mode voice_clone --ref-audio ref.wav --ref-text "参考"  # 克隆

参数： -e 引擎(edge-tts|qwen)、-v 音色、-l 语言、-o 输出路径、--stdin 从 stdin 读 edge-tts： --rate、--pitch、--volume qwen： -m 模式(custom_voice|voice_design|voice_clone)、-s 音色、-i 情绪指令、--ref-audio、--ref-text

`opc say <text>` — 生成并播放

参数同 tts，额外 -d 指定播放设备。

`opc voices` / `opc discover`

opc voices -e edge-tts   # 322 个音色
opc voices -e qwen       # 9 个内置音色
opc discover --set-default

ASR Pipeline

4 阶段 Pipeline：ASR + Forced Alignment → Sentence Breaking → CSV Fix → Render

opc asr audio.mp3                    # 转录到 stdout
opc asr audio.mp3 --format srt       # 生成 SRT + ASS
opc asr audio.mp3 --format json -o result.json
opc asr audio.mp3 --format srt --fix-dir ./fixes             # CSV 修正
opc asr audio.mp3 --format srt --resume-from break           # 从断句阶段恢复

断句规则

两遍扫描：Pass 1 按句号分段，Pass 2 按逗号断行。没有标点绝对不断行。 超长行由 Check 标记，用 opc asr-split 手动拆分：

opc asr-split audio.lines.json --line 10 --after "理解，"
opc asr audio.mp3 --format srt --resume-from render

CSV 修正格式

在 --fix-dir 放 fix_1.csv, fix_2.csv...，格式：原文本，新文本（新文本留空=删除行）。# 开头为注释。

Image 命令

AI 图片生成 + Prompt 知识图谱 + 模板系统。详见 references/image.md。

核心流程：

opc image kg skeleton subject:food style:photography  # KG 规划
opc image -w ernie-full -p "..."                       # 生成
opc image analyze output.png --describe                # 分析

Cut 命令

基于 ASR 字词级时间戳的视频剪辑。详见 references/cut.md。

opc cut --video video.mp4        # 启动剪辑 Web 界面

Dashboard

技能管理面板。详见 references/dashboard.md。

配置

配置文件：~/.opc_cli/opc/config.json

键	默认值	说明
`tts_engine`	`edge-tts`	默认 TTS 引擎
`edge_voice`	`zh-CN-XiaoxiaoNeural`	edge-tts 音色
`qwen_speaker`	`Vivian`	qwen 音色
`default_device`		播放设备
`asr_model_size`	`1.7B`	ASR 模型
`workspace_dir`	`~/opc-workspace`	工作目录
`comfyui_host`	`127.0.0.1`	ComfyUI 地址
`comfyui_port`	`8188`	ComfyUI 端口
`dashboard_host`	`0.0.0.0`	Dashboard 监听地址
`dashboard_port`	`12080`	Dashboard 端口
`model_source`	`modelscope`	模型下载源

opc config --show
opc config --set-engine qwen
opc config --set-comfyui-host 192.168.1.100