name: clip-sense description: Guide AI-powered video editing, highlight extraction, silence removal, and talking-head polish through ClipSense. risk_class: mutating_scoped
ClipSense Skill Definition
1. Trigger Scenarios
Use this skill when the user wants to:
- Edit, trim, or cut a video
- Extract highlights from a long video
- Remove silence from a video
- Split a video by topics
- Clean up talking-head / podcast content
- Generate subtitles from a video
Keywords: 剪辑, 高光, 静音, 拆条, 口播, 字幕, video edit, clip, trim, silence, highlight
2. Command Reference
| Tool | Purpose |
|---|---|
clip_sense_create |
Create an editing task |
clip_sense_status |
Check task status |
clip_sense_list |
List recent tasks |
clip_sense_transcribe |
Transcribe a video |
clip_sense_cancel |
Cancel a running task |
3. Input Schema
clip_sense_create
{
"mode": "highlight_extract|silence_clean|topic_split|talking_polish",
"source_video_path": "/path/to/video.mp4",
"flavor": "optional: funny/controversial/informative",
"target_count": 5,
"target_duration": 30,
"threshold_db": -40,
"min_silence_sec": 0.5,
"padding_sec": 0.1,
"burn_subtitle": false
}
4. Output Schema
Task Response
{
"id": "abc123def456",
"status": "pending|running|succeeded|failed|cancelled",
"mode": "silence_clean",
"pipeline_step": "setup|check_deps|transcribe|analyze|execute|subtitle|finalize",
"output_path": "/path/to/output.mp4",
"subtitle_path": "/path/to/subtitle.srt",
"error_kind": "network|timeout|auth|...",
"error_message": "...",
"error_hints": ["hint1", "hint2"]
}
5. Error Codes
| Kind | Meaning | User Action |
|---|---|---|
network |
Connection failed | Check network/proxy |
timeout |
Task timed out (>15min) | Refresh, may still be running |
auth |
Invalid API key | Reconfigure in Settings |
quota |
Insufficient balance | Top up at Alibaba Cloud |
moderation |
Content flagged | Use different video |
dependency |
FFmpeg missing | Install ffmpeg >= 4.0 |
format |
Invalid video format | Use MP4/MOV/MKV |
duration |
Video too long (>120min) | Trim before upload |
unknown |
Unexpected error | Report task_id |
6. Mode Decision Tree
User wants to edit video →
├── "Remove silence/pauses" → silence_clean
├── "Get best parts/highlights" → highlight_extract
├── "Split into chapters/topics" → topic_split
├── "Clean up talking/podcast" → talking_polish
└── Not sure → Ask about the goal, default to highlight_extract
7. Cost Estimation
silence_clean: ¥0 (pure local FFmpeg)- Others: ~¥0.05/min (ASR) + ~¥0.002/min (Qwen) ≈ ¥1.5 for 30-min video
8. Common Templates
Extract 5 highlights from a podcast
clip_sense_create mode=highlight_extract source_video_path=/uploads/podcast.mp4 target_count=5 flavor=informative
Quick silence removal
clip_sense_create mode=silence_clean source_video_path=/uploads/talk.mp4 silence_preset=standard
Split lecture into chapters
clip_sense_create mode=topic_split source_video_path=/uploads/lecture.mp4 target_segment_duration=180
9. Testing
# Unit tests (no network)
python -m pytest tests/ -q -m "not integration"
# Integration test (needs DASHSCOPE_API_KEY + ffmpeg)
DASHSCOPE_API_KEY=sk-... python -m pytest tests/integration/ -m integration
10. Known Limitations
- Paraformer provides sentence-level timestamps (not word-level); cut boundaries have ~0.5-2s precision
- Pure-Python silence detection is slower than numpy-based for files >30min
- No auto-installation of FFmpeg; user must install manually
- Maximum video duration: 120 minutes
- Transcript text fed to Qwen is truncated at 20,000 characters
- Topic split outputs multiple files; only the first is shown in preview