clip-sense - SKILL.md Agent Skill

name: clip-sense description: Guide AI-powered video editing, highlight extraction, silence removal, and talking-head polish through ClipSense. risk_class: mutating_scoped

ClipSense Skill Definition

1. Trigger Scenarios

Use this skill when the user wants to:

Edit, trim, or cut a video
Extract highlights from a long video
Remove silence from a video
Split a video by topics
Clean up talking-head / podcast content
Generate subtitles from a video

Keywords: 剪辑, 高光, 静音, 拆条, 口播, 字幕, video edit, clip, trim, silence, highlight

2. Command Reference

Tool	Purpose
`clip_sense_create`	Create an editing task
`clip_sense_status`	Check task status
`clip_sense_list`	List recent tasks
`clip_sense_transcribe`	Transcribe a video
`clip_sense_cancel`	Cancel a running task

3. Input Schema

clip_sense_create

{
  "mode": "highlight_extract|silence_clean|topic_split|talking_polish",
  "source_video_path": "/path/to/video.mp4",
  "flavor": "optional: funny/controversial/informative",
  "target_count": 5,
  "target_duration": 30,
  "threshold_db": -40,
  "min_silence_sec": 0.5,
  "padding_sec": 0.1,
  "burn_subtitle": false
}

4. Output Schema

Task Response

{
  "id": "abc123def456",
  "status": "pending|running|succeeded|failed|cancelled",
  "mode": "silence_clean",
  "pipeline_step": "setup|check_deps|transcribe|analyze|execute|subtitle|finalize",
  "output_path": "/path/to/output.mp4",
  "subtitle_path": "/path/to/subtitle.srt",
  "error_kind": "network|timeout|auth|...",
  "error_message": "...",
  "error_hints": ["hint1", "hint2"]
}

5. Error Codes

Kind	Meaning	User Action
`network`	Connection failed	Check network/proxy
`timeout`	Task timed out (>15min)	Refresh, may still be running
`auth`	Invalid API key	Reconfigure in Settings
`quota`	Insufficient balance	Top up at Alibaba Cloud
`moderation`	Content flagged	Use different video
`dependency`	FFmpeg missing	Install ffmpeg >= 4.0
`format`	Invalid video format	Use MP4/MOV/MKV
`duration`	Video too long (>120min)	Trim before upload
`unknown`	Unexpected error	Report task_id

6. Mode Decision Tree

User wants to edit video →
  ├── "Remove silence/pauses" → silence_clean
  ├── "Get best parts/highlights" → highlight_extract
  ├── "Split into chapters/topics" → topic_split
  ├── "Clean up talking/podcast" → talking_polish
  └── Not sure → Ask about the goal, default to highlight_extract

7. Cost Estimation

silence_clean: ¥0 (pure local FFmpeg)
Others: ~¥0.05/min (ASR) + ~¥0.002/min (Qwen) ≈ ¥1.5 for 30-min video

8. Common Templates

Extract 5 highlights from a podcast

clip_sense_create mode=highlight_extract source_video_path=/uploads/podcast.mp4 target_count=5 flavor=informative

Quick silence removal

clip_sense_create mode=silence_clean source_video_path=/uploads/talk.mp4 silence_preset=standard

Split lecture into chapters

clip_sense_create mode=topic_split source_video_path=/uploads/lecture.mp4 target_segment_duration=180

9. Testing

# Unit tests (no network)
python -m pytest tests/ -q -m "not integration"

# Integration test (needs DASHSCOPE_API_KEY + ffmpeg)
DASHSCOPE_API_KEY=sk-... python -m pytest tests/integration/ -m integration

10. Known Limitations

Paraformer provides sentence-level timestamps (not word-level); cut boundaries have ~0.5-2s precision
Pure-Python silence detection is slower than numpy-based for files >30min
No auto-installation of FFmpeg; user must install manually
Maximum video duration: 120 minutes
Transcript text fed to Qwen is truncated at 20,000 characters
Topic split outputs multiple files; only the first is shown in preview