sketch-to-video - SKILL.md Agent Skill

name: sketch-to-video description: Turn hand-drawn sketches into AI music videos. Guides through extracting sketches, generating style variations, creating video transition prompts, composing a song, and editing a beat-synced music video. Use when the user wants to make a music video from sketches, animate drawings, create video from art, or turn sketches into videos. disable-model-invocation: true allowed-tools: Read Write Edit Bash Glob Grep Agent Skill argument-hint: "[source-image-or-pdf] [page-number]"

Sketch to Video

Turn hand-drawn sketches into full AI music videos through a guided creative workflow.

Paths below use {base} as shorthand for this skill's base directory.

How It Works

This is an interactive, multi-phase creative workflow. You guide the user through each phase, presenting options and waiting for their input before proceeding. Be creative, opinionated, and collaborative — you're a creative director working with an artist.

The Phases

┌─────────────┐    ┌──────────────┐    ┌────────────────┐
│  1. SOURCE  │ →  │ 2. STORYBOARD│ →  │  3. VARIATIONS │
│  Extract/   │    │  Plan story, │    │  Generate all  │
│  import art │    │  pick styles │    │  style images  │
└─────────────┘    └──────────────┘    └────────────────┘
       │                                       │
       ▼                                       ▼
┌─────────────┐    ┌──────────────┐    ┌────────────────┐
│  6. EDIT    │ ←  │  5. SONG     │ ←  │ 4. TRANSITIONS │
│  Beat-sync  │    │  Suno prompt │    │  Video prompts │
│  music video│    │  + structure │    │  + story arc   │
└─────────────┘    └──────────────┘    └────────────────┘

Stop after each phase and ask the user if they want to continue, adjust, or skip ahead.

Phase 1: Source

Extract or import the sketch(es) the user wants to work with.

If given a PDF (e.g., reMarkable export):

# Extract a specific page as PNG
{base}/scripts/extract-page.sh "path/to/file.pdf" <page-number> /tmp/sketch.png

# Extract all pages for review
{base}/scripts/extract-page.sh "path/to/file.pdf" all /tmp/sketches/

If given an image file:

Copy it to a working location and read it to understand the content.

Actions:

Extract/load the sketch image(s)
View each sketch — read the image and describe what you see
Ask the user which sketch(es) they want to work with
Copy selected sketch(es) to Archive/Files/ with descriptive names
Copy to Desktop as sketch-original.png (1:1 aspect ratio, padded if needed)

Present findings: Describe each sketch's character, mood, and potential. Be enthusiastic about what you see.

Phase 2: Storyboard

Plan the creative direction collaboratively with the user.

Questions to explore:

What styles? Present the style library (see references/style-library.md) and suggest styles that would work well with this specific sketch. Recommend 6-12 styles.
What story? Propose a narrative arc — how should the styles flow? What emotional journey? Suggest an ordering that tells a compelling story.
How many images? The user controls this. More styles = longer video, more transitions.
Any specific requests? Colors, themes, moods, styles to include/exclude.

Default approach if user says "you choose":

Pick 8-10 styles that create a natural narrative arc
Start grounded (pencil, watercolor), build through energy (anime, cartoon), peak with intensity (psychedelic, cosmic), and resolve gently
Always include the original sketch as a bookend

Output: A numbered list of styles with the proposed story arc and ordering. Wait for approval.

Phase 3: Variations

Generate all style variations from the reference sketch.

Generation:

Use the generate-image skill for each variation:

python3 ~/.claude/skills/generate-image/scripts/generate.py "detailed prompt" \
  --name "descriptive-name" \
  --reference "path/to/sketch.png"

Run all generations in parallel (multiple Bash calls) for speed.

For each variation:

Write a detailed, specific prompt describing the sketch's content in the target style
Include the sketch as --reference
Name files descriptively: creature-style-name.png

After generation:

View all images — read each one to verify quality
Copy all to Desktop with numbered filenames matching story order: 00-original.png, 01-pencil.png, etc.
Create a blank white start frame if the story begins from nothing
Present the gallery to the user with descriptions

Wait for feedback. User may want to regenerate specific styles or adjust the order.

Phase 4: Transitions

Create the video transition prompts and story document.

For each consecutive pair of images:

Write a detailed transition prompt — describe the visual journey from style A to style B
Choose a transition mood — discovery, intensification, transcendence, horror, recovery, peace, etc.
Suggest transition duration — default 8 seconds, but some transitions benefit from more/less time
Suggest motion level — low, medium, high

Transition prompt guidelines (see references/transition-types.md):

Describe the START state (what you see in image A)
Describe the TRANSFORMATION (what happens during the transition)
Describe the END state (what you see in image B)
Include negative prompts
Specify motion amount and style

Output:

Create a markdown file in the vault: Archive/<Project Name> - Transition Prompts.md

Include:

Story overview and arc structure
Each transition with start/end frame references, prompt, negative prompt, motion notes
Sequence overview table with timing
Desktop file reference table

Open the file in Obsidian for the user to review.

Phase 5: Song

Create Suno prompts that match the visual story.

Key principle: BPM alignment

Pick a BPM where each transition duration aligns with bar boundaries:

120 BPM: 8 seconds = 4 bars (most natural)
90 BPM: 8 seconds = 3 bars
150 BPM: 8 seconds = 5 bars

Generate two options:

Instrumental — style prompt + section structure markers
Vocal — style prompt + lyrics that narrate the visual story

For vocal version:

Write lyrics where each section's words describe what's happening visually in that transition. This creates a powerful sync between what you see and what you hear.

Section mapping:

Map each visual transition to a song section:

Quiet visuals → Intro, Verse, Bridge, Outro
Building visuals → Pre-Chorus, Build
Intense visuals → Chorus, Drop
Dark visuals → Breakdown
Recovery visuals → Bridge, Verse 2

Output:

Create a markdown file: Archive/<Project Name> - Song Prompts.md

Include:

Timing map (transition → song section → timestamp)
Style prompt (paste into Suno)
Lyrics/structure (paste into Suno)
Post-production instructions
ffmpeg commands for combining audio + video

Open in Obsidian. The user generates the song in Suno themselves.

Phase 6: Edit

After the user has:

Generated transition videos (in Runway/Kling/ElevenLabs)
Generated a song (in Suno)
Downloaded everything

Step 6a: Stitch source videos

{base}/scripts/stitch-videos.sh ~/Downloads/ ~/Desktop/project-raw.mp4

The script auto-detects and orders the transition videos.

Step 6b: Transcribe the song

python3 {base}/scripts/transcribe-audio.py "path/to/song.mp3"

Uses Gemini API to get precise lyric timestamps and section markers.

Step 6c: Edit the music video

python3 {base}/scripts/edit-music-video.py \
  --video ~/Desktop/project-raw.mp4 \
  --audio "path/to/song.mp3" \
  --sections "section-map.json" \
  --output ~/Desktop/project-final.mp4

The editor:

Maps each source clip to its matching song section using transcription timestamps
Speed-ramps within sections based on audio energy (quiet = slow, loud = fast)
Applies transitions between clips (flash, crossfade, glitch) based on musical moment
Adds effects (vignette on dark sections, etc.)

If auto-detection doesn't match perfectly:

Manually define sections in the script call or adjust the section map JSON.

Final output:

~/Desktop/<project>-music-video.mp4 — the finished music video
Open in default player for review

Quick Reference

Phase	What Happens	User Does	Claude Does
1. Source	Extract sketches	Points to file	Extracts, views, copies
2. Storyboard	Plan styles & story	Chooses styles, approves arc	Proposes styles, narrative
3. Variations	Generate images	Reviews, requests changes	Generates all in parallel
4. Transitions	Write video prompts	Reviews prompts	Writes prompts + story doc
5. Song	Create music	Generates in Suno	Writes Suno prompts + lyrics
6. Edit	Final music video	Downloads videos + song	Stitches, transcribes, edits

Resuming Mid-Workflow

The user can invoke this skill at any phase:

/sketch-to-video — start from Phase 1
/sketch-to-video storyboard — jump to Phase 2 (if images already exist)
/sketch-to-video edit — jump to Phase 6 (if videos + song are ready)

Check what assets already exist on the Desktop to determine where to resume.

Creative Direction

You are a creative collaborator, not just a tool operator. Throughout the workflow:

Be opinionated — suggest styles, story arcs, and transitions proactively
Explain your choices — "I'd put the watercolor last because..."
Respond to feedback — adjust the vision based on user input
Think cinematically — transitions should tell a story, not just change styles
Match music to visuals — lyrics should describe what's happening on screen