audio-score-design

star 1

Use when designing audio, music, sound effects, or voice/dialogue for video storyboards, AI video generation pipelines, film scoring, or any project requiring per-shot synchronized sound design with three layers (score, SFX, voice). Triggers include "audio design", "film score", "sound design", "voice acting", "BGM", "soundtrack", "配乐", "音效", "配音", "声音设计".

2799662352 By 2799662352 schedule Updated 3/1/2026

name: audio-score-design description: Use when designing audio, music, sound effects, or voice/dialogue for video storyboards, AI video generation pipelines, film scoring, or any project requiring per-shot synchronized sound design with three layers (score, SFX, voice). Triggers include "audio design", "film score", "sound design", "voice acting", "BGM", "soundtrack", "配乐", "音效", "配音", "声音设计". metadata: version: "1.0" author: tecx

Audio Score Design

Three-layer audio design methodology for video storyboard pipelines. Every shot gets synchronized score, SFX, and voice — anchored to real film scores, physical parameters, and emotion tension values.

When to Use

  • Designing audio for storyboard shots (any genre/culture)
  • Writing audio field descriptions for AI video generation prompts
  • Choosing reference film scores for a scene's emotional tone
  • Parameterizing voice/dialogue delivery for TTS or voice direction
  • Fixing audio-visual desync in generated video

Core Rule: One Composer Per Shot

Select exactly ONE reference work per shot. Match via: scene emotion tension (0-10) → tension dimension in references/emotion-tension-map.md → pick ONE composer whose sound DNA fits the cultural context.

Do NOT stack 2-3 composers. One precise reference > three vague ones.

Three-Layer Template

audio: {
  score: "ref:{Composer}/{Work} → {instruments}, {dynamics}(pp-ff), {tempo}bpm, tension:{0-10}",
  sfx: "{material}+{action}+{freq Hz}+{decay s}+{spatial L/C/R, distance m}",
  voice: "{character}:'{line}' → pitch:{Hz}, breathiness:{%}, rate:{chars/s}, {physical}, reverb RT60:{s}"
}

Audio-Action Sync Formulas

score_bpm = body_motion_frequency_Hz × 60
volume_dB = tension × 6 - 40        (tension 0 = -40dB floor, 10 = +20dB peak)
pitch_shift = tension_delta × 50 cents
breath_interval_ms = 2000 / tension  (tension 5 = 400ms, 10 = 200ms)

Five-Phase Rhythm Arc

Phase Name Tension Audio Behavior
1 Establish 1-3 Ambient + single instrument, pp
2 Build 3-5 Rhythm pattern enters, layers add, mp→mf
3 Focus 5-7 Detail SFX amplified, space narrows
4 Peak 8-9 All layers stacked, ff, rhythm max
5 Resolve fallback Layers exit, return to ambient, pp

Map your shot sequence onto this arc. Not every scene uses all 5 phases.

Voice Parameterization

Dimension Parameter Range
Pitch Fundamental freq Male 100-180Hz / Female 180-350Hz / Child 300-500Hz
Texture Descriptor sandpaper / velvet / crystal / metallic / hollow
Breathiness Air ratio 0% pure phonation → 100% pure breath
Dynamics Force pp whisper → mp conversation → mf declaration → f shout → ff scream
Emotion externalization Physical only vocal tremor Hz / pitch drift cents / breath gap ms

Do NOT use emotion adjectives (sexy/angry/sad). Use physical parameters.

Audio Dodge (D7 Extension)

Forbidden Safe Replacement
moaning/groaning vocal tremor / breath acceleration
screaming frequency spike >3kHz / vocal overload
panting/gasping breath rate increase to 40bpm
celebrity voice imitation physical parameters (pitch+texture+breathiness)

audio.sfx and audio.score do NOT need dodge — instrument/foley terms are safe.

Quick Reference

See references/emotion-tension-map.md for the full 9-dimension tension → composer lookup.

See references/composer-reference-library.md for 45+ works across 30+ composers.

See references/instrument-palettes.md for cultural instrument groups.

Common Mistakes

Mistake Fix
Stacking 2-3 composers per shot Pick ONE. Match tension dimension + cultural context.
Using emotion words for voice Replace with Hz/bpm/%/cents physical params
Ignoring sync formula bpm must = motion_freq × 60 ± 10%
Same composer across all shots Each shot picks independently from tension map
No five-phase arc consideration Map shots to establish→build→focus→peak→resolve
Audio volume unrelated to tension Use formula: dB = tension × 6 - 40
Install via CLI
npx skills add https://github.com/2799662352/ai-image-master --skill audio-score-design
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator