digital-human-integrator - SKILL.md Agent Skill

name: digital-human-integrator description: Integrates AI-generated digital human avatars into video content. Handles avatar selection, script-to-speech conversion, lip-sync, and presenter integration.

Digital Human Integrator

This skill manages the integration of digital human avatars (AI presenters) into video content, including avatar configuration, speech synthesis, lip-sync, and scene composition.

When to Use This Skill

Creating talking head presenter videos without real humans
Adding consistent brand spokesperson across videos
Generating multilingual presenter videos
Creating personalized video at scale
Professional presenter without filming costs
Content requiring always-available presenter

What This Skill Does

Avatar Selection: Chooses or configures appropriate digital human
Script Optimization: Adapts script for natural digital human delivery
Voice Generation: Converts text to natural speech
Lip-Sync: Synchronizes avatar mouth movements with audio
Background Integration: Composites avatar with branded backgrounds
Expression Control: Manages facial expressions and gestures
Output Rendering: Generates final video with digital presenter

How to Use

Basic Integration

Create digital human video from this script: [script or filename]

Add digital presenter to explain [topic]

With Specifications

Create a video with:
- Avatar: [professional female/male, style]
- Script: [paste or filename]
- Background: [office/studio/custom]
- Duration: [length]

Instructions

When user requests digital human integration:

Check Project Context

Checking project context for digital human preferences...

**From project-context.md**:
- Avatar type: [Preference from context or ask]
- Voice style: [From brand voice or ask]
- Background setting: [From brand guidelines or ask]

Avatar Configuration

## Digital Human Configuration

**Avatar Selection**:

**Appearance**:
- [ ] Professional business person
- [ ] Friendly expert
- [ ] Young professional
- [ ] Mature authority figure
- [ ] Custom specification

**Demographics**:
- Gender: [Male/Female/Non-binary]
- Age range: [20s/30s/40s/50+]
- Ethnicity: [Specification or diverse]
- Style: [Contemporary/Classic/Casual]

**Attire**:
- [ ] Business formal (suit/blazer)
- [ ] Business casual (smart casual)
- [ ] Casual professional (polo/button-down)
- [ ] Industry-specific (lab coat, scrubs, etc.)
- [ ] Custom: ___________

**Avatar Platform**:
- [ ] D-ID
- [ ] Synthesia
- [ ] HeyGen
- [ ] Runway ML
- [ ] Other: ___________

[Note: Availability depends on user's subscriptions]

Voice Configuration

## Voice Setup

**Voice Selection**:

**Voice Type**:
- Gender: [Match avatar or specify]
- Age: [Young/Middle-aged/Mature]
- Accent: [American/British/Australian/Neutral/Other]
- Language: [Primary language]

**Voice Characteristics**:
- Tone: [Professional/Friendly/Authoritative/Casual]
- Pace: [Slow/Medium/Fast] (~[XXX] words/minute)
- Energy: [Calm/Moderate/Energetic]
- Pitch: [Lower/Medium/Higher]

**Voice Platform**:
- [ ] ElevenLabs
- [ ] Murf.ai
- [ ] Platform's built-in (D-ID, Synthesia)
- [ ] Azure Speech
- [ ] Google Text-to-Speech
- [ ] Amazon Polly

**Voice Sample Needed**: [Yes/No - provide reference if yes]

Script Optimization for Digital Human

## Script Adaptation

**Original Script Review**:
- Total word count: [XXX] words
- Estimated duration: [X:XX] at 150 wpm
- Complexity check: [Simple/Moderate/Complex]

**Optimization for Digital Human**:

✅ **Optimizations Made**:
- [ ] Removed tongue twisters
- [ ] Simplified complex words
- [ ] Added natural pauses [pause]
- [ ] Clarified pronunciation: [word] → [phonetic]
- [ ] Adjusted pacing markers
- [ ] Shortened overly long sentences
- [ ] Added emphasis markers *word*

**Pronunciation Guide**:
| Word | Phonetic | Context |
|------|----------|---------|
| [Brand name] | [Pronunciation] | [When appears] |
| [Technical term] | [Pronunciation] | [When appears] |

**Adjusted Script**:

[Script with digital human optimizations]

Background & Scene Setup

## Scene Configuration

**Background Selection**:

**Background Type**:
- [ ] Virtual office
- [ ] Studio (plain/gradient)
- [ ] Home office
- [ ] Industry-specific setting
- [ ] Custom uploaded background
- [ ] Green screen for custom compositing

**Background Details**:
- Style: [Modern/Classic/Minimal/Detailed]
- Color scheme: [From brand colors]
- Lighting: [Bright/Balanced/Dramatic]
- Blur level: [None/Slight/Heavy - for focus on presenter]

**Brand Elements**:
- Logo placement: [Yes/No - where if yes]
- Brand colors: [Incorporated how]
- Props/elements: [Any specific items]

**Camera Framing**:
- Shot type: [Head & shoulders/Medium/Waist-up]
- Position: [Center/Rule of thirds]
- Headroom: [Standard professional framing]

Expression & Gesture Control

## Avatar Behavior Settings

**Facial Expressions**:
- Default: [Neutral friendly/Slight smile/Professional]
- Smile intensity: [Subtle/Moderate/Warm]
- Eye contact: [Direct/Natural variation]
- Blink rate: [Natural/Frequent/Infrequent]

**Gestures** (if platform supports):
- Hand movements: [None/Subtle/Moderate/Expressive]
- Head movements: [Minimal nods/Natural movement/Static]
- Body language: [Formal/Relaxed/Animated]

**Expression Cues from Script**:
| Time | Script Line | Expression |
|------|-------------|------------|
| 0:05 | "Welcome..." | Warm smile |
| 0:30 | "The problem is..." | Concerned |
| 1:00 | "Here's the solution..." | Confident |
| 2:00 | "Imagine..." | Excited |
| 2:45 | "Get started today" | Encouraging |

Platform-Specific Configuration

For D-ID:

### D-ID Configuration

**API Settings**:
- Driver image: [Avatar image or stock]
- Audio source: [Generated TTS or uploaded]
- Stitch: [true/false - for longer videos]
- Result format: [mp4, webm]

**Script Format**:
```json
{
  "script": {
    "type": "text",
    "input": "[Script text with SSML tags]",
    "provider": {
      "type": "microsoft",
      "voice_id": "[Voice ID]"
    }
  },
  "config": {
    "fluent": true,
    "pad_audio": 0,
    "stitch": true
  }
}


**For Synthesia**:
```markdown
### Synthesia Configuration

**Avatar**: [Avatar ID from library]
**Voice**: [Voice ID]
**Background**: [Template ID or custom]

**Script Input**:

[Scene 1] [Script text] 1000

[Scene 2] [Script text]


**Output Settings**:
- Resolution: 1080p / 4K
- Subtitles: On/Off
- Background music: [Yes/No]

For HeyGen:

### HeyGen Configuration

**Avatar Selection**: [Avatar from library or custom]
**Voice Cloning**: [Yes/No - if yes, upload sample]

**Scene Settings**:
- Avatar position: [Center/Left/Right]
- Avatar size: [Scale percentage]
- Background: [Template or custom]

**Text Input**:

[Script with timing markers] {pause: 1} for 1 second pause {emphasis} for emphasized words

Rendering Workflow

## Video Generation Process

**Step 1: Audio Generation**
- Generate voice from script using [platform]
- Duration: [X:XX]
- Preview and verify pronunciation
- Adjust if needed: [List any adjustments]

**Step 2: Avatar Animation**
- Upload audio to digital human platform
- Configure avatar and scene
- Generate video with lip-sync
- Estimated render time: [X minutes]

**Step 3: Background Compositing** (if needed)
- Add custom branded background
- Insert logo/graphics
- Apply color grading to match brand

**Step 4: Post-Processing** (optional)
- Add text overlays
- Insert B-roll cutaways
- Add background music
- Apply transitions

**Step 5: Quality Check**
- [ ] Lip-sync accurate
- [ ] Audio quality clear
- [ ] Video quality high
- [ ] Background appropriate
- [ ] Brand elements present
- [ ] Natural expressions
- [ ] Smooth movements
- [ ] Correct duration

Integration with Video Workflow

## Video Composition

**Digital Human Segments**:

| Segment | Duration | Script | Avatar | Background |
|---------|----------|--------|--------|------------|
| Opening | 0:00-0:15 | [Intro] | [Avatar] | [Setting] |
| Section 1 | 0:15-0:45 | [Content] | [Avatar] | [Setting] |
| Section 2 | 0:45-1:15 | [Content] | [Avatar] | [Setting] |
| Closing | 1:15-1:30 | [CTA] | [Avatar] | [Setting] |

**B-Roll Integration**:
- Cut away to B-roll at: [timestamps]
- Return to avatar at: [timestamps]
- Picture-in-picture: [Yes/No - where]

**Final Composition**:

Timeline: [00:00-00:05] Title card [00:05-00:15] Avatar intro [00:15-00:20] B-roll product shot [00:20-00:45] Avatar content [00:45-00:50] B-roll feature demo [00:50-01:15] Avatar content [01:15-01:30] Avatar CTA + End screen

Quality Assurance Checklist

## QA Checklist

**Audio Quality**:
- [ ] Voice sounds natural (not robotic)
- [ ] Pronunciation correct for all words
- [ ] Pacing feels natural
- [ ] Volume levels consistent
- [ ] No audio artifacts or glitches
- [ ] Background music balanced (if any)

**Visual Quality**:
- [ ] Lip-sync accurate throughout
- [ ] Facial expressions appropriate
- [ ] No uncanny valley effects
- [ ] Lighting on avatar natural
- [ ] Background complements avatar
- [ ] Resolution and quality high
- [ ] No visual artifacts or glitches

**Brand Alignment**:
- [ ] Avatar matches brand personality
- [ ] Voice matches brand tone
- [ ] Background reflects brand aesthetic
- [ ] Colors aligned with brand palette
- [ ] Logo/branding properly placed
- [ ] Overall feeling matches brand

**Content Accuracy**:
- [ ] Script delivered as written
- [ ] All key points covered
- [ ] Duration matches target
- [ ] CTA clear and prominent
- [ ] No errors or misstatements

Save Configuration

Save all settings to ./output/digital-human/config-[project-name]-[date].md for reuse in future videos.

Provide Summary

✅ Digital human video configuration complete!

**Configuration Saved**: `./output/digital-human/config-[filename].md`

**Video Details**:
- Avatar: [Description]
- Voice: [Type]
- Background: [Setting]
- Duration: [X:XX]
- Platform: [Platform]

**Generation Status**:
- Audio: [Generated/Pending]
- Video: [Generated/Rendering/Pending]
- Estimated completion: [Time]

**Output Location**: `./output/videos/digital-human-[filename].mp4`

**Next Steps**:
1. Review generated video
2. Request adjustments if needed:
   - "Adjust voice to be more [characteristic]"
   - "Change avatar expression to [emotion]"
   - "Update background to [setting]"
3. Integrate with other video segments
4. Add post-production elements

**For Future Videos**:
- Configuration saved for reuse
- Use same avatar: "Create another video with same avatar"
- Change settings: "Modify avatar configuration for [change]"

Ready to generate or need adjustments?

Output Format

Digital human videos should:

Look natural: Avoid uncanny valley effects
Sound professional: Clear, natural voice
Match brand: Align with brand personality and style
Be reusable: Save configurations for consistency
Integrate smoothly: Work well with other video elements

Tips

Test short clip first: Generate 10-15 seconds to verify quality
Natural language: Write conversational scripts
Pronunciation guide: Provide phonetics for difficult words
Consistent avatar: Use same avatar for series consistency
Background simplicity: Don't distract from message
Voice preview: Test voice options before full generation
Save config: Document settings for future videos

Platform Comparison

Feature	D-ID	Synthesia	HeyGen
Custom avatar	✓ Easy	✓ Premium	✓ Easy
Voice cloning	✓	✓	✓✓ Best
Gestures	Limited	✓ Good	✓✓ Best
API access	✓✓ Best	✓	✓
Cost	$$	$$$	$$
Quality	High	Highest	High

Related Skills

Use script-writer to create optimized scripts first
Reference project-context for avatar preferences
Combine with video-compositor for final video
Use audio-sync-optimizer if adjustments needed