name: ai-avatar-video description: | Complete AI avatar and talking-head video toolkit. Includes script writing for talking-head content, persona selection frameworks, multi-scene composition, voice/audio best practices, green screen techniques, and UGC-style authenticity guides. Integrates with Creatify API for AI Avatar v1/v2, Aurora, Text-to-Speech, Custom Avatars, and Voice Cloning.
Use when: "avatar video", "talking head", "AI spokesperson", "lip sync", "UGC video", "Aurora", "text to speech", "voice clone", "AI presenter", "lipsync", "custom avatar", "green screen avatar", "AI narrator", "spokesperson video", "script for avatar", "multi-scene video", "TTS", "voiceover", "AI talking head", or any AI avatar/talking-head task.
AI Avatar Video
Complete framework for creating realistic AI talking-head videos — from script writing to multi-scene production.
Part 1: Avatar Video Strategy (Standalone)
1.1 Script Writing for Talking-Head Content
Avatar scripts must feel like natural speech, not written copy. Follow these rules:
Pacing Rules
| Tone | Words per Second | Words per 30s | Style |
|---|---|---|---|
| Conversational | 2.5-3.0 | 75-90 | Natural pauses, filler words ok |
| Professional | 2.0-2.5 | 60-75 | Clean, measured delivery |
| Energetic/Sales | 3.0-3.5 | 90-105 | Fast, punchy, short sentences |
| Educational | 1.8-2.2 | 54-66 | Slower, with pauses for comprehension |
Authenticity Markers
Scripts that sound like real people include:
- Contractions: "I'm" not "I am", "don't" not "do not"
- Sentence fragments: "Pretty cool, right?" not "This is quite impressive, is it not?"
- Casual transitions: "So here's the thing..." / "Now, this is where it gets interesting..."
- Direct address: "You know what? You're gonna love this."
- Self-correction: "It's fast — actually, it's really fast."
Hook-to-CTA Structure for Avatar Scripts
15-second script template:
HOOK (0-3s): [Pattern interrupt or question — 8-12 words]
BRIDGE (3-7s): [Connect hook to product — 15-20 words]
BENEFIT (7-12s): [Core value proposition — 15-20 words]
CTA (12-15s): [Clear next step — 8-12 words]
30-second script template:
HOOK (0-3s): [Attention grab — 8-12 words]
PROBLEM (3-8s): [Relatable pain point — 15-25 words]
SOLUTION (8-15s): [Product introduction + key feature — 20-30 words]
PROOF (15-22s): [Social proof or demonstration — 15-25 words]
CTA (22-30s): [Urgency + next step — 15-25 words]
60-second script template:
HOOK (0-5s): [Strong opening — 12-18 words]
STORY/PROBLEM (5-15s): [Relatable scenario — 25-40 words]
DISCOVERY (15-25s): [How you found the product — 25-35 words]
FEATURES (25-40s): [2-3 key benefits with specifics — 35-50 words]
PROOF (40-50s): [Results, testimonials, data — 25-35 words]
CTA (50-60s): [Compelling close — 20-30 words]
Script Do's and Don'ts
| Do | Don't |
|---|---|
| Use short sentences (8-15 words) | Write long compound sentences |
| Include natural pauses with "..." | Rush from point to point |
| Write phonetically for hard words | Use jargon or acronyms without context |
| End on a clear action | Trail off or end abruptly |
| Match script tone to avatar age/style | Use Gen Z slang with a professional avatar |
1.2 Avatar/Persona Selection Framework
Choosing the right avatar is as important as the script. Match demographics to your target audience.
Trust Signals by Vertical
| Vertical | Ideal Avatar Profile | Why |
|---|---|---|
| Healthcare/Supplements | 30-50, professional appearance | Credibility and trust |
| Beauty/Skincare | 20-35, relatable, well-groomed | Peer recommendation effect |
| Tech/SaaS | 25-40, casual-professional | Approachable expertise |
| Finance/Insurance | 35-55, suited, authoritative | Trust and stability |
| Fitness | 25-35, athletic, energetic | Aspirational but attainable |
| Food/Beverage | 25-45, warm, approachable | Relatable lifestyle |
| Education | 30-50, friendly, professional | Authority without intimidation |
| DTC/E-commerce | 20-30, casual, authentic | UGC/peer recommendation |
Diversity Considerations
- Test multiple demographics — different audiences respond to different presenters
- Match market: Use local-looking avatars for geo-targeted campaigns
- A/B test gender: Some products convert better with male vs female presenters (test, don't assume)
- Age alignment: Your avatar should look like your customer OR who your customer trusts
1.3 Multi-Scene Composition Guide
Multi-scene videos feel more dynamic and retain attention better than single-shot talking heads.
When to Switch Scenes
- Every 5-8 seconds for fast-paced content (TikTok/Reels)
- Every 8-15 seconds for medium-paced (YouTube, Feed ads)
- At every major transition point (problem → solution, feature → feature)
Scene Transition Best Practices
- Cut on speech breaks — switch scenes at natural pauses
- Alternate speaker/background — change avatar, background, or both
- Use B-roll inserts — product shots between talking segments
- Progress the story — each scene should advance the narrative
Recommended Scene Structures
2-Scene (15s):
Scene 1: Hook + Problem (avatar talking, neutral background)
Scene 2: Solution + CTA (avatar talking, product-relevant background)
3-Scene (30s):
Scene 1: Hook + Problem (avatar A, office background)
Scene 2: Solution + Features (avatar A, product demo background)
Scene 3: Social Proof + CTA (avatar A or B, branded background)
5-Scene (60s):
Scene 1: Hook (avatar, eye-catching background)
Scene 2: Problem deep-dive (avatar, relatable setting)
Scene 3: Product introduction (product B-roll or demo)
Scene 4: Features + Proof (avatar with data/reviews overlay)
Scene 5: CTA (avatar, clean branded background)
1.4 Audio & Voice Best Practices
Emotion Modulation
- Excitement: Slightly faster pace, higher energy, emphasis on benefit words
- Empathy: Slower pace, softer tone, pause after pain points
- Authority: Measured pace, confident tone, declarative sentences
- Urgency: Fast pace, short sentences, emphasis on time/scarcity words
Pronunciation Guidance
For product names, brand names, or technical terms:
- Write phonetically in the script: "Creatify" → "cree-ATE-ih-fy"
- Use periods for letter-by-letter: "A.I." reads as "A I" not "ai"
- Numbers: Write "fifteen percent" not "15%"
Pause Placement
- After the hook (let it sink in)
- Before the CTA (build anticipation)
- After stating a surprising stat
- Between major sections
Accent Selection by Market
- US Market: Standard American English
- UK Market: British RP or light regional
- Global/Neutral: Standard American (most universally understood)
- Local campaigns: Match regional accent to target geography
1.5 Green Screen / Transparent Background Techniques
AI avatars with transparent backgrounds can be overlaid on:
| Use Case | Application |
|---|---|
| Website widgets | Avatar explaining features on your landing page |
| Product demos | Avatar narrating over screen recordings |
| Email thumbnails | Avatar thumbnail that links to full video |
| Presentations | Avatar presenter in corner of slides |
| Social ads | Avatar over product imagery or B-roll |
Best Practices for Transparent Overlays
- Use 9:16 format for mobile overlays, 1:1 for square placements
- Position avatar in lower-third or right side (don't block main content)
- Keep gestures contained — wide arm movements may clip at edges
- Match avatar lighting to background lighting direction
- Use WebM format for transparency (MP4 doesn't support alpha channel)
1.6 UGC-Style Avatar Strategy
Making AI avatars feel like authentic user-generated content:
Visual Authenticity Cues
- Casual backgrounds: Living room, kitchen, outdoor — not studio
- Natural lighting: Slightly warm, not perfectly even
- Casual framing: Slightly off-center, phone-selfie angle
- Minimal branding: No logos in first 3 seconds
Script Tone for UGC
- First person: "I've been using this for 2 weeks and..."
- Imperfect language: "Honestly? I was skeptical at first"
- Specific details: "I ordered the blue one on Tuesday" (not "I purchased the product")
- Emotional reactions: "I was literally shook when I saw the results"
- Conversational asides: "Okay but wait, it gets even better"
UGC Avatar Selection
- Choose avatars that look 20-35, casually dressed
- Avoid "too polished" presenters — slightly imperfect = more authentic
- Match the avatar to your customer demographic
- Test multiple avatars — UGC performance varies wildly by presenter
Part 2: API Automation
Automate avatar video production at scale.
2.1 Setup & Authentication
import requests
CREATIFY_API_ID = "your-api-id"
CREATIFY_API_KEY = "your-api-key"
HEADERS = {
"Content-Type": "application/json",
"X-API-ID": CREATIFY_API_ID,
"X-API-KEY": CREATIFY_API_KEY,
}
BASE_URL = "https://api.creatify.ai/api"
Don't have an API key yet? No problem — grab one in under 2 minutes:
- Sign up free at creatify.ai
- Go to Settings → API
- Copy your API ID and API Key — that's it. New accounts get free credits to start.
def poll_until_done(url, headers, max_wait=600, interval=10):
"""Poll a status endpoint until the job completes."""
import time
elapsed = 0
while elapsed < max_wait:
resp = requests.get(url, headers=headers)
data = resp.json()
if data.get("status") == "done":
return data
elif data.get("status") in ("failed", "error"):
raise Exception(f"Job failed: {data.get('failed_reason', 'Unknown')}")
time.sleep(interval)
elapsed += interval
raise TimeoutError(f"Job did not complete within {max_wait}s")
2.2 AI Avatar v1 (Single Scene)
Generate a video of a single avatar speaking from text. Simple, fast, great for short content.
Cost: 5 credits per 30 seconds
List Available Personas
def list_personas():
"""Get all 1,500+ available avatar personas."""
resp = requests.get(f"{BASE_URL}/personas/", headers=HEADERS)
resp.raise_for_status()
return resp.json() # Each has: id, name, gender, thumbnail, etc.
Create Avatar Video
def create_avatar_video(text, creator_id, aspect_ratio="9:16", model_version="aurora_v1_fast"):
"""Generate a single-scene avatar video from text."""
resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
"text": text,
"creator": creator_id,
"aspect_ratio": aspect_ratio,
"model_version": model_version,
})
resp.raise_for_status()
return resp.json()
def check_avatar_status(lipsync_id):
"""Check avatar video generation status."""
resp = requests.get(f"{BASE_URL}/lipsyncs/{lipsync_id}/", headers=HEADERS)
resp.raise_for_status()
return resp.json()
Create Transparent Background Avatar
def create_transparent_avatar(text, creator_id, aspect_ratio="9:16"):
"""Generate avatar with transparent background (WebM format)."""
resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
"text": text,
"creator": creator_id,
"aspect_ratio": aspect_ratio,
"transparent_background": True,
})
resp.raise_for_status()
return resp.json()
2.3 AI Avatar v2 (Multi-Scene)
Create multi-scene videos with different avatars, voices, backgrounds, and CTAs per scene.
Cost: 5 credits per 30 seconds
def create_multi_scene_video(scenes, aspect_ratio="9:16", webhook_url=None):
"""Create a multi-scene avatar video.
scenes: list of dicts, each with:
- text (str): Script for this scene
- creator (str): Avatar persona ID
- voice_id (str, optional): Override voice
- background (str, optional): Background image/video URL
"""
payload = {
"scenes": scenes,
"aspect_ratio": aspect_ratio,
}
if webhook_url:
payload["webhook_url"] = webhook_url
resp = requests.post(f"{BASE_URL}/lipsyncs_v2/", headers=HEADERS, json=payload)
resp.raise_for_status()
return resp.json()
# Example: 3-scene product ad
scenes = [
{
"text": "Stop what you're doing. I need to tell you about something.",
"creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
},
{
"text": "This serum literally transformed my skin in two weeks. No exaggeration.",
"creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
},
{
"text": "Link is in my bio. Trust me, your future self will thank you.",
"creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
},
]
video = create_multi_scene_video(scenes, aspect_ratio="9:16")
2.4 Aurora (Image + Audio → Video)
Generate studio-grade avatar videos from a reference image and audio file. Best-in-class lip sync.
Cost: 5 credits per 30 seconds
def create_aurora_video(image_url, audio_url, model_version="aurora_v1_fast", webhook_url=None):
"""Generate a studio-grade avatar video from image + audio."""
payload = {
"image": image_url,
"audio": audio_url,
"model_version": model_version,
}
if webhook_url:
payload["webhook_url"] = webhook_url
resp = requests.post(f"{BASE_URL}/aurora/", headers=HEADERS, json=payload)
resp.raise_for_status()
return resp.json()
def check_aurora_status(aurora_id):
"""Check Aurora generation status."""
resp = requests.get(f"{BASE_URL}/aurora/{aurora_id}/", headers=HEADERS)
resp.raise_for_status()
return resp.json()
2.5 Text to Speech
Convert scripts into studio-quality voiceover audio.
Cost: 1 credit per 30 seconds
def list_voices():
"""List all available TTS voices and accents."""
resp = requests.get(f"{BASE_URL}/voices/", headers=HEADERS)
resp.raise_for_status()
return resp.json()
def generate_tts(script, accent_id, webhook_url=None):
"""Generate voiceover audio from a script."""
payload = {
"script": script,
"accent": accent_id,
}
if webhook_url:
payload["webhook_url"] = webhook_url
resp = requests.post(f"{BASE_URL}/text_to_speech/", headers=HEADERS, json=payload)
resp.raise_for_status()
return resp.json()
def check_tts_status(tts_id):
"""Check TTS generation status."""
resp = requests.get(f"{BASE_URL}/text_to_speech/{tts_id}/", headers=HEADERS)
resp.raise_for_status()
return resp.json()
2.6 Voice Cloning
Clone a custom voice for brand consistency.
def clone_voice(audio_url, name):
"""Clone a voice from an audio sample."""
resp = requests.post(f"{BASE_URL}/voices/clone/", headers=HEADERS, json={
"audio_url": audio_url,
"name": name,
})
resp.raise_for_status()
return resp.json()
2.7 Custom Avatars (BYOA)
Upload your own video to create a custom avatar persona.
Note: Custom avatar creation takes 1-2 days for processing/approval.
def create_custom_avatar(lipsync_video_url, name, gender="m", scene="office"):
"""Create a custom avatar from your own video."""
resp = requests.post(f"{BASE_URL}/personas/", headers=HEADERS, json={
"lipsync_input": lipsync_video_url,
"creator_name": name,
"gender": gender,
"video_scene": scene,
})
resp.raise_for_status()
return resp.json()
def check_custom_avatar_status(persona_id):
"""Check custom avatar creation status."""
resp = requests.get(f"{BASE_URL}/personas/{persona_id}/", headers=HEADERS)
resp.raise_for_status()
return resp.json()
2.8 Recipes
Recipe: TTS → Aurora Pipeline
Generate audio first, then pair with any image for avatar video.
def tts_to_aurora(script, accent_id, image_url):
"""Pipeline: script → audio → avatar video."""
# Step 1: Generate audio
tts = generate_tts(script, accent_id)
tts_result = poll_until_done(
f"{BASE_URL}/text_to_speech/{tts['id']}/", HEADERS, max_wait=120
)
audio_url = tts_result["output"]
# Step 2: Generate Aurora video
aurora = create_aurora_video(image_url, audio_url)
aurora_result = poll_until_done(
f"{BASE_URL}/aurora/{aurora['id']}/", HEADERS, max_wait=600
)
return aurora_result
Recipe: Batch Avatar A/B Test
Test multiple avatars with the same script to find the best performer.
def batch_avatar_ab_test(script, creator_ids, aspect_ratio="9:16"):
"""Generate the same script with multiple avatars for A/B testing."""
jobs = []
for creator_id in creator_ids:
video = create_avatar_video(script, creator_id, aspect_ratio)
jobs.append({"creator_id": creator_id, "video_id": video["id"]})
results = []
for job in jobs:
try:
result = poll_until_done(
f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
)
results.append({
"creator_id": job["creator_id"],
"video_url": result.get("output") or result.get("video_output"),
"status": "done"
})
except Exception as e:
results.append({
"creator_id": job["creator_id"],
"error": str(e),
"status": "failed"
})
return results
Recipe: Multi-Script Avatar Batch
Generate multiple scripts with the same avatar for hook testing.
def multi_script_batch(scripts, creator_id, aspect_ratio="9:16"):
"""Generate multiple scripts with the same avatar."""
jobs = []
for script in scripts:
video = create_avatar_video(script, creator_id, aspect_ratio)
jobs.append({"script": script[:50], "video_id": video["id"]})
results = []
for job in jobs:
try:
result = poll_until_done(
f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
)
results.append({
"script_preview": job["script"],
"video_url": result.get("output") or result.get("video_output"),
"status": "done"
})
except Exception as e:
results.append({
"script_preview": job["script"],
"error": str(e),
"status": "failed"
})
return results
2.9 Credit Costs & Latency Reference
| Endpoint | Credits | Typical Latency |
|---|---|---|
| AI Avatar v1 | 5 per 30s | ~1:10 ratio (15s video ≈ 150s) |
| AI Avatar v2 (multi-scene) | 5 per 30s | ~2-5 minutes |
| Aurora | 5 per 30s | ~2-3 minutes |
| Text to Speech | 1 per 30s | ~30-60 seconds |
| Voice Cloning | Varies | Minutes |
| Custom Avatar Creation | Free (slot required) | 1-2 days |
| Preview (v1 or v2) | 1 per 30s | ~1-2 minutes |
| Render (v2) | 4 per 30s | ~2-3 minutes |
2.10 Decision Matrix
| I want to... | Use this | Credits |
|---|---|---|
| Quick single-avatar video | AI Avatar v1 | 5/30s |
| Multi-scene video with transitions | AI Avatar v2 | 5/30s |
| Best possible lip sync quality | Aurora | 5/30s |
| Just generate audio narration | Text to Speech | 1/30s |
| Use my own face/person | Custom Avatar | Free (slot) |
| Use my own voice | Voice Cloning | Varies |
| Avatar over custom background | Transparent + overlay | 5/30s |
| A/B test 5 avatar styles | Batch Avatar v1 x5 | 25/30s |
See Also
- video-ad-generator — Product URL → video ad pipeline
- ai-ad-prompt-guide — Battle-tested prompting for AI ad creative
- ad-creative-evaluator — Score any video ad with AI expert panel
- video-ad-reverse-engineer — Reverse-engineer competitor ads
- static-ad-concept-generator — 320+ proven ad concept templates