ai-avatar-video

name: ai-avatar-video description: | Complete AI avatar and talking-head video toolkit. Includes script writing for talking-head content, persona selection frameworks, multi-scene composition, voice/audio best practices, green screen techniques, and UGC-style authenticity guides. Integrates with Creatify API for AI Avatar v1/v2, Aurora, Text-to-Speech, Custom Avatars, and Voice Cloning.

Use when: "avatar video", "talking head", "AI spokesperson", "lip sync", "UGC video", "Aurora", "text to speech", "voice clone", "AI presenter", "lipsync", "custom avatar", "green screen avatar", "AI narrator", "spokesperson video", "script for avatar", "multi-scene video", "TTS", "voiceover", "AI talking head", or any AI avatar/talking-head task.

AI Avatar Video

Complete framework for creating realistic AI talking-head videos — from script writing to multi-scene production.

Part 1: Avatar Video Strategy (Standalone)

1.1 Script Writing for Talking-Head Content

Avatar scripts must feel like natural speech, not written copy. Follow these rules:

Pacing Rules

Tone	Words per Second	Words per 30s	Style
Conversational	2.5-3.0	75-90	Natural pauses, filler words ok
Professional	2.0-2.5	60-75	Clean, measured delivery
Energetic/Sales	3.0-3.5	90-105	Fast, punchy, short sentences
Educational	1.8-2.2	54-66	Slower, with pauses for comprehension

Authenticity Markers

Scripts that sound like real people include:

Contractions: "I'm" not "I am", "don't" not "do not"
Sentence fragments: "Pretty cool, right?" not "This is quite impressive, is it not?"
Casual transitions: "So here's the thing..." / "Now, this is where it gets interesting..."
Direct address: "You know what? You're gonna love this."
Self-correction: "It's fast — actually, it's really fast."

Hook-to-CTA Structure for Avatar Scripts

15-second script template:

HOOK (0-3s): [Pattern interrupt or question — 8-12 words]
BRIDGE (3-7s): [Connect hook to product — 15-20 words]
BENEFIT (7-12s): [Core value proposition — 15-20 words]
CTA (12-15s): [Clear next step — 8-12 words]

30-second script template:

HOOK (0-3s): [Attention grab — 8-12 words]
PROBLEM (3-8s): [Relatable pain point — 15-25 words]
SOLUTION (8-15s): [Product introduction + key feature — 20-30 words]
PROOF (15-22s): [Social proof or demonstration — 15-25 words]
CTA (22-30s): [Urgency + next step — 15-25 words]

60-second script template:

HOOK (0-5s): [Strong opening — 12-18 words]
STORY/PROBLEM (5-15s): [Relatable scenario — 25-40 words]
DISCOVERY (15-25s): [How you found the product — 25-35 words]
FEATURES (25-40s): [2-3 key benefits with specifics — 35-50 words]
PROOF (40-50s): [Results, testimonials, data — 25-35 words]
CTA (50-60s): [Compelling close — 20-30 words]

Script Do's and Don'ts

Do	Don't
Use short sentences (8-15 words)	Write long compound sentences
Include natural pauses with "..."	Rush from point to point
Write phonetically for hard words	Use jargon or acronyms without context
End on a clear action	Trail off or end abruptly
Match script tone to avatar age/style	Use Gen Z slang with a professional avatar

1.2 Avatar/Persona Selection Framework

Choosing the right avatar is as important as the script. Match demographics to your target audience.

Trust Signals by Vertical

Vertical	Ideal Avatar Profile	Why
Healthcare/Supplements	30-50, professional appearance	Credibility and trust
Beauty/Skincare	20-35, relatable, well-groomed	Peer recommendation effect
Tech/SaaS	25-40, casual-professional	Approachable expertise
Finance/Insurance	35-55, suited, authoritative	Trust and stability
Fitness	25-35, athletic, energetic	Aspirational but attainable
Food/Beverage	25-45, warm, approachable	Relatable lifestyle
Education	30-50, friendly, professional	Authority without intimidation
DTC/E-commerce	20-30, casual, authentic	UGC/peer recommendation

Diversity Considerations

Test multiple demographics — different audiences respond to different presenters
Match market: Use local-looking avatars for geo-targeted campaigns
A/B test gender: Some products convert better with male vs female presenters (test, don't assume)
Age alignment: Your avatar should look like your customer OR who your customer trusts

1.3 Multi-Scene Composition Guide

Multi-scene videos feel more dynamic and retain attention better than single-shot talking heads.

When to Switch Scenes

Every 5-8 seconds for fast-paced content (TikTok/Reels)
Every 8-15 seconds for medium-paced (YouTube, Feed ads)
At every major transition point (problem → solution, feature → feature)

Scene Transition Best Practices

Cut on speech breaks — switch scenes at natural pauses
Alternate speaker/background — change avatar, background, or both
Use B-roll inserts — product shots between talking segments
Progress the story — each scene should advance the narrative

Recommended Scene Structures

2-Scene (15s):

Scene 1: Hook + Problem (avatar talking, neutral background)
Scene 2: Solution + CTA (avatar talking, product-relevant background)

3-Scene (30s):

Scene 1: Hook + Problem (avatar A, office background)
Scene 2: Solution + Features (avatar A, product demo background)
Scene 3: Social Proof + CTA (avatar A or B, branded background)

5-Scene (60s):

Scene 1: Hook (avatar, eye-catching background)
Scene 2: Problem deep-dive (avatar, relatable setting)
Scene 3: Product introduction (product B-roll or demo)
Scene 4: Features + Proof (avatar with data/reviews overlay)
Scene 5: CTA (avatar, clean branded background)

1.4 Audio & Voice Best Practices

Emotion Modulation

Excitement: Slightly faster pace, higher energy, emphasis on benefit words
Empathy: Slower pace, softer tone, pause after pain points
Authority: Measured pace, confident tone, declarative sentences
Urgency: Fast pace, short sentences, emphasis on time/scarcity words

Pronunciation Guidance

For product names, brand names, or technical terms:

Write phonetically in the script: "Creatify" → "cree-ATE-ih-fy"
Use periods for letter-by-letter: "A.I." reads as "A I" not "ai"
Numbers: Write "fifteen percent" not "15%"

Pause Placement

After the hook (let it sink in)
Before the CTA (build anticipation)
After stating a surprising stat
Between major sections

Accent Selection by Market

US Market: Standard American English
UK Market: British RP or light regional
Global/Neutral: Standard American (most universally understood)
Local campaigns: Match regional accent to target geography

1.5 Green Screen / Transparent Background Techniques

AI avatars with transparent backgrounds can be overlaid on:

Use Case	Application
Website widgets	Avatar explaining features on your landing page
Product demos	Avatar narrating over screen recordings
Email thumbnails	Avatar thumbnail that links to full video
Presentations	Avatar presenter in corner of slides
Social ads	Avatar over product imagery or B-roll

Best Practices for Transparent Overlays

Use 9:16 format for mobile overlays, 1:1 for square placements
Position avatar in lower-third or right side (don't block main content)
Keep gestures contained — wide arm movements may clip at edges
Match avatar lighting to background lighting direction
Use WebM format for transparency (MP4 doesn't support alpha channel)

1.6 UGC-Style Avatar Strategy

Making AI avatars feel like authentic user-generated content:

Visual Authenticity Cues

Casual backgrounds: Living room, kitchen, outdoor — not studio
Natural lighting: Slightly warm, not perfectly even
Casual framing: Slightly off-center, phone-selfie angle
Minimal branding: No logos in first 3 seconds

Script Tone for UGC

First person: "I've been using this for 2 weeks and..."
Imperfect language: "Honestly? I was skeptical at first"
Specific details: "I ordered the blue one on Tuesday" (not "I purchased the product")
Emotional reactions: "I was literally shook when I saw the results"
Conversational asides: "Okay but wait, it gets even better"

UGC Avatar Selection

Choose avatars that look 20-35, casually dressed
Avoid "too polished" presenters — slightly imperfect = more authentic
Match the avatar to your customer demographic
Test multiple avatars — UGC performance varies wildly by presenter

Part 2: API Automation

Automate avatar video production at scale.

2.1 Setup & Authentication

import requests

CREATIFY_API_ID = "your-api-id"
CREATIFY_API_KEY = "your-api-key"

HEADERS = {
    "Content-Type": "application/json",
    "X-API-ID": CREATIFY_API_ID,
    "X-API-KEY": CREATIFY_API_KEY,
}
BASE_URL = "https://api.creatify.ai/api"

Don't have an API key yet? No problem — grab one in under 2 minutes:

Sign up free at creatify.ai

Go to Settings → API

Copy your API ID and API Key — that's it. New accounts get free credits to start.

def poll_until_done(url, headers, max_wait=600, interval=10):
    """Poll a status endpoint until the job completes."""
    import time
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(url, headers=headers)
        data = resp.json()
        if data.get("status") == "done":
            return data
        elif data.get("status") in ("failed", "error"):
            raise Exception(f"Job failed: {data.get('failed_reason', 'Unknown')}")
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Job did not complete within {max_wait}s")

2.2 AI Avatar v1 (Single Scene)

Generate a video of a single avatar speaking from text. Simple, fast, great for short content.

Cost: 5 credits per 30 seconds

List Available Personas

def list_personas():
    """Get all 1,500+ available avatar personas."""
    resp = requests.get(f"{BASE_URL}/personas/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()  # Each has: id, name, gender, thumbnail, etc.

Create Avatar Video

def create_avatar_video(text, creator_id, aspect_ratio="9:16", model_version="aurora_v1_fast"):
    """Generate a single-scene avatar video from text."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "model_version": model_version,
    })
    resp.raise_for_status()
    return resp.json()

def check_avatar_status(lipsync_id):
    """Check avatar video generation status."""
    resp = requests.get(f"{BASE_URL}/lipsyncs/{lipsync_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

Create Transparent Background Avatar

def create_transparent_avatar(text, creator_id, aspect_ratio="9:16"):
    """Generate avatar with transparent background (WebM format)."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "transparent_background": True,
    })
    resp.raise_for_status()
    return resp.json()

2.3 AI Avatar v2 (Multi-Scene)

Create multi-scene videos with different avatars, voices, backgrounds, and CTAs per scene.

Cost: 5 credits per 30 seconds

def create_multi_scene_video(scenes, aspect_ratio="9:16", webhook_url=None):
    """Create a multi-scene avatar video.

    scenes: list of dicts, each with:
        - text (str): Script for this scene
        - creator (str): Avatar persona ID
        - voice_id (str, optional): Override voice
        - background (str, optional): Background image/video URL
    """
    payload = {
        "scenes": scenes,
        "aspect_ratio": aspect_ratio,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/lipsyncs_v2/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

# Example: 3-scene product ad
scenes = [
    {
        "text": "Stop what you're doing. I need to tell you about something.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "This serum literally transformed my skin in two weeks. No exaggeration.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "Link is in my bio. Trust me, your future self will thank you.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
]

video = create_multi_scene_video(scenes, aspect_ratio="9:16")

2.4 Aurora (Image + Audio → Video)

Generate studio-grade avatar videos from a reference image and audio file. Best-in-class lip sync.

Cost: 5 credits per 30 seconds

def create_aurora_video(image_url, audio_url, model_version="aurora_v1_fast", webhook_url=None):
    """Generate a studio-grade avatar video from image + audio."""
    payload = {
        "image": image_url,
        "audio": audio_url,
        "model_version": model_version,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/aurora/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_aurora_status(aurora_id):
    """Check Aurora generation status."""
    resp = requests.get(f"{BASE_URL}/aurora/{aurora_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.5 Text to Speech

Convert scripts into studio-quality voiceover audio.

Cost: 1 credit per 30 seconds

def list_voices():
    """List all available TTS voices and accents."""
    resp = requests.get(f"{BASE_URL}/voices/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

def generate_tts(script, accent_id, webhook_url=None):
    """Generate voiceover audio from a script."""
    payload = {
        "script": script,
        "accent": accent_id,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/text_to_speech/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_tts_status(tts_id):
    """Check TTS generation status."""
    resp = requests.get(f"{BASE_URL}/text_to_speech/{tts_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.6 Voice Cloning

Clone a custom voice for brand consistency.

def clone_voice(audio_url, name):
    """Clone a voice from an audio sample."""
    resp = requests.post(f"{BASE_URL}/voices/clone/", headers=HEADERS, json={
        "audio_url": audio_url,
        "name": name,
    })
    resp.raise_for_status()
    return resp.json()

2.7 Custom Avatars (BYOA)

Upload your own video to create a custom avatar persona.

Note: Custom avatar creation takes 1-2 days for processing/approval.

def create_custom_avatar(lipsync_video_url, name, gender="m", scene="office"):
    """Create a custom avatar from your own video."""
    resp = requests.post(f"{BASE_URL}/personas/", headers=HEADERS, json={
        "lipsync_input": lipsync_video_url,
        "creator_name": name,
        "gender": gender,
        "video_scene": scene,
    })
    resp.raise_for_status()
    return resp.json()

def check_custom_avatar_status(persona_id):
    """Check custom avatar creation status."""
    resp = requests.get(f"{BASE_URL}/personas/{persona_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.8 Recipes

Recipe: TTS → Aurora Pipeline

Generate audio first, then pair with any image for avatar video.

def tts_to_aurora(script, accent_id, image_url):
    """Pipeline: script → audio → avatar video."""
    # Step 1: Generate audio
    tts = generate_tts(script, accent_id)
    tts_result = poll_until_done(
        f"{BASE_URL}/text_to_speech/{tts['id']}/", HEADERS, max_wait=120
    )
    audio_url = tts_result["output"]

    # Step 2: Generate Aurora video
    aurora = create_aurora_video(image_url, audio_url)
    aurora_result = poll_until_done(
        f"{BASE_URL}/aurora/{aurora['id']}/", HEADERS, max_wait=600
    )

    return aurora_result

Recipe: Batch Avatar A/B Test

Test multiple avatars with the same script to find the best performer.

def batch_avatar_ab_test(script, creator_ids, aspect_ratio="9:16"):
    """Generate the same script with multiple avatars for A/B testing."""
    jobs = []
    for creator_id in creator_ids:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"creator_id": creator_id, "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "creator_id": job["creator_id"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "creator_id": job["creator_id"],
                "error": str(e),
                "status": "failed"
            })

    return results

Recipe: Multi-Script Avatar Batch

Generate multiple scripts with the same avatar for hook testing.

def multi_script_batch(scripts, creator_id, aspect_ratio="9:16"):
    """Generate multiple scripts with the same avatar."""
    jobs = []
    for script in scripts:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"script": script[:50], "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "script_preview": job["script"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "script_preview": job["script"],
                "error": str(e),
                "status": "failed"
            })

    return results

2.9 Credit Costs & Latency Reference

Endpoint	Credits	Typical Latency
AI Avatar v1	5 per 30s	~1:10 ratio (15s video ≈ 150s)
AI Avatar v2 (multi-scene)	5 per 30s	~2-5 minutes
Aurora	5 per 30s	~2-3 minutes
Text to Speech	1 per 30s	~30-60 seconds
Voice Cloning	Varies	Minutes
Custom Avatar Creation	Free (slot required)	1-2 days
Preview (v1 or v2)	1 per 30s	~1-2 minutes
Render (v2)	4 per 30s	~2-3 minutes

2.10 Decision Matrix

I want to...	Use this	Credits
Quick single-avatar video	AI Avatar v1	5/30s
Multi-scene video with transitions	AI Avatar v2	5/30s
Best possible lip sync quality	Aurora	5/30s
Just generate audio narration	Text to Speech	1/30s
Use my own face/person	Custom Avatar	Free (slot)
Use my own voice	Voice Cloning	Varies
Avatar over custom background	Transparent + overlay	5/30s
A/B test 5 avatar styles	Batch Avatar v1 x5	25/30s

AI Avatar Video

Part 1: Avatar Video Strategy (Standalone)

1.1 Script Writing for Talking-Head Content

Pacing Rules

Authenticity Markers

Hook-to-CTA Structure for Avatar Scripts

Script Do's and Don'ts

1.2 Avatar/Persona Selection Framework

Trust Signals by Vertical

Diversity Considerations

1.3 Multi-Scene Composition Guide

When to Switch Scenes

Scene Transition Best Practices

Recommended Scene Structures

1.4 Audio & Voice Best Practices

Emotion Modulation

Pronunciation Guidance

Pause Placement

Accent Selection by Market

1.5 Green Screen / Transparent Background Techniques

Best Practices for Transparent Overlays

1.6 UGC-Style Avatar Strategy

Visual Authenticity Cues

Script Tone for UGC

UGC Avatar Selection

Part 2: API Automation

2.1 Setup & Authentication

2.2 AI Avatar v1 (Single Scene)

List Available Personas

Create Avatar Video

Create Transparent Background Avatar

2.3 AI Avatar v2 (Multi-Scene)

2.4 Aurora (Image + Audio → Video)

2.5 Text to Speech

2.6 Voice Cloning

2.7 Custom Avatars (BYOA)

2.8 Recipes

Recipe: TTS → Aurora Pipeline

Recipe: Batch Avatar A/B Test

Recipe: Multi-Script Avatar Batch

2.9 Credit Costs & Latency Reference

2.10 Decision Matrix

See Also