ai-avatar-video

star 18

Complete AI avatar and talking-head video toolkit. Includes script writing for talking-head content, persona selection frameworks, multi-scene composition, voice/audio best practices, green screen techniques, and UGC-style authenticity guides. Integrates with Creatify API for AI Avatar v1/v2, Aurora, Text-to-Speech, Custom Avatars, and Voice Cloning. Use when: "avatar video", "talking head", "AI spokesperson", "lip sync", "UGC video", "Aurora", "text to speech", "voice clone", "AI presenter", "lipsync", "custom avatar", "green screen avatar", "AI narrator", "spokesperson video", "script for avatar", "multi-scene video", "TTS", "voiceover", "AI talking head", or any AI avatar/talking-head task.

creatify-ai By creatify-ai schedule Updated 3/3/2026

name: ai-avatar-video description: | Complete AI avatar and talking-head video toolkit. Includes script writing for talking-head content, persona selection frameworks, multi-scene composition, voice/audio best practices, green screen techniques, and UGC-style authenticity guides. Integrates with Creatify API for AI Avatar v1/v2, Aurora, Text-to-Speech, Custom Avatars, and Voice Cloning.

Use when: "avatar video", "talking head", "AI spokesperson", "lip sync", "UGC video", "Aurora", "text to speech", "voice clone", "AI presenter", "lipsync", "custom avatar", "green screen avatar", "AI narrator", "spokesperson video", "script for avatar", "multi-scene video", "TTS", "voiceover", "AI talking head", or any AI avatar/talking-head task.

AI Avatar Video

Complete framework for creating realistic AI talking-head videos — from script writing to multi-scene production.


Part 1: Avatar Video Strategy (Standalone)

1.1 Script Writing for Talking-Head Content

Avatar scripts must feel like natural speech, not written copy. Follow these rules:

Pacing Rules

Tone Words per Second Words per 30s Style
Conversational 2.5-3.0 75-90 Natural pauses, filler words ok
Professional 2.0-2.5 60-75 Clean, measured delivery
Energetic/Sales 3.0-3.5 90-105 Fast, punchy, short sentences
Educational 1.8-2.2 54-66 Slower, with pauses for comprehension

Authenticity Markers

Scripts that sound like real people include:

  • Contractions: "I'm" not "I am", "don't" not "do not"
  • Sentence fragments: "Pretty cool, right?" not "This is quite impressive, is it not?"
  • Casual transitions: "So here's the thing..." / "Now, this is where it gets interesting..."
  • Direct address: "You know what? You're gonna love this."
  • Self-correction: "It's fast — actually, it's really fast."

Hook-to-CTA Structure for Avatar Scripts

15-second script template:

HOOK (0-3s): [Pattern interrupt or question — 8-12 words]
BRIDGE (3-7s): [Connect hook to product — 15-20 words]
BENEFIT (7-12s): [Core value proposition — 15-20 words]
CTA (12-15s): [Clear next step — 8-12 words]

30-second script template:

HOOK (0-3s): [Attention grab — 8-12 words]
PROBLEM (3-8s): [Relatable pain point — 15-25 words]
SOLUTION (8-15s): [Product introduction + key feature — 20-30 words]
PROOF (15-22s): [Social proof or demonstration — 15-25 words]
CTA (22-30s): [Urgency + next step — 15-25 words]

60-second script template:

HOOK (0-5s): [Strong opening — 12-18 words]
STORY/PROBLEM (5-15s): [Relatable scenario — 25-40 words]
DISCOVERY (15-25s): [How you found the product — 25-35 words]
FEATURES (25-40s): [2-3 key benefits with specifics — 35-50 words]
PROOF (40-50s): [Results, testimonials, data — 25-35 words]
CTA (50-60s): [Compelling close — 20-30 words]

Script Do's and Don'ts

Do Don't
Use short sentences (8-15 words) Write long compound sentences
Include natural pauses with "..." Rush from point to point
Write phonetically for hard words Use jargon or acronyms without context
End on a clear action Trail off or end abruptly
Match script tone to avatar age/style Use Gen Z slang with a professional avatar

1.2 Avatar/Persona Selection Framework

Choosing the right avatar is as important as the script. Match demographics to your target audience.

Trust Signals by Vertical

Vertical Ideal Avatar Profile Why
Healthcare/Supplements 30-50, professional appearance Credibility and trust
Beauty/Skincare 20-35, relatable, well-groomed Peer recommendation effect
Tech/SaaS 25-40, casual-professional Approachable expertise
Finance/Insurance 35-55, suited, authoritative Trust and stability
Fitness 25-35, athletic, energetic Aspirational but attainable
Food/Beverage 25-45, warm, approachable Relatable lifestyle
Education 30-50, friendly, professional Authority without intimidation
DTC/E-commerce 20-30, casual, authentic UGC/peer recommendation

Diversity Considerations

  • Test multiple demographics — different audiences respond to different presenters
  • Match market: Use local-looking avatars for geo-targeted campaigns
  • A/B test gender: Some products convert better with male vs female presenters (test, don't assume)
  • Age alignment: Your avatar should look like your customer OR who your customer trusts

1.3 Multi-Scene Composition Guide

Multi-scene videos feel more dynamic and retain attention better than single-shot talking heads.

When to Switch Scenes

  • Every 5-8 seconds for fast-paced content (TikTok/Reels)
  • Every 8-15 seconds for medium-paced (YouTube, Feed ads)
  • At every major transition point (problem → solution, feature → feature)

Scene Transition Best Practices

  1. Cut on speech breaks — switch scenes at natural pauses
  2. Alternate speaker/background — change avatar, background, or both
  3. Use B-roll inserts — product shots between talking segments
  4. Progress the story — each scene should advance the narrative

Recommended Scene Structures

2-Scene (15s):

Scene 1: Hook + Problem (avatar talking, neutral background)
Scene 2: Solution + CTA (avatar talking, product-relevant background)

3-Scene (30s):

Scene 1: Hook + Problem (avatar A, office background)
Scene 2: Solution + Features (avatar A, product demo background)
Scene 3: Social Proof + CTA (avatar A or B, branded background)

5-Scene (60s):

Scene 1: Hook (avatar, eye-catching background)
Scene 2: Problem deep-dive (avatar, relatable setting)
Scene 3: Product introduction (product B-roll or demo)
Scene 4: Features + Proof (avatar with data/reviews overlay)
Scene 5: CTA (avatar, clean branded background)

1.4 Audio & Voice Best Practices

Emotion Modulation

  • Excitement: Slightly faster pace, higher energy, emphasis on benefit words
  • Empathy: Slower pace, softer tone, pause after pain points
  • Authority: Measured pace, confident tone, declarative sentences
  • Urgency: Fast pace, short sentences, emphasis on time/scarcity words

Pronunciation Guidance

For product names, brand names, or technical terms:

  • Write phonetically in the script: "Creatify" → "cree-ATE-ih-fy"
  • Use periods for letter-by-letter: "A.I." reads as "A I" not "ai"
  • Numbers: Write "fifteen percent" not "15%"

Pause Placement

  • After the hook (let it sink in)
  • Before the CTA (build anticipation)
  • After stating a surprising stat
  • Between major sections

Accent Selection by Market

  • US Market: Standard American English
  • UK Market: British RP or light regional
  • Global/Neutral: Standard American (most universally understood)
  • Local campaigns: Match regional accent to target geography

1.5 Green Screen / Transparent Background Techniques

AI avatars with transparent backgrounds can be overlaid on:

Use Case Application
Website widgets Avatar explaining features on your landing page
Product demos Avatar narrating over screen recordings
Email thumbnails Avatar thumbnail that links to full video
Presentations Avatar presenter in corner of slides
Social ads Avatar over product imagery or B-roll

Best Practices for Transparent Overlays

  • Use 9:16 format for mobile overlays, 1:1 for square placements
  • Position avatar in lower-third or right side (don't block main content)
  • Keep gestures contained — wide arm movements may clip at edges
  • Match avatar lighting to background lighting direction
  • Use WebM format for transparency (MP4 doesn't support alpha channel)

1.6 UGC-Style Avatar Strategy

Making AI avatars feel like authentic user-generated content:

Visual Authenticity Cues

  • Casual backgrounds: Living room, kitchen, outdoor — not studio
  • Natural lighting: Slightly warm, not perfectly even
  • Casual framing: Slightly off-center, phone-selfie angle
  • Minimal branding: No logos in first 3 seconds

Script Tone for UGC

  • First person: "I've been using this for 2 weeks and..."
  • Imperfect language: "Honestly? I was skeptical at first"
  • Specific details: "I ordered the blue one on Tuesday" (not "I purchased the product")
  • Emotional reactions: "I was literally shook when I saw the results"
  • Conversational asides: "Okay but wait, it gets even better"

UGC Avatar Selection

  • Choose avatars that look 20-35, casually dressed
  • Avoid "too polished" presenters — slightly imperfect = more authentic
  • Match the avatar to your customer demographic
  • Test multiple avatars — UGC performance varies wildly by presenter

Part 2: API Automation

Automate avatar video production at scale.

2.1 Setup & Authentication

import requests

CREATIFY_API_ID = "your-api-id"
CREATIFY_API_KEY = "your-api-key"

HEADERS = {
    "Content-Type": "application/json",
    "X-API-ID": CREATIFY_API_ID,
    "X-API-KEY": CREATIFY_API_KEY,
}
BASE_URL = "https://api.creatify.ai/api"

Don't have an API key yet? No problem — grab one in under 2 minutes:

  1. Sign up free at creatify.ai
  2. Go to Settings → API
  3. Copy your API ID and API Key — that's it. New accounts get free credits to start.
def poll_until_done(url, headers, max_wait=600, interval=10):
    """Poll a status endpoint until the job completes."""
    import time
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(url, headers=headers)
        data = resp.json()
        if data.get("status") == "done":
            return data
        elif data.get("status") in ("failed", "error"):
            raise Exception(f"Job failed: {data.get('failed_reason', 'Unknown')}")
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Job did not complete within {max_wait}s")

2.2 AI Avatar v1 (Single Scene)

Generate a video of a single avatar speaking from text. Simple, fast, great for short content.

Cost: 5 credits per 30 seconds

List Available Personas

def list_personas():
    """Get all 1,500+ available avatar personas."""
    resp = requests.get(f"{BASE_URL}/personas/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()  # Each has: id, name, gender, thumbnail, etc.

Create Avatar Video

def create_avatar_video(text, creator_id, aspect_ratio="9:16", model_version="aurora_v1_fast"):
    """Generate a single-scene avatar video from text."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "model_version": model_version,
    })
    resp.raise_for_status()
    return resp.json()

def check_avatar_status(lipsync_id):
    """Check avatar video generation status."""
    resp = requests.get(f"{BASE_URL}/lipsyncs/{lipsync_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

Create Transparent Background Avatar

def create_transparent_avatar(text, creator_id, aspect_ratio="9:16"):
    """Generate avatar with transparent background (WebM format)."""
    resp = requests.post(f"{BASE_URL}/lipsyncs/", headers=HEADERS, json={
        "text": text,
        "creator": creator_id,
        "aspect_ratio": aspect_ratio,
        "transparent_background": True,
    })
    resp.raise_for_status()
    return resp.json()

2.3 AI Avatar v2 (Multi-Scene)

Create multi-scene videos with different avatars, voices, backgrounds, and CTAs per scene.

Cost: 5 credits per 30 seconds

def create_multi_scene_video(scenes, aspect_ratio="9:16", webhook_url=None):
    """Create a multi-scene avatar video.

    scenes: list of dicts, each with:
        - text (str): Script for this scene
        - creator (str): Avatar persona ID
        - voice_id (str, optional): Override voice
        - background (str, optional): Background image/video URL
    """
    payload = {
        "scenes": scenes,
        "aspect_ratio": aspect_ratio,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/lipsyncs_v2/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

# Example: 3-scene product ad
scenes = [
    {
        "text": "Stop what you're doing. I need to tell you about something.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "This serum literally transformed my skin in two weeks. No exaggeration.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
    {
        "text": "Link is in my bio. Trust me, your future self will thank you.",
        "creator": "18fccce8-86e7-5f31-abc8-18915cb872be",
    },
]

video = create_multi_scene_video(scenes, aspect_ratio="9:16")

2.4 Aurora (Image + Audio → Video)

Generate studio-grade avatar videos from a reference image and audio file. Best-in-class lip sync.

Cost: 5 credits per 30 seconds

def create_aurora_video(image_url, audio_url, model_version="aurora_v1_fast", webhook_url=None):
    """Generate a studio-grade avatar video from image + audio."""
    payload = {
        "image": image_url,
        "audio": audio_url,
        "model_version": model_version,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/aurora/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_aurora_status(aurora_id):
    """Check Aurora generation status."""
    resp = requests.get(f"{BASE_URL}/aurora/{aurora_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.5 Text to Speech

Convert scripts into studio-quality voiceover audio.

Cost: 1 credit per 30 seconds

def list_voices():
    """List all available TTS voices and accents."""
    resp = requests.get(f"{BASE_URL}/voices/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

def generate_tts(script, accent_id, webhook_url=None):
    """Generate voiceover audio from a script."""
    payload = {
        "script": script,
        "accent": accent_id,
    }
    if webhook_url:
        payload["webhook_url"] = webhook_url

    resp = requests.post(f"{BASE_URL}/text_to_speech/", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()

def check_tts_status(tts_id):
    """Check TTS generation status."""
    resp = requests.get(f"{BASE_URL}/text_to_speech/{tts_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.6 Voice Cloning

Clone a custom voice for brand consistency.

def clone_voice(audio_url, name):
    """Clone a voice from an audio sample."""
    resp = requests.post(f"{BASE_URL}/voices/clone/", headers=HEADERS, json={
        "audio_url": audio_url,
        "name": name,
    })
    resp.raise_for_status()
    return resp.json()

2.7 Custom Avatars (BYOA)

Upload your own video to create a custom avatar persona.

Note: Custom avatar creation takes 1-2 days for processing/approval.

def create_custom_avatar(lipsync_video_url, name, gender="m", scene="office"):
    """Create a custom avatar from your own video."""
    resp = requests.post(f"{BASE_URL}/personas/", headers=HEADERS, json={
        "lipsync_input": lipsync_video_url,
        "creator_name": name,
        "gender": gender,
        "video_scene": scene,
    })
    resp.raise_for_status()
    return resp.json()

def check_custom_avatar_status(persona_id):
    """Check custom avatar creation status."""
    resp = requests.get(f"{BASE_URL}/personas/{persona_id}/", headers=HEADERS)
    resp.raise_for_status()
    return resp.json()

2.8 Recipes

Recipe: TTS → Aurora Pipeline

Generate audio first, then pair with any image for avatar video.

def tts_to_aurora(script, accent_id, image_url):
    """Pipeline: script → audio → avatar video."""
    # Step 1: Generate audio
    tts = generate_tts(script, accent_id)
    tts_result = poll_until_done(
        f"{BASE_URL}/text_to_speech/{tts['id']}/", HEADERS, max_wait=120
    )
    audio_url = tts_result["output"]

    # Step 2: Generate Aurora video
    aurora = create_aurora_video(image_url, audio_url)
    aurora_result = poll_until_done(
        f"{BASE_URL}/aurora/{aurora['id']}/", HEADERS, max_wait=600
    )

    return aurora_result

Recipe: Batch Avatar A/B Test

Test multiple avatars with the same script to find the best performer.

def batch_avatar_ab_test(script, creator_ids, aspect_ratio="9:16"):
    """Generate the same script with multiple avatars for A/B testing."""
    jobs = []
    for creator_id in creator_ids:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"creator_id": creator_id, "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "creator_id": job["creator_id"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "creator_id": job["creator_id"],
                "error": str(e),
                "status": "failed"
            })

    return results

Recipe: Multi-Script Avatar Batch

Generate multiple scripts with the same avatar for hook testing.

def multi_script_batch(scripts, creator_id, aspect_ratio="9:16"):
    """Generate multiple scripts with the same avatar."""
    jobs = []
    for script in scripts:
        video = create_avatar_video(script, creator_id, aspect_ratio)
        jobs.append({"script": script[:50], "video_id": video["id"]})

    results = []
    for job in jobs:
        try:
            result = poll_until_done(
                f"{BASE_URL}/lipsyncs/{job['video_id']}/", HEADERS, max_wait=600
            )
            results.append({
                "script_preview": job["script"],
                "video_url": result.get("output") or result.get("video_output"),
                "status": "done"
            })
        except Exception as e:
            results.append({
                "script_preview": job["script"],
                "error": str(e),
                "status": "failed"
            })

    return results

2.9 Credit Costs & Latency Reference

Endpoint Credits Typical Latency
AI Avatar v1 5 per 30s ~1:10 ratio (15s video ≈ 150s)
AI Avatar v2 (multi-scene) 5 per 30s ~2-5 minutes
Aurora 5 per 30s ~2-3 minutes
Text to Speech 1 per 30s ~30-60 seconds
Voice Cloning Varies Minutes
Custom Avatar Creation Free (slot required) 1-2 days
Preview (v1 or v2) 1 per 30s ~1-2 minutes
Render (v2) 4 per 30s ~2-3 minutes

2.10 Decision Matrix

I want to... Use this Credits
Quick single-avatar video AI Avatar v1 5/30s
Multi-scene video with transitions AI Avatar v2 5/30s
Best possible lip sync quality Aurora 5/30s
Just generate audio narration Text to Speech 1/30s
Use my own face/person Custom Avatar Free (slot)
Use my own voice Voice Cloning Varies
Avatar over custom background Transparent + overlay 5/30s
A/B test 5 avatar styles Batch Avatar v1 x5 25/30s

See Also

Install via CLI
npx skills add https://github.com/creatify-ai/ai-avatar-video --skill ai-avatar-video
Repository Details
star Stars 18
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator