24-ai-avatar-production-global

star 458

AI Avatar production pipeline for global markets — 3-tier tools (Free/Pro/Enterprise), 4 workflows (single avatar, translate, batch, hybrid), voice clone, anti-detection, QA Score 100. Has 4 region variants for DISCLOSURE LAW (US FTC, EU AI Act, SEA per country, LATAM mixed). Tools: HeyGen, Synthesia, ElevenLabs, Captions, Rask AI. Trigger: 'AI avatar', 'HeyGen', 'Synthesia', 'avatar AI video', 'talking head AI', 'AI video translate', 'batch AI video'.

minhnv0807 By minhnv0807 schedule Updated 5/11/2026

name: 24-ai-avatar-production-global description: "AI Avatar production pipeline for global markets — 3-tier tools (Free/Pro/Enterprise), 4 workflows (single avatar, translate, batch, hybrid), reference image intake for avatar prompts, face/style/logo/palette replacement workflows, voice clone, anti-detection, QA Score 100. Has 4 region variants for DISCLOSURE LAW (US FTC, EU AI Act, SEA per country, LATAM mixed). Tools: HeyGen, Synthesia, ElevenLabs, Captions, Rask AI. Trigger: 'AI avatar', 'HeyGen', 'Synthesia', 'avatar AI video', 'talking head AI', 'AI video translate', 'batch AI video', 'avatar reference image', 'AI avatar prompt', 'replace avatar face'." metadata: version: 1.1.0 category: content license: MIT triggers: - "AI avatar" - "HeyGen" - "Synthesia" - "avatar AI video" - "talking head AI" - "AI video translate" - "batch AI video" - "avatar reference image" - "AI avatar prompt" - "replace avatar face" related: - 25-voice-clone-podcast-global - 04-script-video-global - 26-thought-leadership-content-global - references/ai-video-disclosure-global - references/voice-clone-prompts-global

AI Avatar Production (Global) — Pipeline 3-Tier, 4 Workflows, QA Score 100

Flagship skill of the AI Content cluster. Covers the full pipeline from zero to publish, voice clone, anti-detection, and region-specific disclosure law.


For newbies

What is an AI Avatar?

An AI Avatar is a video that shows your face (or a stand-in) but uses AI-generated voice and motion. You provide one photo or a short selfie video; the AI produces a final video with natural-looking speech, gestures, and expressions. No filming crew, no studio, no actor required.

What do you need to start?

Method Requirement Quality
Portrait photo 1 forward-facing photo, clean background, 1024x1024+ Medium — mouth less natural
Selfie video 30s video, looking at the lens, speaking naturally Good — better lipsync
Custom avatar 2-5 min recording with teleprompter + lavalier mic Excellent — near photo-real

Minimum gear: Phone with HD front camera + lavalier mic (or headset mic).

How long does it take?

  • One single video (60s): 30-60 min (script + render)
  • Batch of 10: 1-2 days
  • Batch of 30: 4-5 days (with optimized process)

What does it cost?

Tier USD/month Output
Free $0 1-3 videos, watermark
Pro $30-100 10-30 videos, no watermark
Enterprise $200-500+ 30+ videos, custom avatar, API

5 common newbie mistakes

  1. Lipsync drift: Script too fast or voice mismatch -> slow speech 10-15%, use voice clone instead of default voice.
  2. Voice doesn't sound like you: Sample too short or noisy -> re-record 3-5 minutes in a quiet room with phonetically varied script.
  3. Video flagged as "AI content": Platform pattern detection -> see Anti-detection section below.
  4. Blurry / pixelated output: Low-quality input -> use 1024x1024+ photo, natural lighting, no filters.
  5. Slow render: Free tier queue -> render off-peak (early morning in your timezone = US night) or upgrade to Pro.

Information collection (4 questions max)

Ask up to 4 questions before starting:

  1. Primary use case? Brand awareness / Sales / Education / Internal training?
  2. Primary platform? TikTok / YouTube / Facebook / Instagram / LinkedIn / X / Threads?
  3. Budget tier? Free ($0) / Pro ($30-100/mo) / Enterprise ($200+/mo)?
  4. Videos per month target? 1-5 / 10-30 / 30+?

Based on the 4 answers, auto-select Tier + Workflow. If the user has already uploaded reference images, do not ask a long intake form first; classify the images, create the setup/prompt, then ask only for missing assets.


Tier decision — Tools and pricing

Tier Suggested tool Price/month Quality Limit Fits
Free Captions Free, HeyGen Trial, D-ID Trial $0 6/10 — watermark, limited duration 1-5 videos, max 60s/video Personal test, new freelancers
Pro HeyGen Creator ($29), Synthesia Starter ($29), ElevenLabs Pro ($22) $30-100 8/10 — no watermark, HD 10-30 videos, max 5 min/video SME, small agency, content creator
Enterprise HeyGen Business ($89+), Synthesia Enterprise (custom) $200-500+ 9.5/10 — custom avatar, API, priority render 30+ videos, unlimited Large agency, large brand, e-learning

Quick recommendations:

  • Just starting: HeyGen Trial (1 video free, full experience)
  • Serious but budget-limited: Captions Pro ($10/mo) for lipsync + ElevenLabs Starter ($5) for voice
  • Scale fast: HeyGen Creator + ElevenLabs Pro = best price/quality combo
  • Enterprise: Synthesia Enterprise + ElevenLabs Scale

Workflow 1: Single Avatar Production

One video, end-to-end in 30-60 minutes.

6-step process

Step Task Tool Time
1. Script 150-300 words for a 60s video Skill 04-script-video-global 10 min
2. Voice Generate or use voice clone ElevenLabs / HeyGen Voice 5 min
3. Avatar Pick stock avatar or upload your media HeyGen / Synthesia / D-ID 3 min
4. Render Combine voice + avatar, choose background, gestures Tool from step 3 5-15 min (render)
5. QA QA Score 100 review (see section below) Manual review 5 min
6. Publish Export MP4 -> post to platform Manual / Scheduler 2 min

Script template for AI Avatar (60s)

[HOOK — 3s] Curiosity hook, frame the problem
[PROBLEM — 10s] Describe the customer pain
[SOLUTION — 25s] Your solution, 2-3 key points
[PROOF — 12s] Numbers, testimonial, result
[CTA — 10s] Concrete action: "Link in bio for..."

Workflow 2: Multi-language translate

One source video -> many languages for global rollout. Use cases: DTC brand expanding markets, multi-language courses, multi-country agency work.

Tool comparison

Tool Languages Price Notes
Rask AI 130+ $50/mo (Pro) Best for translate today
HeyGen Translate 40+ Included Creator+ Built-in, convenient
Synthesia Translate 35+ Included Enterprise Best for e-learning

Process

  1. Create source video (Workflow 1)
  2. Upload to translate tool (Rask AI recommended)
  3. Pick target language — tool auto-translates and lipsyncs
  4. Review with a native speaker
  5. Export and publish per market

Caveat: Tonal languages (Mandarin, Vietnamese, Thai) have weaker lipsync. Workaround: produce native voice clone + native avatar per language.

See full disclosure law per region in the variant files.


Workflow 3: Batch Production

30 videos in 5 days — assembly-line process.

Detailed timeline

Day Task Output Tool
Day 1 Script batch — write 10 scripts from template 10 scripts (.md) Skill 04-script-video-global + AI assist
Day 2 Voice batch — render 10 audio files 10 audio (.mp3) ElevenLabs API
Day 3 Avatar batch — upload audio + avatar, queue render 10 videos rendering HeyGen Batch / Synthesia
Day 4 QA batch — review 10 videos, fix issues, re-render 10 QA'd videos Manual + QA Score
Day 5 Publish batch — export, add captions, schedule 10 videos published Buffer / Later / Manual

Repeat 3 weeks = 30 videos. Or scale Days 1-2 to 15 scripts/week.

Cost estimate batch 30 videos/month

Tier Tool combo Monthly cost Per-video cost
Free HeyGen Trial + Captions Free $0 (limited 3-5 videos) $0 (watermark)
Pro HeyGen Creator + ElevenLabs Pro ~$51 ~$1.70
Enterprise HeyGen Business + ElevenLabs Scale ~$189 ~$6.30

Batch optimization tips

  • Templated scripts: 3-5 frameworks, swap the core content
  • Voice consistency: One voice clone for the entire series
  • Off-peak rendering: Queue overnight to skip the queue
  • QA checklist: Print the QA Score, check videos like an assembly line

Workflow 4: Hybrid Real + AI

Real face for trust + AI body for speed.

Use cases

  • Real face intro 5s + AI body 55s (save filming time)
  • AI video weekdays + Real video weekly (balance quality/effort)
  • Real talking head + AI B-roll (studio-grade output)

Assembly + tools

  1. Film real intro 5-10s (eye contact, natural greeting); use Captions for lipsync fixes
  2. Create AI for the rest with same outfit/background (HeyGen / Synthesia)
  3. Edit in CapCut / Premiere (precise cuts, smooth transitions)
  4. Color match AI to real footage (LUT or DaVinci Resolve free)

Trust gain: Real face up front -> 20-35% more engagement than full-AI.


Voice Clone Protocol

Voice sample requirements

Criterion Requirement
Duration 3-5 minutes
Quality WAV/FLAC, 44.1kHz+, mono, quiet room
Script content Phonetically varied passages (all vowels, hard consonants)
Emotion Read normal, natural, not acted

Tool comparison

Tool Price Quality Notes
ElevenLabs From $5/mo 9/10 Best overall, 30+ languages
HeyGen Voice Included Creator+ 6/10 Convenient if using HeyGen
Resemble AI From $99/mo 7/10 Strong API
PlayHT From $39/mo 7/10 Good for narration

Consent form template

MANDATORY before cloning anyone's voice.

VOICE USAGE CONSENT

I, [FULL NAME], consent to [COMPANY] using my voice for: [SPECIFIC PURPOSE].
Term: [X months / Until revoked]
Date: [YYYY-MM-DD]
Signature: _______________

Reference: See references/voice-clone-prompts-global.md


Avatar Setup Checklist

Before recording / uploading photo or video for an AI avatar:

  • Lighting: Natural light or softbox; no harsh shadows on the face
  • Background: Solid (white / gray) or real environment (office, store)
  • Wardrobe: On-brand; avoid small busy patterns (AI moire)
  • Framing: Chest up; eyes on the upper-third line
  • Eye contact: Look directly at the lens (not the screen)
  • Gestures: Natural; hands can rest or do light gestures
  • Resolution: Minimum 1080p (1920x1080); 4K preferred
  • Aspect ratio: 9:16 (TikTok / Reels), 16:9 (YouTube), 1:1 (Feed)
  • File format: MP4 (H.264) for video, PNG / JPG for photo
  • Backup: Keep originals on cloud (Google Drive / OneDrive) before uploading to the tool

Reference Image -> Avatar Prompt Director

Use this when the user drops one or more reference images and wants to create an avatar, replace a face, adapt brand colors, add a logo, or create the prompt before uploading assets into a tool.

Classify Input Images

Image type Role Requirement
Style ref Mood, lighting, background, outfit, camera angle Do not use as identity unless requested
Face ref Identity preservation / face replacement 1-3 clear face images, no filter, front + 3/4 angle
Selfie video Better custom avatar / natural lipsync 30s-2 min, looking at camera, speaking naturally
Logo/palette Personal/company brand adaptation PNG/SVG logo + 2-4 hex colors
Product/location Prop or avatar environment Clear product label or location/background image

Multiple Images = Multiple Flows

## Avatar Flows

| Flow | Input image | Role | Suggested tool | Missing assets |
|------|-------------|------|----------------|----------------|
| A | style-01 | style/background | Design Master -> HeyGen | face ref, logo |
| B | face-01 | identity | HeyGen custom avatar | script, voice sample |
  • If every image is a different style direction, create a separate prompt for each flow.
  • If images support one avatar, group by role: style + face + logo + palette + product.
  • Ask for each next asset explicitly: face image, selfie video, logo, hex colors, script, voice sample.

Prompt Setup Output

## Avatar Prompt Setup — Flow A

- Style ref:
- Face ref:
- Brand assets:
- Target platform:
- Tool route:

## Copy-Paste Visual Prompt
[English prompt for avatar/source image generation]

## Upload Next
- Face/selfie video:
- Logo:
- Brand colors:
- Voice sample:
- Script:

For a static personal avatar only, route to 30-design-master-global personal-brand mode. For talking-head video, continue this workflow.


Anti-detection for FB / IG / TikTok / YouTube

5 detection signals and fixes

Signal Platforms flagging Fix
Stiff face, no natural blinking FB, IG Use selfie video over photo; pick avatars with micro-expressions
Monotone voice, no natural pauses TikTok, FB Use voice clone (natural pacing) over default TTS
Fully static background FB, IG Add slight noise/grain, or use real-world background
Isolated motion (only mouth moves) TikTok Pick avatars with gesture (hands, head); use HeyGen v3+
Metadata flagged as AI tool YouTube (monetize) Re-export through CapCut (strips metadata); add color grade

Techniques to add "human feel"

  1. Add film grain / noise: 2-5% in CapCut or Premiere
  2. Zoom and crop: 5-10% crop with subtle motion (Ken Burns)
  3. Color grade: Apply film LUT or manually grade — avoid "too clean"
  4. Text overlay: Add subtitles, callouts, stickers to cover AI weak spots
  5. B-roll insert: Drop 2-3 b-roll clips (product, lifestyle) every 15-20s
  6. Sound design: Background music + light SFX (immersion + masks AI voice)

Per platform

  • TikTok: Most lenient — content quality wins over AI checks
  • Facebook / Instagram: Moderate scrutiny — anti-detection matters
  • LinkedIn: Practically no detection — best fit for AI avatars
  • YouTube: Strict for monetized videos — must disclose per YPP policy

CRITICAL: NEVER use AI avatars to impersonate real people without consent. This is illegal in most jurisdictions and grounds for permanent platform bans.


Ethics and Disclosure — Region selector

Disclosure laws differ dramatically by region. Pick the matching variant:

Region Variant file Key law
US / Canada variants/01-us.md FTC Endorsement Guides (16 CFR Part 255), 2023 update
EU / EEA / UK variants/02-eu.md EU AI Act Article 50 (always disclose) + UCPD + GDPR
Southeast Asia variants/03-sea.md Per-country: ASAS (SG), AKARI (ID), DTI (PH), MCMC (MY), TH
Latin America variants/04-latam.md CONAR + LGPD (BR), PROFECO (MX), AAIP (AR), per-country

ALWAYS read the matching variant BEFORE publishing AI avatar content in that region. Penalties range from warning to multi-thousand-USD fines per influencer (US) and can stack under EU AI Act + GDPR.

Universal disclosure rule of thumb

When in doubt, disclose. Disclosure is rarely penalized; non-disclosure can be.

"This video uses AI Avatar technology for visuals and voice."

Placement: video description, first 3 seconds on-screen text, OR platform "AI-generated" tag (where available — Meta, TikTok, YouTube all now support this).


QA Score — 100 points

Scorecard

# Criterion Points Description
1 Lipsync /10 Mouth tracks speech within 0.2s
2 Voice match /10 Voice sounds like the speaker (if clone) or natural (if TTS)
3 Visual quality /10 Sharp image, no artifacts, no blur
4 Background /10 Background suits context, no render glitches
5 Lighting /10 Even light, no harsh shadows, matches background
6 Gesture /10 Natural, no jitters, hand/head movement present
7 Script flow /10 Hook -> Problem -> Solution -> CTA
8 Disclosure /10 AI disclosure compliant with region (see variant)
9 Platform fit /10 Correct aspect ratio, duration, format for platform
10 CTA /10 Clear call-to-action, easy to execute

Action thresholds

Tier Score Action
Excellent 90-100 Publish now
Good 70-89 Publish, note improvements for next round
Needs fix 50-69 Fix items scoring under 7, then re-render
Redo <50 Rebuild from script + voice + avatar

Output template

# AI Avatar Video — [Title] | [Region variant] | [Date]

1. Workflow used: [Single / Translate / Batch / Hybrid]
2. Script: [Content, 150-300 words]
3. Voice: [Tool] — [Voice ID / clone name] — Consent: [Yes / N/A]
4. Avatar: [Tool] — [Avatar ID / custom]
5. QA Score: [X]/100 (10 criteria)
6. Disclosure (per region variant): [Text + placement]
7. Publish: [Platform] — [Aspect ratio] — [Link]

Quality checklist

  • Information collection completed (4 questions)
  • Tier picked (Free / Pro / Enterprise) and aligns with budget + volume
  • Workflow picked (Single / Translate / Batch / Hybrid)
  • Voice clone consent recorded (if cloning a real person)
  • Avatar setup checklist completed before recording
  • Anti-detection techniques applied for the target platform
  • Region variant read and disclosure compliant
  • QA Score >= 70 before publishing

Related skills

  • 25-voice-clone-podcast-global — voice clone deep-dive + podcast pipeline
  • 04-script-video-global — script writing for AI avatar
  • 26-thought-leadership-content-global — content strategy for personal brand
  • references/ai-video-disclosure-global — full legal reference
  • references/voice-clone-prompts-global — voice clone training prompts

Global Skill 24 (AI Avatar Production) | Over Powers Agency | v1.1.0

Install via CLI
npx skills add https://github.com/minhnv0807/ai-business-skills --skill 24-ai-avatar-production-global
Repository Details
star Stars 458
call_split Forks 202
navigation Branch main
article Path SKILL.md
More from Creator