ugc - SKILL.md Agent Skill

name: ugc description: > Plan and run UGC-style creator ads and social proof videos with genmedia. Use this for direct-to-camera creator scripts, talking-head ads, product demos, testimonials, founder clips, unboxing, reaction, before-after, faceless voiceover, and short vertical social videos.

UGC production with genmedia

Use this skill when the user wants creator-style content rather than polished studio advertising. Load references as needed:

references/formats.md
references/workflows.md
references/examples.md

Load model-routing alongside this skill for default endpoint choices.

Keep outputs believable, platform-native, and claim-safe. Do not write fake testimonials, fake metrics, invented reviews, medical claims, financial claims, or readable legal copy unless the user provides the exact wording.

Inputs to collect

Only ask when the answer changes execution.

Product or offer: what is being shown, sold, taught, or explained.
Format: direct-to-camera, demo, reaction, unboxing, founder, faceless b-roll.
Speaker: supplied portrait/video, generated avatar, no face, or voiceover.
Script source: exact script, bullet points, offer copy, or ask to draft.
Platform: TikTok, Reels, Shorts, paid social, landing page, prototype.
Runtime and crop: usually 9:16, 6-15 seconds for hooks, 15-45 seconds for ads.
Source media: portrait, product image, product video, logo, b-roll, audio.
Claims: proof the user supplied, required disclaimers, banned phrases.
Tone: casual, expert, founder-led, skeptical, excited, calm, documentary.

Genmedia workflow

Start from routed endpoint IDs.

genmedia models --endpoint_id veed/fabric-1.0 --json
genmedia models --endpoint_id veed/fabric-1.0/text --json
genmedia models --endpoint_id fal-ai/creatify/aurora --json
genmedia models --endpoint_id fal-ai/sync-lipsync/v2 --json
genmedia models --endpoint_id bytedance/seedance-2.0/image-to-video --json
genmedia models --endpoint_id openai/gpt-image-2 --json

Use text search only as fallback discovery for missing roles:

genmedia models "ugc talking head avatar" --json
genmedia models "short social video product demo" --json
genmedia docs "lip sync video" --json

Inspect the selected endpoint before running.

genmedia schema <endpoint_id> --json
genmedia pricing <endpoint_id> --json

Upload all local source media.

genmedia upload ./creator.jpg --json
genmedia upload ./product.png --json
genmedia upload ./voiceover.wav --json
genmedia upload ./existing-clip.mp4 --json

Choose the production route.
- Portrait plus audio: veed/fabric-1.0.
- Portrait plus text: veed/fabric-1.0/text when schema supports it.
- Avatar with stronger visual direction: fal-ai/creatify/aurora.
- Existing video with new speech: fal-ai/sync-lipsync/v2.
- Product b-roll: bytedance/seedance-2.0/image-to-video from an approved still or product reference.
- Hook frames and thumbnails: openai/gpt-image-2, especially when text or product fidelity matters.

Run long video jobs async and download from status.

genmedia run <endpoint_id> \
  --prompt "<ugc visual direction or shot prompt>" \
  --async \
  --json

genmedia status <endpoint_id> <request_id> \
  --download "./outputs/ugc/{request_id}_{index}.{ext}" \
  --json

Use schema fields exactly. Mirror the model's field names such as image_url, audio_url, video_url, text, prompt, visual_direction, aspect_ratio, duration, or seed.

Script build order

Build scripts as short spoken beats, not polished ad copy:

Hook: one concrete tension, problem, result, or curiosity gap.
Context: why this speaker cares or what situation they are in.
Product moment: product appears, is used, or is shown solving a problem.
Proof: sensory detail, visible demo, supplied metric, or user-provided fact.
Turn: before-after, objection answered, or unexpected benefit.
Close: soft CTA, next action, or clean final product frame.

Keep claims grounded. Replace unsupported claims with observable statements: "the texture looks lighter" beats "this cures acne".

Prompt build order

For each UGC clip or shot, write:

Speaker and frame: creator type, age range if supplied, setting, crop.
Performance: eye contact, casual delivery, gestures, expression, pace.
Product action: held up, opened, applied, compared, demonstrated, shown.
Camera: handheld phone, desk tripod, mirror shot, close-up, b-roll insert.
Lighting and audio feel: natural window, bathroom light, car interior.
Platform constraints: 9:16, safe top/bottom zones, no generated captions.
Guardrails: no fake claims, no new logos, no changed product packaging.

Model routing

Talking head from portrait and audio: veed/fabric-1.0.
Talking head from portrait and text: veed/fabric-1.0/text.
Avatar with visual direction: fal-ai/creatify/aurora.
Lip-sync existing footage: fal-ai/sync-lipsync/v2.
Product b-roll or demo motion: bytedance/seedance-2.0/image-to-video.
Fast draft b-roll: xai/grok-imagine-video/image-to-video.
Creator keyframes, thumbnails, and product-faithful stills: openai/gpt-image-2 or fal-ai/nano-banana-pro.
TTS: use fal-models-catalog/references/text-to-audio.md and inspect schema. Prefer short test sentences before full scripts.

Quality bar

Before returning:

The clip feels like plausible creator content, not a studio commercial.
Spoken script length fits the requested runtime.
Mouth motion is synced when a speaking face is used.
Product shape, packaging, and logo are stable.
Claims are supplied by the user or phrased as visible observations.
The first 1-2 seconds have a clear hook.
Captions or text overlays are not hallucinated inside the video.
Output paths come from downloaded_files[], not manually fetched URLs.

If the face drifts, shorten the script, use cleaner portrait/audio inputs, or switch from generated avatar to lip-syncing approved source footage.