content-creation

star 2

Full-stack content creation pipeline: idea or reference to published blog post, audio narration, video, and social media distribution. Orchestrates research, reference extraction, storytelling, AI visual assets (Nano Banana, Veo 3.1), TTS audio (Voicebox, kokoro-tts, Edge TTS), Remotion video, and social copy into a complete content package. Use when: (1) creating a blog post, case study, or writing entry, (2) turning an idea or experience into structured narrative, (3) using a reference post/video (X, LinkedIn, YouTube) as inspiration, (4) generating visual content (AI images, clips, screenshots, GIFs), (5) generating audio narration for posts, (6) creating social media content (X threads, Instagram carousels) from long-form writing, (7) packaging and distributing content across platforms. Triggers on 'new post', 'blog post', 'case study', 'content creation', 'write about', 'content pipeline', 'social content from post', 'use this as reference', 'generate audio', 'tts', 'narration'.

broomva By broomva schedule Updated 6/4/2026

name: content-creation description: "Full-stack content creation pipeline: idea or reference to published blog post, audio narration, video, and social media distribution. Orchestrates research, reference extraction, storytelling, AI visual assets (Nano Banana, Veo 3.1), TTS audio (Voicebox, kokoro-tts, Edge TTS), Remotion video, and social copy into a complete content package. Use when: (1) creating a blog post, case study, or writing entry, (2) turning an idea or experience into structured narrative, (3) using a reference post/video (X, LinkedIn, YouTube) as inspiration, (4) generating visual content (AI images, clips, screenshots, GIFs), (5) generating audio narration for posts, (6) creating social media content (X threads, Instagram carousels) from long-form writing, (7) packaging and distributing content across platforms. Triggers on 'new post', 'blog post', 'case study', 'content creation', 'write about', 'content pipeline', 'social content from post', 'use this as reference', 'generate audio', 'tts', 'narration'."

Content Creation Pipeline

Eight-phase workflow: reference/idea → published multimedia content package with social distribution.

REFERENCE (optional) → RESEARCH → NARRATIVE → VISUAL ASSETS → AUDIO → VIDEO → SOCIAL → DEPLOY

Setup & Onboarding

Before using the pipeline, the agent MUST check which tools are available and guide the user through setup for any missing ones. Run the checklist below at the start of every content creation session.

Prerequisite Check (run this first)

# Check what's already available
echo "=== Required ==="
which yt-dlp && echo "✓ yt-dlp" || echo "✗ yt-dlp — needed for video download"
which ffmpeg && echo "✓ ffmpeg" || echo "✗ ffmpeg — needed for video processing"
echo ""
echo "=== API Keys ==="
[ -n "$GEMINI_API_KEY" ] && echo "✓ GEMINI_API_KEY set" || echo "✗ GEMINI_API_KEY — needed for Nano Banana, Veo 3.1, Gemini analysis"
[ -n "$FAL_KEY" ] && echo "✓ FAL_KEY set" || echo "✗ FAL_KEY — optional, for fal.ai multi-provider"
[ -n "$ELEVENLABS_API_KEY" ] && echo "✓ ELEVENLABS_API_KEY set" || echo "✗ ELEVENLABS_API_KEY — optional, for voiceover"
echo ""
echo "=== TTS (Audio Narration) ==="
which kokoro-tts && echo "✓ kokoro-tts" || echo "✗ kokoro-tts — pip install kokoro-tts"
curl -sf http://localhost:17493/health > /dev/null 2>&1 && echo "✓ Voicebox server running" || echo "✗ Voicebox — optional, for premium TTS (voicebox.sh)"
which edge-tts && echo "✓ edge-tts" || echo "✗ edge-tts — pip install edge-tts (fallback TTS)"
echo ""
echo "=== Optional ==="
which agent-browser && echo "✓ agent-browser" || echo "✗ agent-browser — optional, for screenshots"
which nano-banana && echo "✓ nano-banana CLI" || echo "✗ nano-banana CLI — optional, Gemini SDK works without it"
which xurl && echo "✓ xurl" || echo "✗ xurl — optional, for X posting"

Tier 1: Core (required for any content creation)

Tool Install Purpose
yt-dlp brew install yt-dlp Download video from X, YouTube, any platform
ffmpeg brew install ffmpeg Video processing, format conversion, GIF creation
Remotion bun add remotion @remotion/cli (per project) Programmatic video composition

Tier 2: AI Generation (required for AI-powered assets)

Tool Setup Purpose
GEMINI_API_KEY Get free key at aistudio.google.com → API keys Nano Banana images, Veo 3.1 video, Gemini analysis, embeddings
@google/genai bun add @google/genai (per project) SDK for all Gemini models

How to get GEMINI_API_KEY:

  1. Go to aistudio.google.com
  2. Click "Get API key" → "Create API key"
  3. Set it: export GEMINI_API_KEY="your-key" (add to .zshrc for persistence)
  4. Free tier: ~500 image gen/day, video gen included, no credit card needed

Tier 3: Enhanced (optional, for specific features)

Tool Setup Purpose When needed
FAL_KEY Sign up at fal.ai, get key from dashboard Multi-provider: Veo, Sora, Kling via one API When you want to swap between video models
ELEVENLABS_API_KEY Sign up at elevenlabs.io AI voiceover generation When video needs narration
kokoro-tts pip install kokoro-tts CLI TTS generation (82M model) Default audio narration for posts
Voicebox voicebox.sh or docker compose up Premium local TTS with voice cloning When best quality audio needed
edge-tts pip install edge-tts Microsoft Neural voices (free, unofficial) Fallback when no GPU available
agent-browser npm install -g @anthropic-ai/agent-browser Screenshots, web interaction When capturing live app screenshots
nano-banana CLI npm install -g @the-focus-ai/nano-banana Quick CLI image generation Convenience; SDK works without it
xurl brew install --cask xdevplatform/tap/xurl Post directly to X When publishing X threads
TweetSave MCP claude mcp add -s user tweetsave -- npx -y mcp-remote https://mcp.tweetsave.org/sse Read X posts from Claude Code For reference extraction without yt-dlp
mcp-veo3 uvx mcp-veo3 --output-dir ~/Videos/Generated Veo 3.1 via MCP When generating video from Claude Code
@aeven/nanobanana-mcp Add to Claude MCP config Nano Banana via MCP When generating images from Claude Code

Agent Behavior

When the skill is triggered, the agent should:

  1. Run the prerequisite check silently (don't dump output unless something is missing)
  2. If critical tools are missing (yt-dlp, ffmpeg, GEMINI_API_KEY), tell the user what's needed and offer to install/configure:
    • For CLI tools: offer the brew install command
    • For API keys: explain where to get them and how to set them
    • For npm packages: offer to install in the current project
  3. If optional tools are missing, proceed without them and mention alternatives:
    • No agent-browser? Use FxTwitter API + yt-dlp for extraction
    • No FAL_KEY? Use @google/genai directly
    • No xurl? Generate the post copy for manual publishing
  4. Never fail silently — if an API call fails due to missing key, explain which key is needed and how to get it
  5. Adapt the pipeline to available tools — use the best available method, not the ideal one

Phase 0: Reference Extraction (when user provides a link)

When the user provides a URL to a post, video, or thread as creative reference, extract and analyze it before anything else. See references/x-content-extraction.md and references/reference-based-content-creation.md for full details.

Step 1: Extract content from the link

X/Twitter posts (fastest — no auth required):

# Extract tweet text, images, video URLs, engagement metrics
TWEET_ID="2034332847893574080"  # from the URL path
curl -s "https://api.fxtwitter.com/status/$TWEET_ID" | jq .

# Download video directly
yt-dlp "https://x.com/user/status/$TWEET_ID" -o reference_video.mp4

# Or via FxTwitter direct download
curl -sL "https://d.fxtwitter.com/user/status/$TWEET_ID" -o reference_video.mp4

YouTube / other platforms:

yt-dlp "URL" -o reference_video.mp4

Any URL with agent-browser (screenshot + text extraction):

agent-browser open "URL" && agent-browser wait --load networkidle
agent-browser screenshot reference_screenshot.png --full
agent-browser get text body > reference_text.txt

Step 2: Analyze with Gemini (video understanding + style extraction)

Upload the downloaded video to Gemini for deep analysis:

import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const file = await ai.files.upload({ file: "reference_video.mp4", config: { mimeType: "video/mp4" } });

const analysis = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: [
    { fileData: { fileUri: file.uri, mimeType: file.mimeType } },
    { text: `Analyze this video as a content creation reference. Extract:
1. Visual Style: color palette (hex values), lighting, camera angles, framing
2. Pacing: shot durations, rhythm, fast/slow sections with timestamps
3. Transitions: types used (cuts, fades, zooms) and when
4. Text Overlays: fonts, positioning, animation, timing
5. Structure: hook (first 3s), body, CTA placement
6. Audio: music style, SFX, voiceover style
7. Engagement Hooks: techniques for retention
Return as structured JSON.` },
  ],
});

For text/image posts, use Gemini or Claude directly on the extracted text + screenshot to analyze hook type, structure, messaging, and CTA pattern.

Step 3: Generate a style brief

The analysis produces a style brief that guides all downstream phases:

  • Narrative phase: match hook type, structure, CTA pattern
  • Visual phase: match color palette, typography, layout
  • Video phase: match pacing, transitions, aspect ratio, duration
  • Social phase: match platform conventions and engagement patterns

Multimodal Embedding (for similarity search)

Use Gemini Embedding 2 to find similar content in your library:

// Embed a reference video into the same space as your content
const embedding = await ai.models.embedContent({
  model: "gemini-embedding-2-preview",
  contents: [{ inlineData: { mimeType: "video/mp4", data: videoBase64 } }],
  config: { outputDimensionality: 768 },
});
// Compare with cosine similarity against your content embeddings

Supports text, images, video (up to 120s), audio (up to 80s), and PDFs in a single vector space.


Phase 1: Research

Gather evidence, capture production state, pull metrics. Never write without data.

Compounding skills: /deep-research, /agent-browser, /competitor-intel, curl/API data pulls.

Checklist:

  • Core claim identified (what are you proving?)
  • 3-5 validating data points gathered
  • Production screenshots captured (full-page + detail)
  • Metrics pulled from APIs (never fabricate numbers)
  • Target audience and their concerns identified
  • Reference post analyzed (if provided) — style brief generated

Phase 2: Narrative

Structure using a proven framework. See references/storytelling.md for full guides.

Content Type Framework Structure
Case study PSI Challenge → Solution → Quantified results
Industry highlight ABT Context AND, BUT challenge, THEREFORE outcome
Technical deep dive 1-3-1 One idea, three evidence points, one takeaway
Product launch Pixar Spine Once upon a time... Every day... Until one day...
Data story Data Arc Context → Tension → Resolution

Blog Post Structure

frontmatter (title, summary, date, published, tags)
Hook (1-2 sentences — open loop or surprising claim)
Hero media (video or key image)
Numbers section (table with headline metrics)
Problem section + image
Solution section + image + progressive detail image
Evidence section + dashboard screenshots + data visualizations
Context section + variant screenshots
Practice section + animated GIF
Generalization section
Closing (memorable one-liner)

Rules: Lead with numbers. One image per ~300 words. Bold key terms on first use. 3-4 sentence paragraphs max. Use <video> for MP4, ![alt](path) for images/GIFs.

Output: MDX file at apps/chat/content/writing/{slug}.mdx.

Phase 3: Visual Assets

Compounding skills:

  • /agent-browser — production screenshots, UI workflows
  • /pencil (MCP) — design social cards, diagrams, slides in .pen files
    • get_guidelines(topic) for design-system/landing-page/slides guidance
    • get_style_guide(tags) for visual consistency
    • batch_design for multi-element compositions
    • get_screenshot to export assets
  • /before-and-after — visual diffs for transformation stories
  • /frontend-design — custom visual components
  • /arcan-glass — BroomVA brand styling

AI-Generated Assets

See references/ai-video-generation.md for full API details, code examples, and Remotion integration patterns.

Image generation (Nano Banana / Gemini):

# CLI: quick hero images, social cards, diagrams
nano-banana "A hero image for {topic}, dark theme, glass effects, 1080x1080"
# SDK: @google/genai with model "gemini-3.1-flash-image" (Nano Banana 2)
# MCP: @aeven/nanobanana-mcp for Claude Code integration

Video generation (Veo 3.1):

# MCP server for Claude Code
uvx mcp-veo3 --output-dir assets/ai-clips/
# SDK: @google/genai with model "veo-3.1-generate-preview"
# Capabilities: 4K, native audio, image-to-video, frame interpolation
# Duration: 4-8s per clip, chain up to 20 extensions (~148s)

Multi-provider (fal.ai):

# Single API for Veo 3.1, Sora 2 Pro, Kling 3 Pro, 600+ models
bun add @fal-ai/client
# Swap models by changing endpoint string, no code changes

Preprocessing AI clips for Remotion (critical):

ffmpeg -i ai_clip.mp4 -c:v libx264 -crf 18 -movflags +faststart -r 30 processed.mp4

Manual Asset Pipeline

Image pipeline:

magick input.png -resize 1200x -quality 85 output-opt.png
magick f1.png f2.png f3.png -resize 1200x675! -set delay 200 -loop 0 flow.gif
mkdir -p apps/chat/public/images/writing/{slug}/

Naming: {subject}-{descriptor}-opt.png

Checklist: Hero image/video, 1 image per section (5-7 min), 1+ animated GIF, all < 500KB, descriptive alt text.

Phase 4: Audio Narration

Generate TTS audio for each post so readers can listen. Pre-generate at pipeline time, not on-demand. See references/tts-audio-generation.md for full engine comparison, API details, and batch scripts.

Compounding skills: /openrocket-sim (batch scripting patterns), /remotion-best-practices (media pipeline).

TTS Engine Selection

Engine When to use
Voicebox (localhost:17493) Best quality. Voice cloning. GPU available. POST /generateGET /audio/{id}
kokoro-tts Fast batch default. CLI-first. kokoro-tts input.txt output.wav --voice af_sarah
Edge TTS No local GPU. edge-tts --text "..." --voice en-US-AndrewNeural --write-media out.mp3

Quick Generation

# Strip frontmatter from MDX, generate audio
slug="my-post"
sed '1{/^---$/!q;};1,/^---$/d' apps/chat/content/writing/$slug.mdx \
  | kokoro-tts - /tmp/$slug.wav --voice af_sarah
ffmpeg -i /tmp/$slug.wav -codec:a libmp3lame -b:a 128k apps/chat/public/audio/writing/$slug.mp3

Integration

  1. Place MP3 at public/audio/writing/{slug}.mp3
  2. Add audio: /audio/writing/{slug}.mp3 to post frontmatter
  3. ContentArticle component renders <audio> player with full controls (play/pause, seek, skip ±10s)
  4. Falls back to Web Speech API if no audio file

Checklist:

  • TTS engine running (Voicebox server or kokoro-tts installed)
  • Audio generated and placed in public/audio/writing/{slug}.mp3
  • audio field added to post frontmatter
  • Audio plays correctly on the deployed page

Phase 5: Video

Compounding skills: /remotion-best-practices — read rules for animations, sequencing, transitions, images, text. Audio from Phase 4 can be used as voiceover track in Remotion compositions.

Video structure (15-30s):

Title (3-4s) → Stats (3s) → Screenshots (2-3s each) → Workflow (3-4s) → Closing (3-4s)

Key Remotion rules: Use Img + staticFile() (never <img>). Use spring() for organic motion. Use Sequence with premountFor. No CSS transitions or Tailwind animation classes.

Hybrid AI + Remotion Pipeline

Combine AI-generated footage with Remotion motion graphics for production-quality output:

Nano Banana → hero images, backgrounds, social cards
Veo 3.1 → cinematic B-roll clips (8s each, 4K, with audio)
Remotion → motion graphics, titles, transitions, data viz
FFmpeg → preprocess AI clips, final GIF conversion

In Remotion compositions:

// AI-generated video as background layer
<OffthreadVideo src={staticFile("assets/veo-clip.mp4")} style={{ objectFit: "cover" }} />

// AI-generated image
<Img src={staticFile("assets/nano-banana-hero.png")} />

// Dynamic duration from AI clips
// Use @remotion/media-parser parseMedia() with calculateMetadata

Use <TransitionSeries> from @remotion/transitions to blend AI clips with motion graphics scenes via fade/wipe/slide transitions.

Render Commands

cd /tmp/{project}-remotion && bun install
npx remotion render {Id} --output out/video.mp4
ffmpeg -y -i out/video.mp4 -vf "fps=12,scale=960:-1:flags=lanczos" -c:v gif out/video.gif

Phase 6: Social Distribution

See references/social-distribution.md for copy patterns and references/social-publishing.md for CLI/MCP tool setup.

X Thread (5-8 tweets)

  1. Hook — surprising stat, contrarian claim, or earned insight (50% of effort here)
  2. Context — set the scene 3-6. Key insights — one per tweet, image every 2-3 tweets
  3. Strongest evidence
  4. CTA — link, follow, or question

Publishing: Use xurl CLI or Twitter MCP server to post directly.

xurl post "1/7 — [Hook tweet text]"
xurl media upload hero-image.png  # returns MEDIA_ID
xurl post "2/7 — [Context]" --media-id MEDIA_ID

Instagram Carousel (8-12 slides, 1080x1350px)

Use /pencil to design slides. Cover → Problem → Insights (1/slide) → Stat → Summary → CTA.

Publishing: Use Instagram MCP server (ig-mcp) or Meta Graph API.

LinkedIn Post

Hook in first 210 chars. 2-3 paragraphs + bullet list + CTA. 3-5 hashtags.

Publishing: Use LinkedIn MCP server (linkedin-mcp) or REST API with OAuth token.

Phase 7: Deploy

git checkout -b content/{slug}
git add apps/chat/content/writing/{slug}.mdx apps/chat/public/images/writing/{slug}/
git commit -m "content: add {title}"
git push -u origin content/{slug}
gh pr create --title "content: {short title}" --body "..."

Dependency Map

┌─ REFERENCE EXTRACTION ──────────────────────────────────────────┐
│  FxTwitter API (no auth)  yt-dlp    /agent-browser               │
│  TweetSave MCP            Gemini 2.5 (video understanding)       │
│  Gemini Embedding 2 (multimodal similarity)                      │
├─ RESEARCH ──────────────────────────────────────────────────────┤
│  /deep-research    /agent-browser    /competitor-intel    curl   │
├─ AI GENERATION ─────────────────────────────────────────────────┤
│  Nano Banana (@google/genai)   Veo 3.1 (@google/genai)          │
│  fal.ai (@fal-ai/client)      ElevenLabs (voiceover)            │
│  nano-banana CLI               mcp-veo3 (MCP server)            │
│  @aeven/nanobanana-mcp         veo-mcp-server                   │
├─ AUDIO (TTS) ──────────────────────────────────────────────────┤
│  Voicebox (localhost:17493)  kokoro-tts CLI   edge-tts          │
│  mlx-audio (Apple Silicon)   ffmpeg (WAV→MP3)                   │
├─ DESIGN ────────────────────────────────────────────────────────┤
│  /pencil (MCP)   /before-and-after   /frontend-design           │
│  /arcan-glass    magick/ffmpeg                                   │
├─ VIDEO ─────────────────────────────────────────────────────────┤
│  /remotion-best-practices   /skills-showcase   /json-render-remotion │
│  @remotion/media-parser     @remotion/transitions                │
├─ NARRATIVE ─────────────────────────────────────────────────────┤
│  references/storytelling.md   references/social-distribution.md  │
│  references/visual-content.md                                    │
├─ PUBLISH ───────────────────────────────────────────────────────┤
│  xurl (X CLI)    twitter-mcp-server    linkedin-mcp              │
│  ig-mcp          Ayrshare MCP (multi-platform)                   │
├─ DEPLOY ────────────────────────────────────────────────────────┤
│  git + gh CLI    /vercel-cli    Vercel preview CI/CD             │
└─────────────────────────────────────────────────────────────────┘

Reference Files

Install via CLI
npx skills add https://github.com/broomva/skills --skill content-creation
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator