name: ai-media-gen description: Generate raster images and video from prompts via the ai-cli (Vercel AI Gateway). Use when you need a real raster image or video — hero banners, conceptual illustrations, social cards, mockup imagery, or any pixel/motion asset that mermaid, visual-explainer HTML, excalidraw, or ascii-wireframe (all vector) cannot produce. Also use to generate or compare images across models from the terminal. Triggers on "generate an image", "make a hero image", "render a picture/photo", "create a video", "illustration for the doc". allowed-tools: Read, Glob, Grep, Bash
ai-media-gen
cc generates only vector visuals (mermaid, visual-explainer HTML, excalidraw MCP,
ascii-wireframe). This skill closes the raster + motion gap by wrapping the ai CLI
(ai-cli, npm bin ai, v0.3.0) behind a cc-aware wrapper: scripts/bin/ai-media.
Always call scripts/bin/ai-media, not raw ai. The wrapper injects a default output path,
forces --json, and maps results to a three-state exit code — removing the non-TTY
stdout-binary footgun by construction.
When to Use
- A raster hero banner or conceptual illustration for a
visual-explainer/frontend-designpage - A photo/picture/render an agent or user explicitly asks for
- Video from a prompt or from a still image
- Side-by-side comparison of one prompt across multiple models
- A composable media pipeline (image → video, image edit via stdin)
Do NOT use for diagrams, charts, schemas, flowcharts, or tables — those are vector work owned by
visual-explainer (HTML/mermaid), mermaid-diagrams, or ascii-wireframe.
Prerequisites
AI_GATEWAY_API_KEY must be set. The canonical location is ~/.env (your ~/.zshrc auto-exports
it via set -a; source ~/.env; set +a). If it is unset the wrapper exits 3 with guidance.
Verify gateway auth at any time with scripts/bin/ai-media models --type image.
Free tier vs paid (image/video). The CLI's default image model openai/gpt-image-2 requires
paid AI Gateway credits and errors on the free tier. Verified free-tier-accessible image model:
bfl/flux-2-flex (-m bfl/flux-2-flex, ~9-11s). ai text runs free (openai/gpt-5.5). For
image/video on free tier, always pass -m with an accessible model or set AI_CLI_IMAGE_MODEL /
AI_CLI_VIDEO_MODEL. Check access per-model with scripts/bin/ai-media models --type image.
Commands
scripts/bin/ai-media image "a sunset over matte-black mountains" # generate an image
scripts/bin/ai-media video "a slow drone shot of a canyon" # generate a video
scripts/bin/ai-media text "summarize this" < notes.txt # generate text
scripts/bin/ai-media models --type image # list image models
Key Flags (passed through to ai)
-m, --model <id> Model id (creator/name or short), comma-separated for multi-model
-o, --output <path> Output file or dir. Wrapper defaults to docs/diagrams/assets/ if omitted.
-n, --count <n> Generations per model
-i, --image <path> Reference image (image cmd) / vision input (text cmd), repeatable
--size <WxH> Image size --aspect-ratio <W:H> Aspect ratio
Output Behavior (important for agents)
The wrapper always writes to a file and prints --json to stdout. Read the artifact path
from results[].file — never expect raw binary. Without an explicit -o, output lands in
${AI_CLI_OUTPUT_DIR:-docs/diagrams/assets}/.
{ "elapsed_ms": 3420, "count": 1,
"results": [ { "index": 1, "model": "openai/gpt-image-2", "success": true, "file": ".../output.png" } ] }
Exit Codes
| Code | Class | Meaning |
|---|---|---|
0 |
generation | all results succeeded |
1 |
generation | all failed, or ai errored / emitted non-JSON |
2 |
generation | partial — some succeeded, some failed |
3 |
config | AI_GATEWAY_API_KEY unset (preflight, before any call) |
Branch on 2 to handle partial multi-model/multi-count runs distinctly from total failure.
Piping Patterns
# Summarize piped content (text)
git diff | scripts/bin/ai-media text "write a commit message"
# Image → video pipeline
scripts/bin/ai-media image "a dragon" -o /tmp/d.png && \
scripts/bin/ai-media video "animate this" -i /tmp/d.png
Embedding into a visual-explainer / frontend-design page
Generate, base64-encode, inline for self-containment:
scripts/bin/ai-media image "isometric matte-navy server rack, single cyan accent" \
-o /tmp/hero.png --aspect-ratio 16:9
IMG=$(base64 -w 0 /tmp/hero.png) # Linux; macOS: base64 -i /tmp/hero.png
# <img src="data:image/png;base64,${IMG}" alt="...">
rm /tmp/hero.png
Match the prompt to the page palette and aesthetic direction (style + dominant colors). Specific prompts beat vague ones — "isometric illustration of a message queue, cyan nodes on dark navy" beats "a diagram of a queue".
Cross-References
| Need | Skill |
|---|---|
| The page the image goes into | visual-explainer, frontend-design |
| Vector diagrams (do NOT use this skill) | mermaid-diagrams, ascii-wireframe |
| Brand logos (not generated) | svgl |
Source / adoption verdict: docs/recon/vercel-labs-ai-cli.{md,html}.