name: civitai-gen description: Generate images, videos, audio, and more using Civitai's orchestration API. Use when the user wants text-to-image, video generation (11+ engines), text-to-speech, music, transcription, bulk batches, experiment sweeps, or buzz cost estimation. Not for browsing or searching Civitai models (use the Civitai MCP server). license: MIT compatibility: Requires Node.js 18+ (native fetch) and a CIVITAI_API_KEY. Network access required. ffmpeg optional (audio post-processing only). metadata: { "author": "Civitai", "version": "1.0.5", "homepage": "https://github.com/civitai/civitai-gen-skill" }
civitai-gen
Unified CLI for Civitai's orchestration Workflow API. All generation types share the same workflow lifecycle: submit, poll, download.
Setup
Requires Node.js 18+. Set CIVITAI_API_KEY either as an environment variable or in a .env file in this skill's directory (next to generate.mjs). Copy .env.example to .env to start. Get a key at: https://civitai.com/user/account
Paths below are relative to this skill's directory. Run the scripts from there (
cdinto it), or prefix with the install path your runtime uses.
Capabilities
| Command | What it does | Details |
|---|---|---|
wait |
Submit + poll + download (default) | All-in-one blocking generation |
submit |
Fire-and-forget, returns workflow ID | For async workflows |
status |
Check workflow progress | --poll for live updates |
download |
Fetch completed media | From a workflow ID |
cost |
Dry-run buzz estimation | whatif=true, 0 buzz spent |
engines |
List video engines + live status | 11+ engines |
tts |
Text-to-speech | See docs/tts.md |
music |
Music/song generation (ACE Step 1.5) | See docs/music.md |
transcribe |
Speech-to-text transcription | See docs/transcription.md |
Quick Examples
# Image (defaults to Flux.1, 4 images)
node generate.mjs wait --prompt "A knight at sunset" -o ./out
# Multiple concurrent prompts
node generate.mjs wait --prompt "A warrior" --prompt "A mage" -o ./out
# Video (VEO 3)
node generate.mjs wait --engine veo3 --prompt "A robot walking" -o ./out
# Text-to-speech
node generate.mjs tts --text "Hello world" --speaker serena -o ./out
# Music
node generate.mjs music --prompt "upbeat electronic dance track" -o ./out
# Transcription
node generate.mjs transcribe --media-url "https://example.com/audio.mp3" -o ./out
# Cost check (any type, 0 buzz)
node generate.mjs cost --prompt "A cat" -n 100
node generate.mjs cost --engine veo3 --prompt "A robot" --duration 8
# Experiment mode (wildcard expansion)
node experiment.mjs --spec experiment.json -o ./out
Posting What You Generate
Generate an image (the skill saves it locally), then post the local file in one command:
node generate.mjs wait --prompt "a red apple" -n 1 -o ./out
node mcp-cli.mjs post-image ./out/step_0-0.png --title "My apple"
# -> Post published. URL: https://civitai.com/posts/<id>
post-image reads the local file, uploads it, and publishes the post — it prints the public URL. Pull mcp-cli.mjs first if you don't have it (curl -fsSL https://mcp.civitai.com/cli -o mcp-cli.mjs); CIVITAI_API_KEY is required.
Flags: --title (defaults to the filename), --detail, --nsfw <level>, --draft (leave unpublished), --json. For video/audio, or to attach a pre-uploaded image by UUID, call create_post directly (node mcp-cli.mjs schema create_post) — and prefer posting the local file over the generate output's remoteUrls[].url, which is a short-lived signed URL that expires quickly.
Choosing an Engine & Model
Read docs/engines.md to pick the right generator. The key split:
| Path | Engines | How to pick the model |
|---|---|---|
| Open-weight ecosystem | SD1, SDXL, Pony, Illustrious, Flux.1/2, Qwen, Z-Image, Chroma, Anima | Find a checkpoint AIR + compatible LoRAs via the Civitai MCP (search_models / get_model_version) → --model / --resources |
| Closed API engine | OpenAI, Google/Gemini, Seedream, Grok, MAI, ERNIE + all video/audio | Engine name only — no checkpoint search, no LoRA |
Model discovery uses the Civitai MCP server (hosted at https://mcp.civitai.com/mcp) — call its tools to search checkpoints/LoRAs and get AIR URNs. Don't search inside this skill. A LoRA's base model must match the checkpoint's ecosystem.
If the Civitai MCP isn't connected, add it (browse tools need no API key):
claude mcp add --transport http civitai https://mcp.civitai.com/mcp
Can't add MCP config in your runtime? Pull the zero-dep CLI (Node 18+) and call tools over the shell instead:
curl -fsSL https://mcp.civitai.com/cli -o mcp-cli.mjs
node mcp-cli.mjs call search_models '{"query":"anime portrait","type":"Checkpoint","supportsGeneration":true,"baseModel":"SDXL 1.0"}'
# Call these Civitai MCP tools directly (they return AIR URNs ready for --model / --resources):
search_models { query: "anime portrait", type: "Checkpoint", supportsGeneration: true, baseModel: "SDXL 1.0" }
search_models { query: "neon style", type: "LORA", baseModel: "SDXL 1.0" } # LoRA base MUST match checkpoint
get_model_version { ids: [<versionId>] } # full version details + AIR URN + trigger words
Engine availability/params drift — run node generate.mjs engines for the live list and see https://developer.civitai.com/orchestration/recipes for per-engine parameters.
Domain-Specific Docs
For detailed parameters, read the relevant doc:
- Images:
--prompt,--model,--resources,--aspect,--resolution,--source-image(img2img). Runnode generate.mjs --helpfor full flag list. - Videos:
--engine,--duration,--video-aspect,--generate-audio. Runnode generate.mjs enginesfor available engines. - TTS: Read
docs/tts.md— built-in speakers, voice cloning, style instructions. - Music: Read
docs/music.md— ACE Step 1.5, lyrics, duration. - Transcription: Read
docs/transcription.md— ASR, timestamps, language hints. - Experiments: Read
experiment.mjs --help— wildcards, parameter sweeps, naming.
Workflow Lifecycle
All generation types follow the same pattern:
- Build steps — each job becomes a workflow step with a
$type(textToImage,videoGen,textToSpeech,aceStepAudio,transcription) - Submit — POST to orchestration API, get workflow ID
- Poll — check status until terminal state (succeeded/failed/expired)
- Download — fetch output media (images, videos, audio files)
- JSON summary — clean JSON to stdout with paths, costs, and metadata
Use cost (whatif) to estimate buzz before spending. Use --quiet for agent-friendly output.
Guidelines
- Default to 4 images per prompt. Use
-n 1only when a single image is needed. - For video: always check
costfirst. Video costs 500-2000+ buzz per clip. - Multiple
--promptflags create concurrent steps in one workflow. - Use
--bulk file.jsonfor large batches. - Use experiment mode for systematic parameter sweeps.
Developer Guide
See CLAUDE.md for architecture, module layout, and how to add new step types.