name: ai-image-creator description: Generate PNG images using AI (multiple models via OpenRouter including Gemini, FLUX.2, Riverflow, SeedDream, GPT-5 Image, GPT-5.4 Image 2, proxied through Cloudflare AI Gateway BYOK). Also analyze/describe existing images using multimodal AI vision. Use when user asks to "generate an image", "create a PNG", "make an icon", "make it transparent", "describe this image", "analyze this image", "what's in this image", "explain this image", or needs AI-generated visual assets for the project. Supports model selection via keywords (gemini, riverflow, flux2, seedream, gpt5, gpt5.4), configurable aspect ratios/resolutions, transparent backgrounds (-t), reference image editing (-r), image analysis (--analyze), and per-project cost tracking (--costs). allowed-tools: Bash, Read, Write compatibility: Requires uv (Python runner) and network access. Environment variables for CF AI Gateway or direct API keys must be configured in shell profile (~/.zshrc on macOS, ~/.bashrc on Linux, or System Environment Variables on Windows). metadata: tags: image-generation, ai, openrouter, cloudflare, gemini, flux2, riverflow, seedream, gpt5, gpt54
AI Image Creator
Generate PNG images via multiple AI models, routed through Cloudflare AI Gateway BYOK or directly via OpenRouter/Google AI Studio.
Model Selection
When the user mentions a model keyword in their image request, use the corresponding --model flag:
| Keyword | Model | Use When User Says |
|---|---|---|
gemini |
Google Gemini 3.1 Flash (default) | "gemini", "generate an image" (no model specified) |
riverflow |
Sourceful Riverflow v2 Pro | "riverflow", "use riverflow" |
flux2 |
FLUX.2 Max | "flux2", "flux", "use flux" |
seedream |
ByteDance SeedDream 4.5 | "seedream", "use seedream" |
gpt5 |
OpenAI GPT-5 Image | "gpt5", "gpt5 image", "use gpt5" |
gpt5.4 |
OpenAI GPT-5.4 Image 2 | "gpt5.4", "gpt-5.4 image", "use gpt5.4" |
Instructions
Routing check: If the user asks to describe, analyze, or explain an existing image (not generate a new one), skip directly to the Image Analysis (
--analyze) section below. No prompt enhancement or output path needed.
Step 1: Write Prompt
For long or complex prompts (recommended), write to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt using the Write tool:
Write prompt text to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt
For short prompts (under 200 chars, no special characters), pass inline via --prompt.
CRITICAL — Prompt Quality Tips:
- Be detailed and descriptive. Include style, colors, composition, background, and intended use.
- Good: "A flat-design globe icon with vertical timezone band lines in blue and teal, white background, clean vector style, suitable for a web app at 512x512 pixels"
- Bad: "globe icon"
- Specify "transparent background" or "white background" explicitly.
- For icons, mention the target size (e.g., "512x512", "favicon at 32x32").
- For photos, describe lighting, camera angle, and mood.
Step 1.5: Prompt Enhancement (Optional — Progressive Disclosure)
Professional prompt patterns are available in 3 reference files. These are not loaded by default — only read them when the user's request matches a category or they explicitly ask for enhancement.
Category Detection — Match the user's request to a category:
| If request mentions... | Category | Also read |
|---|---|---|
| "product shot", "product photo", "hero image" | product_hero |
prompt-core.md + prompt-categories.md § product_hero |
| "lifestyle", "in-use", "in context" | lifestyle |
prompt-core.md + prompt-categories.md § lifestyle |
| "instagram", "social media", "tiktok", "pinterest" | social_media |
prompt-core.md + prompt-platforms.md + prompt-categories.md § social_media |
| "banner", "ad", "email header" | marketing_banner |
prompt-core.md + prompt-platforms.md + prompt-categories.md § marketing_banner. Routing hint: If user has an existing logo and wants multiple standard sizes → use composite mode instead (see ## Composite Banners). |
| "website", "app", "logo", "ad format", "leaderboard", "skyscraper" | web_app |
prompt-core.md + prompt-platforms.md + prompt-categories.md § web_app. Routing hint: For "logo banners" or "OG images with my logo" where user has existing logo → use composite-banners.py. For "design me a new logo" → use generate-image.py. |
| "brand kit", "logo banners", "banner sizes", "IAB sizes", "consistent banners" + user has existing logo | composite |
Read references/composite-reference.md, use composite-banners.py |
| "icon", "favicon", "app icon" | icon_logo |
prompt-core.md + prompt-categories.md § icon_logo |
| "mascot", "character", "illustration", "artwork" | illustration |
prompt-core.md + prompt-categories.md § illustration |
| "food", "drink", "recipe", "restaurant" | food_drink |
prompt-core.md + prompt-categories.md § food_drink |
| "building", "interior", "room", "architecture" | architecture |
prompt-core.md + prompt-categories.md § architecture |
| "chart", "infographic", "data", "diagram" | infographic |
prompt-core.md + prompt-categories.md § infographic |
| "t-shirt", "mug design", "poster", "POD", "print-on-demand" | pod_design |
prompt-core.md + prompt-platforms.md + prompt-categories.md § pod_design |
| "describe", "analyze", "what's in this image", "explain image" | analyze |
Skip prompt enhancement — use --analyze mode directly. Read references/analyze-reference.md for advanced analysis patterns |
| No match / simple request | — | Skip patterns, generate directly |
When to skip enhancement:
- User's prompt is already detailed (150+ words with camera/lighting/composition specifics)
- Simple/direct requests ("generate a blue circle on white background")
- User says "no pattern" or provides a fully formed prompt
When to apply:
- User says "use product_hero pattern" or "apply social_media pattern" (explicit)
- Request clearly matches a category above (auto-detect)
- User asks for "enhanced prompt" or "professional quality"
Reference files (in references/ directory):
prompt-core.md— Foundational rules: narrative prompting, camera/lens/lighting specs, text rendering rules, model recommendationsprompt-platforms.md— Social media ratios, IAB ad sizes, web dimensions, POD specs — all mapped to-a/-sflagsprompt-categories.md— 11 category formulas with templates and complete example prompts
Step 2: Run Generation Script
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
[--provider openrouter|google] \
[-a "16:9"] \
[-s "2K"] \
[-m "model-id"] \
[-r "ref-image.png"] \
[-t]
With a specific model:
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
-m riverflow \
-p "A serene mountain lake at sunset"
With transparent background (requires ffmpeg + imagemagick):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "mascot.png" \
-t \
-p "A friendly robot mascot character"
With reference image for editing/style transfer (multimodal models only):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "edited.png" \
-r "original.png" \
-p "Change the background to a sunset scene"
Or with inline prompt (default model):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
-p "A simple blue circle on white background"
Step 3: Clean Up (if temp file used)
rm -f ${CLAUDE_SKILL_DIR}/tmp/prompt.txt
Step 4: Verify Output
file OUTPUT_PATH
Confirm it shows "PNG image data" and report the file path and size to the user.
Step 5: Post-Processing (optional)
If the user needs resizing, format conversion, or other manipulation, first detect available image tools, then use them. See Image Tools section below.
Parameters
| Argument | Short | Required | Default | Description |
|---|---|---|---|---|
--output |
-o |
Yes | -- | Output file path (parent dirs auto-created) |
--prompt |
-p |
No | -- | Inline prompt text |
--prompt-file |
-- | No | ../tmp/prompt.txt |
Path to prompt file |
--provider |
-- | No | openrouter |
openrouter or google |
--aspect-ratio |
-a |
No | model default | OpenRouter only: 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 4:5, 5:4, 21:9 |
--image-size |
-s |
No | model default | OpenRouter only: 0.5K, 1K, 2K, 4K |
--model |
-m |
No | gemini |
Model keyword (gemini, riverflow, flux2, seedream, gpt5) or full model ID |
--ref |
-r |
No | -- | Reference image file (repeatable). For editing/style transfer. Multimodal models only (gemini, gpt5) |
--analyze |
-- | No | -- | Analyze/describe a reference image (text-only output, no image generated). Requires -r. Multimodal models only |
--transparent |
-t |
No | -- | Generate with transparent background. Requires ffmpeg + imagemagick |
--costs |
-- | No | -- | Display generation/cost history for this project and exit |
--list-models |
-- | No | -- | List available model keywords and exit |
Environment Variables
| Variable | Required For | Description |
|---|---|---|
AI_IMG_CREATOR_CF_ACCOUNT_ID |
Gateway mode | Cloudflare account ID |
AI_IMG_CREATOR_CF_GATEWAY_ID |
Gateway mode | AI Gateway name |
AI_IMG_CREATOR_CF_TOKEN |
Gateway mode | Gateway auth token |
AI_IMG_CREATOR_OPENROUTER_KEY |
Direct OpenRouter | OpenRouter API key (sk-or-...) |
AI_IMG_CREATOR_GEMINI_KEY |
Direct Google | Google AI Studio API key |
Gateway mode activates when all 3 CF_* vars are set. Falls back to direct mode if gateway fails.
For first-time setup, see references/setup-guide.md.
Transparent Mode (-t)
Generates images with transparent backgrounds using a 3-step pipeline:
- Green screen generation — Prompt is augmented to place subject on solid #00FF00 green
- FFmpeg chroma key — Removes green background + green fringe from edges
- ImageMagick auto-crop — Trims transparent padding
Requirements: brew install ffmpeg imagemagick
Use cases: Game sprites, icons, logos, mascots, marketing assets with transparency.
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "sprite.png" -t -p "A pixel art treasure chest"
Reference Images (-r)
Send existing images alongside text prompts for editing, style transfer, or guided generation. Supports multiple references. Multimodal models only (gemini, gpt5) — image-only models (riverflow, flux2, seedream) will error.
# Edit an existing image
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "edited.png" -r "photo.png" -p "Make the background white"
# Style transfer with multiple references
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "combined.png" -r "style1.png" -r "content.png" -p "Apply the style of the first image to the second"
Supported formats: PNG, JPEG, WebP, GIF.
Image Analysis (--analyze)
Describe, analyze, or explain existing images using multimodal AI vision. Returns text-only output (no image generated). Multimodal models only (gemini, gpt5).
No -o output path needed. No prompt enhancement needed. The script outputs JSON to stdout with the model's analysis in the analysis field.
# Analyze with default prompt (describes subject, style, colors, composition, mood, text)
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png"
# Analyze with custom prompt
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png" -p "Describe this image in plain text and also in JSON structured output"
# Analyze with a specific model
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png" -m gpt5 -p "What text is visible in this image?"
# Analyze multiple images together
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "before.png" -r "after.png" -p "Compare these two images and describe the differences"
JSON output format:
{"ok": true, "analyze": true, "analysis": "<model text>", "provider": "openrouter", "model": "...", "mode": "gateway", "elapsed_seconds": 3.2, "ref_images": 1}
Incompatible flags: --analyze cannot be combined with -o, -t, -a, or -s.
For advanced analysis prompt patterns (structured output, comparison, targeted analysis), read references/analyze-reference.md.
Cost Tracking (--costs)
Every generation is logged to .ai-image-creator/costs.json in your project directory. View history:
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py --costs
Shows per-model breakdown: generation count, total tokens, elapsed time, and recent entries. Security: Only non-sensitive data is logged (model, tokens, timing, file path). No API keys or credentials are ever stored.
Consider adding .ai-image-creator/ to your .gitignore.
Composite Banners
Generate consistent logo banners across multiple sizes from a JSON config. Uses ImageMagick for offline compositing — no API calls, no network required. Composites an existing logo/mark onto branded backgrounds with text at standard dimensions.
Composite vs. AI Generation — Decision Rule
Use composite-banners.py when ALL of these are true:
- User has an existing logo/mark they want to use as-is (provides or references a logo file)
- User wants consistent branding across multiple standard sizes (not one creative image)
- The output is logo + text on a solid/gradient background (not a photograph, illustration, or creative design)
Use generate-image.py (AI generation) when ANY of these are true:
- User wants a creative/artistic banner design (describes a scene, mood, concept, or style)
- User wants AI to design the visual content (product shots, illustrations, creative layouts)
- User wants a single banner with artistic content, not a multi-size brand kit
When composite mode applies, read references/composite-reference.md for full config schema, preset dimensions, and font handling details.
Quick Start
- Init config:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --init - Edit
banner-config.json— set logo path, brand text, colors, banner sizes - Validate:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --validate - Generate:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py -c banner-config.json -o ./banners/
Composite Parameters
| Argument | Short | Default | Description |
|---|---|---|---|
--config |
-c |
banner-config.json |
Config JSON path |
--output-dir |
-o |
. |
Output directory |
--name |
-n |
all | Generate single banner by name |
--format |
-f |
png |
png, webp, jpeg |
--list-presets |
List IAB/social/web size presets | ||
--init |
Generate starter config | ||
--validate |
Check config, exit 0 or 2 | ||
--dry-run |
Preview without rendering | ||
--json |
Structured JSON to stdout | ||
--verbose |
-v |
Verbose output |
Requirements: ImageMagick 7 (brew install imagemagick or apt install imagemagick).
Workflow Hints
Starting composite mode:
- Ask user for: logo file path, brand name, tagline text, brand colors (hex)
- If user doesn't have a logo yet → use generate-image.py to create one first
- Run
--initto scaffold config, then help user fill in their brand values
During generation:
- Always run
--validatebefore generating to catch font/logo issues early - Use
--nameto iterate on one banner before generating the full set - Show user 3-4 representative sizes (hero, OG, square, leaderboard) for approval
After generation:
- If user wants creative/artistic redesign of banner visuals → switch to generate-image.py (composite only does logo + text on gradient/solid backgrounds)
- If banners look too plain → suggest AI-generating a textured or photographic background first, then compositing the logo onto it
Combined workflow (most powerful):
- Use generate-image.py to AI-create a hero background or textured pattern
- Use composite-banners.py to overlay the logo + text onto that background at all standard sizes This gives both creative AI visuals AND pixel-perfect logo consistency.
Image Tools
On first invocation, detect available image manipulation tools:
which magick convert sips ffmpeg 2>/dev/null
Available Tools
| Tool | Check | Key Operations |
|---|---|---|
ImageMagick 7 (magick) |
magick --version |
Resize, crop, convert, composite |
ImageMagick 6 (convert) |
convert --version |
Same ops, legacy command name |
| sips (macOS) | sips --help |
Resize, format conversion |
| ffmpeg | ffmpeg -version |
Convert formats, resize |
Common Post-Processing
# Resize
magick output.png -resize 512x512 icon-512.png
# Multiple sizes (icons)
for s in 16 32 48 64 128 256 512; do magick output.png -resize ${s}x${s} icon-${s}.png; done
# Convert to WebP
magick output.png output.webp
# Maskable icon (add safe-zone padding)
magick output.png -gravity center -extent 120%x120% maskable.png
# macOS sips resize
sips --resampleWidth 512 --resampleHeight 512 output.png --out icon-512.png
CRITICAL: Check tool availability before using. Prefer magick (IM7) over convert (IM6). If no tools found, inform user: brew install imagemagick.
Common Issues
"No API credentials configured"
Cause: Environment variables not set or not exported.
Fix: Add exports to ~/.zshrc and run source ~/.zshrc. See references/setup-guide.md.
"HTTP 401: Unauthorized"
Cause: Invalid or expired API key/token.
Fix: Check AI_IMG_CREATOR_CF_TOKEN (gateway) or AI_IMG_CREATOR_OPENROUTER_KEY (direct). Regenerate if needed.
"No images in response"
Cause: Model returned text only (safety filter, unclear prompt, or unsupported request). Fix: Make the prompt more specific and descriptive. Avoid prohibited content.
"Connection error" / timeout
Cause: Network issue or image generation taking too long (120s timeout).
Fix: Retry. If persistent, try --provider google as alternative. Check CF gateway status.
Detailed API Reference
For full API formats, response schemas, BYOK configuration, and curl examples: see references/api-reference.md
For first-time setup instructions: see references/setup-guide.md