ai-image-creator

name: ai-image-creator description: Generate PNG images using AI (multiple models via OpenRouter including Gemini, FLUX.2, Riverflow, SeedDream, GPT-5 Image, GPT-5.4 Image 2, proxied through Cloudflare AI Gateway BYOK). Also analyze/describe existing images using multimodal AI vision. Use when user asks to "generate an image", "create a PNG", "make an icon", "make it transparent", "describe this image", "analyze this image", "what's in this image", "explain this image", or needs AI-generated visual assets for the project. Supports model selection via keywords (gemini, riverflow, flux2, seedream, gpt5, gpt5.4), configurable aspect ratios/resolutions, transparent backgrounds (-t), reference image editing (-r), image analysis (--analyze), and per-project cost tracking (--costs). allowed-tools: Bash, Read, Write compatibility: Requires uv (Python runner) and network access. Environment variables for CF AI Gateway or direct API keys must be configured in shell profile (~/.zshrc on macOS, ~/.bashrc on Linux, or System Environment Variables on Windows). metadata: tags: image-generation, ai, openrouter, cloudflare, gemini, flux2, riverflow, seedream, gpt5, gpt54

AI Image Creator

Generate PNG images via multiple AI models, routed through Cloudflare AI Gateway BYOK or directly via OpenRouter/Google AI Studio.

Model Selection

When the user mentions a model keyword in their image request, use the corresponding --model flag:

Keyword	Model	Use When User Says
`gemini`	Google Gemini 3.1 Flash (default)	"gemini", "generate an image" (no model specified)
`riverflow`	Sourceful Riverflow v2 Pro	"riverflow", "use riverflow"
`flux2`	FLUX.2 Max	"flux2", "flux", "use flux"
`seedream`	ByteDance SeedDream 4.5	"seedream", "use seedream"
`gpt5`	OpenAI GPT-5 Image	"gpt5", "gpt5 image", "use gpt5"
`gpt5.4`	OpenAI GPT-5.4 Image 2	"gpt5.4", "gpt-5.4 image", "use gpt5.4"

Instructions

Routing check: If the user asks to describe, analyze, or explain an existing image (not generate a new one), skip directly to the Image Analysis (--analyze) section below. No prompt enhancement or output path needed.

Step 1: Write Prompt

For long or complex prompts (recommended), write to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt using the Write tool:

Write prompt text to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt

For short prompts (under 200 chars, no special characters), pass inline via --prompt.

CRITICAL — Prompt Quality Tips:

Be detailed and descriptive. Include style, colors, composition, background, and intended use.
Good: "A flat-design globe icon with vertical timezone band lines in blue and teal, white background, clean vector style, suitable for a web app at 512x512 pixels"
Bad: "globe icon"
Specify "transparent background" or "white background" explicitly.
For icons, mention the target size (e.g., "512x512", "favicon at 32x32").
For photos, describe lighting, camera angle, and mood.

Step 1.5: Prompt Enhancement (Optional — Progressive Disclosure)

Professional prompt patterns are available in 3 reference files. These are not loaded by default — only read them when the user's request matches a category or they explicitly ask for enhancement.

Category Detection — Match the user's request to a category:

If request mentions...	Category	Also read
"product shot", "product photo", "hero image"	`product_hero`	`prompt-core.md` + `prompt-categories.md` § product_hero
"lifestyle", "in-use", "in context"	`lifestyle`	`prompt-core.md` + `prompt-categories.md` § lifestyle
"instagram", "social media", "tiktok", "pinterest"	`social_media`	`prompt-core.md` + `prompt-platforms.md` + `prompt-categories.md` § social_media
"banner", "ad", "email header"	`marketing_banner`	`prompt-core.md` + `prompt-platforms.md` + `prompt-categories.md` § marketing_banner. Routing hint: If user has an existing logo and wants multiple standard sizes → use composite mode instead (see `## Composite Banners`).
"website", "app", "logo", "ad format", "leaderboard", "skyscraper"	`web_app`	`prompt-core.md` + `prompt-platforms.md` + `prompt-categories.md` § web_app. Routing hint: For "logo banners" or "OG images with my logo" where user has existing logo → use `composite-banners.py`. For "design me a new logo" → use `generate-image.py`.
"brand kit", "logo banners", "banner sizes", "IAB sizes", "consistent banners" + user has existing logo	`composite`	Read `references/composite-reference.md`, use `composite-banners.py`
"icon", "favicon", "app icon"	`icon_logo`	`prompt-core.md` + `prompt-categories.md` § icon_logo
"mascot", "character", "illustration", "artwork"	`illustration`	`prompt-core.md` + `prompt-categories.md` § illustration
"food", "drink", "recipe", "restaurant"	`food_drink`	`prompt-core.md` + `prompt-categories.md` § food_drink
"building", "interior", "room", "architecture"	`architecture`	`prompt-core.md` + `prompt-categories.md` § architecture
"chart", "infographic", "data", "diagram"	`infographic`	`prompt-core.md` + `prompt-categories.md` § infographic
"t-shirt", "mug design", "poster", "POD", "print-on-demand"	`pod_design`	`prompt-core.md` + `prompt-platforms.md` + `prompt-categories.md` § pod_design
"describe", "analyze", "what's in this image", "explain image"	`analyze`	Skip prompt enhancement — use `--analyze` mode directly. Read `references/analyze-reference.md` for advanced analysis patterns
No match / simple request	—	Skip patterns, generate directly

When to skip enhancement:

User's prompt is already detailed (150+ words with camera/lighting/composition specifics)
Simple/direct requests ("generate a blue circle on white background")
User says "no pattern" or provides a fully formed prompt

When to apply:

User says "use product_hero pattern" or "apply social_media pattern" (explicit)
Request clearly matches a category above (auto-detect)
User asks for "enhanced prompt" or "professional quality"

Reference files (in references/ directory):

prompt-core.md — Foundational rules: narrative prompting, camera/lens/lighting specs, text rendering rules, model recommendations
prompt-platforms.md — Social media ratios, IAB ad sizes, web dimensions, POD specs — all mapped to -a/-s flags
prompt-categories.md — 11 category formulas with templates and complete example prompts

Step 2: Run Generation Script

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  [--provider openrouter|google] \
  [-a "16:9"] \
  [-s "2K"] \
  [-m "model-id"] \
  [-r "ref-image.png"] \
  [-t]

With a specific model:

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  -m riverflow \
  -p "A serene mountain lake at sunset"

With transparent background (requires ffmpeg + imagemagick):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "mascot.png" \
  -t \
  -p "A friendly robot mascot character"

With reference image for editing/style transfer (multimodal models only):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "edited.png" \
  -r "original.png" \
  -p "Change the background to a sunset scene"

Or with inline prompt (default model):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  -p "A simple blue circle on white background"

Step 3: Clean Up (if temp file used)

rm -f ${CLAUDE_SKILL_DIR}/tmp/prompt.txt

Step 4: Verify Output

file OUTPUT_PATH

Confirm it shows "PNG image data" and report the file path and size to the user.

Step 5: Post-Processing (optional)

If the user needs resizing, format conversion, or other manipulation, first detect available image tools, then use them. See Image Tools section below.

Parameters

Argument	Short	Required	Default	Description
`--output`	`-o`	Yes	--	Output file path (parent dirs auto-created)
`--prompt`	`-p`	No	--	Inline prompt text
`--prompt-file`	--	No	`../tmp/prompt.txt`	Path to prompt file
`--provider`	--	No	`openrouter`	`openrouter` or `google`
`--aspect-ratio`	`-a`	No	model default	OpenRouter only: `1:1`, `16:9`, `9:16`, `3:2`, `2:3`, `4:3`, `3:4`, `4:5`, `5:4`, `21:9`
`--image-size`	`-s`	No	model default	OpenRouter only: `0.5K`, `1K`, `2K`, `4K`
`--model`	`-m`	No	`gemini`	Model keyword (`gemini`, `riverflow`, `flux2`, `seedream`, `gpt5`) or full model ID
`--ref`	`-r`	No	--	Reference image file (repeatable). For editing/style transfer. Multimodal models only (gemini, gpt5)
`--analyze`	--	No	--	Analyze/describe a reference image (text-only output, no image generated). Requires `-r`. Multimodal models only
`--transparent`	`-t`	No	--	Generate with transparent background. Requires ffmpeg + imagemagick
`--costs`	--	No	--	Display generation/cost history for this project and exit
`--list-models`	--	No	--	List available model keywords and exit

Environment Variables

Variable	Required For	Description
`AI_IMG_CREATOR_CF_ACCOUNT_ID`	Gateway mode	Cloudflare account ID
`AI_IMG_CREATOR_CF_GATEWAY_ID`	Gateway mode	AI Gateway name
`AI_IMG_CREATOR_CF_TOKEN`	Gateway mode	Gateway auth token
`AI_IMG_CREATOR_OPENROUTER_KEY`	Direct OpenRouter	OpenRouter API key (`sk-or-...`)
`AI_IMG_CREATOR_GEMINI_KEY`	Direct Google	Google AI Studio API key

Gateway mode activates when all 3 CF_* vars are set. Falls back to direct mode if gateway fails.

For first-time setup, see references/setup-guide.md.

Transparent Mode (`-t`)

Generates images with transparent backgrounds using a 3-step pipeline:

Green screen generation — Prompt is augmented to place subject on solid #00FF00 green
FFmpeg chroma key — Removes green background + green fringe from edges
ImageMagick auto-crop — Trims transparent padding

Requirements: brew install ffmpeg imagemagick

Use cases: Game sprites, icons, logos, mascots, marketing assets with transparency.

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "sprite.png" -t -p "A pixel art treasure chest"

Reference Images (`-r`)

Send existing images alongside text prompts for editing, style transfer, or guided generation. Supports multiple references. Multimodal models only (gemini, gpt5) — image-only models (riverflow, flux2, seedream) will error.

# Edit an existing image
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "edited.png" -r "photo.png" -p "Make the background white"

# Style transfer with multiple references
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "combined.png" -r "style1.png" -r "content.png" -p "Apply the style of the first image to the second"

Supported formats: PNG, JPEG, WebP, GIF.

Image Analysis (`--analyze`)

Describe, analyze, or explain existing images using multimodal AI vision. Returns text-only output (no image generated). Multimodal models only (gemini, gpt5).

No -o output path needed. No prompt enhancement needed. The script outputs JSON to stdout with the model's analysis in the analysis field.

# Analyze with default prompt (describes subject, style, colors, composition, mood, text)
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png"

# Analyze with custom prompt
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png" -p "Describe this image in plain text and also in JSON structured output"

# Analyze with a specific model
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png" -m gpt5 -p "What text is visible in this image?"

# Analyze multiple images together
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "before.png" -r "after.png" -p "Compare these two images and describe the differences"

JSON output format:

{"ok": true, "analyze": true, "analysis": "<model text>", "provider": "openrouter", "model": "...", "mode": "gateway", "elapsed_seconds": 3.2, "ref_images": 1}

Incompatible flags: --analyze cannot be combined with -o, -t, -a, or -s.

For advanced analysis prompt patterns (structured output, comparison, targeted analysis), read references/analyze-reference.md.

Cost Tracking (`--costs`)

Every generation is logged to .ai-image-creator/costs.json in your project directory. View history:

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py --costs

Shows per-model breakdown: generation count, total tokens, elapsed time, and recent entries. Security: Only non-sensitive data is logged (model, tokens, timing, file path). No API keys or credentials are ever stored.

Consider adding .ai-image-creator/ to your .gitignore.

Composite Banners

Generate consistent logo banners across multiple sizes from a JSON config. Uses ImageMagick for offline compositing — no API calls, no network required. Composites an existing logo/mark onto branded backgrounds with text at standard dimensions.

Composite vs. AI Generation — Decision Rule

Use composite-banners.py when ALL of these are true:

User has an existing logo/mark they want to use as-is (provides or references a logo file)
User wants consistent branding across multiple standard sizes (not one creative image)
The output is logo + text on a solid/gradient background (not a photograph, illustration, or creative design)

Use generate-image.py (AI generation) when ANY of these are true:

User wants a creative/artistic banner design (describes a scene, mood, concept, or style)
User wants AI to design the visual content (product shots, illustrations, creative layouts)
User wants a single banner with artistic content, not a multi-size brand kit

When composite mode applies, read references/composite-reference.md for full config schema, preset dimensions, and font handling details.

Quick Start

Init config: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --init
Edit banner-config.json — set logo path, brand text, colors, banner sizes
Validate: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --validate
Generate: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py -c banner-config.json -o ./banners/

Composite Parameters

Argument	Short	Default	Description
`--config`	`-c`	`banner-config.json`	Config JSON path
`--output-dir`	`-o`	`.`	Output directory
`--name`	`-n`	all	Generate single banner by name
`--format`	`-f`	`png`	`png`, `webp`, `jpeg`
`--list-presets`			List IAB/social/web size presets
`--init`			Generate starter config
`--validate`			Check config, exit 0 or 2
`--dry-run`			Preview without rendering
`--json`			Structured JSON to stdout
`--verbose`	`-v`		Verbose output

Requirements: ImageMagick 7 (brew install imagemagick or apt install imagemagick).

Workflow Hints

Starting composite mode:

Ask user for: logo file path, brand name, tagline text, brand colors (hex)
If user doesn't have a logo yet → use generate-image.py to create one first
Run --init to scaffold config, then help user fill in their brand values

During generation:

Always run --validate before generating to catch font/logo issues early
Use --name to iterate on one banner before generating the full set
Show user 3-4 representative sizes (hero, OG, square, leaderboard) for approval

After generation:

If user wants creative/artistic redesign of banner visuals → switch to generate-image.py (composite only does logo + text on gradient/solid backgrounds)
If banners look too plain → suggest AI-generating a textured or photographic background first, then compositing the logo onto it

Combined workflow (most powerful):

Use generate-image.py to AI-create a hero background or textured pattern
Use composite-banners.py to overlay the logo + text onto that background at all standard sizes This gives both creative AI visuals AND pixel-perfect logo consistency.

Image Tools

On first invocation, detect available image manipulation tools:

which magick convert sips ffmpeg 2>/dev/null

Available Tools

Tool	Check	Key Operations
ImageMagick 7 (`magick`)	`magick --version`	Resize, crop, convert, composite
ImageMagick 6 (`convert`)	`convert --version`	Same ops, legacy command name
sips (macOS)	`sips --help`	Resize, format conversion
ffmpeg	`ffmpeg -version`	Convert formats, resize

Common Post-Processing

# Resize
magick output.png -resize 512x512 icon-512.png

# Multiple sizes (icons)
for s in 16 32 48 64 128 256 512; do magick output.png -resize ${s}x${s} icon-${s}.png; done

# Convert to WebP
magick output.png output.webp

# Maskable icon (add safe-zone padding)
magick output.png -gravity center -extent 120%x120% maskable.png

# macOS sips resize
sips --resampleWidth 512 --resampleHeight 512 output.png --out icon-512.png

CRITICAL: Check tool availability before using. Prefer magick (IM7) over convert (IM6). If no tools found, inform user: brew install imagemagick.

Common Issues

"No API credentials configured"

Cause: Environment variables not set or not exported. Fix: Add exports to ~/.zshrc and run source ~/.zshrc. See references/setup-guide.md.

"HTTP 401: Unauthorized"

Cause: Invalid or expired API key/token. Fix: Check AI_IMG_CREATOR_CF_TOKEN (gateway) or AI_IMG_CREATOR_OPENROUTER_KEY (direct). Regenerate if needed.

"No images in response"

Cause: Model returned text only (safety filter, unclear prompt, or unsupported request). Fix: Make the prompt more specific and descriptive. Avoid prohibited content.

"Connection error" / timeout

Cause: Network issue or image generation taking too long (120s timeout). Fix: Retry. If persistent, try --provider google as alternative. Check CF gateway status.

Detailed API Reference

For full API formats, response schemas, BYOK configuration, and curl examples: see references/api-reference.md

For first-time setup instructions: see references/setup-guide.md