ai-image-creator

star 2.4k

Generate PNG images using AI (multiple models via OpenRouter including Gemini, FLUX.2, Riverflow, SeedDream, GPT-5 Image, GPT-5.4 Image 2, proxied through Cloudflare AI Gateway BYOK). Also analyze/describe existing images using multimodal AI vision. Use when user asks to "generate an image", "create a PNG", "make an icon", "make it transparent", "describe this image", "analyze this image", "what's in this image", "explain this image", or needs AI-generated visual assets for the project. Supports model selection via keywords (gemini, riverflow, flux2, seedream, gpt5, gpt5.4), configurable aspect ratios/resolutions, transparent backgrounds (-t), reference image editing (-r), image analysis (--analyze), and per-project cost tracking (--costs).

centminmod By centminmod schedule Updated 4/22/2026

name: ai-image-creator description: Generate PNG images using AI (multiple models via OpenRouter including Gemini, FLUX.2, Riverflow, SeedDream, GPT-5 Image, GPT-5.4 Image 2, proxied through Cloudflare AI Gateway BYOK). Also analyze/describe existing images using multimodal AI vision. Use when user asks to "generate an image", "create a PNG", "make an icon", "make it transparent", "describe this image", "analyze this image", "what's in this image", "explain this image", or needs AI-generated visual assets for the project. Supports model selection via keywords (gemini, riverflow, flux2, seedream, gpt5, gpt5.4), configurable aspect ratios/resolutions, transparent backgrounds (-t), reference image editing (-r), image analysis (--analyze), and per-project cost tracking (--costs). allowed-tools: Bash, Read, Write compatibility: Requires uv (Python runner) and network access. Environment variables for CF AI Gateway or direct API keys must be configured in shell profile (~/.zshrc on macOS, ~/.bashrc on Linux, or System Environment Variables on Windows). metadata: tags: image-generation, ai, openrouter, cloudflare, gemini, flux2, riverflow, seedream, gpt5, gpt54

AI Image Creator

Generate PNG images via multiple AI models, routed through Cloudflare AI Gateway BYOK or directly via OpenRouter/Google AI Studio.

Model Selection

When the user mentions a model keyword in their image request, use the corresponding --model flag:

Keyword Model Use When User Says
gemini Google Gemini 3.1 Flash (default) "gemini", "generate an image" (no model specified)
riverflow Sourceful Riverflow v2 Pro "riverflow", "use riverflow"
flux2 FLUX.2 Max "flux2", "flux", "use flux"
seedream ByteDance SeedDream 4.5 "seedream", "use seedream"
gpt5 OpenAI GPT-5 Image "gpt5", "gpt5 image", "use gpt5"
gpt5.4 OpenAI GPT-5.4 Image 2 "gpt5.4", "gpt-5.4 image", "use gpt5.4"

Instructions

Routing check: If the user asks to describe, analyze, or explain an existing image (not generate a new one), skip directly to the Image Analysis (--analyze) section below. No prompt enhancement or output path needed.

Step 1: Write Prompt

For long or complex prompts (recommended), write to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt using the Write tool:

Write prompt text to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt

For short prompts (under 200 chars, no special characters), pass inline via --prompt.

CRITICAL — Prompt Quality Tips:

  • Be detailed and descriptive. Include style, colors, composition, background, and intended use.
  • Good: "A flat-design globe icon with vertical timezone band lines in blue and teal, white background, clean vector style, suitable for a web app at 512x512 pixels"
  • Bad: "globe icon"
  • Specify "transparent background" or "white background" explicitly.
  • For icons, mention the target size (e.g., "512x512", "favicon at 32x32").
  • For photos, describe lighting, camera angle, and mood.

Step 1.5: Prompt Enhancement (Optional — Progressive Disclosure)

Professional prompt patterns are available in 3 reference files. These are not loaded by default — only read them when the user's request matches a category or they explicitly ask for enhancement.

Category Detection — Match the user's request to a category:

If request mentions... Category Also read
"product shot", "product photo", "hero image" product_hero prompt-core.md + prompt-categories.md § product_hero
"lifestyle", "in-use", "in context" lifestyle prompt-core.md + prompt-categories.md § lifestyle
"instagram", "social media", "tiktok", "pinterest" social_media prompt-core.md + prompt-platforms.md + prompt-categories.md § social_media
"banner", "ad", "email header" marketing_banner prompt-core.md + prompt-platforms.md + prompt-categories.md § marketing_banner. Routing hint: If user has an existing logo and wants multiple standard sizes → use composite mode instead (see ## Composite Banners).
"website", "app", "logo", "ad format", "leaderboard", "skyscraper" web_app prompt-core.md + prompt-platforms.md + prompt-categories.md § web_app. Routing hint: For "logo banners" or "OG images with my logo" where user has existing logo → use composite-banners.py. For "design me a new logo" → use generate-image.py.
"brand kit", "logo banners", "banner sizes", "IAB sizes", "consistent banners" + user has existing logo composite Read references/composite-reference.md, use composite-banners.py
"icon", "favicon", "app icon" icon_logo prompt-core.md + prompt-categories.md § icon_logo
"mascot", "character", "illustration", "artwork" illustration prompt-core.md + prompt-categories.md § illustration
"food", "drink", "recipe", "restaurant" food_drink prompt-core.md + prompt-categories.md § food_drink
"building", "interior", "room", "architecture" architecture prompt-core.md + prompt-categories.md § architecture
"chart", "infographic", "data", "diagram" infographic prompt-core.md + prompt-categories.md § infographic
"t-shirt", "mug design", "poster", "POD", "print-on-demand" pod_design prompt-core.md + prompt-platforms.md + prompt-categories.md § pod_design
"describe", "analyze", "what's in this image", "explain image" analyze Skip prompt enhancement — use --analyze mode directly. Read references/analyze-reference.md for advanced analysis patterns
No match / simple request Skip patterns, generate directly

When to skip enhancement:

  • User's prompt is already detailed (150+ words with camera/lighting/composition specifics)
  • Simple/direct requests ("generate a blue circle on white background")
  • User says "no pattern" or provides a fully formed prompt

When to apply:

  • User says "use product_hero pattern" or "apply social_media pattern" (explicit)
  • Request clearly matches a category above (auto-detect)
  • User asks for "enhanced prompt" or "professional quality"

Reference files (in references/ directory):

  • prompt-core.md — Foundational rules: narrative prompting, camera/lens/lighting specs, text rendering rules, model recommendations
  • prompt-platforms.md — Social media ratios, IAB ad sizes, web dimensions, POD specs — all mapped to -a/-s flags
  • prompt-categories.md — 11 category formulas with templates and complete example prompts

Step 2: Run Generation Script

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  [--provider openrouter|google] \
  [-a "16:9"] \
  [-s "2K"] \
  [-m "model-id"] \
  [-r "ref-image.png"] \
  [-t]

With a specific model:

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  -m riverflow \
  -p "A serene mountain lake at sunset"

With transparent background (requires ffmpeg + imagemagick):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "mascot.png" \
  -t \
  -p "A friendly robot mascot character"

With reference image for editing/style transfer (multimodal models only):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "edited.png" \
  -r "original.png" \
  -p "Change the background to a sunset scene"

Or with inline prompt (default model):

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "OUTPUT_PATH" \
  -p "A simple blue circle on white background"

Step 3: Clean Up (if temp file used)

rm -f ${CLAUDE_SKILL_DIR}/tmp/prompt.txt

Step 4: Verify Output

file OUTPUT_PATH

Confirm it shows "PNG image data" and report the file path and size to the user.

Step 5: Post-Processing (optional)

If the user needs resizing, format conversion, or other manipulation, first detect available image tools, then use them. See Image Tools section below.

Parameters

Argument Short Required Default Description
--output -o Yes -- Output file path (parent dirs auto-created)
--prompt -p No -- Inline prompt text
--prompt-file -- No ../tmp/prompt.txt Path to prompt file
--provider -- No openrouter openrouter or google
--aspect-ratio -a No model default OpenRouter only: 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 4:5, 5:4, 21:9
--image-size -s No model default OpenRouter only: 0.5K, 1K, 2K, 4K
--model -m No gemini Model keyword (gemini, riverflow, flux2, seedream, gpt5) or full model ID
--ref -r No -- Reference image file (repeatable). For editing/style transfer. Multimodal models only (gemini, gpt5)
--analyze -- No -- Analyze/describe a reference image (text-only output, no image generated). Requires -r. Multimodal models only
--transparent -t No -- Generate with transparent background. Requires ffmpeg + imagemagick
--costs -- No -- Display generation/cost history for this project and exit
--list-models -- No -- List available model keywords and exit

Environment Variables

Variable Required For Description
AI_IMG_CREATOR_CF_ACCOUNT_ID Gateway mode Cloudflare account ID
AI_IMG_CREATOR_CF_GATEWAY_ID Gateway mode AI Gateway name
AI_IMG_CREATOR_CF_TOKEN Gateway mode Gateway auth token
AI_IMG_CREATOR_OPENROUTER_KEY Direct OpenRouter OpenRouter API key (sk-or-...)
AI_IMG_CREATOR_GEMINI_KEY Direct Google Google AI Studio API key

Gateway mode activates when all 3 CF_* vars are set. Falls back to direct mode if gateway fails.

For first-time setup, see references/setup-guide.md.

Transparent Mode (-t)

Generates images with transparent backgrounds using a 3-step pipeline:

  1. Green screen generation — Prompt is augmented to place subject on solid #00FF00 green
  2. FFmpeg chroma key — Removes green background + green fringe from edges
  3. ImageMagick auto-crop — Trims transparent padding

Requirements: brew install ffmpeg imagemagick

Use cases: Game sprites, icons, logos, mascots, marketing assets with transparency.

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "sprite.png" -t -p "A pixel art treasure chest"

Reference Images (-r)

Send existing images alongside text prompts for editing, style transfer, or guided generation. Supports multiple references. Multimodal models only (gemini, gpt5) — image-only models (riverflow, flux2, seedream) will error.

# Edit an existing image
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "edited.png" -r "photo.png" -p "Make the background white"

# Style transfer with multiple references
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  -o "combined.png" -r "style1.png" -r "content.png" -p "Apply the style of the first image to the second"

Supported formats: PNG, JPEG, WebP, GIF.

Image Analysis (--analyze)

Describe, analyze, or explain existing images using multimodal AI vision. Returns text-only output (no image generated). Multimodal models only (gemini, gpt5).

No -o output path needed. No prompt enhancement needed. The script outputs JSON to stdout with the model's analysis in the analysis field.

# Analyze with default prompt (describes subject, style, colors, composition, mood, text)
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png"

# Analyze with custom prompt
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png" -p "Describe this image in plain text and also in JSON structured output"

# Analyze with a specific model
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "photo.png" -m gpt5 -p "What text is visible in this image?"

# Analyze multiple images together
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
  --analyze -r "before.png" -r "after.png" -p "Compare these two images and describe the differences"

JSON output format:

{"ok": true, "analyze": true, "analysis": "<model text>", "provider": "openrouter", "model": "...", "mode": "gateway", "elapsed_seconds": 3.2, "ref_images": 1}

Incompatible flags: --analyze cannot be combined with -o, -t, -a, or -s.

For advanced analysis prompt patterns (structured output, comparison, targeted analysis), read references/analyze-reference.md.

Cost Tracking (--costs)

Every generation is logged to .ai-image-creator/costs.json in your project directory. View history:

uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py --costs

Shows per-model breakdown: generation count, total tokens, elapsed time, and recent entries. Security: Only non-sensitive data is logged (model, tokens, timing, file path). No API keys or credentials are ever stored.

Consider adding .ai-image-creator/ to your .gitignore.

Composite Banners

Generate consistent logo banners across multiple sizes from a JSON config. Uses ImageMagick for offline compositing — no API calls, no network required. Composites an existing logo/mark onto branded backgrounds with text at standard dimensions.

Composite vs. AI Generation — Decision Rule

Use composite-banners.py when ALL of these are true:

  • User has an existing logo/mark they want to use as-is (provides or references a logo file)
  • User wants consistent branding across multiple standard sizes (not one creative image)
  • The output is logo + text on a solid/gradient background (not a photograph, illustration, or creative design)

Use generate-image.py (AI generation) when ANY of these are true:

  • User wants a creative/artistic banner design (describes a scene, mood, concept, or style)
  • User wants AI to design the visual content (product shots, illustrations, creative layouts)
  • User wants a single banner with artistic content, not a multi-size brand kit

When composite mode applies, read references/composite-reference.md for full config schema, preset dimensions, and font handling details.

Quick Start

  1. Init config: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --init
  2. Edit banner-config.json — set logo path, brand text, colors, banner sizes
  3. Validate: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --validate
  4. Generate: uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py -c banner-config.json -o ./banners/

Composite Parameters

Argument Short Default Description
--config -c banner-config.json Config JSON path
--output-dir -o . Output directory
--name -n all Generate single banner by name
--format -f png png, webp, jpeg
--list-presets List IAB/social/web size presets
--init Generate starter config
--validate Check config, exit 0 or 2
--dry-run Preview without rendering
--json Structured JSON to stdout
--verbose -v Verbose output

Requirements: ImageMagick 7 (brew install imagemagick or apt install imagemagick).

Workflow Hints

Starting composite mode:

  • Ask user for: logo file path, brand name, tagline text, brand colors (hex)
  • If user doesn't have a logo yet → use generate-image.py to create one first
  • Run --init to scaffold config, then help user fill in their brand values

During generation:

  • Always run --validate before generating to catch font/logo issues early
  • Use --name to iterate on one banner before generating the full set
  • Show user 3-4 representative sizes (hero, OG, square, leaderboard) for approval

After generation:

  • If user wants creative/artistic redesign of banner visuals → switch to generate-image.py (composite only does logo + text on gradient/solid backgrounds)
  • If banners look too plain → suggest AI-generating a textured or photographic background first, then compositing the logo onto it

Combined workflow (most powerful):

  1. Use generate-image.py to AI-create a hero background or textured pattern
  2. Use composite-banners.py to overlay the logo + text onto that background at all standard sizes This gives both creative AI visuals AND pixel-perfect logo consistency.

Image Tools

On first invocation, detect available image manipulation tools:

which magick convert sips ffmpeg 2>/dev/null

Available Tools

Tool Check Key Operations
ImageMagick 7 (magick) magick --version Resize, crop, convert, composite
ImageMagick 6 (convert) convert --version Same ops, legacy command name
sips (macOS) sips --help Resize, format conversion
ffmpeg ffmpeg -version Convert formats, resize

Common Post-Processing

# Resize
magick output.png -resize 512x512 icon-512.png

# Multiple sizes (icons)
for s in 16 32 48 64 128 256 512; do magick output.png -resize ${s}x${s} icon-${s}.png; done

# Convert to WebP
magick output.png output.webp

# Maskable icon (add safe-zone padding)
magick output.png -gravity center -extent 120%x120% maskable.png

# macOS sips resize
sips --resampleWidth 512 --resampleHeight 512 output.png --out icon-512.png

CRITICAL: Check tool availability before using. Prefer magick (IM7) over convert (IM6). If no tools found, inform user: brew install imagemagick.

Common Issues

"No API credentials configured"

Cause: Environment variables not set or not exported. Fix: Add exports to ~/.zshrc and run source ~/.zshrc. See references/setup-guide.md.

"HTTP 401: Unauthorized"

Cause: Invalid or expired API key/token. Fix: Check AI_IMG_CREATOR_CF_TOKEN (gateway) or AI_IMG_CREATOR_OPENROUTER_KEY (direct). Regenerate if needed.

"No images in response"

Cause: Model returned text only (safety filter, unclear prompt, or unsupported request). Fix: Make the prompt more specific and descriptive. Avoid prohibited content.

"Connection error" / timeout

Cause: Network issue or image generation taking too long (120s timeout). Fix: Retry. If persistent, try --provider google as alternative. Check CF gateway status.

Detailed API Reference

For full API formats, response schemas, BYOK configuration, and curl examples: see references/api-reference.md

For first-time setup instructions: see references/setup-guide.md

Install via CLI
npx skills add https://github.com/centminmod/my-claude-code-setup --skill ai-image-creator
Repository Details
star Stars 2,405
call_split Forks 227
navigation Branch main
article Path SKILL.md
More from Creator