image-gen - SKILL.md Agent Skill

name: image-gen description: AI image generation — Stable Diffusion, Midjourney, DALL-E, ComfyUI. Prompt engineering for images, inpainting, outpainting, ControlNet domain: content tags:

content-creation
digital-content
gen
image
media

Overview

AI image generation creates images from text prompts using diffusion models. This skill covers prompt engineering for consistent results, inpainting/outpainting for editing, ControlNet for precise control, and API integration for production workflows.

Capabilities

Generate images from text prompts (DALL-E, Stable Diffusion, Midjourney)
Engineer prompts with positive/negative prompts, style modifiers, and weights
Edit existing images with inpainting and outpainting
Use ControlNet for pose, depth, and edge-guided generation
Batch generate images for content pipelines
Integrate via API (OpenAI, Stability AI, Replicate)
Set up local generation with ComfyUI or Automatic1111

When to Use

Creating marketing visuals, thumbnails, social media images
Generating product mockups and concept art
Building content pipelines that need custom images
Editing photos (remove backgrounds, change styles)
Creating consistent character/brand imagery

When NOT to Use

Task is about content strategy, not creation (use strategy skills)
Task is about content distribution (use distribution skills)
You need to analyze content performance (use analytics skills)
Task is about content moderation (use moderation tools)
You don't have content guidelines
Task requires domain expertise (consult experts)

Pseudo Code

The image-gen workflow follows a standard pipeline pattern.

Core flow:

# image-gen primary flow
input = prepare(raw_data)
result = process(input, config={comfyui, controlnet, dall, diffusion, engineering})
validate(result)
deliver(result)

Error handling:

on error:
  log(error_details)
  retry_with_backoff(max=3)
  if still_failing: alert_and_escalate()

Core Workflow

# image-gen primary flow
input = prepare(raw_data)
result = process(input, config={comfyui, controlnet, dall, diffusion, engineering})
validate(result)
deliver(result)

Error Handling

on error:
  log(error_details)
  retry_with_backoff(max=3)
  if still_failing: alert_and_escalate()

OpenAI DALL-E API

from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="A minimalist logo for a tech startup called Nova, flat design, blue and white",
    size="1024x1024",
    quality="hd",
    n=1,
)

image_url = response.data[0].url

Stability AI API

import requests

response = requests.post(
    "https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
    headers={"Authorization": f"Bearer {STABILITY_API_KEY}"},
    json={
        "text_prompts": [
            {"text": "cyberpunk cityscape at night, neon lights, rain", "weight": 1},
            {"text": "blurry, low quality, watermark", "weight": -1},
        ],
        "cfg_scale": 7,
        "steps": 30,
        "width": 1024,
        "height": 1024,
    },
)

Prompt Engineering for Images

Structure: [subject] [style] [details] [lighting] [camera] [quality]

Good prompt:
"A portrait of a cyberpunk hacker, neon lighting, rain-soaked streets,
cinematic composition, 8k, photorealistic, volumetric lighting,
shot on Sony A7III, f/1.4 bokeh"

Negative prompt:
"blurry, low quality, watermark, text, deformed, ugly, extra limbs,
bad anatomy, bad hands, cropped, worst quality"

Weights (Automatic1111):
"(neon:1.5) city at night" — emphasize neon
"ugly, (deformed:1.3)" — strongly avoid deformed

ComfyUI Workflow (Local)

# Load ComfyUI workflow JSON
import json

with open("workflow.json") as f:
    workflow = json.load(f)

# Modify prompt
workflow["6"]["inputs"]["text"] = "a cat wearing a space helmet, digital art"

# Queue generation
import requests
requests.post("http://127.0.0.1:8188/prompt", json={"prompt": workflow})

Batch Generation Pipeline

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def generate_image(prompt: str, index: int):
    response = await client.images.generate(
        model="dall-e-3", prompt=prompt, size="1024x1024", quality="standard"
    )
    return {"index": index, "url": response.data[0].url}

prompts = [
    "Minimalist tech blog header, abstract circuits",
    "Team collaboration illustration, flat design",
    "Cloud infrastructure diagram, isometric",
]

results = await asyncio.gather(*[generate_image(p, i) for i, p in enumerate(prompts)])

Common Patterns

Batch processing: Process multiple items in parallel for throughput
Retry with backoff: Handle transient failures gracefully
Rate limiting: Respect API limits with configurable delays
Logging: Structured logging for debugging and audit trails

Consistent Characters

Prompt template with fixed descriptors:
"Character NAME, [fixed appearance], [scene description], [style]"

Example:
"Luna, young woman with silver hair and blue eyes, standing in a cyberpunk market,
anime style, studio lighting"

Style Transfer

# img2img: transform existing image
response = client.images.edit(
    model="dall-e-2",
    image=open("original.png", "rb"),
    prompt="Transform into watercolor painting style",
    size="1024x1024",
)

Inpainting (Edit Part of Image)

response = client.images.edit(
    model="dall-e-2",
    image=open("photo.png", "rb"),
    mask=open("mask.png", "rb"),  # White = edit area
    prompt="Replace background with beach sunset",
)

How to Use

Define content goal (traffic, engagement, conversion, brand awareness)
Research target audience pain points and search intent
Generate content using appropriate AI tools
Edit and humanize output for authenticity
Optimize for target platform (SEO, hashtags, format)
Schedule and distribute across channels
Measure performance and iterate

Red Flags

AI-generated content sounds robotic: Always run through humanizer before publishing
Engagement dropping week-over-week: Content fatigue or algorithm change — vary formats
Duplicate content across platforms: Adapt content per platform, don't just cross-post
No content calendar: Sporadic posting kills audience retention
Ignoring analytics: Content without measurement is just publishing, not marketing

Verification

Skill output matches expected behavior

Process

Analyze the task requirements
Apply domain expertise
Verify output quality