name: image-gen description: "Generate and edit images using the Gemini API. Text-to-image, image editing, multi-turn iteration, 4K resolution, search grounding." metadata: { "requires": { "bins": ["curl", "jq"], "env": ["GEMINI_API_KEY"] } }
Image Generation Skill
Generate and edit images via Gemini's native image generation API using curl.
Models
| Model | ID | Best for |
|---|---|---|
| Nano Banana | gemini-2.5-flash-image |
Fast, high-volume, low-latency |
| Nano Banana Pro | gemini-3-pro-image-preview |
Pro asset production, complex prompts, accurate text rendering, 4K |
Default to gemini-2.5-flash-image unless the user asks for high quality, 4K, search grounding, or text-heavy images.
Text-to-Image
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}]
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Read the output image file to show it to the user.
Image Editing (image + text → image)
Encode an existing image as base64 and send it alongside a text prompt:
BASE64_IMG=$(base64 -i input.png)
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"contents\": [{
\"parts\": [
{\"text\": \"YOUR EDIT PROMPT HERE\"},
{\"inline_data\": {\"mime_type\": \"image/png\", \"data\": \"$BASE64_IMG\"}}
]
}]
}" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Pro Model Options
When using gemini-3-pro-image-preview, you can set aspect ratio and resolution:
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Aspect Ratios
1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Resolutions (Pro only)
1K (default), 2K, 4K — must be uppercase K.
Search Grounding (Pro only)
Generate images based on real-time info (weather, news, etc.):
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
"tools": [{"google_search": {}}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Handling Responses
The API returns JSON with parts that can contain text and/or image data. Extract text and image separately:
# save full response
RESPONSE=$(curl -s -X POST ... )
# extract text (if any)
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.text) | .text'
# extract and save image
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Workflow
- Understand what the user wants to generate or edit
- Pick the right model (flash for speed, pro for quality/text/4K)
- Write a detailed, descriptive prompt — more detail = better results
- Run the curl command, save the image
- Read the image file to display it inline
- If the user wants edits, use the image editing flow with the previous output as input
Tips
- Prompts should be descriptive and specific — style, composition, lighting, mood
- For image editing, describe what to change, not what to keep
- Pro model has a "thinking" mode — may take longer but produces better results
- All generated images include a SynthID watermark
- Pro supports up to 14 reference images in a single request (up to 6 objects + 5 humans)
- If the response has no
inlineData, check for error messages in the JSON