image-generation - SKILL.md Agent Skill

name: image-generation description: Generate images from text prompts (and optionally edit/remix input images). Use when the user asks to create, generate, draw, render, or edit an image, illustration, logo, icon, diagram, or photo.

Image Generation

Generate images via Letta's hosted endpoint POST /v1/images/generations. The API usually returns base64 image bytes, but some providers return signed image URLs; save either form to a local image file before replying.

Example

Generate the image, save it locally, then show it inline:

curl -sS -X POST "https://api.letta.com/v1/images/generations" \
  -H "Authorization: Bearer $LETTA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"provider":"gemini","prompt":"a friendly robot mascot waving, flat vector logo, mint green background","n":1}' \
  > image-response.json

python3 - <<'PY'
import base64, json, urllib.request

with open("image-response.json") as f:
    response = json.load(f)

image = response["images"][0]
if image.get("b64_json"):
    data = base64.b64decode(image["b64_json"])
else:
    data = urllib.request.urlopen(image["url"]).read()

with open("robot-mascot.png", "wb") as f:
    f.write(data)

print("saved robot-mascot.png; credits:", response["billing"]["credits_charged"])
PY

In Bash tools launched by Letta Code, the current Letta credential is available as $LETTA_API_KEY. This works for both Letta auth modes: it may be a normal Letta API key, or the OAuth access token from a Letta Cloud OAuth login. Reference it directly. If it is missing, the user needs to authenticate with Letta Cloud (or provide a Letta API key); do not ask for a Flux/OpenAI/Gemini provider key. This endpoint also does not use /connect BYOK providers — the only provider values supported here are flux, gemini, and openai.

Then show the image to the user by embedding the saved file in your reply:

Here's the mascot:

![a friendly robot mascot waving, flat vector logo](./robot-mascot.png)

The Letta Code UI renders local file paths in markdown image tags, so the image appears inline. Always display generated images this way — don't just report the path, and never paste the raw base64 / a data: URI. The markdown path must match where you saved the file. For n > 1, save each image to its own file and embed each on its own line. Also tell the user the credits_charged.

Request body

Field	Type	Notes
`provider`	`"flux"` \| `"gemini"` \| `"openai"`	Required.
`prompt`	string	Required, 1–32000 chars.
`model`	string	Optional; defaults per provider (below).
`n`	int 1–4	Optional, default 1. Request variations in one call.
`size`	string	Optional, e.g. `"1024x1024"` (OpenAI).
`quality`	`low`\|`medium`\|`high`\|`auto`	Optional (OpenAI; higher = more credits).
`output_format`	`png`\|`jpeg`\|`webp`	Optional (OpenAI).
`input_images`	string[] (max 14)	Optional. Base64 data URLs for edit/remix.
`seed`	int	Optional.

Provider	Default model	Use for
`flux`	`flux-2-pro`	Default for normal text-to-image. High-quality general image generation; commonly returns signed URLs.
`gemini`	`gemini-3-pro-image`	Strong prompt adherence, image editing/remix.
`openai`	`gpt-image-2`	Photoreal output, explicit `size`/`quality`/`output_format`.

Default to flux for normal text-to-image requests. Use gemini when the user provides input images or wants image editing/remix. Use openai when the user wants photoreal output or a specific size/quality.

Response

{
  "provider": "gemini",
  "model": "gemini-3-pro-image",
  "images": [{ "b64_json": "<base64>", "mime_type": "image/png" }],
  "billing": { "credits_charged": 12, "...": "..." }
}

Each images[] entry has either b64_json or url, plus mime_type. Gemini always returns b64_json. Flux commonly returns a signed url; download it to your local image file immediately because signed URLs expire. If OpenAI returns a url, download that URL instead of base64-decoding.

Editing / remixing images

Pass source images in input_images as base64 data URLs (data:<mime>;base64,<data>) and describe the edit in prompt. Gemini handles multi-image edits well. To build a data URL from a local file:

DATA_URL="data:image/png;base64,$(base64 < input.png | tr -d '\n')"

Notes

Billing: every success charges credits; don't loop needlessly, and report credits_charged.
Errors: 402 = insufficient credits (credits_required in body); 400/500 return { "message": "..." } — surface it to the user.
Only flux, gemini, and openai are supported here.