name: image-generate
description: Generate an image from a text prompt via the cloud LLM image proxy, persist it as a content-addressed workspace asset, and return a ContentBlock that downstream renderers can attach. Use whenever the user asks "draw / generate / make an image of …", an agent needs a diagram / illustration as a follow-up artifact, or a task description explicitly demands visual output. Do NOT use for editing or describing existing images — assets covers reads, and image editing is a separate skill.
applies_to: [hermes, claude-code, openclaw, codex]
requires:
- assets
phaseModel:
defaultPhase: tool_use
version: 1
Image Generate
Turn a text prompt into a content-addressed image asset plus a v2.0 §4.6 ContentBlock that the chat renderer can surface inline. The skill is a thin orchestration over two existing surfaces:
- LLM image gateway — the cloud's NewAPI proxy at
POST /api/v1/images/generations(OpenAI-compatible — see §"Hand-off" if the endpoint is not yet wired in your deployment). - Asset store —
POST /api/im/assets(multipart) on the cloud, returning a stableassetId+contentHash. Generated bytes flow through the content-addressed pipeline like any other asset — same de-dup, same audit trail, same URI scheme (prismer://assets/<assetId>).
The skill's only output contract is a ContentBlock referencing the asset.
Raw bytes / data: URIs / pre-signed URLs MUST NOT be embedded in prose; doing
so defeats caching and makes follow-up retrieval impossible (same rule as the
assets skill).
When to use
- The user says "draw", "generate an image of …", "make me a picture / poster / diagram / illustration".
- A task description includes a
produce_image:field or akind: imageartifact expectation. - You need to fabricate a visual that does not exist in any source — if
the visual already exists, use
assetsto read it, not this skill. - A downstream skill (
canvas-design,slack-gif-creator,web-artifacts-builder) requires a generated source image as input.
Not when to use
- Editing / variation / inpainting an existing image — that's a separate
upcoming skill (
image-edit). Don't fake it by reading + regenerating. - Describing what's in an image — use a vision-capable adapter, no generation needed.
- Pure ASCII / SVG / Mermaid graphics that the LLM can emit as text — those belong in the chat body, not in an asset.
- Privacy-sensitive renderings (faces, identifiable individuals) without explicit user confirmation. The skill does not gate this; the agent must.
API Reference
There is no cloud image generate subcommand in the runtime CLI today
(release 201 audit, sdk/prismer-cloud/runtime/src/cli/commands/). Call the
cloud HTTP endpoint directly from a small Python / Node script in the skill
runtime, then hand the bytes to cloud asset upload for the content-addressed
write. Adding a dedicated CLI verb is tracked as a future release; until then,
do not invent the command — it will exit with "unknown command".
HTTP shape:
POST /api/v1/images/generations
Authorization: Bearer <user JWT or sk-prismer-* key>
Content-Type: application/json
{
"prompt": "<text prompt, 1..4000 chars>",
"model": "gpt-image-1", // or "dall-e-3", deployment-dependent
"size": "1024x1024", // 256x256 | 512x512 | 1024x1024 | 1792x1024 | 1024x1792
"n": 1, // skill always uses 1 (return single ContentBlock)
"response_format": "b64_json" // skill requires bytes — never "url"
}
Successful response (OpenAI shape):
{
"created": 1716345600,
"data": [{ "b64_json": "<base64 PNG bytes>" }]
}
After receiving bytes the skill MUST upload to the cloud asset store
(POST /api/im/assets, multipart) and use the returned assetId in the
ContentBlock. Bytes never leak into chat.
Workflow
Validate inputs. Prompt 1..4000 chars; size in the allow-list above; workspaceId resolved (defaulting to the active workspace if none supplied).
Generate. POST to
/api/v1/images/generationswithresponse_format: 'b64_json'. Captureb64_json(single image).Decode + hash. Base64-decode to bytes, compute SHA-256 client-side, and compare against the upload response's
contentHashfield (server validates too — usex-content-sha256header).Upload as asset. Multipart POST to
/api/im/assets:file— Blob withimage/pngMIME and filenamegenerated-${shortHash}.pngworkspaceId,kind=image,description=<prompt[0..500]>sourceTaskId/sourceAgentImUserId(if available from runtime context)folderPath=/generated/images/${YYYY-MM}(auto-organized, optional)
Emit ContentBlock. Return — and only return — a ContentBlock pointing at the new asset. Shape exactly as v2.0 §4.6 (Anthropic-shape, not OpenAI):
{ "kind": "image", "assetId": "<returned assetId>", "mediaType": "image/png", "alt": "<prompt truncated to 100 chars>" }
Operating Rules
- Always
response_format: b64_json. Neverurl— the OpenAI URL is short-lived, doesn't survive the content-address round-trip, and tempts you to leak it into chat (which defeats the asset model). - One image per call.
n=1only. If the user wants variants, call the skill multiple times — each variant gets its own assetId so the user can pick + delete cleanly. - Hash check is non-negotiable. Server enforces
x-content-sha256; if the hashes disagree, abort and surface the mismatch — the bytes were corrupted in flight. - Never inline base64 /
data:<mime>;base64,...in the reply body. The whole point of the skill is to avoid that anti-pattern. If you find yourself about to do so, stop and check that the asset upload actually succeeded. - Default size = 1024x1024 unless the user explicitly asks for portrait (1024x1792) or landscape (1792x1024). 256/512 only when budget is tight.
- Cost-aware: image gen is far more expensive than a chat completion. Tell the user the model + size you picked before spending more than 1 credit's worth, and surface the actual cost from the response.
ContentBlock output (v2.0 §4.6 / Gap E-⑤)
This skill produces a single image ContentBlock per successful generation. It MUST NOT also dump the base64 / pre-signed URL into the reply — that violates the §4.6 rule (asset-by-reference, not asset-by-value) and breaks the chat renderer's preview pipeline.
{ "kind": "image", "assetId": "<assetId>", "mediaType": "image/png", "alt": "<prompt summary, ≤100 chars>" }
The reply envelope from this skill (when invoked via the agent runtime) looks like:
{
"ok": true,
"result": {
"assetId": "<assetId>",
"contentHash": "<sha256 hex>",
"sizeBytes": 123456,
"cdnUrl": "<optional, server may include>",
"model": "gpt-image-1",
"size": "1024x1024",
"promptHash": "<sha256 of prompt for de-dup>"
},
"contentBlocks": [
{ "kind": "image", "assetId": "<assetId>", "mediaType": "image/png", "alt": "<prompt[0..100]>" }
]
}
Legacy callers that only know how to parse result.assetId keep working
(field preserved). Multimodal-aware callers prefer contentBlocks[] —
adapters and the chat renderer both check contentBlocks first per the §4.6
prefer-blocks rule.
Failure modes
| Status | Where | Cause | What to surface |
|---|---|---|---|
| 400 | LLM proxy | prompt too long / disallowed content | Prompt rejected; surface the proxy's error message verbatim. Do NOT retry the same prompt. |
| 402 | LLM proxy | not enough credits | Tell the user the cost + ask them to top up. Don't burn credits on retries. |
| 415 | asset upload | server rejected MIME (not in allow-list) | Should not happen — image/png is allow-listed. If it does, this is a deployment bug; flag it. |
| 422 | asset upload | x-content-sha256 mismatch |
Bytes corrupted in flight. Retry once; if it persists, fail the skill and tell the user. |
| 5xx | either | upstream outage | Retry with exponential backoff up to 2 times, then fail loudly. Do NOT fabricate the assetId. |
Output reporting
After successful generation:
[image-generate] generated assetId=<id> model=<model> size=<WxH> sha=<short>
cost=<credits>c prompt="<first 60 chars>…"
Then in chat, return ONLY the ContentBlock (the renderer surfaces the image preview). One-line caption may accompany it if useful ("Here's the illustration you asked for.").
After failure: report status + proxy error code + message verbatim. Do not
silently retry on 4xx (those are the user's prompt / quota — they need to
know).
Backing capabilities (Gap E-⑤ mapping)
- LLM image gateway:
POST /api/v1/images/generations(NewAPI proxy wired in Wave 6 G1 atsrc/app/api/images/generations/route.ts; gateway reusesproxyToNewAPIinsrc/lib/llm-proxy.tswith image-specific billing viacalculateImageCredits). Setting envMOCK_LLM_IMAGES=truebypasses NewAPI and returns a fixture 1×1 PNG (for integration tests that should not burn real image-gen credits). - Asset store:
POST /api/im/assets(multipart) —src/im/api/assets.tsline 2729+. ReturnsIMAssetwithid,contentHash,cdnUrl,sizeBytes. - ContentBlock contract: v2.0 §4.6 —
sdk/prismer-cloud/typescript/src/types.tslines 278–296 (8-variant discriminated union, Anthropic-shape). - Reply attachment plumbing:
AgentDispatchReplyPayload.attachments— same path as theassetsskill's image-resolve output. Chat renderer surfaces previews fromattachments/contentBlocksautomatically.
Examples
Example 1 — User asks for an illustration
User: "Draw an isometric server room with glowing blue racks"
Skill flow:
POST /api/v1/images/generations
{ prompt: "An isometric...", model: "gpt-image-1",
size: "1024x1024", n: 1, response_format: "b64_json" }
← 200 { data: [{ b64_json: "<bytes>" }] }
decode + sha256 → "8f4a..."
POST /api/im/assets (multipart)
file=generated-8f4a.png kind=image workspaceId=...
x-content-sha256: 8f4a...
← 200 { data: { id: "asset_<...>", contentHash: "8f4a...",
cdnUrl: "/api/im/assets/asset_<...>" } }
Reply envelope:
{ ok: true,
result: { assetId: "asset_<...>", contentHash: "8f4a...", ... },
contentBlocks: [
{ kind: "image", assetId: "asset_<...>", mediaType: "image/png",
alt: "An isometric server room with glowing blue racks" }
] }
Example 2 — Task-pinned generation for kanban artifact
Task input: { produce_image: { prompt: "Logo: minimalist owl, monochrome",
size: "1024x1024" } }
Skill call: POST /api/v1/images/generations with the prompt, then
`cloud asset upload generated.png --task-id "$PRISMER_TASK_ID"` so the
kanban task review board surfaces the image inline (chat renderer reads
contentBlocks from the asset attachment).
Example 3 — Failure: prompt rejected
User: "<disallowed content>"
LLM proxy: 400 { error: { code: "content_policy_violation", ... } }
Skill response:
{ ok: false,
error: { code: "content_policy_violation",
message: "<proxy's verbatim message>" } }
Do NOT retry. Do NOT fabricate an assetId.
Anti-patterns
- ❌ Returning the raw base64 in
result.image_b64for chat to render. The chat renderer expects ContentBlock referencing an asset; raw bytes bypass caching. - ❌ Using
response_format: 'url'and pasting the OpenAI URL into the reply. URL expires; user clicks later → 404. - ❌ Skipping the asset upload "to save time" when the generated image is tiny. Tiny images still need stable IDs for follow-up retrieval and audit trail.
- ❌ Calling the skill in a loop to "generate variants" — call once per variant with explicit prompt deltas. The de-dup hash will catch identical prompts.
- ❌ Setting
n > 1. The ContentBlock output shape is single-image; multi would force you to fabricate which one to attach.
Hand-off — endpoint provisioning
Cloud-side endpoint wired in Wave 6 G1. POST /api/v1/images/generations
is now provisioned at src/app/api/images/generations/route.ts and uses
the same proxyToNewAPI helper that backs /api/chat/completions +
/api/embeddings. Image-specific billing lives in
src/lib/llm-pricing.ts::calculateImageCredits (per-image USD pricing,
no token semantics). Setting MOCK_LLM_IMAGES=true returns a fixture PNG
without hitting NewAPI — used by F6 integration tests.
Historical handler template (matches the landed implementation):
// src/app/api/images/generations/route.ts (NEW)
import { NextRequest, NextResponse } from 'next/server';
import { apiGuard } from '@/lib/api-guard';
import { checkRateLimit, rateLimitResponse } from '@/lib/rate-limit';
import { FEATURE_FLAGS } from '@/lib/feature-flags';
import { proxyToNewAPI } from '@/lib/llm-proxy';
import { ensureNacosConfig } from '@/lib/nacos-config';
export async function POST(request: NextRequest) {
await ensureNacosConfig();
if (!FEATURE_FLAGS.LLM_PROXY_ENABLED) {
return NextResponse.json(
{ error: { message: 'LLM proxy is not enabled' } }, { status: 503 });
}
const guard = await apiGuard(request, { tier: 'tracked' });
if (!guard.ok) return guard.response;
const rl = checkRateLimit(guard.auth.userId, 'llm');
if (!rl.allowed) return rateLimitResponse(rl);
return proxyToNewAPI(request, guard, '/v1/images/generations');
}
Until this lands, the skill operates in mock mode — see
scripts/test-image-generate-skill.ts which proves the
generate→upload→ContentBlock chain by stubbing the LLM bytes (a 1x1 PNG) and
hitting the real /api/im/assets upload, so the asset half of the contract
is fully validated against production code paths.