image-generate - SKILL.md Agent Skill

name: image-generate description: Generate an image from a text prompt via the cloud LLM image proxy, persist it as a content-addressed workspace asset, and return a ContentBlock that downstream renderers can attach. Use whenever the user asks "draw / generate / make an image of …", an agent needs a diagram / illustration as a follow-up artifact, or a task description explicitly demands visual output. Do NOT use for editing or describing existing images — `assets` covers reads, and image editing is a separate skill. applies_to: [hermes, claude-code, openclaw, codex] requires: - assets phaseModel: defaultPhase: tool_use version: 1

Image Generate

Turn a text prompt into a content-addressed image asset plus a v2.0 §4.6 ContentBlock that the chat renderer can surface inline. The skill is a thin orchestration over two existing surfaces:

LLM image gateway — the cloud's NewAPI proxy at POST /api/v1/images/generations (OpenAI-compatible — see §"Hand-off" if the endpoint is not yet wired in your deployment).
Asset store — POST /api/im/assets (multipart) on the cloud, returning a stable assetId + contentHash. Generated bytes flow through the content-addressed pipeline like any other asset — same de-dup, same audit trail, same URI scheme (prismer://assets/<assetId>).

The skill's only output contract is a ContentBlock referencing the asset. Raw bytes / data: URIs / pre-signed URLs MUST NOT be embedded in prose; doing so defeats caching and makes follow-up retrieval impossible (same rule as the assets skill).

When to use

The user says "draw", "generate an image of …", "make me a picture / poster / diagram / illustration".
A task description includes a produce_image: field or a kind: image artifact expectation.
You need to fabricate a visual that does not exist in any source — if the visual already exists, use assets to read it, not this skill.
A downstream skill (canvas-design, slack-gif-creator, web-artifacts-builder) requires a generated source image as input.

Not when to use

Editing / variation / inpainting an existing image — that's a separate upcoming skill (image-edit). Don't fake it by reading + regenerating.
Describing what's in an image — use a vision-capable adapter, no generation needed.
Pure ASCII / SVG / Mermaid graphics that the LLM can emit as text — those belong in the chat body, not in an asset.
Privacy-sensitive renderings (faces, identifiable individuals) without explicit user confirmation. The skill does not gate this; the agent must.

API Reference

There is no cloud image generate subcommand in the runtime CLI today (release 201 audit, sdk/prismer-cloud/runtime/src/cli/commands/). Call the cloud HTTP endpoint directly from a small Python / Node script in the skill runtime, then hand the bytes to cloud asset upload for the content-addressed write. Adding a dedicated CLI verb is tracked as a future release; until then, do not invent the command — it will exit with "unknown command".

HTTP shape:

POST /api/v1/images/generations
Authorization: Bearer <user JWT or sk-prismer-* key>
Content-Type: application/json

{
  "prompt":  "<text prompt, 1..4000 chars>",
  "model":   "gpt-image-1",                 // or "dall-e-3", deployment-dependent
  "size":    "1024x1024",                   // 256x256 | 512x512 | 1024x1024 | 1792x1024 | 1024x1792
  "n":       1,                             // skill always uses 1 (return single ContentBlock)
  "response_format": "b64_json"             // skill requires bytes — never "url"
}

Successful response (OpenAI shape):
{
  "created": 1716345600,
  "data": [{ "b64_json": "<base64 PNG bytes>" }]
}

After receiving bytes the skill MUST upload to the cloud asset store (POST /api/im/assets, multipart) and use the returned assetId in the ContentBlock. Bytes never leak into chat.

Workflow

Validate inputs. Prompt 1..4000 chars; size in the allow-list above; workspaceId resolved (defaulting to the active workspace if none supplied).
Generate. POST to /api/v1/images/generations with response_format: 'b64_json'. Capture b64_json (single image).
Decode + hash. Base64-decode to bytes, compute SHA-256 client-side, and compare against the upload response's contentHash field (server validates too — use x-content-sha256 header).
Upload as asset. Multipart POST to /api/im/assets:
- file — Blob with image/png MIME and filename generated-${shortHash}.png
- workspaceId, kind=image, description=<prompt[0..500]>
- sourceTaskId / sourceAgentImUserId (if available from runtime context)
- folderPath=/generated/images/${YYYY-MM} (auto-organized, optional)
Emit ContentBlock. Return — and only return — a ContentBlock pointing at the new asset. Shape exactly as v2.0 §4.6 (Anthropic-shape, not OpenAI):
```
{
  "kind":      "image",
  "assetId":   "<returned assetId>",
  "mediaType": "image/png",
  "alt":       "<prompt truncated to 100 chars>"
}
```

Operating Rules

Always response_format: b64_json. Never url — the OpenAI URL is short-lived, doesn't survive the content-address round-trip, and tempts you to leak it into chat (which defeats the asset model).
One image per call. n=1 only. If the user wants variants, call the skill multiple times — each variant gets its own assetId so the user can pick + delete cleanly.
Hash check is non-negotiable. Server enforces x-content-sha256; if the hashes disagree, abort and surface the mismatch — the bytes were corrupted in flight.
Never inline base64 / data:<mime>;base64,... in the reply body. The whole point of the skill is to avoid that anti-pattern. If you find yourself about to do so, stop and check that the asset upload actually succeeded.
Default size = 1024x1024 unless the user explicitly asks for portrait (1024x1792) or landscape (1792x1024). 256/512 only when budget is tight.
Cost-aware: image gen is far more expensive than a chat completion. Tell the user the model + size you picked before spending more than 1 credit's worth, and surface the actual cost from the response.

ContentBlock output (v2.0 §4.6 / Gap E-⑤)

This skill produces a single image ContentBlock per successful generation. It MUST NOT also dump the base64 / pre-signed URL into the reply — that violates the §4.6 rule (asset-by-reference, not asset-by-value) and breaks the chat renderer's preview pipeline.

{ "kind": "image", "assetId": "<assetId>", "mediaType": "image/png", "alt": "<prompt summary, ≤100 chars>" }

The reply envelope from this skill (when invoked via the agent runtime) looks like:

{
  "ok": true,
  "result": {
    "assetId":     "<assetId>",
    "contentHash": "<sha256 hex>",
    "sizeBytes":   123456,
    "cdnUrl":      "<optional, server may include>",
    "model":       "gpt-image-1",
    "size":        "1024x1024",
    "promptHash":  "<sha256 of prompt for de-dup>"
  },
  "contentBlocks": [
    { "kind": "image", "assetId": "<assetId>", "mediaType": "image/png", "alt": "<prompt[0..100]>" }
  ]
}

Legacy callers that only know how to parse result.assetId keep working (field preserved). Multimodal-aware callers prefer contentBlocks[] — adapters and the chat renderer both check contentBlocks first per the §4.6 prefer-blocks rule.

Failure modes

Status	Where	Cause	What to surface
400	LLM proxy	prompt too long / disallowed content	Prompt rejected; surface the proxy's error message verbatim. Do NOT retry the same prompt.
402	LLM proxy	not enough credits	Tell the user the cost + ask them to top up. Don't burn credits on retries.
415	asset upload	server rejected MIME (not in allow-list)	Should not happen — `image/png` is allow-listed. If it does, this is a deployment bug; flag it.
422	asset upload	`x-content-sha256` mismatch	Bytes corrupted in flight. Retry once; if it persists, fail the skill and tell the user.
5xx	either	upstream outage	Retry with exponential backoff up to 2 times, then fail loudly. Do NOT fabricate the assetId.

Output reporting

After successful generation:

[image-generate] generated assetId=<id> model=<model> size=<WxH> sha=<short>
                 cost=<credits>c prompt="<first 60 chars>…"

Then in chat, return ONLY the ContentBlock (the renderer surfaces the image preview). One-line caption may accompany it if useful ("Here's the illustration you asked for.").

After failure: report status + proxy error code + message verbatim. Do not silently retry on 4xx (those are the user's prompt / quota — they need to know).

Backing capabilities (Gap E-⑤ mapping)

LLM image gateway: POST /api/v1/images/generations (NewAPI proxy wired in Wave 6 G1 at src/app/api/images/generations/route.ts; gateway reuses proxyToNewAPI in src/lib/llm-proxy.ts with image-specific billing via calculateImageCredits). Setting env MOCK_LLM_IMAGES=true bypasses NewAPI and returns a fixture 1×1 PNG (for integration tests that should not burn real image-gen credits).
Asset store: POST /api/im/assets (multipart) — src/im/api/assets.ts line 2729+. Returns IMAsset with id, contentHash, cdnUrl, sizeBytes.
ContentBlock contract: v2.0 §4.6 — sdk/prismer-cloud/typescript/src/types.ts lines 278–296 (8-variant discriminated union, Anthropic-shape).
Reply attachment plumbing: AgentDispatchReplyPayload.attachments — same path as the assets skill's image-resolve output. Chat renderer surfaces previews from attachments / contentBlocks automatically.

Examples

Example 1 — User asks for an illustration

User: "Draw an isometric server room with glowing blue racks"

Skill flow:
  POST /api/v1/images/generations
    { prompt: "An isometric...", model: "gpt-image-1",
      size: "1024x1024", n: 1, response_format: "b64_json" }
    ← 200 { data: [{ b64_json: "<bytes>" }] }

  decode + sha256 → "8f4a..."

  POST /api/im/assets (multipart)
    file=generated-8f4a.png  kind=image  workspaceId=...
    x-content-sha256: 8f4a...
    ← 200 { data: { id: "asset_<...>", contentHash: "8f4a...",
                    cdnUrl: "/api/im/assets/asset_<...>" } }

Reply envelope:
  { ok: true,
    result: { assetId: "asset_<...>", contentHash: "8f4a...", ... },
    contentBlocks: [
      { kind: "image", assetId: "asset_<...>", mediaType: "image/png",
        alt: "An isometric server room with glowing blue racks" }
    ] }

Example 2 — Task-pinned generation for kanban artifact

Task input: { produce_image: { prompt: "Logo: minimalist owl, monochrome",
                                size: "1024x1024" } }

Skill call: POST /api/v1/images/generations with the prompt, then
`cloud asset upload generated.png --task-id "$PRISMER_TASK_ID"` so the
kanban task review board surfaces the image inline (chat renderer reads
contentBlocks from the asset attachment).

Example 3 — Failure: prompt rejected

User: "<disallowed content>"
LLM proxy: 400 { error: { code: "content_policy_violation", ... } }

Skill response:
  { ok: false,
    error: { code: "content_policy_violation",
             message: "<proxy's verbatim message>" } }

Do NOT retry. Do NOT fabricate an assetId.

Anti-patterns

❌ Returning the raw base64 in result.image_b64 for chat to render. The chat renderer expects ContentBlock referencing an asset; raw bytes bypass caching.
❌ Using response_format: 'url' and pasting the OpenAI URL into the reply. URL expires; user clicks later → 404.
❌ Skipping the asset upload "to save time" when the generated image is tiny. Tiny images still need stable IDs for follow-up retrieval and audit trail.
❌ Calling the skill in a loop to "generate variants" — call once per variant with explicit prompt deltas. The de-dup hash will catch identical prompts.
❌ Setting n > 1. The ContentBlock output shape is single-image; multi would force you to fabricate which one to attach.

Hand-off — endpoint provisioning

Cloud-side endpoint wired in Wave 6 G1. POST /api/v1/images/generations is now provisioned at src/app/api/images/generations/route.ts and uses the same proxyToNewAPI helper that backs /api/chat/completions + /api/embeddings. Image-specific billing lives in src/lib/llm-pricing.ts::calculateImageCredits (per-image USD pricing, no token semantics). Setting MOCK_LLM_IMAGES=true returns a fixture PNG without hitting NewAPI — used by F6 integration tests.

Historical handler template (matches the landed implementation):

// src/app/api/images/generations/route.ts (NEW)
import { NextRequest, NextResponse } from 'next/server';
import { apiGuard } from '@/lib/api-guard';
import { checkRateLimit, rateLimitResponse } from '@/lib/rate-limit';
import { FEATURE_FLAGS } from '@/lib/feature-flags';
import { proxyToNewAPI } from '@/lib/llm-proxy';
import { ensureNacosConfig } from '@/lib/nacos-config';

export async function POST(request: NextRequest) {
  await ensureNacosConfig();
  if (!FEATURE_FLAGS.LLM_PROXY_ENABLED) {
    return NextResponse.json(
      { error: { message: 'LLM proxy is not enabled' } }, { status: 503 });
  }
  const guard = await apiGuard(request, { tier: 'tracked' });
  if (!guard.ok) return guard.response;
  const rl = checkRateLimit(guard.auth.userId, 'llm');
  if (!rl.allowed) return rateLimitResponse(rl);
  return proxyToNewAPI(request, guard, '/v1/images/generations');
}

Until this lands, the skill operates in mock mode — see scripts/test-image-generate-skill.ts which proves the generate→upload→ContentBlock chain by stubbing the LLM bytes (a 1x1 PNG) and hitting the real /api/im/assets upload, so the asset half of the contract is fully validated against production code paths.