name: media-creation description: > Creates images and video via Alibaba Wan 2.6 (DashScope), Google Gemini/Veo, and OpenAI GPT Image 1.5 APIs, plus background extraction workflows. Trigger terms: image generation, video generation, dashscope, wan 2.6, alibaba, gemini, veo, gpt image, openai images, background removal, alpha extraction, transparent png.
Use when
- Generate images or video via Alibaba Wan, Google Gemini/Veo, or OpenAI GPT Image APIs.
- Create transparent PNGs (prefer GPT Image 1.5 for native transparency support).
- Convert consistent renders (3D, compositing) with different backgrounds into a transparent RGBA output.
- Prefer Gemini for general image generation; use Wan when content is restricted (e.g., babies).
Don't use when
- If API access or credentials are not available.
- If the task does not involve media generation or background extraction.
Outputs
- Generated media files in
artifacts/(PNG, WEBP, MP4, etc.) or API JSON responses when requested.
Templates or Examples
- Use the API request examples below as templates.
Transparent Image Generation (Recommended Approach)
Option 1: GPT Image 1.5 Native Transparency (BEST)
GPT Image 1.5 supports native transparency output. This is the simplest and most reliable method:
curl -X POST "https://api.openai.com/v1/images/generations" \
-H "Authorization: Bearer ${OPENAI_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1.5",
"prompt": "A cute cartoon cat mascot",
"size": "1024x1024",
"quality": "high",
"background": "transparent",
"output_format": "png"
}'
Notes:
background: "transparent"requiresoutput_format: "png"or"webp"- Returns base64 data in
data[0].b64_json - This is the only method that produces TRUE transparency from a single generation
Option 2: Three-Background Extraction (For Consistent Renders Only)
⚠️ IMPORTANT LIMITATION: This workflow ONLY works when you have control over the exact pixel output:
- ✅ 3D renders (Blender, Maya, etc.)
- ✅ Compositing software with controlled backgrounds
- ✅ Screenshots with different desktop backgrounds
- ❌ Generative AI (each generation produces different results)
The algorithm requires IDENTICAL foreground pixels across all three images. Generative AI models produce different outputs even with the same prompt.
For 3D/compositing use:
python3 scripts/extract_transparency.py \
--black render_black.png \
--white render_white.png \
--colored render_red.png \
--output result.png
Option 3: AI Image + Manual Background Removal
For AI-generated images that need transparency:
- Generate the image with any provider
- Use a dedicated background removal tool (rembg, remove.bg API, etc.)
Inputs the Agent Should Ask For (only if missing; otherwise proceed)
- Provider: Alibaba Wan (DashScope), Google (Gemini/Veo), or OpenAI (GPT Image).
- Model ID and task type (T2I, I2V, T2V).
- Prompt text and any input image path (for I2V).
- Output size/resolution and aspect ratio.
- Desired output format and count.
- For transparency: whether native transparency (GPT Image) or background extraction is needed.
- For background extraction: paths to black/white/red background images and the colored background RGB (0-1).
API Keys
The following environment variables should be set for API access:
OPENAI_API_KEY- For GPT Image 1.5 generationsGOOGLE_GENAI_API_KEY- For Gemini image/Veo video generationDASHSCOPE_API_KEY- For Alibaba Wan 2.6 image/video generation
Outputs / Definition of Done
- A clear, credential-safe request plan or script snippet.
- For generation: task submission, polling, and decode/download steps.
- For background removal: algorithm steps and expected RGBA output.
Procedure
- Use
references/alibaba-wan-api.mdfor Wan 2.6 endpoints and parameters (image, T2V, I2V). - Use
references/gemini-banana-api.mdfor Gemini image and Veo video in the Gemini API. - Use
references/openai-gpt-image-api.mdfor GPT Image 1.5 endpoints and parameters. - Use
references/background-removal-3-bg.mdfor the three-background alpha extraction algorithm. - API keys in code examples are stored encrypted using
<encrypted>tags.
Model Quick Reference
| Provider | Model | Use Case |
|---|---|---|
| OpenAI | gpt-image-1.5 |
Best for transparent images, high quality |
| OpenAI | gpt-image-1 |
Image edits/inpainting |
gemini-2.5-flash-image |
Fast image generation | |
veo-3.1-generate-preview |
Video generation | |
| Alibaba | wan2.6-t2v |
Text-to-video |
| Alibaba | wan2.6-i2v |
Image-to-video |
| Alibaba | wan2.6-image |
Image generation (fewer restrictions) |
Checks & Guardrails
- API keys must be wrapped in
<encrypted>tags; they are encrypted at rest. - Validate image sizes/formats and rate limits.
- Ensure base64 encoding formats match API expectations.
- For transparency: verify the workflow matches the source type (render vs. AI).
References
- references/alibaba-wan-api.md
- references/gemini-banana-api.md
- references/openai-gpt-image-api.md
- references/background-removal-3-bg.md
Scripts
scripts/extract_transparency.py- Extract RGBA from black/white/red background images. Usage:python3 scripts/extract_transparency.py --black img_black.png --white img_white.png --colored img_red.png --output result.png