name: media-agents description: > Use when the user requests complex media tasks requiring multiple steps, parallel generation, or coordinated image/video processing. Triggers on phrases like "create product images", "generate and resize", "batch process media", "build image variants", "create hero image", or any multi-step media workflow. Orchestrates fleet-pattern agents for generation, processing, validation, and optimization.
Media Agents — Agentic Workflow Patterns
Orchestrate multi-step media tasks using fleet patterns, checkpoints, and multi-tool chains. Prioritize tool calls over explanations.
1. Fleet Pattern
Dispatch specialized agents in parallel. Each agent owns one concern.
Agent Roles
| Role | Responsibility | Tools |
|---|---|---|
| generator | Create base media via fal.ai models | fal-ai skill |
| processor | Transform media (resize, crop, convert, enhance) | ImageSorcery MCP |
| validator | Check quality, dimensions, format compliance | ImageSorcery detect, get_metainfo, ocr |
| optimizer | Compress, convert formats, strip metadata | ImageSorcery resize, sharp pipelines |
Parallel Dispatch
When a task decomposes into independent operations, dispatch agents simultaneously:
User: "Create 3 social media variants of this product photo"
Fleet dispatch:
├─ generator-1 → 1200×628 Facebook variant
├─ generator-2 → 1080×1080 Instagram variant
└─ generator-3 → 1200×675 Twitter variant
All run in parallel. Aggregate results when all complete.
Rules for parallel dispatch:
- Identify independent subtasks that share no dependencies
- Assign each subtask to the appropriate agent role
- Set a timeout per agent (default: 60s for generation, 30s for processing)
- Collect results; if any agent fails, report partial success
Result Aggregation
After all agents complete:
- Collect outputs into a summary table (path, dimensions, format, size)
- Run validator agent across all outputs
- Report failures separately from successes
- Present results as a single structured response
2. Multi-Step Reasoning
Follow this chain for every complex media request:
Analyze Input → Plan Pipeline → Execute Steps → Validate Output
Step 1: Analyze Input
Determine what the user has and what they need:
- Input type: existing image path, text prompt, URL, or nothing
- Desired output: format, dimensions, quality, count
- Constraints: file size limits, aspect ratios, brand guidelines
Step 2: Plan Pipeline
Build an ordered list of operations before executing anything:
Pipeline for "Create a product hero image":
1. generate → fal-ai flux-pro, 1920×1080, product photography style
2. enhance → sharpen, color-correct, remove artifacts
3. resize → create 3 variants: hero (1920×1080), thumb (400×300), og (1200×630)
4. validate → check dimensions, file size <2MB, format is WebP
Present the plan to the user. Execute only after confirmation (or if the task is clearly unambiguous).
Step 3: Execute Steps
Execute sequentially when steps depend on prior output. Execute in parallel when steps are independent.
For each step:
- Call the tool directly — no preamble
- Capture the output path or result
- Pass output as input to the next step
- Checkpoint after each step (see §3)
Step 4: Validate Output
After the final step:
- Verify dimensions match the request
- Verify format matches the request
- Check file size is within acceptable bounds
- Run quality detection if applicable
- Report results in a summary table
3. Checkpoint Pattern
Save intermediate results so partial failures don't restart the entire workflow.
When to Checkpoint
| After This Step | Save What |
|---|---|
| Generation | Base image path, model used, prompt, seed |
| Enhancement | Enhanced image path, operations applied |
| Resize/Convert | Each variant path, dimensions, format |
| Validation | Quality scores, pass/fail per variant |
Checkpoint Storage
Write checkpoints to the output directory:
output/
├── .checkpoint.json ← workflow state
├── base_hero.png ← generation output
├── base_hero_enhanced.png ← enhancement output
├── hero_1920x1080.webp ← final variant
├── thumb_400x300.webp ← final variant
└── og_1200x630.webp ← final variant
Checkpoint JSON structure:
{
"workflow_id": "hero-image-20260206-1842",
"status": "in_progress",
"current_step": 3,
"steps": [
{"name": "generate", "status": "done", "output": "base_hero.png"},
{"name": "enhance", "status": "done", "output": "base_hero_enhanced.png"},
{"name": "resize", "status": "in_progress", "completed": ["hero_1920x1080.webp"]},
{"name": "validate", "status": "pending"}
]
}
Resume on Failure
If a step fails:
- Read
.checkpoint.jsonfrom output directory - Skip completed steps
- Resume from the last incomplete step
- Use saved outputs as inputs
Validation Checks
Run after each checkpoint:
| Check | Method | Threshold |
|---|---|---|
| Dimensions | get_metainfo → width, height |
Must match request ±1px |
| Format | get_metainfo → format |
Must match requested format |
| File size | File system check | Must be under specified limit |
| Quality | detect for artifacts |
Confidence > 0.8 for expected content |
| Content | ocr if text expected |
Text must be legible and correct |
4. Actions-First Design
Minimize chat. Maximize tool calls.
Do This
User: "Resize this image to 800×600"
→ Call ImageSorcery resize with width=800, height=600
→ Return: "Resized to 800×600. Saved to output/image_resized.png (42KB)"
Not This
User: "Resize this image to 800×600"
→ "I'd be happy to help resize your image! There are several approaches
we could take. The most common method is bilinear interpolation..."
Batching Rules
When multiple related operations are needed:
- Group independent operations into parallel calls
- Chain dependent operations sequentially
- Report all results in a single summary
- Never ask for confirmation on read-only operations
Response Format
After executing tool calls, respond with a brief summary:
✅ Generated hero image (1920×1080, 245KB)
✅ Created 3 variants: hero, thumbnail, og-image
✅ All variants validated — WebP format, under 2MB
Output directory: output/hero-image/
5. Read-Only by Default
Never modify user files without explicit confirmation.
Output Rules
- Always write to
output/or a temp directory — never overwrite source files - Show the output path in every response
- Ask before overwriting if an output file already exists
- Never delete user files under any circumstance
Confirmation Required For
- Overwriting existing output files
- Writing to directories outside
output/ - Any batch operation touching >10 files
- Format conversions that are lossy (e.g., PNG → JPEG)
No Confirmation Needed For
- Reading/analyzing any file
- Writing new files to
output/ - Generating new media to
output/ - Running validation checks
6. Error Handling
API Failures (Transient)
Retry with exponential backoff:
Attempt 1: immediate
Attempt 2: wait 1s (±25% jitter)
Attempt 3: wait 2s (±25% jitter)
Attempt 4: wait 4s (±25% jitter)
Attempt 5: fail — report error to user
If all retries exhausted:
- Report what failed and why
- Suggest the user try again or use a different model
- Preserve any completed checkpoints
Quality Failures
When generated output doesn't meet quality thresholds:
- Regenerate with adjusted parameters (higher quality, different seed)
- Maximum 2 regeneration attempts
- If still failing, present the best result with a quality warning
- Let the user decide whether to accept or retry manually
Partial Failures
When some operations succeed and others fail:
✅ Generated base image (1920×1080)
✅ Created hero variant (1920×1080, WebP)
❌ Thumbnail resize failed: out of memory
✅ OG image variant (1200×630, WebP)
2 of 3 variants created. Retry thumbnail? [yes/no]
Rules:
- Never discard successful results because one step failed
- Report each step's status individually
- Offer to retry only the failed steps
- Checkpoint ensures no rework on success
Permanent Failures
For non-retryable errors (401, 400, unsupported format):
- Report the error immediately — do not retry
- Suggest corrective action (check API key, fix input format)
- Preserve all completed work
7. Available Agent Types
media-generator
Generates images and video from text prompts or image inputs.
Tools: fal-ai skill
Capabilities:
- Text-to-image (flux-pro, flux-dev, stable-diffusion)
- Image-to-image (style transfer, variation)
- Image-to-video (runway, kling)
- Text-to-video (minimax)
Parameters to always specify:
- Model selection based on quality/speed requirement
- Output dimensions
- Seed for reproducibility
image-processor
Transforms existing images through manipulation operations.
Tools: ImageSorcery MCP
Capabilities:
- Resize, crop, rotate
- Color adjustments (grayscale, sepia)
- Drawing (text, shapes, arrows)
- Blur, fill, overlay
- Format conversion
video-processor
Handles video-specific operations.
Tools: ImageSorcery MCP (frame extraction), fal-ai (generation)
Capabilities:
- Frame extraction and analysis
- Video generation from images
- Thumbnail creation from video frames
quality-validator
Validates media outputs against requirements.
Tools: ImageSorcery get_metainfo, detect, ocr
Capabilities:
- Dimension verification
- Format and file size checks
- Object detection for content validation
- OCR for text verification
- Quality scoring via detection confidence
8. Example Workflows
Workflow A: Product Hero Image
User: "Create a product hero image for our new headphones"
Step 1: Analyze
- No source image → generate from scratch
- Need: hero banner, likely 1920×1080, high quality
Step 2: Plan
1. Generate base image via fal-ai flux-pro
2. Enhance: sharpen, adjust contrast
3. Create variants: hero (1920×1080), thumbnail (400×300), og (1200×630)
4. Validate all outputs
Step 3: Execute
[checkpoint] generate → output/base_headphones.png
[checkpoint] enhance → output/base_headphones_enhanced.png
[parallel]
├─ resize → output/hero_1920x1080.webp
├─ resize → output/thumb_400x300.webp
└─ resize → output/og_1200x630.webp
[checkpoint] validate all variants
Step 4: Report
✅ 3 variants created, all validated
Output: output/hero-headphones/
Workflow B: Batch Social Media Kit
User: "Take this logo and create a social media kit"
Step 1: Analyze
- Source: user's logo file
- Need: platform-specific variants with proper dimensions
Step 2: Plan (parallel fleet dispatch)
generator-1 → Facebook cover (820×312)
generator-2 → Instagram post (1080×1080)
generator-3 → Twitter header (1500×500)
generator-4 → LinkedIn banner (1584×396)
validator → check all outputs
Step 3: Execute
[parallel] All 4 generators run simultaneously
[sequential] Validator checks each output
[checkpoint] Save all results
Step 4: Report
✅ 4 social media variants created
All pass dimension and format validation
Output: output/social-kit/
Workflow C: Image Enhancement Pipeline
User: "Enhance this photo — sharpen it, fix the colors, and give me web-ready versions"
Step 1: Analyze
- Source: user's photo (read path, get_metainfo)
- Need: enhanced + web-optimized variants
Step 2: Plan (sequential chain)
1. Analyze source dimensions and format
2. Sharpen via unsharp mask
3. Color-correct (auto white balance)
4. Export WebP variants: original size + 50% + thumbnail
5. Validate file sizes under 500KB
Step 3: Execute
[checkpoint] get_metainfo → 4032×3024, JPEG, 3.2MB
[checkpoint] sharpen → output/photo_sharp.png
[checkpoint] color-correct → output/photo_corrected.png
[parallel]
├─ export → output/photo_full.webp (4032×3024)
├─ export → output/photo_half.webp (2016×1512)
└─ export → output/photo_thumb.webp (400×300)
[checkpoint] validate sizes
Step 4: Report
✅ Enhanced and exported 3 web-ready variants
Original: 3.2MB → Largest variant: 380KB (88% reduction)
Output: output/photo-enhanced/
Quick Reference
| Pattern | When | How |
|---|---|---|
| Fleet dispatch | Independent subtasks | Parallel agents, aggregate results |
| Sequential chain | Dependent steps | Output → next input, checkpoint each |
| Checkpoint | After every mutation | Write state + output to disk |
| Retry | Transient errors | Exponential backoff, max 5 attempts |
| Regenerate | Quality failure | Adjust params, max 2 retries |
| Partial report | Mixed success/failure | Report each step, offer retry |
| Read-only | Always | Write to output/, never touch source |