image-generator - SKILL.md Agent Skill

name: image-generator

description: |

Generate professional visuals using Gemini via browser automation with 6-gate quality control.

Use when creating chapter illustrations, diagrams, or teaching visuals.

NOT for stock photos or decorative images.

dependencies:

browser-use

Image Generator

Generate professional teaching visuals using Gemini 3 with multi-turn reasoning partnership.

Quick Start


# 1. Start browser (via browser-use skill)

bash .claude/skills/browser-use/scripts/start-server.sh



# 2. Navigate to Gemini

# Use browser_navigate to https://gemini.google.com/



# 3. Generate image from creative brief

# Paste creative brief → Wait 30-35s → Verify 6 gates → Download

Core Principles

Reasoning over prediction - Creative briefs (Story/Intent/Metaphor) activate reasoning; pixel specs don't
Multi-turn partnership - Teach Gemini your standards through principle-based feedback
6-gate quality - Explicit pass/fail before download
Autonomous batch - No permission-asking between visuals

Input: Creative Brief Format

Receive from visual-asset-workflow:


## The Story

[Narrative about what's visualized]



## Emotional Intent

[What it should FEEL like]



## Visual Metaphor

[Universal concept for instant comprehension]



## Subject / Composition / Action / Location / Style

[Gemini 3 prompt structure]



## Color Semantics

Blue (#2563eb) = Authority | Green (#10b981) = Execution



## Typography Hierarchy

Largest: Key insight | Medium: Supporting | Smallest: Context

Do NOT convert to pixel specs - use as-is to activate reasoning.

Workflow (Per Visual)

| Step | Action | Tool |

|------|--------|------|

| 1 | Navigate to gemini.google.com | browser_navigate |

| 2 | Select "🍌 Create Image" | browser_click |

| 3 | Paste creative brief | browser_type |

| 4 | Wait 30-35 seconds | browser_wait_for |

| 5 | Verify 6 gates (below) | Visual inspection |

| 6 | If fail: Iterate with feedback (max 3) | browser_type |

| 7 | If pass: Download full size | browser_click |

| 8 | Copy to apps/learn-app/static/img/part-{N}/chapter-{NN}/ | Bash |

| 9 | Embed in lesson immediately | Edit |

| 10 | NEW CHAT for next visual | browser_navigate |

Quality Gates (ALL Must Pass)

| Gate | Criterion | Fail Action |

|------|-----------|-------------|

| 1. Spelling | 99% accuracy (Y-Combinator, Kubernetes) | Iterate |

| 2. Layout | Proportions match prompt (2×2 not 3×1) | Iterate |

| 3. Color | Brand colors match (#2563eb not #002050) | Iterate |

| 4. Typography | Largest = key concept (not decoration) | Iterate |

| 5. Teaching | <5 sec concept grasp at target proficiency | Iterate |

| 6. Uniqueness | Not duplicate of existing chapter image | New chat |

Decision: ALL pass → Download | ANY fail → Iterate (max 3 tries)

Iteration: Principle-Based Feedback

When gate fails, provide teaching feedback:


Gate 4 FAILED: Typography hierarchy incorrect



The largest text is "$100K" (supporting detail) but should be "$3T"

(key insight students must grasp).



Increase '$3T' to dominant size. Reduce '$100K' to supporting size.

Information importance drives sizing.

Batch Mode

When invoked with "generate all visuals":


For EACH visual in list:

  A. NEW CHAT (context isolation)

  B. Generate (paste brief)

  C. Verify 6 gates

  D. Iterate if needed (max 3)

  E. Download when pass

  F. Embed in lesson

  G. Log "✅ N/M"

  H. NEXT (no stopping)

Never ask: "Continue?" "Pause here?" "Review?"

Report at END only:


BATCH COMPLETE

✅ Generated: 16/18

⚠️ Deferred: 2 (quality issues)

Location: apps/learn-app/static/img/part-{N}/

Proficiency Limits

| Level | Max Elements | Grasp Time |

|-------|--------------|------------|

| A2 | 5-7 | <5 sec |

| B1 | 7-10 | <10 sec |

| C2 | No limit | N/A |

Token Conservation (Batch Mode)

For >8 visuals, condense briefs:

Original (250 tokens):


"Top Layer shows Coordinator at center top with label 'Orchestrator'

featuring conductor icon, with role 'Strategic oversight'..."

Condensed (80 tokens):


"Top Layer - Coordinator: Center top, 'Orchestrator' (conductor),

Role: 'Strategic oversight', Gold (#fbbf24), Large hexagon."

Keep: Story, Intent, Metaphor, Colors, Reasoning

Condense: Long examples → Short labels

Anti-Patterns

| Don't | Why |

|-------|-----|

| Accept first output without 6 gates | Quality standard violation |

| Ask permission between batch items | Breaks autonomous agency |

| Convert briefs to pixel specs | Defeats reasoning activation |

| Skip embedding step | Creates orphan images |

| Reuse same chat for next visual | Context contamination |

Session Interruption

If session ends mid-batch, create checkpoint:


# Checkpoint: Part {N}

Status: INTERRUPTED at 8/18



## Completed:

- ✅ Image 1: filename (embedded lesson-01.md)

- ✅ Image 2: filename (embedded lesson-02.md)



## Remaining:

- ⏳ Image 8: filename

On continuation: Read checkpoint → Resume → Update incrementally

Success Indicators

✅ All 6 gates verified before download
✅ Batch completion without permission-asking
✅ Principle-based iteration feedback
✅ Images organized by part/chapter
✅ Immediate embedding (no orphans)
✅ >85% production-ready rate