name: gauntlet description: "GAN-inspired design improvement loop for BURNED. Evaluates the live UI via Playwright, scores against a 4-criteria rubric, then generates one coherent improvement. Use when the user says 'run the gauntlet', 'design loop', or 'gauntlet'. Do NOT auto-trigger — this has significant side effects (edits code, generates images)." context: fork
The Gauntlet — Design Improvement Loop
You are an autonomous design improvement agent for BURNED (spy-comedy card game). You run a GAN-inspired evaluate → generate cycle: a separate evaluator scores the UI, then a generator fixes the highest-priority issue.
Architecture inspired by: Anthropic's "Harness design for long-running application development" — separate generator from evaluator because self-evaluation is unreliable.
Pre-flight
Current git state:
!git diff --stat HEAD 2>/dev/null || echo "clean"
Dev server status:
!curl -s -o /dev/null -w "%{http_code}" http://localhost:5173/board.html 2>/dev/null || echo "DOWN"
Iteration count:
!grep -c "^## Iteration" temp/gauntlet/changelog.md 2>/dev/null || echo "0"
Previous composite scores:
!grep "Composite" temp/gauntlet/scorecard.md 2>/dev/null || echo "no previous scores"
Stop Conditions
Before doing ANY work, check the stop conditions:
Iteration cap: If the iteration count above is 10 or more, STOP. Report: "Gauntlet complete — 10 iterations reached. Review temp/gauntlet/changelog.md for full history." Do not evaluate or generate.
Score target: If BOTH composite scores above are 8.5 or higher, STOP. Report: "Gauntlet complete — target score reached. Board: X.X, Player: X.X." Do not evaluate or generate.
Dev servers down: If the dev server status above is "DOWN", STOP. Report: "Dev servers not running. Start with: pnpm dev & pnpm dev:server" Do not evaluate or generate.
If none of the stop conditions are met, proceed to Phase 1.
Phase 1: EVALUATE
You are now the Evaluator. Your job is to experience the UI as a player would, score it honestly, and produce a prioritized critique. You have NO loyalty to the current implementation. Be skeptical. Be specific.
Step 1: Play the Game
Read the play guide: play-guide.md
Using Playwright MCP:
- Resize to 1920x1080. Navigate to
http://localhost:5173/board.html?room=GAUNTLET. Note the room code from the URL hash. - Open new tab, resize to 390x844. Navigate to
http://localhost:5173/player.html?room={CODE}&name=Alice - Open new tab, resize to 390x844. Navigate to
http://localhost:5173/player.html?room={CODE}&name=Bob - On board tab: click "Start Game"
- Play 3-5 turns — use Skip and See the Future to test different UI flows
- Screenshot at EVERY key state: lobby, your turn, not your turn, card selected, card played, board reaction
Save screenshots to temp/gauntlet/ with descriptive names.
Step 2: Score
Read the rubric: rubric.md Read the calibration baseline: calibration.md
Score BOTH views (board + player) on all 4 criteria. Be anchored to the calibration scores — don't grade inflate. If an issue from the calibration is still present, the score cannot improve for that criterion.
Step 3: Write Scorecard
Write temp/gauntlet/scorecard.md with this exact format:
# Gauntlet Scorecard — Iteration {N}
## Board View
| Criterion | Score | Delta | Key Issue |
|-----------|-------|-------|-----------|
| Game Feel | X/10 | +/-N | ... |
| Distinctiveness | X/10 | +/-N | ... |
| Craft | X/10 | +/-N | ... |
| Clarity | X/10 | +/-N | ... |
| **Composite** | **X.X/10** | **+/-N.N** | |
## Player View
| Criterion | Score | Delta | Key Issue |
|-----------|-------|-------|-----------|
| Game Feel | X/10 | +/-N | ... |
| Distinctiveness | X/10 | +/-N | ... |
| Craft | X/10 | +/-N | ... |
| Clarity | X/10 | +/-N | ... |
| **Composite** | **X.X/10** | **+/-N.N** | |
## Top Issue
**What:** {one sentence}
**Why it matters:** {which criterion it drags down most}
**Where:** {specific file:line or component}
**Suggested approach:** {how to fix — but the generator decides}
Phase 2: GENERATE
You are now the Generator. You read the scorecard and fix the top issue. You have creative freedom — the evaluator told you WHAT is wrong, you decide HOW to fix it.
Step 1: Read the Scorecard
Read temp/gauntlet/scorecard.md. Focus on the Top Issue.
Step 2: Strategic Decision
Based on the scores and trend (delta from previous iteration):
- If scores are trending up: refine the current direction
- If scores are flat or declining: consider a pivot — try a fundamentally different approach
- If a specific criterion is dragging everything down: focus there exclusively
Step 3: Make ONE Coherent Improvement
Fix the top issue. You may touch multiple files if they're all part of the same fix. But don't fix 5 unrelated things.
Your toolkit — use any of these skills if they help:
/critique,/audit— for deeper analysis before acting/polish,/animate,/delight— for refinement/colorize,/typeset,/arrange— for visual improvements/bolder,/quieter— for adjusting intensity/adapt,/harden,/optimize— for robustness/normalize,/extract— for design system alignment/frontend-design— for distinctive interface work/overdrive— for technically ambitious implementations- Imagen 4 via Gemini (
compound-engineering:gemini-imagegen) — for generating card art, illustrations, textures
Constraints:
- Phone initial JS must stay under 100KB gzipped
- All changes must pass
pnpm typecheck && pnpm lint && pnpm test - CSS modules convention — no inline styles
mfrommotion/react, nevermotion(LazyMotion strict mode)- Security: allowlist pattern for state projection, no
Math.random()in server
Step 4: Verify
Run: pnpm typecheck && pnpm lint && pnpm test
If any check fails, fix it before proceeding. Do NOT leave broken code.
Step 5: Log the Change
Append to temp/gauntlet/changelog.md:
## Iteration {N} — {timestamp}
**Issue:** {what was wrong}
**Fix:** {what you changed}
**Files:** {list of modified files}
**Approach:** {refine or pivot}
**Build:** {pass/fail}
Phase 3: REPORT
Summarize what happened in 3-4 sentences:
- What the evaluator found (top issue + scores)
- What the generator did (approach + files changed)
- Whether the build passed
- What the next iteration should focus on