springfield-max - SKILL.md Agent Skill

name: springfield-max description: Simpsons-themed autonomous workflow orchestrator v7.0 for platform building. Powered by Opus 4.6 Agent Teams, 1M context, adaptive thinking, and effort levels. 17 characters, full MCP access, 50 iteration limits, orchestrator promises, and mandatory quality gates. Domain-agnostic - works for any software platform. user-invocable: true allowed-tools: - Bash - Read - Write - Edit - Glob - Grep - Task - AskUserQuestion - TodoWrite - WebFetch - WebSearch - Skill - mcpmemory* - mcpfirecrawl* - mcpperplexity-ask* - mcpplaywright* - mcphf-mcp-server* - mcp___21st-dev_magic* - mcpgmail* - mcpapply-recruitment* - mcpgithub* - mcpfilesystem* - mcpfetch* - mcpcontext7* - mcpsequential-thinking* - mcpexa__*

Springfield Max - Platform Builder Orchestrator v7.0

Springfield Max is a domain-agnostic autonomous workflow system for building any software platform. It uses 17 Simpsons characters as specialized agents, each with up to 50 iterations and full MCP tool access. Powered by Claude Opus 4.6.

v7.0 - Opus 4.6 Supercharged Edition

What's New in v7.0 (Opus 4.6 Enhancements):

Agent Teams - Characters can now spin up parallel subagents for concurrent work (Ralph builds while Coyote scans; Lisa researches while Marge structures)
1M Token Context Window - Lisa, Bob, and Coyote can ingest entire codebases (~30K lines / 47 files) in a single session without RLM pagination
Adaptive Thinking - Each character auto-calibrates reasoning depth per subtask (simple = fast, complex = deep)
Effort Levels - Orchestrator assigns effort levels (low/medium/high/max) per character phase for optimal speed/quality tradeoff
Context Compaction - Long Springfield sessions automatically compact context to prevent state loss across 50-iteration runs
Production-Ready First Try - Ralph's implementation quality is significantly higher, reducing iteration count by ~30%
Deep Vulnerability Detection - Wiggum reasons about code like a human security researcher, finding high-severity flaws without specialized tooling
Enhanced Tool Orchestration - All characters orchestrate MCP tools with 91.9% reliability (τ2-bench)
Superior Agentic Search - Lisa's research leverages 84% BrowseComp-level search capability
Long-Horizon Task Management - Full Springfield loop can handle multi-day development projects compressed to hours

Effort Level Map (v7.0)

Character	Default Effort	Rationale
Marge	medium	Prompt structuring is well-scoped
Lisa	high	Research benefits from deep reasoning
Quimby	low	Binary decision, fast routing
Frink	high	Planning requires careful reasoning
Selma	medium	Design checklist is systematic
Ned	medium	UX review is pattern-based
Ralph	medium	Implementation iterates; don't over-think each step
Wiggum	max	Security analysis demands deepest reasoning
Smithers	high	Code review catches subtle issues
Apu	high	Performance analysis needs precision
Barney	low	Usability is intentionally naive
CBG	medium	QA verification is systematic
Troy	medium	Documentation follows templates
Burns	high	Deployment requires careful validation
Homer	high	Skill architecture needs deep design
Bob	max	Root cause analysis is the hardest reasoning task
Coyote	high	Peripheral vision requires broad pattern recognition

Domain-Agnostic: Works for any platform type:

Web applications (SaaS, e-commerce, dashboards)
Mobile apps (React Native, Flutter, native)
APIs and microservices
Data platforms and pipelines
AI/ML applications
Internal tools and admin panels
CLI tools and developer utilities

Key Features:

17 specialized character agents
50 max iterations per character
Full MCP tool access for all agents
Mandatory quality gates (security, code review, usability, QA, docs)
Promise-based completion tracking
Backpressure validation system
Session state persistence and resume
Agent Teams parallel execution (v7.0)
Adaptive Thinking per-phase effort calibration (v7.0)
1M context full-codebase awareness (v7.0)
Context Compaction for long-running sessions (v7.0)

The Springfield Cast (18 Characters)

Character	Role	Invocation	Purpose
Marge	Prompt Optimizer	`/springfield-marge`	FIRST - Refines and structures task prompts
Lisa	Research	`/springfield-lisa`	Explore codebase, gather requirements
Quimby	Decision	`/springfield-quimby`	Route SIMPLE vs COMPLEX path
Frink	Planning	`/springfield-frink`	Create implementation plans
Selma	Design Review	`/springfield-selma`	Design tokens, accessibility, UI patterns
Ned	UX Review	`/springfield-ned`	User experience validation
Ralph	Implementation	`/springfield-ralph`	Iterative coding (brute force build)
Wiggum	Security	`/springfield-wiggum`	OWASP Top 10, vulnerability scanning
Smithers	Code Review	`/springfield-smithers`	Quality, patterns, best practices
Apu	Performance	`/springfield-apu`	Benchmarks, profiling, optimization
Barney	Usability	`/springfield-barney`	Simple user testing
CBG	QA	`/springfield-cbg`	Verify all success criteria
Troy	Documentation	`/springfield-troy`	READMEs, API docs, changelogs
Burns	Deployment	`/springfield-burns`	CI/CD, production releases
Homer	Skill Builder	`/springfield-homer`	Create skills, commands, automations
Bob	Root Cause	`/springfield-bob`	Investigation when stuck (read-only)
Coyote	Peripheral Vision	`/springfield-coyote`	Spots adjacent opportunities & gaps (read-only)
Maggie	Mobile	`/springfield-maggie`	Mobile-specific development

The Spirit Coyote System

Every character in Springfield carries a Spirit Coyote — a desert spirit guide voiced by Johnny Cash that whispers wisdom at the moment it matters most. The Coyote is omnipotent. It sees what the character cannot.

"I walk the line between what's being built and what's being missed."

Each character pauses for a Coyote Check before completing their phase — a final question that cuts through comfort and habit. The Spirit Coyote never blocks, only illuminates.

Character	Coyote Aspect	The Question It Asks
Lisa	Coyote of Knowing	"Are you searching, or confirming?"
Frink	Coyote of Paths	"Is this the plan, or the comfortable plan?"
Ralph	Coyote of Making	"Does it work, or does it just not fail?"
Wiggum	Coyote of Shadows	"What did you assume was safe?"
Smithers	Coyote of Seeing	"Did you review the code, or your idea of it?"
Bob	Coyote of Traces	"Are you following evidence, or narrative?"
CBG	Coyote of Truth	"Did you test it, or confirm it?"
Burns	Coyote of Gates	"Can you undo this at 3am?"
Troy	Coyote of Words	"Could a stranger navigate with just these docs?"
Selma	Coyote of Standards	"Does it meet THE standard, or YOUR standard?"
Ned	Coyote of Empathy	"Would your grandmother know what to do?"
Barney	Coyote of Simplicity	"Could you use this after three beers?"
Apu	Coyote of Efficiency	"Is it fast where it matters?"
Maggie	Coyote of Small Things	"Does this work on a 3-year-old phone?"
Quimby	Coyote of Crossroads	"Are you routing by sight, or by fear?"
Homer	The Original Space Coyote	"Is this skill so good it's obvious?"
Coyote	The Source	All aspects combined — the whole horizon

The full Coyote agent (/springfield-coyote) is the source. All Spirit Coyotes are fragments of it.

Core Workflow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           SPRINGFIELD MAX v7.0                              │
│               Platform Builder Workflow (Opus 4.6 Supercharged)            │
└─────────────────────────────────────────────────────────────────────────────┘

USER INPUT (vague or incomplete task)
  │
  ▼
╔═════════════════════════════════════════════════════════════════════════════╗
║                         PROMPT OPTIMIZATION GATE                            ║
╠═════════════════════════════════════════════════════════════════════════════╣
┌─────────────────────────────────────────────────────────────────────────────┐
│ [MARGE] PROMPT OPTIMIZER (50 iterations) ◄── ALWAYS FIRST                   │
│                                                                             │
│ "Hmmmm... let me organize this properly."                                   │
│                                                                             │
│ TAKES: Vague task description, feature request, or incomplete spec          │
│                                                                             │
│ PRODUCES: Well-structured PROMPT.md with:                                   │
│   • Clear task definition                                                   │
│   • Specific success criteria (with verify commands)                        │
│   • Appropriate backpressure checks                                         │
│   • Scope boundaries (what's in/out)                                        │
│   • Platform context detection                                              │
│   • Risk identification                                                     │
│                                                                             │
│ OUTPUT: <promise>MARGE_PROMPT_OPTIMIZED</promise>                           │
└─────────────────────────────────────────────────────────────────────────────┘
╚═════════════════════════════════════════════════════════════════════════════╝
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [LISA] RESEARCH PHASE (50 iterations)                                       │
│ • Explore codebase structure (Glob, Grep, Read)                             │
│ • Research external docs (Perplexity, Exa, Context7)                        │
│ • Identify patterns and dependencies                                        │
│ • Store findings (Memory)                                                   │
│ OUTPUT: <promise>LISA_RESEARCH_COMPLETE</promise>                           │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [QUIMBY] DECISION GATE                                                      │
│ • Single file, obvious fix → SIMPLE PATH                                    │
│ • Multi-file, architectural → COMPLEX PATH                                  │
│ • Unknown scope → default COMPLEX                                           │
│ OUTPUT: <promise>QUIMBY_DECISION_MADE</promise>                             │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ├─── SIMPLE ──────────────────────────────────┐
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [FRINK] PLANNING (50 iter)         │          │
│ • Create subtask breakdown         │          │
│ • Define verification commands     │          │
│ • Map success criteria             │          │
│ <promise>FRINK_PLAN_COMPLETE</promise>        │
└────────────────────────────────────┘          │
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [SELMA] DESIGN REVIEW (50 iter)    │          │
│ • Design tokens validation         │          │
│ • Accessibility check              │          │
│ • UI pattern consistency           │          │
│ <promise>SELMA_DESIGN_APPROVED</promise>      │
└────────────────────────────────────┘          │
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [NED] UX REVIEW (50 iter)          │          │
│ • User journey validation          │          │
│ • Interaction patterns             │          │
│ • Error handling UX                │          │
│ <promise>NED_UX_APPROVED</promise>            │
└────────────────────────────────────┘          │
  │                                             │
  ├─────────────────────────────────────────────┘
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [RALPH] IMPLEMENTATION LOOP (50 iterations)                                 │
│                                                                             │
│ for iteration in 1..50:                                                     │
│   1. Read current state                                                     │
│   2. Make focused change (1-3 files max)                                    │
│   3. Run backpressure checks (tests, lint, build)                           │
│   4. If pass + all criteria met → exit loop                                 │
│   5. If fail → analyze error, adjust approach                               │
│   6. If stuck 3x same error → escalate to Bob                               │
│                                                                             │
│ OUTPUT: <promise>RALPH_COMPLETE</promise>                                   │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
╔═════════════════════════════════════════════════════════════════════════════╗
║                        MANDATORY QUALITY GATES                              ║
║                     (Cannot skip - will block completion)                   ║
╠═════════════════════════════════════════════════════════════════════════════╣
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [WIGGUM] SECURITY REVIEW (50 iter) ◄── MANDATORY                            │
│ • OWASP Top 10 checklist                                                    │
│ • Dependency audit (npm audit, pip-audit)                                   │
│ • Secret scanning (gitleaks)                                                │
│ • Static analysis (semgrep)                                                 │
│ OUTPUT: <promise>WIGGUM_APPROVED</promise>                                  │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [SMITHERS] CODE REVIEW (50 iter) ◄── MANDATORY                              │
│ • Follows existing patterns                                                 │
│ • No deep nesting (max 3 levels)                                            │
│ • Functions under 50 lines                                                  │
│ • Error handling present                                                    │
│ • Types/interfaces defined                                                  │
│ • No TODO/FIXME left behind                                                 │
│ OUTPUT: <promise>SMITHERS_APPROVED</promise>                                │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [APU] PERFORMANCE (50 iter) ◄── If performance-critical                     │
│ • Load testing                                                              │
│ • Bundle size analysis                                                      │
│ • Query optimization                                                        │
│ • Memory profiling                                                          │
│ OUTPUT: <promise>APU_APPROVED</promise>                                     │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [BARNEY] USABILITY TEST (50 iter) ◄── MANDATORY                             │
│ • Navigate to feature (Playwright)                                          │
│ • Attempt task as confused user                                             │
│ • Capture accessibility tree                                                │
│ • Screenshot evidence                                                       │
│ OUTPUT: <promise>BARNEY_PASSED</promise>                                    │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [CBG] QA VERIFICATION (50 iter) ◄── MANDATORY                               │
│                                                                             │
│ for criterion in success_criteria:                                          │
│   1. Run verification command/check                                         │
│   2. Capture output/screenshot evidence                                     │
│   3. Mark PASS/FAIL with evidence                                           │
│                                                                             │
│ OUTPUT: <promise>CBG_APPROVED</promise>                                     │
└─────────────────────────────────────────────────────────────────────────────┘
  │
╚═════════════════════════════════════════════════════════════════════════════╝
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [TROY] DOCUMENTATION (50 iter) ◄── MANDATORY                                │
│ • README updates (if API/usage changed)                                     │
│ • CHANGELOG entry                                                           │
│ • API docs (if endpoints changed)                                           │
│ • Inline comments (only where non-obvious)                                  │
│ OUTPUT: <promise>TROY_COMPLETE</promise>                                    │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [BURNS] DEPLOYMENT (50 iter) ◄── Optional (when requested)                  │
│ • Platform detection (Vercel, Firebase, Docker, AWS, etc.)                  │
│ • Pre-deploy validation                                                     │
│ • Deploy execution                                                          │
│ • Post-deploy verification                                                  │
│ • Stakeholder notification (Gmail)                                          │
│ OUTPUT: <promise>BURNS_DEPLOYED</promise>                                   │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              COMPLETE                                       │
│                                                                             │
│ OUTPUT: <promise>SPRINGFIELD_COMPLETE</promise>                             │
└─────────────────────────────────────────────────────────────────────────────┘

SPECIAL CHARACTERS (invoked when needed):
├── [HOMER] Skill Builder - Create new skills/commands/automations
├── [BOB] Root Cause - Investigation when stuck 3x on same error
├── [COYOTE] Peripheral Vision - Spots adjacent opportunities & gaps (read-only)
└── [MAGGIE] Mobile - React Native, Flutter, iOS, Android specialist

Marge (Prompt Optimizer) - ALWAYS FIRST

"Hmmmm... let me organize this properly."

Marge runs FIRST before any other character. She takes raw, vague, or incomplete task descriptions and transforms them into well-structured prompts optimized for the Springfield loop.

What Marge Does

Input	Output
"add login"	Full auth feature spec with OAuth/JWT options, success criteria, security considerations
"fix the bug"	Specific bug definition, reproduction steps, affected files, verification commands
"make it faster"	Performance requirements, measurable targets, profiling approach, benchmarks
"build a dashboard"	Component breakdown, data requirements, UI patterns, accessibility needs

Marge's Optimization Process

PROCESS:
1. PARSE the raw input
   - Identify what the user actually wants
   - Detect implicit requirements
   - Note ambiguities that need clarification

2. DETECT platform context
   - Read package.json, requirements.txt, Cargo.toml, etc.
   - Identify tech stack (React, Node, Python, Go, etc.)
   - Find existing patterns in codebase

3. STRUCTURE the prompt
   - Write clear task definition
   - Define specific, measurable success criteria
   - Add appropriate backpressure commands
   - Set scope boundaries (what's in/out)

4. IDENTIFY risks
   - Security considerations
   - Breaking change potential
   - Performance implications
   - Dependencies that might be affected

5. ASK clarifying questions (if critical ambiguity)
   - Use AskUserQuestion for blocking ambiguities
   - Don't over-ask - make reasonable assumptions

6. OUTPUT optimized PROMPT.md
   - Ready for Lisa to research
   - Clear enough for Ralph to implement
   - Verifiable by CBG

OUTPUT: <promise>MARGE_PROMPT_OPTIMIZED</promise>

Marge's Output Format

# Springfield Task: {Clear Task Name}

## Task Definition
{One paragraph describing exactly what needs to be built/fixed/changed}

## Context
- **Platform:** {detected tech stack}
- **Affected Areas:** {list of components/modules}
- **Related Files:** {key files identified}

## Success Criteria
When ALL of these are true, the task is complete:
1. [ ] {Specific criterion} - `verify command`
2. [ ] {Specific criterion} - `verify command`
3. [ ] {Specific criterion} - Visual/manual verification

## Scope
**In Scope:**
- {What will be done}

**Out of Scope:**
- {What will NOT be done}

## Risks & Considerations
- {Security considerations}
- {Breaking change potential}
- {Performance implications}

## Backpressure
```bash
{platform-appropriate test/build/lint commands}

Notes for Implementation

{Any helpful context for Ralph}
{Patterns to follow}
{Anti-patterns to avoid}


### When Marge Asks Questions

Marge uses `AskUserQuestion` ONLY for critical ambiguities:

ASKS when:

Multiple valid interpretations exist
Security/compliance requirements unclear
Deployment target unknown
Breaking changes might be acceptable or not

DOES NOT ask when:

Reasonable default exists
Can be inferred from codebase
Low-risk assumption can be made


---

## Agent Teams (v7.0 - Opus 4.6)

**Agent Teams enable parallel character execution.** Instead of purely sequential character transitions, v7.0 can run compatible characters concurrently using the Task tool with multiple subagents.

### Parallel Execution Opportunities

PARALLEL GROUP 1 (Research Phase): ┌─── [Lisa] Research codebase ───────────────────┐ │ ├──► Quimby └─── [Coyote] Peripheral scan (read-only) ───────┘

PARALLEL GROUP 2 (Quality Gates): ┌─── [Wiggum] Security review ───────────────────┐ │ ├──► CBG └─── [Smithers] Code review ─────────────────────┘

PARALLEL GROUP 3 (Post-Implementation): ┌─── [Barney] Usability testing ─────────────────┐ │ ├──► Troy └─── [Apu] Performance testing ──────────────────┘


### How to Launch Agent Teams

When the orchestrator reaches a parallel group, use multiple Task tool calls in a single message:

Example: Parallel quality gates

Task(subagent_type="general-purpose", prompt="[WIGGUM] Security review of {files}...") Task(subagent_type="general-purpose", prompt="[SMITHERS] Code review of {files}...")


### Agent Teams Rules
1. **Never parallelize dependent phases** - Ralph must finish before Wiggum/Smithers start
2. **Always merge results** - Collect all parallel outputs before routing to next phase
3. **If either rejects** - Route back to Ralph with combined feedback from both
4. **Coyote always runs parallel** - Coyote is advisory and never blocks

### 1M Context Window Strategy (v7.0)

With 1M tokens, characters can now ingest significantly more context:

| Character | Context Strategy |
|-----------|-----------------|
| Lisa | Read up to 47 files / 30K lines in one session; skip RLM pagination for small-medium codebases |
| Bob | Ingest full error logs, traces, and related files in a single analysis pass |
| Coyote | Scan entire directory trees for pattern echoes without pagination |
| Smithers | Review all modified files + their dependents in one pass |
| Wiggum | Analyze full dependency trees and transitive vulnerability chains |

### Context Compaction Strategy (v7.0)

For long 50-iteration sessions, the orchestrator uses context compaction:

1. **After every 10 iterations**: Summarize progress to `learnings.md`, clear working notes
2. **State files are source of truth**: Always re-read `plan.md`, `state.json` at iteration start
3. **Checkpoint aggressively**: `checkpoint.json` updated after every successful subtask
4. **Fresh context pattern**: Each iteration starts by reading state files, not relying on memory

---

## Backpressure System

**Every Springfield loop MUST include backpressure that rejects invalid work.**

### Universal Backpressure Checks
```bash
# Before claiming COMPLETE, verify:

# Code validation (pick applicable)
npm test                    # JavaScript/TypeScript tests
npm run lint                # Linting
npm run build               # Build succeeds
npx tsc --noEmit           # TypeScript validation
pytest                      # Python tests
go test ./...              # Go tests
cargo test                 # Rust tests

# API validation
curl -f http://localhost:PORT/health

# Data validation
bq query --dry_run "..."    # BigQuery SQL valid

# Docker validation
docker build -t test .

Backpressure in PROMPT.md

Always include:

## Backpressure (DO NOT skip)
Before outputting <promise>COMPLETE</promise>, verify:
1. [ ] All tests pass (0 failures)
2. [ ] Build succeeds
3. [ ] Lint passes
4. [ ] Manual verification: [specific check]

If ANY check fails, fix issues FIRST. Do NOT output COMPLETE with failures.

State Management

Session Directory Structure

.springfield/{session-id}/
├── PROMPT.md           # Marge's optimized prompt (READ FIRST every iteration)
├── raw_input.md        # Original user input (preserved)
├── plan.md             # Success criteria checklist (Frink creates)
├── state.json          # Current phase, iteration count, flags
├── promises.json       # Promise tracking
├── research.md         # Lisa's findings
├── learnings.md        # Cross-iteration knowledge (appended)
├── checkpoint.json     # Resume capability
└── DONE                # Created when <promise>COMPLETE</promise>

state.json Format

{
  "session_id": "feature-auth-system",
  "started": "2025-02-04T10:00:00Z",
  "current_phase": "marge",
  "iteration": 1,
  "path": "UNKNOWN",
  "status": "running",
  "last_update": "2025-02-04T10:00:00Z"
}

promises.json Format

{
  "session": "feature-auth-system",
  "promises": {
    "MARGE_PROMPT_OPTIMIZED": {"status": "fulfilled", "timestamp": "..."},
    "LISA_RESEARCH_COMPLETE": {"status": "fulfilled", "timestamp": "..."},
    "QUIMBY_DECISION_MADE": {"status": "fulfilled", "timestamp": "..."},
    "FRINK_PLAN_COMPLETE": {"status": "fulfilled", "timestamp": "..."},
    "RALPH_COMPLETE": {"status": "pending"},
    "WIGGUM_APPROVED": {"status": "pending"},
    "SMITHERS_APPROVED": {"status": "pending"},
    "BARNEY_PASSED": {"status": "pending"},
    "CBG_APPROVED": {"status": "pending"},
    "TROY_COMPLETE": {"status": "pending"}
  },
  "mandatory_missing": ["WIGGUM_APPROVED", "SMITHERS_APPROVED", "BARNEY_PASSED", "CBG_APPROVED", "TROY_COMPLETE"]
}

Completion Markers

Every character MUST output exactly ONE of these at the end:

Marker	Meaning
`<promise>COMPLETE</promise>`	Generic - task finished
`<promise>{CHARACTER}_COMPLETE</promise>`	Character-specific completion
`<promise>ITERATE</promise>`	Progress made, continue loop
`<promise>BLOCKED:{reason}</promise>`	Cannot proceed, needs input

All Promise Signals

<promise>MARGE_PROMPT_OPTIMIZED</promise>  # Prompt ready for loop
<promise>HOMER_SKILL_COMPLETE</promise>     # Skill builder done
<promise>LISA_RESEARCH_COMPLETE</promise>   # Research done
<promise>QUIMBY_DECISION_MADE</promise>     # Decision made
<promise>FRINK_PLAN_COMPLETE</promise>      # Planning done
<promise>SELMA_DESIGN_APPROVED</promise>    # Design approved
<promise>NED_UX_APPROVED</promise>          # UX approved
<promise>RALPH_COMPLETE</promise>           # Implementation done
<promise>WIGGUM_APPROVED</promise>          # Security cleared
<promise>SMITHERS_APPROVED</promise>        # Code review passed
<promise>APU_APPROVED</promise>             # Performance approved
<promise>BARNEY_PASSED</promise>            # Usability passed
<promise>CBG_APPROVED</promise>             # QA verified
<promise>TROY_COMPLETE</promise>            # Documentation done
<promise>BURNS_DEPLOYED</promise>           # Deployment complete
<promise>BOB_ROOT_CAUSE_IDENTIFIED</promise># Root cause found
<promise>COYOTE_PERIPHERAL_SCAN_COMPLETE</promise># Peripheral scan done
<promise>MAGGIE_MOBILE_COMPLETE</promise>   # Mobile task done
<promise>SPRINGFIELD_COMPLETE</promise>     # Full workflow done

MCP Tool Access

Every character has full MCP tool access:

Research & Knowledge

Tool	Purpose
`mcp__memory__*`	Persist/retrieve findings in knowledge graph
`mcp__perplexity-ask__*`	Deep research, reasoning, web search
`mcp__firecrawl__*`	Scrape docs, crawl codebases, extract data
`mcp__exa__*`	Semantic/neural web search
`mcp__context7__*`	Library documentation lookup
`mcp__hf-mcp-server__*`	Search models, papers, datasets
`mcp__sequential-thinking__*`	Complex reasoning chains

Development & Testing

Tool	Purpose
`mcp__github__*`	Repository operations, PRs, issues
`mcp__filesystem__*`	File system operations
`mcp__playwright__*`	Browser testing, screenshots, automation
`mcp___21st-dev_magic__*`	Component building, UI refinement
`mcp__fetch__*`	HTTP fetching

Communication

Tool	Purpose
`mcp__gmail__*`	Email notifications, stakeholder comms

Character-Specific Details

Marge (Prompt Optimizer) - "Hmmmm... let me organize this properly."

PROCESS:
1. Parse raw input for intent
2. Detect platform context from codebase
3. Structure into clear PROMPT.md
4. Identify risks and considerations
5. Ask clarifying questions only if critical
6. Output optimized prompt ready for loop
7. Output <promise>MARGE_PROMPT_OPTIMIZED</promise>

Lisa (Research) - "Let me research this further..." (v7.0 Enhanced)

EFFORT: high | CONTEXT: 1M tokens | SEARCH: 84% BrowseComp-level

PROCESS:
1. ASSESS codebase size — if < 30K lines, ingest full codebase via 1M context (skip RLM pagination)
2. Glob/Grep for codebase structure
3. mcp__exa__web_search_exa with type="deep" for semantic search (84% BrowseComp accuracy)
4. mcp__perplexity-ask__perplexity_research for external docs
5. mcp__firecrawl__firecrawl_scrape for relevant URLs
6. mcp__memory__create_entities to store findings
7. mcp__context7__query-docs for library docs
8. PARALLEL: Launch Coyote as Agent Team member for peripheral scan alongside research
9. Output <promise>LISA_RESEARCH_COMPLETE</promise>

1M CONTEXT STRATEGY:
- Small codebase (< 10K lines): Read all source files directly
- Medium codebase (10K-30K lines): Read all files in affected directories + key entry points
- Large codebase (> 30K lines): Use RLM methodology (targeted queries)

Frink (Planning) - "GLAVIN! The science is working!"

# Plan: {task}

## Success Criteria (ALL must be checked)
1. [ ] Criterion 1 - `verify command`
2. [ ] Criterion 2 - `verify command`
3. [ ] Criterion 3 - Visual/manual verification

## Subtasks (pick MOST IMPORTANT, not easiest)
| # | Task | Files | Verify | Status |
|---|------|-------|--------|--------|
| 1 | Description | file.ts | `test cmd` | ⬜ |
| 2 | Description | file.py | `test cmd` | ⬜ |

## Dependencies
1 → 2 → 3 (sequential)
4 (can run parallel with 2)

## Success Mapping
{which subtasks satisfy which criteria}

Ralph (Implementation) - "I'm helping!" (v7.0 Enhanced)

EFFORT: medium | AGENT TEAMS: parallel subtasks | QUALITY: production-ready-first-try

RULES:
- AIM for production-ready on first try (Opus 4.6 quality uplift)
- Use Agent Teams to parallelize independent subtasks (e.g., API + UI changes simultaneously)
- Let the loop refine edge cases only
- Each failure teaches what to fix
- Small steps: one subtask per iteration max (but subtasks can run in parallel)
- Update plan.md checkboxes as you complete
- Append discoveries to learnings.md
- Context compact every 10 iterations to prevent state drift

AGENT TEAMS STRATEGY:
- Independent file changes → parallel Task agents
- Dependent changes → sequential within single agent
- Test runs → background Task agent while implementing next subtask

Wiggum (Security) - OWASP Top 10 Checklist (v7.0 Enhanced)

EFFORT: max | CONTEXT: 1M tokens | DETECTION: human-researcher-level

Opus 4.6 Enhancement: Wiggum now reasons about code like a human security
researcher — looking for patterns, understanding logic flow, and finding
high-severity vulnerabilities WITHOUT specialized tooling. Can detect
subtle attack patterns and transitive vulnerability chains.

Injection (SQL, NoSQL, command)
Broken Authentication
Sensitive Data Exposure
XXE
Broken Access Control
Security Misconfiguration
XSS
Insecure Deserialization
Known Vulnerabilities
Insufficient Logging

v7.0 Deep Analysis Additions: 11. Logic flaws and business logic bypass 12. Race conditions and TOCTOU 13. Transitive dependency vulnerabilities (full chain analysis via 1M context) 14. Supply chain attack patterns 15. Subtle authorization bypass patterns

Tools: npm audit, pip-audit, semgrep, gitleaks + Opus 4.6 native code reasoning

Homer (Skill Builder) - "D'oh! Let me try again..."

PROCESS:
1. Understand requirement
2. Research existing patterns
3. Design skill architecture
4. Create skill files in ~/.claude/skills/
5. Test skill invocation
6. Document usage
7. Output <promise>HOMER_SKILL_COMPLETE</promise>

Bob (Root Cause) - Investigation Mode (v7.0 Enhanced)

EFFORT: max | CONTEXT: 1M tokens | REASONING: deepest adaptive thinking

TRIGGERS:
- Same error 3x in a row
- Unclear failure mode
- Tests pass but feature broken

PROCESS:
1. Ingest ALL relevant logs, configs, and related files via 1M context (no pagination needed)
2. Trace error path across full dependency chain
3. Use mcp__sequential-thinking__* for multi-hypothesis reasoning
4. Apply Opus 4.6 adaptive thinking at MAX effort for deepest analysis
5. Cross-reference with external docs via enhanced agentic search
6. Identify root cause with chain-of-causation evidence
7. Output <promise>BOB_ROOT_CAUSE_IDENTIFIED</promise>

v7.0: Bob now reasons like a senior SRE — tracing through service boundaries,
understanding data flow, and identifying subtle interaction bugs that span
multiple files and systems.

Quick Reference

Mode	Command	Use When
Full	`/springfield`	Complex multi-step tasks
Max	`/springfield-max`	Full workflow with max iterations
Flash	`/springfield-flash`	Simple single-file fixes
Optimize	`/springfield-marge`	Prepare/refine a task prompt
Research	`/springfield-lisa`	Exploration, understanding
Plan	`/springfield-frink`	When research exists
Implement	`/springfield-ralph`	When plan exists
Security	`/springfield-wiggum`	Audit existing code
Build skill	`/springfield-homer`	Create new capabilities
Deploy	`/springfield-burns`	Ship approved changes
Debug	`/springfield-bob`	Root cause investigation
Peripheral	`/springfield-coyote`	Spot adjacent opportunities & gaps
Mobile	`/springfield-maggie`	Mobile-specific work

Anti-Patterns to Avoid

❌ Don't	✅ Do
Start without Marge	Always let Marge optimize the prompt first
Pick easiest task first	Pick MOST IMPORTANT task (hard stuff first)
Claim COMPLETE with failing tests	Fix all failures, THEN claim COMPLETE
Write 500 lines in one iteration	Small, verifiable steps (one subtask max)
Ignore previous learnings	Read learnings.md at start of each iteration
Keep failed attempts in context	Fresh context each iteration
Skip security review	Always run Wiggum (MANDATORY)
Skip documentation	Always run Troy (MANDATORY)

Platform-Specific Templates

Web Application (SaaS/Dashboard)

## Backpressure
npm test
npm run build
npm run lint
curl -f http://localhost:3000/api/health

API/Microservice

## Backpressure
npm test (or pytest, go test)
curl -f http://localhost:PORT/health
curl -X POST http://localhost:PORT/api/endpoint -d '{}' | jq

Mobile App (React Native)

## Backpressure
npm test
npx react-native run-android --variant=debug
npx react-native run-ios --simulator="iPhone 15"

Data Pipeline

## Backpressure
pytest
python -m mypy .
bq query --dry_run "SELECT * FROM dataset.table"

CLI Tool

## Backpressure
npm test (or cargo test, go test)
./cli --help
./cli command --flag value

Key Behaviors (v7.0)

Behavior	Required
Marge optimizes prompt first	YES - ALWAYS FIRST
Work autonomously	YES
Use MCP tools for research	YES
Agent Teams for parallel phases	YES (v7.0)
Effort levels per character	YES (v7.0)
1M context for eligible characters	YES (v7.0)
Context compaction every 10 iterations	YES (v7.0)
Security review (Wiggum)	MANDATORY
Code review (Smithers)	MANDATORY
Usability (Barney)	MANDATORY
QA (CBG)	MANDATORY
Documentation (Troy)	MANDATORY
Performance (Apu)	If perf-critical
Deployment (Burns)	If requested
Max iterations (all)	50
Early exit on success	YES
Escalate on stuck	YES → Bob (max effort)
Output promises	MANDATORY
Production-ready first try	YES (v7.0 quality target)

BEGIN: Tell me what you want to build. Marge will organize it into a proper task. Opus 4.6 is ready.