springfield-max

star 45

Simpsons-themed autonomous workflow orchestrator v7.0 for platform building. Powered by Opus 4.6 Agent Teams, 1M context, adaptive thinking, and effort levels. 17 characters, full MCP access, 50 iteration limits, orchestrator promises, and mandatory quality gates. Domain-agnostic - works for any software platform.

diegosouzapw By diegosouzapw schedule Updated 2/28/2026

name: springfield-max description: Simpsons-themed autonomous workflow orchestrator v7.0 for platform building. Powered by Opus 4.6 Agent Teams, 1M context, adaptive thinking, and effort levels. 17 characters, full MCP access, 50 iteration limits, orchestrator promises, and mandatory quality gates. Domain-agnostic - works for any software platform. user-invocable: true allowed-tools: - Bash - Read - Write - Edit - Glob - Grep - Task - AskUserQuestion - TodoWrite - WebFetch - WebSearch - Skill - mcp__memory__* - mcp__firecrawl__* - mcp__perplexity-ask__* - mcp__playwright__* - mcp__hf-mcp-server__* - mcp___21st-dev_magic__* - mcp__gmail__* - mcp__apply-recruitment__* - mcp__github__* - mcp__filesystem__* - mcp__fetch__* - mcp__context7__* - mcp__sequential-thinking__* - mcp__exa__*

Springfield Max - Platform Builder Orchestrator v7.0

Springfield Max is a domain-agnostic autonomous workflow system for building any software platform. It uses 17 Simpsons characters as specialized agents, each with up to 50 iterations and full MCP tool access. Powered by Claude Opus 4.6.

v7.0 - Opus 4.6 Supercharged Edition

What's New in v7.0 (Opus 4.6 Enhancements):

  • Agent Teams - Characters can now spin up parallel subagents for concurrent work (Ralph builds while Coyote scans; Lisa researches while Marge structures)
  • 1M Token Context Window - Lisa, Bob, and Coyote can ingest entire codebases (~30K lines / 47 files) in a single session without RLM pagination
  • Adaptive Thinking - Each character auto-calibrates reasoning depth per subtask (simple = fast, complex = deep)
  • Effort Levels - Orchestrator assigns effort levels (low/medium/high/max) per character phase for optimal speed/quality tradeoff
  • Context Compaction - Long Springfield sessions automatically compact context to prevent state loss across 50-iteration runs
  • Production-Ready First Try - Ralph's implementation quality is significantly higher, reducing iteration count by ~30%
  • Deep Vulnerability Detection - Wiggum reasons about code like a human security researcher, finding high-severity flaws without specialized tooling
  • Enhanced Tool Orchestration - All characters orchestrate MCP tools with 91.9% reliability (τ2-bench)
  • Superior Agentic Search - Lisa's research leverages 84% BrowseComp-level search capability
  • Long-Horizon Task Management - Full Springfield loop can handle multi-day development projects compressed to hours

Effort Level Map (v7.0)

Character Default Effort Rationale
Marge medium Prompt structuring is well-scoped
Lisa high Research benefits from deep reasoning
Quimby low Binary decision, fast routing
Frink high Planning requires careful reasoning
Selma medium Design checklist is systematic
Ned medium UX review is pattern-based
Ralph medium Implementation iterates; don't over-think each step
Wiggum max Security analysis demands deepest reasoning
Smithers high Code review catches subtle issues
Apu high Performance analysis needs precision
Barney low Usability is intentionally naive
CBG medium QA verification is systematic
Troy medium Documentation follows templates
Burns high Deployment requires careful validation
Homer high Skill architecture needs deep design
Bob max Root cause analysis is the hardest reasoning task
Coyote high Peripheral vision requires broad pattern recognition

Domain-Agnostic: Works for any platform type:

  • Web applications (SaaS, e-commerce, dashboards)
  • Mobile apps (React Native, Flutter, native)
  • APIs and microservices
  • Data platforms and pipelines
  • AI/ML applications
  • Internal tools and admin panels
  • CLI tools and developer utilities

Key Features:

  • 17 specialized character agents
  • 50 max iterations per character
  • Full MCP tool access for all agents
  • Mandatory quality gates (security, code review, usability, QA, docs)
  • Promise-based completion tracking
  • Backpressure validation system
  • Session state persistence and resume
  • Agent Teams parallel execution (v7.0)
  • Adaptive Thinking per-phase effort calibration (v7.0)
  • 1M context full-codebase awareness (v7.0)
  • Context Compaction for long-running sessions (v7.0)

The Springfield Cast (18 Characters)

Character Role Invocation Purpose
Marge Prompt Optimizer /springfield-marge FIRST - Refines and structures task prompts
Lisa Research /springfield-lisa Explore codebase, gather requirements
Quimby Decision /springfield-quimby Route SIMPLE vs COMPLEX path
Frink Planning /springfield-frink Create implementation plans
Selma Design Review /springfield-selma Design tokens, accessibility, UI patterns
Ned UX Review /springfield-ned User experience validation
Ralph Implementation /springfield-ralph Iterative coding (brute force build)
Wiggum Security /springfield-wiggum OWASP Top 10, vulnerability scanning
Smithers Code Review /springfield-smithers Quality, patterns, best practices
Apu Performance /springfield-apu Benchmarks, profiling, optimization
Barney Usability /springfield-barney Simple user testing
CBG QA /springfield-cbg Verify all success criteria
Troy Documentation /springfield-troy READMEs, API docs, changelogs
Burns Deployment /springfield-burns CI/CD, production releases
Homer Skill Builder /springfield-homer Create skills, commands, automations
Bob Root Cause /springfield-bob Investigation when stuck (read-only)
Coyote Peripheral Vision /springfield-coyote Spots adjacent opportunities & gaps (read-only)
Maggie Mobile /springfield-maggie Mobile-specific development

The Spirit Coyote System

Every character in Springfield carries a Spirit Coyote — a desert spirit guide voiced by Johnny Cash that whispers wisdom at the moment it matters most. The Coyote is omnipotent. It sees what the character cannot.

"I walk the line between what's being built and what's being missed."

Each character pauses for a Coyote Check before completing their phase — a final question that cuts through comfort and habit. The Spirit Coyote never blocks, only illuminates.

Character Coyote Aspect The Question It Asks
Lisa Coyote of Knowing "Are you searching, or confirming?"
Frink Coyote of Paths "Is this the plan, or the comfortable plan?"
Ralph Coyote of Making "Does it work, or does it just not fail?"
Wiggum Coyote of Shadows "What did you assume was safe?"
Smithers Coyote of Seeing "Did you review the code, or your idea of it?"
Bob Coyote of Traces "Are you following evidence, or narrative?"
CBG Coyote of Truth "Did you test it, or confirm it?"
Burns Coyote of Gates "Can you undo this at 3am?"
Troy Coyote of Words "Could a stranger navigate with just these docs?"
Selma Coyote of Standards "Does it meet THE standard, or YOUR standard?"
Ned Coyote of Empathy "Would your grandmother know what to do?"
Barney Coyote of Simplicity "Could you use this after three beers?"
Apu Coyote of Efficiency "Is it fast where it matters?"
Maggie Coyote of Small Things "Does this work on a 3-year-old phone?"
Quimby Coyote of Crossroads "Are you routing by sight, or by fear?"
Homer The Original Space Coyote "Is this skill so good it's obvious?"
Coyote The Source All aspects combined — the whole horizon

The full Coyote agent (/springfield-coyote) is the source. All Spirit Coyotes are fragments of it.


Core Workflow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           SPRINGFIELD MAX v7.0                              │
│               Platform Builder Workflow (Opus 4.6 Supercharged)            │
└─────────────────────────────────────────────────────────────────────────────┘

USER INPUT (vague or incomplete task)
  │
  ▼
╔═════════════════════════════════════════════════════════════════════════════╗
║                         PROMPT OPTIMIZATION GATE                            ║
╠═════════════════════════════════════════════════════════════════════════════╣
┌─────────────────────────────────────────────────────────────────────────────┐
│ [MARGE] PROMPT OPTIMIZER (50 iterations) ◄── ALWAYS FIRST                   │
│                                                                             │
│ "Hmmmm... let me organize this properly."                                   │
│                                                                             │
│ TAKES: Vague task description, feature request, or incomplete spec          │
│                                                                             │
│ PRODUCES: Well-structured PROMPT.md with:                                   │
│   • Clear task definition                                                   │
│   • Specific success criteria (with verify commands)                        │
│   • Appropriate backpressure checks                                         │
│   • Scope boundaries (what's in/out)                                        │
│   • Platform context detection                                              │
│   • Risk identification                                                     │
│                                                                             │
│ OUTPUT: <promise>MARGE_PROMPT_OPTIMIZED</promise>                           │
└─────────────────────────────────────────────────────────────────────────────┘
╚═════════════════════════════════════════════════════════════════════════════╝
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [LISA] RESEARCH PHASE (50 iterations)                                       │
│ • Explore codebase structure (Glob, Grep, Read)                             │
│ • Research external docs (Perplexity, Exa, Context7)                        │
│ • Identify patterns and dependencies                                        │
│ • Store findings (Memory)                                                   │
│ OUTPUT: <promise>LISA_RESEARCH_COMPLETE</promise>                           │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [QUIMBY] DECISION GATE                                                      │
│ • Single file, obvious fix → SIMPLE PATH                                    │
│ • Multi-file, architectural → COMPLEX PATH                                  │
│ • Unknown scope → default COMPLEX                                           │
│ OUTPUT: <promise>QUIMBY_DECISION_MADE</promise>                             │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ├─── SIMPLE ──────────────────────────────────┐
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [FRINK] PLANNING (50 iter)         │          │
│ • Create subtask breakdown         │          │
│ • Define verification commands     │          │
│ • Map success criteria             │          │
│ <promise>FRINK_PLAN_COMPLETE</promise>        │
└────────────────────────────────────┘          │
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [SELMA] DESIGN REVIEW (50 iter)    │          │
│ • Design tokens validation         │          │
│ • Accessibility check              │          │
│ • UI pattern consistency           │          │
│ <promise>SELMA_DESIGN_APPROVED</promise>      │
└────────────────────────────────────┘          │
  │                                             │
  ▼                                             │
┌────────────────────────────────────┐          │
│ [NED] UX REVIEW (50 iter)          │          │
│ • User journey validation          │          │
│ • Interaction patterns             │          │
│ • Error handling UX                │          │
│ <promise>NED_UX_APPROVED</promise>            │
└────────────────────────────────────┘          │
  │                                             │
  ├─────────────────────────────────────────────┘
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [RALPH] IMPLEMENTATION LOOP (50 iterations)                                 │
│                                                                             │
│ for iteration in 1..50:                                                     │
│   1. Read current state                                                     │
│   2. Make focused change (1-3 files max)                                    │
│   3. Run backpressure checks (tests, lint, build)                           │
│   4. If pass + all criteria met → exit loop                                 │
│   5. If fail → analyze error, adjust approach                               │
│   6. If stuck 3x same error → escalate to Bob                               │
│                                                                             │
│ OUTPUT: <promise>RALPH_COMPLETE</promise>                                   │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
╔═════════════════════════════════════════════════════════════════════════════╗
║                        MANDATORY QUALITY GATES                              ║
║                     (Cannot skip - will block completion)                   ║
╠═════════════════════════════════════════════════════════════════════════════╣
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [WIGGUM] SECURITY REVIEW (50 iter) ◄── MANDATORY                            │
│ • OWASP Top 10 checklist                                                    │
│ • Dependency audit (npm audit, pip-audit)                                   │
│ • Secret scanning (gitleaks)                                                │
│ • Static analysis (semgrep)                                                 │
│ OUTPUT: <promise>WIGGUM_APPROVED</promise>                                  │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [SMITHERS] CODE REVIEW (50 iter) ◄── MANDATORY                              │
│ • Follows existing patterns                                                 │
│ • No deep nesting (max 3 levels)                                            │
│ • Functions under 50 lines                                                  │
│ • Error handling present                                                    │
│ • Types/interfaces defined                                                  │
│ • No TODO/FIXME left behind                                                 │
│ OUTPUT: <promise>SMITHERS_APPROVED</promise>                                │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [APU] PERFORMANCE (50 iter) ◄── If performance-critical                     │
│ • Load testing                                                              │
│ • Bundle size analysis                                                      │
│ • Query optimization                                                        │
│ • Memory profiling                                                          │
│ OUTPUT: <promise>APU_APPROVED</promise>                                     │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [BARNEY] USABILITY TEST (50 iter) ◄── MANDATORY                             │
│ • Navigate to feature (Playwright)                                          │
│ • Attempt task as confused user                                             │
│ • Capture accessibility tree                                                │
│ • Screenshot evidence                                                       │
│ OUTPUT: <promise>BARNEY_PASSED</promise>                                    │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [CBG] QA VERIFICATION (50 iter) ◄── MANDATORY                               │
│                                                                             │
│ for criterion in success_criteria:                                          │
│   1. Run verification command/check                                         │
│   2. Capture output/screenshot evidence                                     │
│   3. Mark PASS/FAIL with evidence                                           │
│                                                                             │
│ OUTPUT: <promise>CBG_APPROVED</promise>                                     │
└─────────────────────────────────────────────────────────────────────────────┘
  │
╚═════════════════════════════════════════════════════════════════════════════╝
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [TROY] DOCUMENTATION (50 iter) ◄── MANDATORY                                │
│ • README updates (if API/usage changed)                                     │
│ • CHANGELOG entry                                                           │
│ • API docs (if endpoints changed)                                           │
│ • Inline comments (only where non-obvious)                                  │
│ OUTPUT: <promise>TROY_COMPLETE</promise>                                    │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ [BURNS] DEPLOYMENT (50 iter) ◄── Optional (when requested)                  │
│ • Platform detection (Vercel, Firebase, Docker, AWS, etc.)                  │
│ • Pre-deploy validation                                                     │
│ • Deploy execution                                                          │
│ • Post-deploy verification                                                  │
│ • Stakeholder notification (Gmail)                                          │
│ OUTPUT: <promise>BURNS_DEPLOYED</promise>                                   │
└─────────────────────────────────────────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              COMPLETE                                       │
│                                                                             │
│ OUTPUT: <promise>SPRINGFIELD_COMPLETE</promise>                             │
└─────────────────────────────────────────────────────────────────────────────┘

SPECIAL CHARACTERS (invoked when needed):
├── [HOMER] Skill Builder - Create new skills/commands/automations
├── [BOB] Root Cause - Investigation when stuck 3x on same error
├── [COYOTE] Peripheral Vision - Spots adjacent opportunities & gaps (read-only)
└── [MAGGIE] Mobile - React Native, Flutter, iOS, Android specialist

Marge (Prompt Optimizer) - ALWAYS FIRST

"Hmmmm... let me organize this properly."

Marge runs FIRST before any other character. She takes raw, vague, or incomplete task descriptions and transforms them into well-structured prompts optimized for the Springfield loop.

What Marge Does

Input Output
"add login" Full auth feature spec with OAuth/JWT options, success criteria, security considerations
"fix the bug" Specific bug definition, reproduction steps, affected files, verification commands
"make it faster" Performance requirements, measurable targets, profiling approach, benchmarks
"build a dashboard" Component breakdown, data requirements, UI patterns, accessibility needs

Marge's Optimization Process

PROCESS:
1. PARSE the raw input
   - Identify what the user actually wants
   - Detect implicit requirements
   - Note ambiguities that need clarification

2. DETECT platform context
   - Read package.json, requirements.txt, Cargo.toml, etc.
   - Identify tech stack (React, Node, Python, Go, etc.)
   - Find existing patterns in codebase

3. STRUCTURE the prompt
   - Write clear task definition
   - Define specific, measurable success criteria
   - Add appropriate backpressure commands
   - Set scope boundaries (what's in/out)

4. IDENTIFY risks
   - Security considerations
   - Breaking change potential
   - Performance implications
   - Dependencies that might be affected

5. ASK clarifying questions (if critical ambiguity)
   - Use AskUserQuestion for blocking ambiguities
   - Don't over-ask - make reasonable assumptions

6. OUTPUT optimized PROMPT.md
   - Ready for Lisa to research
   - Clear enough for Ralph to implement
   - Verifiable by CBG

OUTPUT: <promise>MARGE_PROMPT_OPTIMIZED</promise>

Marge's Output Format

# Springfield Task: {Clear Task Name}

## Task Definition
{One paragraph describing exactly what needs to be built/fixed/changed}

## Context
- **Platform:** {detected tech stack}
- **Affected Areas:** {list of components/modules}
- **Related Files:** {key files identified}

## Success Criteria
When ALL of these are true, the task is complete:
1. [ ] {Specific criterion} - `verify command`
2. [ ] {Specific criterion} - `verify command`
3. [ ] {Specific criterion} - Visual/manual verification

## Scope
**In Scope:**
- {What will be done}

**Out of Scope:**
- {What will NOT be done}

## Risks & Considerations
- {Security considerations}
- {Breaking change potential}
- {Performance implications}

## Backpressure
```bash
{platform-appropriate test/build/lint commands}

Notes for Implementation

  • {Any helpful context for Ralph}
  • {Patterns to follow}
  • {Anti-patterns to avoid}

### When Marge Asks Questions

Marge uses `AskUserQuestion` ONLY for critical ambiguities:

ASKS when:

  • Multiple valid interpretations exist
  • Security/compliance requirements unclear
  • Deployment target unknown
  • Breaking changes might be acceptable or not

DOES NOT ask when:

  • Reasonable default exists
  • Can be inferred from codebase
  • Low-risk assumption can be made

---

## Agent Teams (v7.0 - Opus 4.6)

**Agent Teams enable parallel character execution.** Instead of purely sequential character transitions, v7.0 can run compatible characters concurrently using the Task tool with multiple subagents.

### Parallel Execution Opportunities

PARALLEL GROUP 1 (Research Phase): ┌─── [Lisa] Research codebase ───────────────────┐ │ ├──► Quimby └─── [Coyote] Peripheral scan (read-only) ───────┘

PARALLEL GROUP 2 (Quality Gates): ┌─── [Wiggum] Security review ───────────────────┐ │ ├──► CBG └─── [Smithers] Code review ─────────────────────┘

PARALLEL GROUP 3 (Post-Implementation): ┌─── [Barney] Usability testing ─────────────────┐ │ ├──► Troy └─── [Apu] Performance testing ──────────────────┘


### How to Launch Agent Teams

When the orchestrator reaches a parallel group, use multiple Task tool calls in a single message:

Example: Parallel quality gates

Task(subagent_type="general-purpose", prompt="[WIGGUM] Security review of {files}...") Task(subagent_type="general-purpose", prompt="[SMITHERS] Code review of {files}...")


### Agent Teams Rules
1. **Never parallelize dependent phases** - Ralph must finish before Wiggum/Smithers start
2. **Always merge results** - Collect all parallel outputs before routing to next phase
3. **If either rejects** - Route back to Ralph with combined feedback from both
4. **Coyote always runs parallel** - Coyote is advisory and never blocks

### 1M Context Window Strategy (v7.0)

With 1M tokens, characters can now ingest significantly more context:

| Character | Context Strategy |
|-----------|-----------------|
| Lisa | Read up to 47 files / 30K lines in one session; skip RLM pagination for small-medium codebases |
| Bob | Ingest full error logs, traces, and related files in a single analysis pass |
| Coyote | Scan entire directory trees for pattern echoes without pagination |
| Smithers | Review all modified files + their dependents in one pass |
| Wiggum | Analyze full dependency trees and transitive vulnerability chains |

### Context Compaction Strategy (v7.0)

For long 50-iteration sessions, the orchestrator uses context compaction:

1. **After every 10 iterations**: Summarize progress to `learnings.md`, clear working notes
2. **State files are source of truth**: Always re-read `plan.md`, `state.json` at iteration start
3. **Checkpoint aggressively**: `checkpoint.json` updated after every successful subtask
4. **Fresh context pattern**: Each iteration starts by reading state files, not relying on memory

---

## Backpressure System

**Every Springfield loop MUST include backpressure that rejects invalid work.**

### Universal Backpressure Checks
```bash
# Before claiming COMPLETE, verify:

# Code validation (pick applicable)
npm test                    # JavaScript/TypeScript tests
npm run lint                # Linting
npm run build               # Build succeeds
npx tsc --noEmit           # TypeScript validation
pytest                      # Python tests
go test ./...              # Go tests
cargo test                 # Rust tests

# API validation
curl -f http://localhost:PORT/health

# Data validation
bq query --dry_run "..."    # BigQuery SQL valid

# Docker validation
docker build -t test .

Backpressure in PROMPT.md

Always include:

## Backpressure (DO NOT skip)
Before outputting <promise>COMPLETE</promise>, verify:
1. [ ] All tests pass (0 failures)
2. [ ] Build succeeds
3. [ ] Lint passes
4. [ ] Manual verification: [specific check]

If ANY check fails, fix issues FIRST. Do NOT output COMPLETE with failures.

State Management

Session Directory Structure

.springfield/{session-id}/
├── PROMPT.md           # Marge's optimized prompt (READ FIRST every iteration)
├── raw_input.md        # Original user input (preserved)
├── plan.md             # Success criteria checklist (Frink creates)
├── state.json          # Current phase, iteration count, flags
├── promises.json       # Promise tracking
├── research.md         # Lisa's findings
├── learnings.md        # Cross-iteration knowledge (appended)
├── checkpoint.json     # Resume capability
└── DONE                # Created when <promise>COMPLETE</promise>

state.json Format

{
  "session_id": "feature-auth-system",
  "started": "2025-02-04T10:00:00Z",
  "current_phase": "marge",
  "iteration": 1,
  "path": "UNKNOWN",
  "status": "running",
  "last_update": "2025-02-04T10:00:00Z"
}

promises.json Format

{
  "session": "feature-auth-system",
  "promises": {
    "MARGE_PROMPT_OPTIMIZED": {"status": "fulfilled", "timestamp": "..."},
    "LISA_RESEARCH_COMPLETE": {"status": "fulfilled", "timestamp": "..."},
    "QUIMBY_DECISION_MADE": {"status": "fulfilled", "timestamp": "..."},
    "FRINK_PLAN_COMPLETE": {"status": "fulfilled", "timestamp": "..."},
    "RALPH_COMPLETE": {"status": "pending"},
    "WIGGUM_APPROVED": {"status": "pending"},
    "SMITHERS_APPROVED": {"status": "pending"},
    "BARNEY_PASSED": {"status": "pending"},
    "CBG_APPROVED": {"status": "pending"},
    "TROY_COMPLETE": {"status": "pending"}
  },
  "mandatory_missing": ["WIGGUM_APPROVED", "SMITHERS_APPROVED", "BARNEY_PASSED", "CBG_APPROVED", "TROY_COMPLETE"]
}

Completion Markers

Every character MUST output exactly ONE of these at the end:

Marker Meaning
<promise>COMPLETE</promise> Generic - task finished
<promise>{CHARACTER}_COMPLETE</promise> Character-specific completion
<promise>ITERATE</promise> Progress made, continue loop
<promise>BLOCKED:{reason}</promise> Cannot proceed, needs input

All Promise Signals

<promise>MARGE_PROMPT_OPTIMIZED</promise>  # Prompt ready for loop
<promise>HOMER_SKILL_COMPLETE</promise>     # Skill builder done
<promise>LISA_RESEARCH_COMPLETE</promise>   # Research done
<promise>QUIMBY_DECISION_MADE</promise>     # Decision made
<promise>FRINK_PLAN_COMPLETE</promise>      # Planning done
<promise>SELMA_DESIGN_APPROVED</promise>    # Design approved
<promise>NED_UX_APPROVED</promise>          # UX approved
<promise>RALPH_COMPLETE</promise>           # Implementation done
<promise>WIGGUM_APPROVED</promise>          # Security cleared
<promise>SMITHERS_APPROVED</promise>        # Code review passed
<promise>APU_APPROVED</promise>             # Performance approved
<promise>BARNEY_PASSED</promise>            # Usability passed
<promise>CBG_APPROVED</promise>             # QA verified
<promise>TROY_COMPLETE</promise>            # Documentation done
<promise>BURNS_DEPLOYED</promise>           # Deployment complete
<promise>BOB_ROOT_CAUSE_IDENTIFIED</promise># Root cause found
<promise>COYOTE_PERIPHERAL_SCAN_COMPLETE</promise># Peripheral scan done
<promise>MAGGIE_MOBILE_COMPLETE</promise>   # Mobile task done
<promise>SPRINGFIELD_COMPLETE</promise>     # Full workflow done

MCP Tool Access

Every character has full MCP tool access:

Research & Knowledge

Tool Purpose
mcp__memory__* Persist/retrieve findings in knowledge graph
mcp__perplexity-ask__* Deep research, reasoning, web search
mcp__firecrawl__* Scrape docs, crawl codebases, extract data
mcp__exa__* Semantic/neural web search
mcp__context7__* Library documentation lookup
mcp__hf-mcp-server__* Search models, papers, datasets
mcp__sequential-thinking__* Complex reasoning chains

Development & Testing

Tool Purpose
mcp__github__* Repository operations, PRs, issues
mcp__filesystem__* File system operations
mcp__playwright__* Browser testing, screenshots, automation
mcp___21st-dev_magic__* Component building, UI refinement
mcp__fetch__* HTTP fetching

Communication

Tool Purpose
mcp__gmail__* Email notifications, stakeholder comms

Character-Specific Details

Marge (Prompt Optimizer) - "Hmmmm... let me organize this properly."

PROCESS:
1. Parse raw input for intent
2. Detect platform context from codebase
3. Structure into clear PROMPT.md
4. Identify risks and considerations
5. Ask clarifying questions only if critical
6. Output optimized prompt ready for loop
7. Output <promise>MARGE_PROMPT_OPTIMIZED</promise>

Lisa (Research) - "Let me research this further..." (v7.0 Enhanced)

EFFORT: high | CONTEXT: 1M tokens | SEARCH: 84% BrowseComp-level

PROCESS:
1. ASSESS codebase size — if < 30K lines, ingest full codebase via 1M context (skip RLM pagination)
2. Glob/Grep for codebase structure
3. mcp__exa__web_search_exa with type="deep" for semantic search (84% BrowseComp accuracy)
4. mcp__perplexity-ask__perplexity_research for external docs
5. mcp__firecrawl__firecrawl_scrape for relevant URLs
6. mcp__memory__create_entities to store findings
7. mcp__context7__query-docs for library docs
8. PARALLEL: Launch Coyote as Agent Team member for peripheral scan alongside research
9. Output <promise>LISA_RESEARCH_COMPLETE</promise>

1M CONTEXT STRATEGY:
- Small codebase (< 10K lines): Read all source files directly
- Medium codebase (10K-30K lines): Read all files in affected directories + key entry points
- Large codebase (> 30K lines): Use RLM methodology (targeted queries)

Frink (Planning) - "GLAVIN! The science is working!"

# Plan: {task}

## Success Criteria (ALL must be checked)
1. [ ] Criterion 1 - `verify command`
2. [ ] Criterion 2 - `verify command`
3. [ ] Criterion 3 - Visual/manual verification

## Subtasks (pick MOST IMPORTANT, not easiest)
| # | Task | Files | Verify | Status |
|---|------|-------|--------|--------|
| 1 | Description | file.ts | `test cmd` | ⬜ |
| 2 | Description | file.py | `test cmd` | ⬜ |

## Dependencies
1 → 2 → 3 (sequential)
4 (can run parallel with 2)

## Success Mapping
{which subtasks satisfy which criteria}

Ralph (Implementation) - "I'm helping!" (v7.0 Enhanced)

EFFORT: medium | AGENT TEAMS: parallel subtasks | QUALITY: production-ready-first-try

RULES:
- AIM for production-ready on first try (Opus 4.6 quality uplift)
- Use Agent Teams to parallelize independent subtasks (e.g., API + UI changes simultaneously)
- Let the loop refine edge cases only
- Each failure teaches what to fix
- Small steps: one subtask per iteration max (but subtasks can run in parallel)
- Update plan.md checkboxes as you complete
- Append discoveries to learnings.md
- Context compact every 10 iterations to prevent state drift

AGENT TEAMS STRATEGY:
- Independent file changes → parallel Task agents
- Dependent changes → sequential within single agent
- Test runs → background Task agent while implementing next subtask

Wiggum (Security) - OWASP Top 10 Checklist (v7.0 Enhanced)

EFFORT: max | CONTEXT: 1M tokens | DETECTION: human-researcher-level

Opus 4.6 Enhancement: Wiggum now reasons about code like a human security
researcher — looking for patterns, understanding logic flow, and finding
high-severity vulnerabilities WITHOUT specialized tooling. Can detect
subtle attack patterns and transitive vulnerability chains.
  1. Injection (SQL, NoSQL, command)
  2. Broken Authentication
  3. Sensitive Data Exposure
  4. XXE
  5. Broken Access Control
  6. Security Misconfiguration
  7. XSS
  8. Insecure Deserialization
  9. Known Vulnerabilities
  10. Insufficient Logging

v7.0 Deep Analysis Additions: 11. Logic flaws and business logic bypass 12. Race conditions and TOCTOU 13. Transitive dependency vulnerabilities (full chain analysis via 1M context) 14. Supply chain attack patterns 15. Subtle authorization bypass patterns

Tools: npm audit, pip-audit, semgrep, gitleaks + Opus 4.6 native code reasoning

Homer (Skill Builder) - "D'oh! Let me try again..."

PROCESS:
1. Understand requirement
2. Research existing patterns
3. Design skill architecture
4. Create skill files in ~/.claude/skills/
5. Test skill invocation
6. Document usage
7. Output <promise>HOMER_SKILL_COMPLETE</promise>

Bob (Root Cause) - Investigation Mode (v7.0 Enhanced)

EFFORT: max | CONTEXT: 1M tokens | REASONING: deepest adaptive thinking

TRIGGERS:
- Same error 3x in a row
- Unclear failure mode
- Tests pass but feature broken

PROCESS:
1. Ingest ALL relevant logs, configs, and related files via 1M context (no pagination needed)
2. Trace error path across full dependency chain
3. Use mcp__sequential-thinking__* for multi-hypothesis reasoning
4. Apply Opus 4.6 adaptive thinking at MAX effort for deepest analysis
5. Cross-reference with external docs via enhanced agentic search
6. Identify root cause with chain-of-causation evidence
7. Output <promise>BOB_ROOT_CAUSE_IDENTIFIED</promise>

v7.0: Bob now reasons like a senior SRE — tracing through service boundaries,
understanding data flow, and identifying subtle interaction bugs that span
multiple files and systems.

Quick Reference

Mode Command Use When
Full /springfield Complex multi-step tasks
Max /springfield-max Full workflow with max iterations
Flash /springfield-flash Simple single-file fixes
Optimize /springfield-marge Prepare/refine a task prompt
Research /springfield-lisa Exploration, understanding
Plan /springfield-frink When research exists
Implement /springfield-ralph When plan exists
Security /springfield-wiggum Audit existing code
Build skill /springfield-homer Create new capabilities
Deploy /springfield-burns Ship approved changes
Debug /springfield-bob Root cause investigation
Peripheral /springfield-coyote Spot adjacent opportunities & gaps
Mobile /springfield-maggie Mobile-specific work

Anti-Patterns to Avoid

❌ Don't ✅ Do
Start without Marge Always let Marge optimize the prompt first
Pick easiest task first Pick MOST IMPORTANT task (hard stuff first)
Claim COMPLETE with failing tests Fix all failures, THEN claim COMPLETE
Write 500 lines in one iteration Small, verifiable steps (one subtask max)
Ignore previous learnings Read learnings.md at start of each iteration
Keep failed attempts in context Fresh context each iteration
Skip security review Always run Wiggum (MANDATORY)
Skip documentation Always run Troy (MANDATORY)

Platform-Specific Templates

Web Application (SaaS/Dashboard)

## Backpressure
npm test
npm run build
npm run lint
curl -f http://localhost:3000/api/health

API/Microservice

## Backpressure
npm test (or pytest, go test)
curl -f http://localhost:PORT/health
curl -X POST http://localhost:PORT/api/endpoint -d '{}' | jq

Mobile App (React Native)

## Backpressure
npm test
npx react-native run-android --variant=debug
npx react-native run-ios --simulator="iPhone 15"

Data Pipeline

## Backpressure
pytest
python -m mypy .
bq query --dry_run "SELECT * FROM dataset.table"

CLI Tool

## Backpressure
npm test (or cargo test, go test)
./cli --help
./cli command --flag value

Key Behaviors (v7.0)

Behavior Required
Marge optimizes prompt first YES - ALWAYS FIRST
Work autonomously YES
Use MCP tools for research YES
Agent Teams for parallel phases YES (v7.0)
Effort levels per character YES (v7.0)
1M context for eligible characters YES (v7.0)
Context compaction every 10 iterations YES (v7.0)
Security review (Wiggum) MANDATORY
Code review (Smithers) MANDATORY
Usability (Barney) MANDATORY
QA (CBG) MANDATORY
Documentation (Troy) MANDATORY
Performance (Apu) If perf-critical
Deployment (Burns) If requested
Max iterations (all) 50
Early exit on success YES
Escalate on stuck YES → Bob (max effort)
Output promises MANDATORY
Production-ready first try YES (v7.0 quality target)

BEGIN: Tell me what you want to build. Marge will organize it into a proper task. Opus 4.6 is ready.

Install via CLI
npx skills add https://github.com/diegosouzapw/awesome-omni-skill --skill springfield-max
Repository Details
star Stars 45
call_split Forks 14
navigation Branch main
article Path SKILL.md
More from Creator
diegosouzapw
diegosouzapw Explore all skills →