oracle-consensus

star 0

Run 2x oracle sessions (FOR + AGAINST stances) to validate design decisions, plans, or bead readiness. Default = two Fable subagents; escalate to PAL 2x GPT-Pro (always paired with Fable) for extremely important or complex validations. Use after design rounds, before implementation, or to challenge architecture decisions.

oysteinkrog By oysteinkrog schedule Updated 6/10/2026

name: oracle-consensus model: opus description: Run 2x oracle sessions (FOR + AGAINST stances) to validate design decisions, plans, or bead readiness. Default = two Fable subagents; escalate to PAL 2x GPT-Pro (always paired with Fable) for extremely important or complex validations. Use after design rounds, before implementation, or to challenge architecture decisions. argument-hint: " [--rounds N] [--models M1,M2]"

Oracle Consensus

Oracle policy (2026-06): Fable (claude-fable-5) is the primary oracle — the default FOR/AGAINST debate runs as two fresh Fable subagents (Agent tool with model: "fable"; Fable is not reachable through PAL). The PAL 2x GPT-Pro consensus is the escalation tier, for extremely important or complex validations only, and must always run alongside a Fable consultation on the same question. See /consult-oracles and the Oracle Consultation Policy in ~/CLAUDE.md.

Run a structured FOR/AGAINST debate between two high-capability oracle sessions on a design decision, plan, or bead set. Produces a scored verdict with specific actionable corrections.

When to Use

  • After a design swarm to validate decisions
  • Before committing to an architecture direction
  • After bead creation to validate readiness
  • When two approaches seem equally valid
  • Any time "validate this with oracles" is requested

Prerequisites

For the escalation tier only: verify PAL MCP is running before launching GPT oracles. If mcp__pal__listmodels fails or returns empty, alert the user — agents silently fall back to self-analysis without PAL, producing unreliable results. The default Fable tier needs no PAL.

Workflow

Step 1: Frame the Evaluation

Write a clear, specific evaluation prompt. This is what both models will see.

Good framing:

Evaluate this sensor plate settings redesign for a WPF desktop app:
- Plan: [summary of key decisions]
- Key constraint: Must support 1-8 sensor plates, composite plates, and auto-detection
- Files affected: [list]
- Risk areas: [list]
Score 1-10 on: correctness, completeness, feasibility, UX quality, architecture quality

Bad framing:

Is this plan good?

The evaluation prompt must be self-contained — models do not share context between stances.

Step 2: Configure Stances

Default tier (Fable): spawn two fresh Fable subagents in a single message with opposing stances. Each prompt is self-contained (evaluation prompt + file paths); the agents share no context.

Agent({ subagent_type: "general-purpose", model: "fable",
  prompt: "<evaluation prompt>\n\nStance: Advocate for this design. Identify its strengths, explain why the decisions are sound, and argue that it should be approved. Be specific — cite exact decisions and explain their merit. Score honestly; 'for' does not mean blindly positive." })

Agent({ subagent_type: "general-purpose", model: "fable",
  prompt: "<evaluation prompt>\n\nStance: Challenge this design. Find weaknesses, missing considerations, contradictions, and risks. Propose specific corrections for each issue found. Score honestly; 'against' does not mean blindly negative." })

Then skip to Step 4 and synthesize the two responses yourself.

Escalation tier (PAL 2x GPT-Pro — extremely important/complex only, always paired with a parallel Fable consult on the same evaluation prompt):

{
  "models": [
    {"model": "gpt-5.5-pro", "stance": "for", "stance_prompt": "Advocate for this design. Identify its strengths, explain why the decisions are sound, and argue that it should be approved. Be specific — cite exact decisions and explain their merit. Score honestly; 'for' does not mean blindly positive."},
    {"model": "gpt-5.5-pro", "stance": "against", "stance_prompt": "Challenge this design. Find weaknesses, missing considerations, contradictions, and risks. Propose specific corrections for each issue found. Score honestly; 'against' does not mean blindly negative."}
  ]
}

Alternate Configurations (escalation tier)

Architecture validation (3 models):

[
  {"model": "gpt-5.5-pro", "stance": "for"},
  {"model": "gpt-5.5-pro", "stance": "against"},
  {"model": "gemini-3.1-pro-preview", "stance": "neutral", "stance_prompt": "Provide an independent technical assessment. Focus on feasibility, risk, and alternatives the other evaluators may miss."}
]

Bead readiness (2 models, specific stance prompts):

[
  {"model": "gpt-5.5-pro", "stance": "for", "stance_prompt": "Argue these beads are implementation-ready. Each bead should have: clear ACs in Given/When/Then, correct file paths, correct dependencies, no spec contradictions, and be self-contained."},
  {"model": "gpt-5.5-pro", "stance": "against", "stance_prompt": "Find beads that are NOT ready. Look for: vague ACs, wrong file paths, missing dependencies, contradictions between beads, beads too large for atomic implementation, cross-cutting requirements not embedded."}
]

Step 3: Run Consensus (escalation tier only)

Use the PAL MCP consensus tool, with the paired Fable subagent launched in the same message. The tool manages the multi-step flow internally:

  1. Step 1 (your analysis): Write the evaluation prompt and your own independent assessment
  2. Steps 2-N (model consultations): Each model responds with its stance
  3. Final step (synthesis): You synthesize all responses into a verdict
mcp__pal__consensus(
  step="Evaluate the following design for [topic]:\n\n[evaluation prompt]\n\n[relevant context]",
  step_number=1,
  total_steps=4,  // 1 (your analysis) + N models + 1 (synthesis)
  next_step_required=true,
  findings="[your independent analysis before seeing model responses]",
  models=[...],
  relevant_files=["/absolute/path/to/plan.md", ...]
)

Step 4: Extract Corrections

From the synthesized consensus, extract:

  1. Score (1-10) with breakdown by category
  2. Unanimous findings — both FOR and AGAINST agree (highest confidence)
  3. Contested findings — disagreement between stances (needs human judgment)
  4. Corrections — specific, actionable changes to make

Format corrections as:

## Oracle Consensus Results

**Overall Score:** 8/10
**Unanimous:** 3 findings | **Contested:** 1 finding | **Corrections:** 5

### Unanimous Findings
1. [Finding both stances agreed on]

### Corrections (ordered by impact)
1. **CRITICAL:** [correction] — Reason: [why]
2. **HIGH:** [correction] — Reason: [why]
3. **MEDIUM:** [correction] — Reason: [why]

### Contested
1. FOR says [X], AGAINST says [Y] — **Recommendation:** [your judgment]

Step 5: Apply Corrections

For each correction:

  1. Verify it against the original plan/code/beads
  2. Apply if valid; reject with rationale if not
  3. Track applied vs rejected corrections

Step 6: Optional Re-validation

If corrections were extensive (5+ CRITICAL/HIGH), run a second oracle round on the corrected version. Use a shorter evaluation prompt focused on whether corrections were properly applied.

Convergence criterion: Stop when oracle score is 8+ AND zero CRITICAL corrections remain.

Multiple Concurrent Oracles

For large targets (e.g., 80 beads across 10 epics), run parallel oracle sessions:

Oracle 1: Evaluate epics 1-3 (data model + safety)
Oracle 2: Evaluate epics 4-6 (UI + UX)
Oracle 3: Evaluate epics 7-10 (integration + testing)
Oracle 4: Evaluate cross-cutting concerns (deps, ordering, completeness)

Each oracle session is independent. Compile all corrections after all complete.

Scoring Guide

Score Meaning Action
9-10 Excellent, minor polish only Ship it
7-8 Good, specific corrections needed Apply corrections, no re-validation needed
5-6 Significant issues Apply corrections + re-validate
1-4 Fundamental problems Redesign required

Example Invocations

/swarm-oracle foundation/product/features/sensor-plate-implementation-plan.md
/swarm-oracle "Should we split StationConfigurationVM into 5 VMs or 4?"
/swarm-oracle --rounds 2 .beads/  # validate all open beads, re-validate if needed
Install via CLI
npx skills add https://github.com/oysteinkrog/dotfiles --skill oracle-consensus
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator