name: consensus-engine description: Multi-validator agreement gate with 5-state synthesis and multi-round debate iteration. ALWAYS use when the user says "consensus validation", "validate with N reviewers", "agreement gate", "high-confidence validation", "consensus gate", or needs confidence-scored verdicts. Spawns ≥2 (default 3) independent validators in isolated evidence dirs, applies the 5-state synthesis table (UNANIMOUS_PASS / UNANIMOUS_FAIL / MAJORITY_PASS / MAJORITY_FAIL / SPLIT), and escalates SPLIT or borderline MAJORITY to up to 3 rounds of filesystem-mediated debate. triggers: - "consensus validation" - "validate with N reviewers" - "agreement gate" - "high-confidence validation" - "consensus gate" - "validate with debate"
consensus-engine
Execution-time agreement gate. Spawns ≥2 (default 3) independent validators against the same feature; applies Shannon's 5-state synthesis table; escalates non-unanimous outcomes to multi-round debate via the judge-with-debate engine. Synthesizes a confidence-scored verdict.
5-state synthesis table (Shannon-distinctive)
| State | Condition | Verdict | Confidence |
|---|---|---|---|
| UNANIMOUS_PASS | All PASS | PASS | HIGH |
| UNANIMOUS_FAIL | All FAIL | FAIL | HIGH |
| MAJORITY_PASS | ≥⅔ PASS (after debate if needed) | PASS | MEDIUM |
| MAJORITY_FAIL | ≥⅔ FAIL (after debate if needed) | FAIL | MEDIUM |
| SPLIT | Neither side ⅔ after max debate rounds | DISAGREEMENT_UNRESOLVED | LOW |
Confidence never upgrades — a late-arriving PASS verdict does NOT turn SPLIT into UNANIMOUS_PASS, even if the late verdict is well-evidenced. Confidence is fixed at the time of synthesis.
Behavior contract
Phase 1: Setup
- Dispatch meta-judge ONCE to generate validation rubric YAML (
agents/meta-judge.md). - Create evidence directory structure:
e2e-evidence/consensus/<run-id>/ validator-1/ validator-2/ validator-3/ report.md (synthesizer-owned) - Store meta-judge YAML at
.shannon/state/consensus-rubric-<run-id>.yaml.
Phase 2: Independent validation (parallel)
- Spawn N consensus-validator subagents in parallel via
Task(SINGLE assistant response, N Task tool calls). Each receives:- The feature / artifact under validation.
- The meta-judge YAML verbatim.
- Its assigned validator directory (exclusive write zone).
- Each validator runs the full journey list independently. Blind to peer verdicts in this phase.
- Each emits a structured YAML verdict header at the START of its report:
---
VALIDATOR: {1|2|3}
VERDICT: PASS | FAIL
SCORE: X.X/5.0
CRITERIA:
- {criterion_1}: {X.X}/5.0
- {criterion_2}: {X.X}/5.0
ISSUES:
- {issue with file:line}
EVIDENCE:
- {evidence_path_1}
- {evidence_path_2}
---
Phase 3: Initial synthesis
Aggregate verdict headers. Apply 5-state synthesis table:
- Compute
(pass_count, fail_count, total)per criterion AND overall. - Map to a state.
If UNANIMOUS_PASS or UNANIMOUS_FAIL → emit verdict with HIGH confidence; skip Phase 4.
Otherwise → check escalation conditions:
- SPLIT (neither side ⅔) → escalate to debate.
- MAJORITY with borderline scores (criterion scores disagree by > 1.0 point) → escalate to debate.
- MAJORITY with tight agreement (overall scores within 0.5 points) → emit MAJORITY verdict with MEDIUM confidence.
Phase 4: Multi-round debate (escalation path — engine: judge-with-debate)
Delegate to judge-with-debate skill for the iteration mechanics. Inputs:
- The N initial validator reports (
.specs/reports/consensus-<run-id>.[1|N].md). - The artifact / feature under validation.
- The same meta-judge YAML (no new rubric).
Debate runs up to 3 rounds:
- Each round: validators read peers' reports DIRECTLY FROM THE FILESYSTEM. Coordinator does NOT relay arguments.
- Each validator either defends or revises their position, citing the meta-judge YAML rubric as grounding. Arguments must reference specification criteria, NOT opinion.
- Updates are APPEND-ONLY: each validator appends
## Debate Round Rto their own report file. Previous rounds are preserved. - Convergence thresholds (Shannon-adopted from sadd):
- Overall scores within 0.5 points across all validators, AND
- Every criterion score within 1.0 point across all validators.
- Sycophancy detection: if a validator agrees with another without citing unique evidence, the synthesizer flags the agreement as low-quality and triggers another debate round (up to 3 max).
After convergence (or 3 rounds, whichever first):
- Re-apply 5-state synthesis table to the converged scores.
- If converged to UNANIMOUS → emit with MEDIUM confidence (post-debate UNANIMOUS is not the same as pre-debate UNANIMOUS).
- If converged to MAJORITY → emit MAJORITY verdict with MEDIUM confidence.
- If still SPLIT after 3 rounds → emit SPLIT verdict with LOW confidence; escalate to user with all reports + debate history.
Phase 5: Per-criterion side-by-side score table
The synthesizer emits a unified report:
# Consensus Report — <run-id>
## Per-criterion scores
| Criterion | V1 (initial) | V2 (initial) | V3 (initial) | V1 (final) | V2 (final) | V3 (final) | Consensus | Within Threshold? |
|-----------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-------------------|
| {criterion_1} | {X.X} | {X.X} | {X.X} | {X.X} | {X.X} | {X.X} | {avg} | YES/NO |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
## Verdict
- State: {UNANIMOUS_PASS | UNANIMOUS_FAIL | MAJORITY_PASS | MAJORITY_FAIL | SPLIT}
- Final: {PASS | FAIL | DISAGREEMENT_UNRESOLVED}
- Confidence: {HIGH | MEDIUM | LOW}
- Debate Rounds: {0-3}
## Disagreement Protocol Applied (if non-unanimous)
- Diverging criteria: {list}
- Resolution: {converged via debate / re-ran validator / sharpened criterion / discarded erroring validator}
## Cited Evidence
- {file:line} — {context}
- ...
File ownership (LOAD-BEARING)
- Coordinator: writes NOTHING.
- Validator-N: writes ONLY to
validator-<N>/subdirectory (append-only across debate rounds). - Synthesizer: writes ONLY
report.md. - Meta-judge: writes ONLY
.shannon/state/consensus-rubric-<run-id>.yaml.
A validator that writes outside its directory invalidates the run.
Disagreement protocol (non-debate fallback)
If debate is not chosen (e.g., user explicitly disables debate), the legacy single-round disagreement protocol still applies:
- Identify diverging criteria.
- Invoke
codebase-analysison the divergent evidence. - Re-resolve via case:
- (a) re-run missing validator,
- (b) re-run minority,
- (c) sharpen criterion (escalate to planner),
- (d) discard erroring validator.
- If unresolvable → SPLIT with LOW confidence; escalate.
When to use
- High-stakes change (security, payments, production data).
- Pre-major-release validation.
- After a near-miss bug to harden the gate.
- User explicitly requests
--mode consensus.
When NOT to use
- Routine refactor with no behavior change.
- Exploratory work.
- Speed-critical iteration (consensus is 3× the wall time of single-validator).
- Quorum-style plan-review or evidence-audit — use
judgefor those (Shannon-distinctive quorum semantics differ from consensus).
Iron rules
- Confidence never upgrades — a late-arriving PASS does not turn SPLIT into UNANIMOUS.
- Evidence quality cannot substitute for agreement. Three highly-evidenced validators in SPLIT remain SPLIT.
- SPLIT is a real, reportable outcome. Do NOT silently downgrade to MAJORITY.
- File ownership is enforced. Validator-N writes ONLY validator-N/.
- Filesystem is the debate medium. Coordinator does NOT relay arguments.
- Append-only on debate rounds. Each validator appends
## Debate Round R; never overwrites. - Debate arguments cite the meta-judge YAML rubric, not opinion.
- Sycophancy detection — agreement without unique evidence triggers re-debate.
- Meta-judge runs ONCE per consensus run; YAML reused across rounds.
- Threshold hidden from validators (delegated via meta-judge YAML).
Related Skills
judge-with-debate— the iteration engine for Phase 4 escalation.agents/meta-judge.md— generates the consensus rubric.judge— alternative quorum gate (pre-execution plan review, post-execution evidence audit); distinct from consensus's "did this feature pass?" semantics.judge— single-validator primitive used inside each validator.codebase-analysis— fallback non-debate disagreement protocol.dispatch-parallel— orchestrates parallel validator spawn.
Cross-references
agents/consensus-validator.md— the validator agent.agents/meta-judge.md— rubric generator.core/SUBAGENT_PATTERNS.md— consensus vs quorum vs debate design rationale.