dwa-council - SKILL.md Agent Skill

name: dwa-council description: Debate-Weighted Aggregation council for critical multi-agent decisions. Automatically triggers for security, architectural, and high-risk operations requiring expert deliberation. tags: [decision-making, multi-agent, voting, deliberation]

DWA Council Methodology

Multi-agent deliberation system using Debate-Weighted Aggregation for critical decisions.

When Council Convenes

Council automatically triggers for 8 conditions:

Architectural Decisions - Design choices, tech stack, migrations
Security/Risk Decisions - Auth, secrets, vulnerabilities
Agent Disagreements - Multiple conflicting approaches
Quality Gate Failures - TDD violations, linting errors
Ethical Flags - Privacy, bias, GDPR concerns
Low Aggregate Confidence - Uncertainty < 0.75
External Commitments - Deploys, API calls, publishing
Novel/OOD Queries - Unfamiliar technology decisions

DWA Voting Formula

Score = Σ (Vote × Confidence × Expertise Weight)

Where:

Vote: 1.0 for APPROVE, 0.0 for REJECT/ABSTAIN
Confidence: Agent's confidence in their recommendation (0-1)
Expertise Weight: Agent's domain expertise (0-1, from YAML frontmatter)

Council Workflow

1. TRIGGER DETECTION
   └─> Pattern matching or risk-level check
       └─> Infer domain (security, architecture, api_design, etc.)

2. AGENT SELECTION
   └─> Query expertise registry for relevant agents
       └─> Filter by min_expertise >= 0.5
           └─> Select top N agents (default: 5)

3. PROPOSAL GENERATION
   └─> Each agent generates independent proposal
       ├─> Recommendation (what to do)
       ├─> Reasoning chain (step-by-step logic)
       ├─> Confidence score (0-1)
       └─> Domain relevance (0-1)

4. DEBATE (Optional - only if needed)
   └─> Check: Low consensus OR low confidence?
       └─> Round 1: Agents critique each other's proposals
           └─> Round 2 (if needed): Refine based on critiques

5. VOTING AGGREGATION
   └─> Each agent votes (typically for own proposal)
       └─> Apply DWA formula
           └─> Winner = highest weighted score

6. ESCALATION CHECK
   └─> Aggregate confidence < 0.7? → Escalate
   └─> Tie vote (within 5%)? → Escalate
   └─> High disagreement (HHI < 0.3)? → Escalate
       └─> Consult o3/Gemini/DeepSeek for expert opinion

7. DECISION FINALIZATION
   └─> Persist session to memory-keeper
       └─> Audit trail for compliance
           └─> Return decision to user

Escalation Triggers

Council may escalate to external models (o3, Gemini, DeepSeek) when:

Low Confidence: Aggregate confidence < 0.7
Tie Vote: Top two proposals within 5% score
High Disagreement: Herfindahl-Hirschman Index < 0.3

Example: Security Decision

Trigger: "Design JWT authentication system"
Domain: security
Agents: security-auditor (1.0), code-reviewer (0.9), backend-developer (0.7)

Proposals:
  1. security-auditor: "Use RS256 with public key verification"
     - Confidence: 0.90
     - Reasoning: Asymmetric signing prevents forgery

  2. code-reviewer: "Use RS256 with public key verification"
     - Confidence: 0.85
     - Reasoning: Industry standard, better security

  3. backend-developer: "Use HS256 for simplicity"
     - Confidence: 0.70
     - Reasoning: Easier implementation

Voting (DWA):
  Proposal 1 (RS256): (0.90 × 1.0) + (0.85 × 0.9) = 1.665
  Proposal 2 (HS256): (0.70 × 0.7) = 0.490

Winner: RS256 approach (Proposal 1) with 77% confidence
Decision: "Use JWT with RS256 signing for enhanced security"

Configuration

Agent Expertise Weights

Agents declare expertise in YAML frontmatter:

---
name: security-auditor
expertise_weights:
  security: 1.0      # Primary domain
  architecture: 0.7
  api_design: 0.5
  performance: 0.4
council_role: proposer  # proposer | reviewer | abstainer
---

Hook Integration

Council hook runs after intelligent_gate.py:

{
  "PreToolUse": [
    {"type": "command", "command": "python3 ~/.claude/hooks/intelligent_gate.py"},
    {"type": "command", "command": "python3 ~/.claude/hooks/council_hook.py"}
  ]
}

Performance Characteristics

Token Overhead: ~12,000 tokens for full council (5 agents + debate)
Latency: 30-60 seconds (parallel proposal generation)
Cost Optimization: Local Ollama models for proposals ($0), Claude Opus only for critical domains
Escalation Cost: $0.016-$0.06 per external model consultation

Audit Trail

All council sessions persisted to memory-keeper MCP:

Channel: council:sessions - Full session data
Channel: council:audit - Audit trail entries
Channel: council:decisions - Decision index

Query with:

from council.state_manager import list_sessions

# Recent security decisions
sessions = list_sessions(domain="security", limit=10)

# High-confidence decisions only
sessions = list_sessions(min_confidence=0.85)

Phase 2 Status (Current)

✅ Implemented:

Trigger detection (8 conditions)
Expertise registry (YAML frontmatter parsing)
Voting aggregation (DWA formula + HHI)
Debate manager (consensus checks)
State manager (persistence framework)
Council hook (PreToolUse integration)

⏳ Phase 3 TODO:

Actual Ollama proposal generation (currently simulated)
External model escalation (o3/Gemini integration)
Message bus integration
Performance metrics

Usage

Council convenes automatically via hook when triggers detected.

Manual convocation (for testing):

from council.trigger_detector import detect_trigger
from council.orchestrator import convene_council

# Detect trigger
trigger = detect_trigger(
    tool_name="Bash",
    operation_text="Deploy authentication system to production",
    risk_level="HIGH"
)

# Convene council
if trigger:
    session = convene_council(trigger, context="Production deployment")
    print(f"Decision: {session.decision}")
    print(f"Confidence: {session.decision_confidence}")

References

Plan: /Users/mac/.claude/plans/synthetic-soaring-quill.md
Implementation: /Users/mac/.claude/council/
Tests: /Users/mac/.claude/council/test_phase1.py
Hooks: /Users/mac/.claude/hooks/council_hook.py

DWA Council: Making critical decisions through expert deliberation.