name: peer-grill-with-agents
description: Two (or more) agents independently audit the SAME existing agent stack against the codebase, then reconcile via the peer-grill file-based protocol. Each agent walks the same 13-branch agent-stack audit (grill-me-with-agents) on its own, dumps a claims.yaml grounded in concrete file paths, and the disagreements get ratified or escalated. Use when the agent topology is already implemented and you need multi-agent triangulation on whether a change to it is sound — single-agent grilling has known blind spots, peer-grill alone has no code anchor, this combines them. Triggers — "peer-grill the agent stack", "two agents audit the topology", "stress-test our agents from two angles", "triangulate the agent design", "reconcile our reading of the stack". Do NOT use for greenfield design (use grill-me-agents), for single-agent code-aware grilling (use grill-me-with-agents), or for non-agent state reconciliation (use peer-grill).
Flagship skill for the Athenaeum orchestration package. Where
grill-me-with-agentsaudits an implemented stack with one agent's lens,peer-grill-with-agentsruns that audit through N agents in parallel and reconciles the deltas. The output is a multi-agent-ratifiedAGENT_DESIGN.mdthat no single model authored alone — the design has survived independent code-anchored interrogation by every peer.
This skill is the dialectic protocol at agent-design scale. It composes two existing primitives without replacing them: the 13-branch audit from grill-me-with-agents (what to read in the repo, in what order) and the file-based reconciliation protocol from peer-grill (claims → diff → grill → ratify → sign-off). Read both for full mechanics; this skill describes how they fit together.
When to invoke
- A design change is proposed against an existing multi-agent stack and the stakes warrant > one model's read of it.
- Two parallel sessions are working on the same agent-topology question and you suspect they're forming divergent mental models.
- A non-Claude peer (Gemini, GPT, local model) has been asked to review an agent design and you want the review to converge with the Claude perspective, not just sit alongside it.
- Authoring or amending an
AGENT_DESIGN.md— peer-grill-with-agents produces the canonical artifact.
Do NOT invoke when:
- The agent stack doesn't exist yet (
grill-me-agents— greenfield). - One agent is enough — single-agent rigor is satisfied by
grill-me-with-agents. - The reconciliation isn't about agents (vanilla
peer-grill).
Setup — ask the user before any writes
- Topic slug — what change or design is being grilled? e.g.
athenaeum-rollout,hermes-demotion,pt-dissent-gate. Default toathenaeum-rolloutif unspecified. - Identities — what name is this agent? (
claude-opus-4-7,gemini-2.5-pro,gpt-5-codex, etc.) What identities are the peers, and how do they invoke the skill (separate session, separate machine, manual prompt)? - Stack scope — which subdirectories of
.claude/agents/,.claude/skills/,.claude/shared-agents/, etc. are in scope? Anything outside is filtered out before diffing. - Round budget — max grilling rounds per disputed branch before
ESCALATE(default: 3, perpeer-grill). poll_timeout_minutes— wall-clock cap on Phase 2 and Phase 5 waits (default: 30).
If a peer is non-Claude, offer to draft a one-page system-prompt summary the user can paste into that peer so it follows the same protocol. The protocol is symmetric; it does not assume Claude-on-both-sides.
Working directory
All artifacts live in .peer-grill/<topic>/ relative to repo root — same convention as peer-grill. Files (see peer-grill SKILL.md for full semantics):
| File | Writers | Purpose |
|---|---|---|
<agent>.claims.yaml |
only that agent | self-reported audit of the agent stack |
diff.md |
last writer (append-only) | three-bucket diff: agreed / disagreed / only-one |
grill-log.md |
append-only, all peers | Q&A transcript with timestamps + agent identity + branch number |
state.merged.yaml |
merge writer (first alphabetically) | the ratified ground truth, drives AGENT_DESIGN.md generation |
unresolved.md |
append-only, all peers | escalated disputes; both/all positions recorded |
signoff.md |
append-only, all peers | sha256-attested final state per peer |
Hard rules from peer-grill apply unchanged — never edit a peer's claims file; only the merge writer writes state.merged.yaml; append-only files stay append-only.
Protocol — single thread, 7 phases
1. Independent audit (code-anchored dump)
Each peer walks the 13 branches from grill-me-with-agents independently, citing concrete file paths from this repo. Each branch becomes one or more claims in the peer's <agent>.claims.yaml. The 13 branches, in order — do not descend until upstream is settled within your own audit:
- Goal & success criteria
- Agent roster — for each agent in
.claude/agents/or.claude/shared-agents/, confirm responsibility against its frontmatter - Topology — diff the proposed change against the current topology; flag every removed/repurposed agent
- Context boundaries — read each agent's
tools:allowlist - Handoffs & contracts — find where messages cross agent boundaries today (file conventions, MCP calls, settings hooks)
- Shared state — locate mutable persistence (files, MCP servers, env)
- Tool access — read each agent's tool list, flag privilege creep
- Per-agent failure modes — inspect hooks + error paths in
settings.json - Inter-agent disagreement & authority — find precedence rules in prompts or orchestration code
- Human-in-the-loop — locate
AskUserQuestion,ExitPlanMode, confirmation gates - Termination — current stop conditions in orchestration code
- Observability — logging / tracing / trace files / status endpoints already wired
- Cost & latency — model assignments per agent (
model:frontmatter)
Claim format (extends peer-grill's schema with a branch field):
agent: <agent-name>
session_started: <ISO8601>
scope: [<stack-paths-in-scope>]
claims:
- id: <stable-slug> # e.g., athenaeum-port, hermes-role, pt-dissent-gate
branch: 1..13 # which branch this claim sits in
statement: <one sentence>
confidence: high | medium | low
source: <file:line | command-output | quoted-frontmatter>
last_verified: <ISO8601>
Confidence rules tighten here: high requires a quoted file path + line range AND a verifying read; medium is "I read it but the artifact may be stale"; low is "I inferred this from adjacent files." If you cannot find a source for a claim within scope, the correct answer is "no claim" — do NOT invent.
Do NOT read peer claims yet. Independent dump is load-bearing; context leakage defeats triangulation.
2. Read & validate peers' claims
Poll for each peer's <peer>.claims.yaml per peer-grill Phase 2 (60s poll, poll_timeout_minutes cap, parse-retry on YAML failure, identity-collision check). Terminal events from peer-grill apply: TIMEOUT: phase=2, PARSE-FAIL, IDENTITY-COLLISION.
Additional check for this skill: every peer's scope field must match. If scopes differ, emit SCOPE-MISMATCH: <peer> to grill-log.md — the audit isn't comparable until scopes align. Pause until the user reconciles.
3. Diff per branch
Compute the three-bucket diff (agreed, disagreed, only-one-knows) and group by branch number in diff.md. Grouping matters: branch 1 (Goal) disagreements likely cascade into branch 3 (Topology); resolve upstream first.
If a peer's diff disagrees with yours (different bucket assignment for the same id), that's itself a disagreement — log it in grill-log.md and resolve before grilling claims. Silent-drop bugs are the most expensive failure mode of this protocol.
4. Grill loop — branch-ordered
Run the peer-grill grill loop, but iterate branch-by-branch in 1→13 order. Within a branch, iterate claims by id alphabetically.
Per-question rigor floor (stricter than peer-grill):
- Every question must cite the specific file path or frontmatter field the asker doubts.
- Every answer must quote ≤ 80 characters from a verifiable source. "Trust me, I read it" is grounds for another round.
- For disagreements where one peer claims
highand the otherlow: thehighclaim must produce its source on first request, OR drop tomediumimmediately. Confidence asymmetry without sourcing is a tell. - Ratification (
RATIFY: <claim-id> | <agreed-statement>) followspeer-grillsemantics. Both/all peers' statements must match exactly modulo whitespace.
Round budget exhaustion → ESCALATE: <claim-id> to unresolved.md. Protocol continues; only that claim is finalized as unresolved.
5. Merge — driven by branch order
The merge writer (peer whose name sorts first alphabetically) waits for every claim to be ratified or escalated, then overwrites state.merged.yaml. Schema:
topic: <topic>
ratified_at: <ISO8601>
peers: [<agent-a>, <agent-b>, ...]
stack_scope: [<paths>]
claims:
- id: <id>
branch: 1..13
statement: <agreed statement>
sources: [<from agent-a>, <from agent-b>, ...] # all sources preserved
Non-writer peers poll for the file's appearance per peer-grill Phase 5 (TIMEOUT: phase=5 on overrun).
6. Sign-off
Per peer-grill Phase 6: LF-normalize state.merged.yaml, compute sha256, append a signoff with attestation. Mismatch handling, supersession, and INTEGRITY-FAIL semantics carry over unchanged.
7. Author AGENT_DESIGN.md
Once all peers have sign-offs and hashes converge, the merge writer generates AGENT_DESIGN.md from state.merged.yaml, organizing by the 13-branch ordering. Template: ~/.claude/skills/grill-me-agents/AGENT_DESIGN.template.md if present; otherwise mirror the structure of existing AGENT_DESIGN.md files in the repo (read the most recent for style).
The generated AGENT_DESIGN.md includes a footer:
*Peer-ratified by: <agent-a>, <agent-b>, ...*
*State sha256: <hex from signoff.md>*
*Reconciliation artifacts: .peer-grill/<topic>/*
This footer is non-negotiable — it's the audit trail. Removing it severs the design from its provenance.
Multi-peer (3+)
peer-grill's pairwise-chain extension applies: run A↔B, then merged↔C, document chain order in grill-log.md. Each pairwise reconciliation produces a state.merged.<n>.yaml to feed the next pair. Star topologies remain out of scope.
For agent-stack audits, prefer 2-3 peers across model families (e.g., Opus + Gemini + GPT) over 3+ Claude sessions — same-family agents converge on shared blind spots.
Guardrails specific to agent audits
- The codebase is the tiebreaker. When peers disagree and neither can produce a stronger source, re-read the file together (both peers cite the same
file:line-rangein their next round). The repo wins; agent recollection does not. - Frontmatter beats prose. A claim about an agent's tools sourced from
tools:allowlist beats a claim sourced from the prose body of the agent's prompt. AGENT_DESIGN.md'ssupersedes:field is load-bearing. When generating a newAGENT_DESIGN.md, the prior design (if any) must be in thesupersedes:field with its sha256. Otherwise the audit trail breaks at the design-doc boundary.- Don't peer-grill yourself. If both peers are Claude sessions on the same machine sharing context, the protocol degrades into single-agent grilling with extra steps. Different machine, different model, or different user-driven session — never shared transcript history.
Sibling skills
grill-me-with-agents— single-agent, code-aware audit of an existing stack. The 13-branch audit lives there; this skill composes it.grill-me-agents— greenfield agent design (codebase-blind).peer-grill— non-agent state reconciliation; this skill's protocol substrate.grill-me— single-track plan grilling, not agent-specific.
Provenance
This skill was authored as the flagship for the Athenaeum orchestration package — the Athenaeum design itself (AGENT_DESIGN.md at vault root) was produced by a single-agent run of grill-me-with-agents, and the natural next-iteration upgrade is multi-agent ratification. Lineage: grill-me → grill-me-with-agents → peer-grill → this skill. Each step adds rigor without discarding the prior level.