distributed-consensus

star 20

Distributed-systems architecture at the protocol level: consensus (Raft, Paxos, BFT), replication and quorums, consistency models, clock synchronization, and the CAP/PACELC trade-offs. Architect-level — how to make state agree and survive failures. USE WHEN: designing replicated/consensus systems, "Raft", "Paxos", "BFT", "quorum", "leader election", "consistency model", "linearizability", "CAP", "PACELC", "split-brain", replication topology, distributed state machines. DO NOT USE FOR: app microservice wiring (use web/enterprise patterns); message queues (use messaging skills); blockchain specifics (use bitcoin skills).

claude-dev-suite By claude-dev-suite schedule Updated 6/1/2026

name: distributed-consensus description: | Distributed-systems architecture at the protocol level: consensus (Raft, Paxos, BFT), replication and quorums, consistency models, clock synchronization, and the CAP/PACELC trade-offs. Architect-level — how to make state agree and survive failures.

USE WHEN: designing replicated/consensus systems, "Raft", "Paxos", "BFT", "quorum", "leader election", "consistency model", "linearizability", "CAP", "PACELC", "split-brain", replication topology, distributed state machines.

DO NOT USE FOR: app microservice wiring (use web/enterprise patterns); message queues (use messaging skills); blockchain specifics (use bitcoin skills). allowed-tools: Read, Grep, Glob

Distributed Consensus & Replication

Consensus protocol selection

Protocol Fault model Notes
Raft Crash-fault (f of 2f+1) Understandable, leader-based; default for etcd/Consul
(Multi-)Paxos Crash-fault Foundational, subtle; powers Spanner/Chubby lineage
BFT (PBFT, Tendermint) Byzantine (f of 3f+1) For untrusted/adversarial nodes; higher msg cost
Viewstamped Replication Crash-fault Raft-like, predates it

Use crash-fault consensus inside a trust boundary; use BFT only when nodes can be malicious (cross-org, blockchain) — it costs more nodes and messages.

Replication & quorums

  • Quorum: W + R > N for read-your-writes; tune (e.g. N=3, W=2, R=2).
  • Leader-based (strong, simple, leader bottleneck) vs leaderless (Dynamo-style, available, needs read-repair/anti-entropy + conflict handling).
  • Sync vs async replication = durability/latency vs RPO on failover.

Consistency models (state the one you need)

Linearizable → sequential → causal → eventual. Stronger = more coordination = higher latency / lower availability. Don't ask for linearizable if causal suffices.

CAP / PACELC

Under a Partition choose C or A; else (no partition) trade Latency vs Consistency. Real systems are points on a spectrum (Spanner: CP + TrueTime clocks; Dynamo: AP). Clocks: avoid relying on wall-clock ordering; use logical / hybrid logical clocks, or bounded-uncertainty clocks (TrueTime) for external consistency.

When to recommend what

  • Config/coordination, small cluster → Raft (etcd).
  • Global strong consistency → Paxos/Spanner-style + synchronized clocks.
  • High availability, geo, conflict-tolerant → leaderless Dynamo-style + CRDTs.
  • Adversarial/multi-party → BFT.
Install via CLI
npx skills add https://github.com/claude-dev-suite/claude-dev-suite --skill distributed-consensus
Repository Details
star Stars 20
call_split Forks 5
navigation Branch main
article Path SKILL.md
More from Creator
claude-dev-suite
claude-dev-suite Explore all skills →