name: distributed-consistency description: Choose a consistency model for a cross-node operation — strong/eventual/causal, CAP/PACELC trade-off, idempotency + dedup, and when consensus (Raft/quorum) is warranted vs simpler.
distributed-consistency
"Consistent" is not one thing. Name the model you need per operation — most need less than strong, a few need more than eventual — and pay only for that.
When to use
- A piece of data lives on / is read from more than one node, region, or service, and correctness depends on what readers see.
- Someone proposed a consensus system (Raft/Paxos/quorum) and you need to confirm it's warranted, not cargo-culted.
- A workflow invokes
polymath-architecture:distributed-consistency.
Not this when: the consistency boundary is inside one service's object model — that is aggregate-design. For the messaging mechanics that deliver updates, see event-driven-design.
Inputs
- The operation, its read/write pattern, and the cost of a stale or conflicting read (money? safety? mild annoyance?).
- Latency and availability requirements; single- vs multi-region.
Procedure
- Classify the operation's need: strong/linearizable (a read sees the latest write — uniqueness, balances, locks), causal (reads respect cause→effect order — comments, sessions), or eventual (converges, brief staleness fine — counters, feeds). Most operations are not strong.
- Apply CAP under partition, PACELC otherwise: during a partition, choose consistency or availability; even with no partition, there is a latency↔consistency trade. State which you pick and the business reason.
- For eventual consistency, define convergence: the staleness budget, conflict resolution (last-writer-wins with caution, version vectors, or CRDTs for commutative merges), and read-your-writes if users expect it.
- Make writes idempotent: idempotency keys + dedup so retried/duplicated writes (inevitable in distributed systems) don't double-apply.
- Justify consensus only when needed: leader election, a replicated log, or strict linearizability across nodes warrant Raft/quorum (e.g.
R+W>N). It costs latency and operational complexity — if a single writer or a managed transactional store suffices, prefer that and say so. - Check time assumptions: don't rely on wall-clock ordering across nodes; use logical clocks / versions where ordering matters.
- Name the failure semantics: what a client sees during a partition, a failover, and a conflict.
Output
- A one-page decision: per-operation consistency model, the CAP/PACELC choice with its business justification, idempotency/dedup mechanism, conflict-resolution strategy, and a consensus-vs-simpler recommendation.
Quality bar
- Each operation names exactly one consistency model and why weaker won't do.
- The partition/failover behavior the client sees is stated.
- Consensus is recommended only with a concrete need; otherwise the simpler option is named.
- Writes have an idempotency story — "we'll retry" without dedup is flagged.