name: replication-patterns description: "Use when designing how a database keeps multiple copies of its data in agreement across nodes for availability, read scaling, and disaster recovery: the three foundational topologies (single-leader / primary-replica, multi-leader / multi-primary, leaderless / quorum), synchronous vs asynchronous replication and the replication-lag trade-off, log shipping vs statement replication vs trigger-based replication, the read-after-write consistency problem and its mitigations (sticky session, read-from-leader, monotonic reads), the failover model and split-brain risk, and the relationship to the CAP/PACELC choices the topology realizes. Do NOT use for horizontal partitioning across nodes (use sharding-strategy), the CAP theoretical frame itself (use cap-theorem-tradeoffs), single-node transactional guarantees (use acid-fundamentals), or query tuning (use query-optimization)." license: MIT allowed-tools: Read Grep metadata: relations: "{"related":["acid-fundamentals","query-optimization","indexing-strategy","cap-theorem-tradeoffs","sharding-strategy","transaction-isolation"],"suppresses":["sharding-strategy","cap-theorem-tradeoffs"],"verify_with":["transaction-isolation","cap-theorem-tradeoffs","sharding-strategy"]}" subject: data-engineering scope: "Designing database replication topologies and operational guardrails for keeping multiple copies of the same data in agreement across nodes: single-leader, multi-leader, leaderless/quorum; synchronous, semi-synchronous, and asynchronous replication; log-shipping and statement/row/trigger/logical mechanisms; read-after-write mitigations; failover and split-brain prevention; conflict resolution; and backup-vs-replica boundaries. Portable across distributed database systems. Excludes horizontal partitioning across nodes (sharding-strategy), the abstract CAP/PACELC frame itself (cap-theorem-tradeoffs), single-node transaction guarantees (acid-fundamentals), isolation-level choice (transaction-isolation), and query/index tuning (query-optimization/indexing-strategy)." public: "true" taxonomy_domain: engineering/data stability: experimental keywords: "["replication","primary replica","multi-leader","leaderless","quorum","synchronous replication","asynchronous replication","replication lag","read-after-write","failover"]" triggers: "["single-leader vs multi-leader","synchronous vs async replication","what happens on failover","split brain","read-after-write consistency"]" examples: "["design replication topology for a service with one region writing and three regions reading","decide between synchronous and asynchronous replication given a target RPO","diagnose stale reads after a write — likely replication lag without read-after-write handling","explain the split-brain risk in multi-leader replication"]" anti_examples: "["horizontally partition data across nodes (use sharding-strategy)","reason about the CAP theorem abstractly (use cap-theorem-tradeoffs)","explain ACID properties (use acid-fundamentals)"]" mental_model: "|" purpose: "|" concept_boundary: "|" analogy: "Replication is to a database what mirror copies of a master photograph are to a museum's record — single-leader async is the photographer keeping the negative and printing copies as requested; single-leader sync is the conservator requiring two darkroom signatures before any print leaves the building; multi-leader is multiple authorized photographers in different cities each accepting submissions and reconciling at intervals; leaderless quorum is asking three of five archivists to vote on whether this print matches the master, accepting their verdict. Failover is replacing the negative-keeper when they retire; split brain is what happens when the agency forgets to revoke the old keeper's keys." misconception: "|" skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph" skill_graph_project: Skill Graph skill_graph_canonical_skill: skills/data-engineering/replication-patterns/SKILL.md
Concept of the skill
What it is: Replication is the design discipline for keeping multiple copies of the same data on multiple nodes so a database can survive failures, scale reads, or place data near users.
Mental model: The core choices are topology, synchrony, replication mechanism, read-freshness policy, failover policy, and conflict handling. Single-leader systems serialize writes through one primary, multi-leader systems accept writes in more than one place and must merge conflicts, and leaderless systems use quorums so clients can read or write through multiple nodes.
Why it exists: A single database copy creates one point of failure and one read bottleneck. Replication adds resilience and scale, but it also creates lag, failover, stale-read, and split-brain risks that the application must deliberately handle.
What it is NOT: It is not sharding, which splits different data across nodes. It is not CAP theory itself, single-node ACID guarantees, isolation-level selection, query tuning, indexing, or backups.
Adjacent concepts: sharding-strategy partitions data; cap-theorem-tradeoffs names the consistency/availability frame; acid-fundamentals and transaction-isolation describe local transaction guarantees; backup and restore practice protects against replicated corruption or deletion.
One-line analogy: Replication is like keeping synchronized copies of a critical operations log in several control rooms: the system keeps working when one room fails, but everyone needs rules for who may write, how copies catch up, and who takes charge after an outage.
Common misconception: Turning replication on does not automatically create zero data loss, fresh reads, safe failover, or backups; each of those safety properties requires an explicit topology, synchrony, routing, fencing, monitoring, and recovery choice.
Replication Patterns
Coverage
The catalog of replication topologies and the operational discipline that makes them work in production. Covers the three foundational topologies (single-leader / primary-replica, multi-leader / multi-primary, leaderless / quorum), the synchrony spectrum (sync, semi-sync, async, quorum-sync), the mechanism choices (statement-based, row-based, trigger-based, logical, physical), the read-after-write consistency problem and its mitigations (sticky session, read-from-leader, monotonic reads, version tokens), the failover model and quorum-based split-brain prevention, the conflict-resolution choices in multi-leader and leaderless systems (LWW, CRDTs, vector clocks, application merge), and the relationship to the CAP/PACELC choices the topology realizes.
Philosophy of the skill
Replication is the discipline that gives a database fault tolerance, read scaling, and disaster recovery — at the cost of consistency-handling, conflict resolution, and operational complexity.
The default starting point is single-leader with asynchronous replication: simple, well-understood, sufficient for most read-mostly workloads. The departures from this default — multi-leader, leaderless, synchronous, geographic — each address a specific requirement (multi-region writes, strong consistency under partition, RPO=0) and add proportional complexity.
The most common production failures are not in the replication topology itself but in the application's handling of its consequences: stale reads producing user-visible bugs, split brain producing silent data divergence, failover producing unrehearsed surprises. The discipline is treating replication as an architecture the application is co-designed with, not a database feature that handles itself.
Topology Selection
| Topology | Best for | Trade-offs |
|---|---|---|
| Single-leader, async | Read-heavy workloads with tolerance for stale reads | Replication lag; possible data loss on leader failure |
| Single-leader, sync | Workloads requiring RPO=0 | Write latency; replica failure can block writes |
| Multi-leader | Multi-region writes; active-active disaster recovery | Conflict resolution complexity |
| Leaderless (quorum) | High availability with tunable consistency | Read latency = slowest of R; quorum-sizing decisions |
| Synchronous via Raft/Paxos (Spanner, CockroachDB) | Strong consistency at scale | High write latency for distant replicas |
The starting point for most workloads is single-leader async; depart from this default only when the workload requires it.
Synchronous vs Asynchronous Trade-off
| Property | Synchronous | Asynchronous |
|---|---|---|
| Write latency | Higher (waits for replica) | Lower (acks immediately) |
| RPO (data loss on failure) | Zero | Up to lag window |
| Read consistency from replicas | Strong | Eventual |
| Replica failure impact | Can block writes | None |
| Use case | High-stakes financial transactions, strong-RPO systems | Most production read-replicas |
Semi-sync (wait for one replica with timeout-to-async fallback) is the middle ground; production-default for many systems.
Read-After-Write Mitigations
| Mitigation | How it works | Cost |
|---|---|---|
| Read-from-leader for a window after write | Client routes back to leader for N seconds | Loses read scaling for write-heavy users |
| Sticky session | Client always reads from same replica | Replica failure invalidates session |
| Monotonic reads | Client tracks last-seen version; replica must be ≥ that fresh | Requires version-token plumbing |
| Wait-for-replica | Client waits until replica catches up | Adds latency to the read |
| Accept stale reads | Application tolerates staleness | Only viable when stale data is OK |
Every read-mostly workload with async replicas must choose one. Defaulting to "read from any replica" produces stale-data bugs.
Failover and Split-Brain Prevention
| Mechanism | Split-brain risk | Used by |
|---|---|---|
| Manual operator failover | None (if procedure is correct) | Small / legacy systems |
| Heuristic auto-failover (no quorum) | High under partition | Older Postgres tools (without proper consensus) |
| Quorum-based promotion (Raft / Paxos) | Eliminated by majority requirement | Modern HA tools (Patroni, etcd, CockroachDB internal) |
| STONITH fencing | Old leader killed; cannot continue writing | Pacemaker / Corosync HA |
A system that must not split-brain uses quorum-based promotion. The cost is requiring an odd number of voting nodes ≥ 3.
Verification
After applying this skill, verify:
- The replication topology is intentional and documented: single-leader / multi-leader / leaderless; sync / async / semi-sync; physical / logical / trigger-based.
- Read-after-write consistency is handled explicitly. Sticky session, read-from-leader, monotonic reads, version tokens, or accept-stale is a named choice.
- Replication lag is monitored. Lag thresholds trigger alerts before users notice.
- Failover procedure is documented, tested, and rehearsed. Untested failover is failover that fails first time.
- Split-brain prevention uses quorum-based promotion for any system requiring it. Heuristic failover is recognized as risky.
- Backups exist separately from replication. Replica state and backup state are distinguished.
- For multi-leader: the conflict resolution mechanism (LWW / CRDT / vector clocks / app merge) is defined and tested with concurrent-write scenarios.
- For leaderless: W and R values are chosen for the workload's consistency-vs-latency target; W+R>N if strong consistency is required.
- Cross-region replication latency is measured and accepted as part of the design budget.
Do NOT Use When
| Instead of this skill | Use | Why |
|---|---|---|
| Horizontally partitioning data across nodes | sharding-strategy |
sharding partitions data; replication copies it |
| Reasoning about the CAP/PACELC theoretical frame | cap-theorem-tradeoffs |
CAP names the trade-off; this skill realizes it |
| Single-node transactional guarantees | acid-fundamentals |
ACID is the single-system frame |
| Choosing isolation levels | transaction-isolation |
transaction-isolation owns concurrency; this owns multi-node |
| Indexing | indexing-strategy |
indexing is within-node retrieval |
| Tuning a slow query | query-optimization |
query-optimization is per-query |
Key Sources
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly. Chapter 5 (Replication) — modern comprehensive treatment of all three topologies with practical detail.
- Lamport, L. (1998). "The Part-Time Parliament". ACM TOCS, 16(2). The Paxos consensus algorithm; foundation of synchronous replication with split-brain prevention.
- Ongaro, D., & Ousterhout, J. (2014). "In Search of an Understandable Consensus Algorithm (Raft)". USENIX ATC 2014. The Raft consensus algorithm; basis of etcd, CockroachDB, many modern HA systems.
- DeCandia, G., Hastorun, D., Jampani, M., et al. (2007). "Dynamo: Amazon's Highly Available Key-Value Store". SOSP 2007. The foundational paper on leaderless (Dynamo-style) replication; basis of Cassandra, Riak, DynamoDB.
- PostgreSQL Global Development Group. "PostgreSQL Documentation — Replication" and "Logical Replication". Reference for Postgres physical streaming and logical replication.
- MySQL Reference Manual. "Replication". MySQL replication including binlog formats, semi-sync, and group replication.
- MongoDB Manual. "Replication". MongoDB replica sets, automatic failover, and oplog-based replication.
- Vogels, W. (2009). "Eventually Consistent". ACM Queue. Practitioner essay on eventual consistency in replicated systems.
- Shapiro, M., Preguiça, N., Baquero, C., & Zawirski, M. (2011). "A Comprehensive Study of Convergent and Commutative Replicated Data Types". CRDT foundational paper; basis for multi-leader conflict resolution without data loss.
- Brewer, E. (2012). "CAP Twelve Years Later". IEEE Computer. Brewer's retrospective on CAP, useful for understanding the consistency-availability trade-offs replication realizes.
Skill Graph context
Classification
- Subject:
data-engineering - Public:
true - Domain:
engineering/data - Scope: Designing database replication topologies and operational guardrails for keeping multiple copies of the same data in agreement across nodes: single-leader, multi-leader, leaderless/quorum; synchronous, semi-synchronous, and asynchronous replication; log-shipping and statement/row/trigger/logical mechanisms; read-after-write mitigations; failover and split-brain prevention; conflict resolution; and backup-vs-replica boundaries. Portable across distributed database systems. Excludes horizontal partitioning across nodes (sharding-strategy), the abstract CAP/PACELC frame itself (cap-theorem-tradeoffs), single-node transaction guarantees (acid-fundamentals), isolation-level choice (transaction-isolation), and query/index tuning (query-optimization/indexing-strategy).
When to use
- design replication topology for a service with one region writing and three regions reading
- decide between synchronous and asynchronous replication given a target RPO
- diagnose stale reads after a write — likely replication lag without read-after-write handling
- explain the split-brain risk in multi-leader replication
- Triggers:
single-leader vs multi-leader,synchronous vs async replication,what happens on failover,split brain,read-after-write consistency
Not for
- horizontally partition data across nodes (use sharding-strategy)
- reason about the CAP theorem abstractly (use cap-theorem-tradeoffs)
- explain ACID properties (use acid-fundamentals)
Related skills
- Verify with:
transaction-isolation,cap-theorem-tradeoffs,sharding-strategy - Related:
acid-fundamentals,query-optimization,indexing-strategy,cap-theorem-tradeoffs,sharding-strategy,transaction-isolation
Concept
- Mental model: |
- Purpose: |
- Boundary: |
- Analogy: Replication is to a database what mirror copies of a master photograph are to a museum's record — single-leader async is the photographer keeping the negative and printing copies as requested; single-leader sync is the conservator requiring two darkroom signatures before any print leaves the building; multi-leader is multiple authorized photographers in different cities each accepting submissions and reconciling at intervals; leaderless quorum is asking three of five archivists to vote on whether this print matches the master, accepting their verdict. Failover is replacing the negative-keeper when they retire; split brain is what happens when the agency forgets to revoke the old keeper's keys.
- Common misconception: |
Keywords
replication,primary replica,multi-leader,leaderless,quorum,synchronous replication,asynchronous replication,replication lag,read-after-write,failover