name: raft-2026-05-03 description: Raft consensus algorithm for replicated log management across server clusters. Covers leader election, log replication, joint consensus for membership changes, and snapshot compaction. Use when building fault-tolerant distributed systems, implementing replicated state machines, or configuring Raft-based databases (etcd, CockroachDB, TiKV).
Raft Consensus Algorithm
Overview
Raft is a consensus algorithm for managing a replicated log across a cluster of servers. Designed by Diego Ongaro and John Ousterhout at Stanford University, it produces results equivalent to (multi-)Paxos but decomposes the problem into relatively independent subproblems — leader election, log replication, and safety — making it significantly more understandable. Raft is not Byzantine fault tolerant; it assumes all participants are trustworthy and failures are crash-stop.
A Raft cluster typically has 5 servers (tolerating 2 failures). At any time each server is a leader, follower, or candidate. The leader handles all client requests, replicates log entries to followers via AppendEntries RPCs, and commits entries once replicated on a majority. If the leader fails, followers detect this via election timeouts and trigger a new election.
When to Use
- Implementing a consensus-based replicated state machine from scratch
- Building fault-tolerant coordination services (key-value stores, configuration management, leader election)
- Understanding or debugging Raft-based systems (etcd, CockroachDB, TiKV, Kafka KRaft, ScyllaDB)
- Designing cluster membership change protocols with zero-downtime reconfiguration
- Reasoning about distributed safety properties and linearizability
- Choosing between consensus algorithms (Raft vs Paxos vs Viewstamped Replication)
Core Concepts
Replicated state machines: Each server runs an identical state machine fed by a replicated log. The consensus algorithm ensures all logs contain the same commands in the same order, so all state machines reach identical states.
Terms: Time is divided into numbered terms (logical clock). Each term begins with an election. A winning candidate serves as leader for the rest of the term. Terms detect stale information — servers reject requests with older term numbers and revert to follower on discovering a higher term.
Roles:
- Leader — handles all client requests, replicates log entries, commits entries
- Follower — passive; responds to RPCs from leaders and candidates only
- Candidate — transitional state during elections; votes for self and requests votes
Log entries: Each entry has an index (position), a term (when created), and a command for the state machine. Entries retain their original term number across all logs, enabling consistency checks.
Majority rule: A cluster of N servers tolerates floor(N/2) failures. Decisions require votes from a majority. For N=5, any 3 servers form a majority.
Advanced Topics
Core Protocol: Server states, persistent/volatile state, RequestVote and AppendEntries RPCs, leader election, log replication → Core Protocol
Safety Properties: Election safety, Log Matching, Leader Completeness, State Machine Safety, election restrictions, commitment rules, crash handling → Safety
Membership Changes: Joint consensus, configuration change timeline, edge cases (new servers, leader exclusion, removed server disruptions) → Membership Changes
Log Compaction: Snapshotting, InstallSnapshot RPC, performance considerations → Log Compaction
Client Interaction: Leader discovery, linearizability, idempotency via serial numbers, read-only operations, timing requirements → Client Interaction
Extensions and Production: Pre-Vote, leadership transfer, production systems (etcd, CockroachDB, TiKV), Paxos comparison, formal verification → Extensions and Production