raft-2026-05-03

star 2

Raft consensus algorithm for replicated log management across server clusters. Covers leader election, log replication, joint consensus for membership changes, and snapshot compaction. Use when building fault-tolerant distributed systems, implementing replicated state machines, or configuring Raft-based databases (etcd, CockroachDB, TiKV).

tangledgroup By tangledgroup schedule Updated 6/11/2026

name: raft-2026-05-03 description: Raft consensus algorithm for replicated log management across server clusters. Covers leader election, log replication, joint consensus for membership changes, and snapshot compaction. Use when building fault-tolerant distributed systems, implementing replicated state machines, or configuring Raft-based databases (etcd, CockroachDB, TiKV).

Raft Consensus Algorithm

Overview

Raft is a consensus algorithm for managing a replicated log across a cluster of servers. Designed by Diego Ongaro and John Ousterhout at Stanford University, it produces results equivalent to (multi-)Paxos but decomposes the problem into relatively independent subproblems — leader election, log replication, and safety — making it significantly more understandable. Raft is not Byzantine fault tolerant; it assumes all participants are trustworthy and failures are crash-stop.

A Raft cluster typically has 5 servers (tolerating 2 failures). At any time each server is a leader, follower, or candidate. The leader handles all client requests, replicates log entries to followers via AppendEntries RPCs, and commits entries once replicated on a majority. If the leader fails, followers detect this via election timeouts and trigger a new election.

When to Use

  • Implementing a consensus-based replicated state machine from scratch
  • Building fault-tolerant coordination services (key-value stores, configuration management, leader election)
  • Understanding or debugging Raft-based systems (etcd, CockroachDB, TiKV, Kafka KRaft, ScyllaDB)
  • Designing cluster membership change protocols with zero-downtime reconfiguration
  • Reasoning about distributed safety properties and linearizability
  • Choosing between consensus algorithms (Raft vs Paxos vs Viewstamped Replication)

Core Concepts

Replicated state machines: Each server runs an identical state machine fed by a replicated log. The consensus algorithm ensures all logs contain the same commands in the same order, so all state machines reach identical states.

Terms: Time is divided into numbered terms (logical clock). Each term begins with an election. A winning candidate serves as leader for the rest of the term. Terms detect stale information — servers reject requests with older term numbers and revert to follower on discovering a higher term.

Roles:

  • Leader — handles all client requests, replicates log entries, commits entries
  • Follower — passive; responds to RPCs from leaders and candidates only
  • Candidate — transitional state during elections; votes for self and requests votes

Log entries: Each entry has an index (position), a term (when created), and a command for the state machine. Entries retain their original term number across all logs, enabling consistency checks.

Majority rule: A cluster of N servers tolerates floor(N/2) failures. Decisions require votes from a majority. For N=5, any 3 servers form a majority.

Advanced Topics

Core Protocol: Server states, persistent/volatile state, RequestVote and AppendEntries RPCs, leader election, log replication → Core Protocol

Safety Properties: Election safety, Log Matching, Leader Completeness, State Machine Safety, election restrictions, commitment rules, crash handling → Safety

Membership Changes: Joint consensus, configuration change timeline, edge cases (new servers, leader exclusion, removed server disruptions) → Membership Changes

Log Compaction: Snapshotting, InstallSnapshot RPC, performance considerations → Log Compaction

Client Interaction: Leader discovery, linearizability, idempotency via serial numbers, read-only operations, timing requirements → Client Interaction

Extensions and Production: Pre-Vote, leadership transfer, production systems (etcd, CockroachDB, TiKV), Paxos comparison, formal verification → Extensions and Production

Install via CLI
npx skills add https://github.com/tangledgroup/tangled-skills --skill raft-2026-05-03
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
tangledgroup
tangledgroup Explore all skills →