cluster-operations

star 241

Cluster management: distributed tables, ON CLUSTER DDL, node lifecycle, resharding, load balancing, and Keeper migration.

duyet

By duyet schedule Updated 5/9/2026

play_arrow Run Skill in Manus View GitHub

name: cluster-operations description: "Cluster management: distributed tables, ON CLUSTER DDL, node lifecycle, resharding, load balancing, and Keeper migration."

Cluster Operations

Distributed Tables

CREATE TABLE dist ENGINE = Distributed(cluster, db, local_table, sharding_key)
Sharding key: rand() for even distribution, cityHash64(user_id) for user affinity
Reads: query all shards in parallel; Writes: route to correct shard or write locally

ON CLUSTER DDL

ALTER TABLE t ON CLUSTER '{cluster}' ADD COLUMN col Type — propagate schema to all replicas
CREATE TABLE t ON CLUSTER '{cluster}' AS template_db.template_table — clone across shards
distributed_ddl_output_mode: throw (fail on error), null (ignore), none, active
Check status: SELECT * FROM system.distributed_ddl_queue

Load Balancing and Read Routing

load_balancing: random, in_order, first_or_random, nearest_hostname
max_replica_delay_for_distributed_queries — skip lagging replicas
fallback_to_stale_replicas_for_distributed_queries=1 — use stale when all delayed

Adding Nodes

Install ClickHouse on new node
Configure Keeper/ZooKeeper connection
Update cluster config (remote_servers) on all nodes
Create local tables on new node
ReplicatedMergeTree: data syncs automatically; non-replicated: copy or re-insert

Removing Nodes

Stop writes, wait for replication queue to drain
SYSTEM DROP REPLICA for replicated tables
Remove from cluster config, restart remaining nodes

Resharding

No native online resharding — create new distributed table with new sharding scheme
INSERT INTO new_dist SELECT * FROM old_dist or clickhouse-copier for large migrations

Cluster Recovery

SYSTEM RESTART REPLICA ON CLUSTER '{cluster}' — restart replication across all nodes
SYSTEM SYNC REPLICA ON CLUSTER '{cluster}' — force sync from ZooKeeper
SYSTEM FETCH PARTS ON CLUSTER '{cluster}' — pull missing parts from other replicas

Monitoring Clusters

system.clusters — topology; system.distributed_ddl_queue — DDL status; system.replicas — replication
Cross-shard queries: use Distributed table or remote() function

Keeper Migration (ZooKeeper to ClickHouse Keeper)

Deploy ClickHouse Keeper alongside ZooKeeper
Snapshot data: clickhouse-keeper-converter or zk-dump.sh
Configure Keeper with converted snapshot
Update zookeeper config, restart one node at a time
Verify replication recovers, then remove ZooKeeper

Install via CLI

npx skills add https://github.com/duyet/clickhouse-monitoring --skill cluster-operations

Repository Details

star Stars 241

call_split Forks 39

navigation Branch main

article Path SKILL.md

More from Creator

duyet

duyet Explore all skills →