name: query-optimization description: "Advanced query tuning: join algorithms, skip index selection, EXPLAIN interpretation, ProfileEvents profiling, and optimizer settings."
Query Optimization
JOIN Strategies
join_algorithmsetting:hash(default, in-memory),partial_merge(spills to disk for large right table),auto(lets ClickHouse decide)JOIN ... USINGavoids repeated column names vsONfor same-name columns- Filter both sides before joining to reduce intermediate data
GLOBAL JOINbroadcasts the right table to all shards for distributed queries
EXPLAIN Analysis
EXPLAIN PLAN— logical plan, shows projection/pushdown transformationsEXPLAIN PIPELINE— physical execution with parallelism info and port countsEXPLAIN INDEXES— which indexes fire, granules selected vs total- Look for: full table scans, missing index usage, excessive granule reads
Index Usage
- Skip index types with use-cases:
minmax— range queries on numeric/date columnsset(N)— equality on low-cardinality columns, stores N unique values per granulebloom_filter— equality on high-cardinality stringstokenbf_v1— tokenized text search (logs, URLs)
- Check effectiveness via
ProfileEvents['SelectedRows']vs result size
Query Profiling
ProfileEventsmap counters:SelectedRows,MergedRows,FileOpen,SeekCountnormalized_query_hashto group parameterized query variantssystem.query_logcolumns:query_duration_ms,memory_usage,read_bytes
Optimizer Settings
enable_optimizer = 1— activates ClickHouse's new cost-based query optimizer (v22.6+)max_threads— controls query parallelism; higher = faster but more memory; lower for concurrent workloadsprefer_localhost_replica = 1— avoids network round-trip by reading from local replica on distributed queriessystem.query_plan(v23.6+) — persisted query plans for analysis across runs