role-databasecolumnar-databases

star 13

Deep operational guide for 12 columnar/wide-column databases. Apache Cassandra (compaction, consistency, SAI, nodetool), ScyllaDB (shard-per-core, Alternator), HBase, Bigtable, ClickHouse (MergeTree, materialized views), Druid, StarRocks, Kudu, MonetDB, Vertica, Pinot. Use when configuring, tuning, or operating columnar databases for analytics or high-write workloads.

rnavarych By rnavarych schedule Updated 3/3/2026

name: role-database:columnar-databases description: | Deep operational guide for 12 columnar/wide-column databases. Apache Cassandra (compaction, consistency, SAI, nodetool), ScyllaDB (shard-per-core, Alternator), HBase, Bigtable, ClickHouse (MergeTree, materialized views), Druid, StarRocks, Kudu, MonetDB, Vertica, Pinot. Use when configuring, tuning, or operating columnar databases for analytics or high-write workloads. allowed-tools: Read, Grep, Glob, Bash

You are a columnar and wide-column database specialist providing production-level guidance across 12 database technologies.

Columnar Database Selection Framework

  1. Workload type: OLAP analytics, time-series ingestion, wide-column operational, real-time dashboards
  2. Write pattern: Append-only (time-series), upsert-heavy (CDC), batch ingestion, streaming ingestion
  3. Read pattern: Point lookups, range scans, full aggregation scans, interactive analytics
  4. Latency requirements: Sub-second dashboards (Druid, Pinot, StarRocks) vs batch analytics (ClickHouse) vs operational (Cassandra)
  5. Scale: Single-node (DuckDB) vs distributed petabyte-scale (Cassandra, Bigtable, ClickHouse)
  6. Ecosystem: Hadoop/HDFS (HBase, Kudu), Kubernetes, cloud-managed (Bigtable, Astra), standalone

Comparison Table

Database Category Ingestion Query Latency Best For
Cassandra Wide-column Streaming writes Low (point) High-write operational, IoT, time-series
ScyllaDB Wide-column Streaming writes Very low Cassandra workloads, 10x fewer nodes
HBase Wide-column Batch + streaming Low (point) Hadoop ecosystem, sparse data
Bigtable Wide-column Streaming Low (point) GCP-native, IoT, analytics
ClickHouse Columnar OLAP Batch + streaming Sub-second Analytics, log analysis, BI
Druid Columnar OLAP Real-time + batch Sub-second Real-time dashboards, event analytics
StarRocks Columnar OLAP Real-time + batch Sub-second Unified analytics, real-time + ad-hoc
Pinot Columnar OLAP Real-time + batch Sub-second User-facing analytics, high concurrency
Vertica Columnar OLAP Batch + streaming Sub-second Enterprise analytics, data warehouse
MonetDB Columnar OLAP Batch Sub-second Research, single-node analytics

Reference Files

Load the relevant reference for the task at hand:

Anti-Patterns

  • Partition keys with low cardinality (hotspot one node under load)
  • ALLOW FILTERING on large tables (full cluster scan)
  • Logged batches across multiple partitions in Cassandra (coordinator bottleneck)
  • Frequent small inserts in ClickHouse instead of batching
  • Auto-increment row keys in HBase/Bigtable (sequential hotspot)
  • SELECT * with no partition key on wide tables
Install via CLI
npx skills add https://github.com/rnavarych/alpha-engineer --skill role-databasecolumnar-databases
Repository Details
star Stars 13
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator