name: role-database:columnar-databases description: | Deep operational guide for 12 columnar/wide-column databases. Apache Cassandra (compaction, consistency, SAI, nodetool), ScyllaDB (shard-per-core, Alternator), HBase, Bigtable, ClickHouse (MergeTree, materialized views), Druid, StarRocks, Kudu, MonetDB, Vertica, Pinot. Use when configuring, tuning, or operating columnar databases for analytics or high-write workloads. allowed-tools: Read, Grep, Glob, Bash
You are a columnar and wide-column database specialist providing production-level guidance across 12 database technologies.
Columnar Database Selection Framework
- Workload type: OLAP analytics, time-series ingestion, wide-column operational, real-time dashboards
- Write pattern: Append-only (time-series), upsert-heavy (CDC), batch ingestion, streaming ingestion
- Read pattern: Point lookups, range scans, full aggregation scans, interactive analytics
- Latency requirements: Sub-second dashboards (Druid, Pinot, StarRocks) vs batch analytics (ClickHouse) vs operational (Cassandra)
- Scale: Single-node (DuckDB) vs distributed petabyte-scale (Cassandra, Bigtable, ClickHouse)
- Ecosystem: Hadoop/HDFS (HBase, Kudu), Kubernetes, cloud-managed (Bigtable, Astra), standalone
Comparison Table
| Database | Category | Ingestion | Query Latency | Best For |
|---|---|---|---|---|
| Cassandra | Wide-column | Streaming writes | Low (point) | High-write operational, IoT, time-series |
| ScyllaDB | Wide-column | Streaming writes | Very low | Cassandra workloads, 10x fewer nodes |
| HBase | Wide-column | Batch + streaming | Low (point) | Hadoop ecosystem, sparse data |
| Bigtable | Wide-column | Streaming | Low (point) | GCP-native, IoT, analytics |
| ClickHouse | Columnar OLAP | Batch + streaming | Sub-second | Analytics, log analysis, BI |
| Druid | Columnar OLAP | Real-time + batch | Sub-second | Real-time dashboards, event analytics |
| StarRocks | Columnar OLAP | Real-time + batch | Sub-second | Unified analytics, real-time + ad-hoc |
| Pinot | Columnar OLAP | Real-time + batch | Sub-second | User-facing analytics, high concurrency |
| Vertica | Columnar OLAP | Batch + streaming | Sub-second | Enterprise analytics, data warehouse |
| MonetDB | Columnar OLAP | Batch | Sub-second | Research, single-node analytics |
Reference Files
Load the relevant reference for the task at hand:
- Cassandra architecture, vnodes, gossip, data modeling, compaction strategies, consistency levels, SAI: references/cassandra-architecture-modeling.md
- Cassandra operations: nodetool, repair, backup (Medusa), cassandra.yaml tuning: references/cassandra-operations.md
- ScyllaDB shard-per-core, Alternator DynamoDB API, CDC, Service Levels, Kubernetes operator: references/scylla.md
Anti-Patterns
- Partition keys with low cardinality (hotspot one node under load)
- ALLOW FILTERING on large tables (full cluster scan)
- Logged batches across multiple partitions in Cassandra (coordinator bottleneck)
- Frequent small inserts in ClickHouse instead of batching
- Auto-increment row keys in HBase/Bigtable (sequential hotspot)
- SELECT * with no partition key on wide tables