data-intensive

star 20

Data-intensive / data-platform architecture: warehouse vs lakehouse, OLTP vs OLAP, batch vs streaming (lambda/kappa), change-data-capture, and data mesh. Architect-level platform-shape decisions, not SQL or ETL code. USE WHEN: designing a data platform/pipeline architecture, "data warehouse", "lakehouse", "OLAP vs OLTP", "lambda/kappa", "streaming vs batch", "CDC", "data mesh", "medallion", analytics platform, columnar store, ingestion topology. DO NOT USE FOR: writing SQL/ORM (use database skills); a specific ETL tool (use data/data-processing skills); storage-engine internals (use `storage-engines`).

claude-dev-suite By claude-dev-suite schedule Updated 6/1/2026

name: data-intensive description: | Data-intensive / data-platform architecture: warehouse vs lakehouse, OLTP vs OLAP, batch vs streaming (lambda/kappa), change-data-capture, and data mesh. Architect-level platform-shape decisions, not SQL or ETL code.

USE WHEN: designing a data platform/pipeline architecture, "data warehouse", "lakehouse", "OLAP vs OLTP", "lambda/kappa", "streaming vs batch", "CDC", "data mesh", "medallion", analytics platform, columnar store, ingestion topology.

DO NOT USE FOR: writing SQL/ORM (use database skills); a specific ETL tool (use data/data-processing skills); storage-engine internals (use storage-engines). allowed-tools: Read, Grep, Glob

Data-Intensive Architecture

OLTP vs OLAP — separate the workloads

  • OLTP: many small row-oriented transactions, low latency, normalized.
  • OLAP: few large column-oriented scans/aggregations, throughput-oriented. Don't run heavy analytics on the OLTP store — replicate/ETL/CDC into an analytics store. Columnar formats (Parquet/ORC) + columnar engines win OLAP.

Storage architecture choice

Option Idea Fits
Warehouse (Snowflake, BigQuery, Redshift) Managed columnar SQL store Structured BI, governance, SQL-first
Data lake (object storage + files) Cheap, schema-on-read, any format Raw/varied data, ML feature sources
Lakehouse (Delta/Iceberg/Hudi on object storage) Lake economics + ACID tables + time travel Unified BI + ML, open formats, avoids lock-in

Lakehouse (open table formats) is the common 2026 default when you want one copy of data serving both BI and ML without warehouse lock-in.

Batch vs streaming topology

  • Batch: periodic, simple, high-latency. Streaming (Kafka/Flink): continuous, low-latency, more complex (exactly-once, watermarks, state).
  • Lambda: batch layer (accurate) + speed layer (fresh) merged — two codebases, complexity. Kappa: stream-only, reprocess from the log — simpler; prefer it unless you truly need a separate batch layer.
  • CDC (Debezium): stream row changes out of OLTP without dual-writes — the clean way to feed lake/warehouse/search in near-real-time.

Modeling & ownership

  • Medallion (bronze/silver/gold) layering for lake/lakehouse refinement.
  • Data mesh: domain-owned data products + federated governance — organizational, for large orgs; overkill for small teams (start centralized).

When to recommend what

  • BI on structured data, SQL team → warehouse (or lakehouse w/ SQL engine).
  • Mixed BI + ML, open formats, scale → lakehouse (Iceberg/Delta) + Kappa + CDC.
  • Real-time decisions → streaming (Kafka + Flink), exactly-once where it matters.
  • Don't reach for data mesh until org scale forces decentralization.
Install via CLI
npx skills add https://github.com/claude-dev-suite/claude-dev-suite --skill data-intensive
Repository Details
star Stars 20
call_split Forks 5
navigation Branch main
article Path SKILL.md
More from Creator
claude-dev-suite
claude-dev-suite Explore all skills →