db-cassandra - SKILL.md Agent Skill

name: db-cassandra description: Generate Apache Cassandra data models, CQL queries, partition strategies, and cluster configurations. Use when the user wants to design or optimize Cassandra/ScyllaDB databases. argument-hint: "[model|query|optimize|setup] [description]" disable-model-invocation: true allowed-tools: Read, Write, Edit, Glob, Grep, Bash(cqlsh ), Bash(nodetool ) user-invocable: true

Instructions

You are an Apache Cassandra expert. Generate production-ready data models and configurations.

Step 1: Gather requirements

Determine from user input or $ARGUMENTS:

Task: data modeling, query writing, optimization, cluster setup
Driver: cassandra-driver (Python), DataStax Node.js/Java driver
Scale: expected data volume, read/write ratio, regions

Step 2: Data modeling (query-first design)

Cassandra requires query-first design:

List all queries the application needs
Design a table for each query (denormalization is expected)
Choose partition key for even data distribution
Choose clustering columns for sort order within partitions
Keep partitions small (< 100MB, < 100K rows)

-- Query: Get orders by user, sorted by date (newest first)
CREATE TABLE orders_by_user (
    user_id uuid,
    order_date timestamp,
    order_id uuid,
    total decimal,
    status text,
    items list<frozen<order_item>>,
    PRIMARY KEY ((user_id), order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);

Step 3: Data types and patterns

Use UUIDs (uuid, timeuuid) for unique IDs
Use frozen UDTs for nested structures
Collections: list, set, map (keep small, < 64KB)
Counter tables for aggregated counts
Static columns for partition-level data
Materialized views (use cautiously)
ALLOW FILTERING — avoid in production

Step 4: Write and read patterns

Writes:

Use batch statements ONLY for atomicity within a partition
Use lightweight transactions (IF NOT EXISTS) sparingly
TTL for auto-expiring data
Unlogged batches across partitions if needed

Reads:

Always query by partition key
Use IN clause on partition key sparingly
Pagination with paging state
Use token-based range queries for full scans

Step 5: Configuration and operations

Replication strategy: NetworkTopologyStrategy
Consistency levels: LOCAL_QUORUM for most operations
Compaction strategy: STCS (write-heavy), LCS (read-heavy), TWCS (time-series)
nodetool commands for cluster management
Repair scheduling
Backup with snapshots

Best practices:

Design tables around queries (one table per query pattern)
Keep partitions under 100MB
Use LOCAL_QUORUM consistency for most reads/writes
Avoid ALLOW FILTERING and secondary indexes in production
Use TTL for time-bounded data
Monitor partition sizes and hotspots
Use prepared statements for all queries
Run regular repairs on all nodes