aio-starrocks-best-practices - SKILL.md Agent Skill

name: aio-starrocks-best-practices description: | StarRocks table design best practices — partitioning strategy, sort key selection, bucketing decisions, Primary Key table tuning, authentication/authorization, and resource group configuration. Use when designing a StarRocks table, reviewing DDL, choosing partition granularity, sizing tablets (tablet sizing), setting up colocated joins, configuring LDAP or OIDC authentication, tuning Primary Key compaction, adjusting FE memory limits for large partition counts, or isolating workloads with resource group isolation. when_to_use: StarRocks table design, partitioning strategy, sort key selection, bucketing, Primary Key table tuning, designing new tables, optimizing schemas, reviewing DDL, partition strategy, bucket strategy, sort strategy, access control, resource isolation, resource group, StarRocks DDL, StarRocks best practices, colocated join, tablet sizing, LDAP, OIDC, compaction, FE memory limit, resource group isolation, DDL review argument-hint: "Table DDL or design question" effort: low

StarRocks Best Practices

Official best practices from docs.starrocks.io/docs/best_practices/.

Core design philosophy: "Designing for efficiency does more than improve query speed — it decreases costs by reducing storage, CPU, and object storage API costs."

Related Skills

/aio-starrocks-query-tuning — Query performance tuning, EXPLAIN plans, operator metrics, hints
/starrocks — Query syntax, cluster connections, Grafana integration (external skill, not part of this plugin)
/starrocks-expert — General table types, data loading, query optimization (external skill, not part of this plugin)

1. Partitioning

Partitioning enables coarse-grain data pruning via partition elimination AND metadata-only lifecycle operations (TTL, GDPR deletes, tenant isolation).

Partition Key Selection

Time-first — If 80%+ of queries filter by time, lead with date_trunc('day', dt)
Tenant isolation — Include tenant_id when managing data per-tenant
Retention alignment — Include columns you'll purge via DROP PARTITION
Composite keys — Creates #tenants x #days partitions — keep total below ~100K to avoid FE memory strain

Granularity Decision

Granularity	Best For	Advantage	Limitation
Daily	BI/reporting	365 partitions/year; simple TTL	Coarse for sub-day queries
Hourly	IoT/burst workloads	Hot-spot isolation	8,700 partitions/year
Weekly/Monthly	Historical archive	Minimal metadata	Less precise pruning

Sizing Rule

Each partition <= 100GB
<= 20K tablets per partition (across replicas)
Total partitions < 100K (FE memory constraint)
Tablets per BE < 200K

DDL Templates

Single-tenant clickstream:

CREATE TABLE click_stream (
  user_id BIGINT, event_time DATETIME, url STRING, ...)
DUPLICATE KEY(user_id, event_time)
PARTITION BY date_trunc('day', event_time)
DISTRIBUTED BY HASH(user_id) BUCKETS xxx;

Multi-tenant SaaS (recommended for sales-engine pattern):

CREATE TABLE metrics (
  tenant_id INT, dt DATETIME, metric_name STRING, v DOUBLE)
PRIMARY KEY(tenant_id, dt, metric_name)
PARTITION BY date_trunc('DAY', dt)
DISTRIBUTED BY HASH(tenant_id) BUCKETS xxx;

Large-tenant composite (when single tenant > 100GB/partition):

CREATE TABLE activity (
  tenant_id INT, dt DATETIME, id BIGINT, ....)
DUPLICATE KEY(dt, id)
PARTITION BY tenant_id, date_trunc('MONTH', dt)
DISTRIBUTED BY HASH(id) BUCKETS xxx;

Partitioning vs Bucketing

Partitions = lifecycle management tools (TTL, DROP PARTITION, GDPR). Enable query-time partition pruning — skip entire data blocks.
Buckets = parallelism levers. Distribute data within partitions for parallel scan/ingest.

2. Table Clustering (Sort Keys)

"A thoughtful sort-key is the highest-leverage physical-design knob in StarRocks."

Why Sort Keys Matter

Sort keys deliver compounding benefits across write, storage, and read:

I/O elimination — Segment and page pruning via min/max metadata skips irrelevant data blocks
Point lookups — Sparse prefix index enables millisecond queries on leading sort columns
Sorted aggregation — Streaming aggregation (2-3x faster) when GROUP BY aligns with sort key
Compression & caching — Sorted data improves encoding efficiency and CPU cache locality

Sort Key Selection Playbook

Decision hierarchy:

Equality columns first — High-cardinality columns with frequent = / IN filters
Range columns second — Timestamps or numeric ranges for temporal/value windows
Aggregation helpers third — GROUP BY columns that enable sorted aggregation

Configuration Rules

Rule	Guidance
Width	3-5 columns max. Wider keys degrade ingest and exhaust the 36-byte prefix-index limit
Cardinality order	Low-cardinality before high-cardinality enhances compression
String columns	Long strings consume prefix-index bytes, blocking subsequent columns from indexing

Reference Templates

Scenario	Partition	Sort Key	Why
B2C Orders	`date_trunc('day', order_ts)`	`(user_id, order_ts)`	User-first filters; then temporal ranges
IoT Telemetry	`date_trunc('day', ts)`	`(device_id, ts)`	Device time-series dominates
SaaS Multi-Tenant	`tenant_id`	`(dt, event_id)`	Tenant isolation; dashboard time clustering
Dimension Lookup	none	`(dim_id)`	Point lookups only

DDL Example

CREATE TABLE telemetry (
  device_id VARCHAR,
  ts DATETIME,
  value DOUBLE
)
ENGINE=OLAP
PRIMARY KEY(device_id, ts)
PARTITION BY date_trunc('day', ts)
DISTRIBUTED BY HASH(device_id) BUCKETS 16
ORDER BY (device_id, ts);

Anti-Patterns

Placing long string columns at the sort-key head (wastes prefix-index bytes)
Overly wide sort keys (>5 columns)
Misaligning partition and sort keys (defeating logical pruning order)

3. Bucketing (Distribution Strategy)

Quick Decision Framework

Scenario	Choice	Rationale
Stable join/filter keys, high cardinality	Hash	Enables pruning, colocated joins, local aggregation
Write-heavy logs/events; multi-tenant	Random	Prevents skew, uniform throughput, elastic growth
Aggregate/Primary Key tables	Hash	Only option for these table types
Duplicate Key tables needing elasticity	Random	Auto-splits when `bucket_size` set

Hash Bucketing

DISTRIBUTED BY HASH(column1, column2) BUCKETS n
PROPERTIES ("colocate_with" = "group_name")

Key requirements:

Key must be stable, evenly distributed, high-cardinality
Cardinality rule: >= 1000x the number of BE nodes to prevent skew
Tablet sizing: Target 1-10 GB per tablet initially
Tablets > 10GB = compaction efficiency degradation

Query optimizations enabled:

-- Tablet pruning: single tablet accessed
SELECT sum(amount) FROM sales WHERE customer_id = 123;

-- Local aggregation: no shuffle phase
SELECT customer_id, sum(amount) FROM sales GROUP BY customer_id;

-- Colocated join: no network shuffle between BEs
SELECT c.region, sum(s.amount)
FROM sales s JOIN customers c USING (customer_id)
WHERE s.sale_date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY c.region;

Colocated join setup (matching key + bucket count required):

CREATE TABLE sales (
  sale_id BIGINT, customer_id INT, sale_date DATE, amount DECIMAL(10,2))
DISTRIBUTED BY HASH(customer_id) BUCKETS 48
PARTITION BY date_trunc('DAY', sale_date)
PROPERTIES ("colocate_with" = "group1");

CREATE TABLE customers (
  customer_id INT, region VARCHAR(32), status TINYINT)
DISTRIBUTED BY HASH(customer_id) BUCKETS 48
PROPERTIES ("colocate_with" = "group1");

Random Bucketing

DISTRIBUTED BY RANDOM
PROPERTIES ("bucket_size" = "1GB")  -- Enables auto-split (v3.2+)

Round-robin row assignment (no hash key)
Auto tablet splitting when partition grows (requires bucket_size)
Growth-only — no shrinking
Limitation: Duplicate Key tables only
Trade-off: No bucket pruning; every query scans all tablets in a partition; no colocated joins

Operational Maintenance

Random: Always set bucket_size (e.g., 1GB) for auto-split
Hash: Monitor tablet size; re-shard before tablets exceed 5-10 GB (ALTER TABLE ... BUCKETS n)
Both: Watch for metadata bloat with excessive tablet counts

Anti-Patterns

Low-cardinality hash keys — Creates hot tablets and imbalanced writes
Undersizing initial buckets — Hampers ingestion parallelism and compaction
Random bucketing for dimensional joins — Eliminates locality optimizations
Ignoring bucket_size in Random mode — Tablets never split; metadata grows unbounded

4. Primary Key Table Tuning

Primary Key Index Types

Type	Recommendation	Notes
Full in-memory	NOT recommended	Excessive memory waste
Local disk persistent	Standard option	Good for shared-nothing
Cloud-native persistent	Recommended for shared-data	Avoids disk capacity constraints, eliminates index rebuilds after rebalancing

Key Design Principles

Focus on uniqueness requirements during import/updates, NOT query acceleration
Minimize column count and size (default max: 128 bytes)
Use ORDER BY clause separately for query optimization via sort keys

Resource Consumption Formula

Storage: (key_size + 8 bytes) x row_count x 50%
Memory: min(l0_max_mem_usage x tablet_count, update_memory_limit_percent x BE_memory)

Memory Management

Monitor: http://be_ip:be_http_port/mem_tracker?type=update

Reduce import memory overhead:

l0_max_mem_usage = <value < 104857600>    # Default 104857600 (100MB)
skip_pk_preload = true
transaction_apply_worker_count = <cpu_cores - n>
transaction_publish_version_worker_count = <cpu_cores - n>

Trade-off: Reduced memory increases I/O; fewer worker threads slow ingestion

Performance Balance

Goal	Configuration
High freshness + low query latency	Increase `compact_threads`; decrease `update_compaction_per_tablet_min_interval_seconds`
Good freshness, limited resources	Raise `lake_ingest_slowdown_threshold` and `lake_compaction_score_upper_bound`
Good latency, limited resources	Reduce write frequency; batch larger data loads

Monitoring

Shared-data: SHOW PROC '/transactions/{db}/running' for compaction slowdown messages
Shared-nothing: Monitor tablet_max_versions threshold before ingestion failures

5. Authentication & Authorization

Three-Layer Access Control

Identity Authentication — "I am who I claim to be" (user verification)
Access Authentication — Group/role-based login eligibility to the cluster
Operation Authorization — Query execution and data access permissions

Authentication Methods

Method	User Storage	Setup	Best For
Native User	In-cluster	Manual creation	Small user bases
Security Integration	External system	Configuration-driven	Large enterprises

Configuration Examples

Native user with external auth (LDAP):

CREATE USER <username> IDENTIFIED WITH authentication_ldap_simple
AS 'uid=tom,ou=company,dc=example,dc=com';

Security integration (LDAP):

CREATE SECURITY INTEGRATION <name> PROPERTIES (
    "type" = "authentication_ldap_simple",
    "authentication_ldap_simple_server_host" = "",
    "authentication_ldap_simple_server_port" = "",
    "authentication_ldap_simple_bind_base_dn" = "",
    "authentication_ldap_simple_user_search_attr" = ""
);

ADMIN SET FRONTEND CONFIG (
    "authentication_chain" = "<security_integration_name>"
);

Group provider + role grants:

CREATE GROUP PROVIDER <name> PROPERTIES (
    "type" = "ldap",
    "ldap_conn_url" = "",
    "ldap_bind_root_dn" = "",
    "ldap_bind_base_dn" = ""
);

GRANT <role> TO EXTERNAL GROUP <group_name>;

Solution Selection

Full external control — Security Integration + Apache Ranger
Minimal setup, native control — Security Integration + Internal RBAC
Legacy — Native users with GRANT statements

Supported protocols: LDAP, OIDC, OAuth 2.0, JWT, native passwords

Critical: User IDs and group names must match across auth, group provider, and authorization systems. Mismatches cause permission failures.

6. Audit Log & Resource Groups

Core principle: Use data-driven resource allocation by analyzing starrocks_audit_db__.starrocks_audit_tbl__ rather than guesswork.

CPU Resource Allocation

Analyze per-user CPU consumption, allocate proportionally:

-- Aggregate cpuCostNs per user, last 30 days
SELECT user, SUM(cpuCostNs) / 1e9 AS cpu_seconds
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF', 'OK')
  AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY user
ORDER BY cpu_seconds DESC;

Configuration:

exclusive_cpu_cores — Cannot exceed single BE core count; sum across all groups <= BE total
cpu_weight — For soft-isolation groups; determines relative share on remaining cores

Rule of thumb: If a user is 16% of CPU on a 64-core BE, allocate ~11 cores.

Memory Management

-- Peak single-query memory per user
SELECT user, MAX(memCostBytes) / (1024 * 1024) AS peak_mem_mb
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF', 'OK')
  AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY user;

big_query_mem_limit — Set high to avoid false-positive termination of legitimate large queries
mem_limit — Set high (e.g., 0.9 for 90%)
Per-BE usage ~= total_max_mem_mb / number_of_BEs

Concurrency Control

-- Peak concurrent queries per user per minute
SELECT user, DATE_FORMAT(timestamp, '%Y-%m-%d %H:%i') AS minute_bucket,
       COUNT(*) AS concurrent_queries
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE queryType = 'query' AND state IN ('EOF', 'OK')
  AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY user, minute_bucket
ORDER BY concurrent_queries DESC;

concurrency_limit — Set to 1.5x observed peak for headroom
For extreme spikes: enable Query Queues for load smoothing
Minute-level analysis may underestimate per-second spikes

Materialized View Resource Isolation

Prevent async MV refreshes from degrading interactive queries:

CREATE RESOURCE GROUP rg_mv (
    user = 'mv_user',
    query_type IN ('insert', 'select')
)
WITH (
    'cpu_weight' = '32',
    'mem_limit' = '0.9',
    'concurrency_limit' = '10',
    'spill_mem_limit_threshold' = '0.5'
);

-- Assign to MV at creation
CREATE MATERIALIZED VIEW ... PROPERTIES ('resource_group' = 'rg_mv');

-- Or existing MV
ALTER MATERIALIZED VIEW ... SET ("resource_group" = "rg_mv");

Anti-Patterns

Relying on guesswork instead of audit log analysis
Setting exclusive_cpu_cores sum to exceed available BE cores
Using low concurrency_limit without headroom buffer
Applying low mem_limit that terminates legitimate queries
Allowing MV refreshes to share resources with interactive workloads

Decision Matrix — Table Design by Use Case

Decision	Single-Tenant Fact	Multi-Tenant SaaS	IoT/Events	Dimension Lookup
Partition	`date_trunc('day', ts)`	`date_trunc('DAY', dt)`	`date_trunc('day', ts)`	None
Sort Key	`(user_id, ts)`	`(tenant_id, dt)`	`(device_id, ts)`	`(dim_id)`
Distribution	HASH(user_id)	HASH(tenant_id)	RANDOM w/ bucket_size	HASH(dim_id)
Bucket Type	Hash (colocate joins)	Hash (tenant filter pruning)	Random (prevent skew)	Hash
PK Index	Local disk persistent	Cloud-native (elastic)	Cloud-native	Local disk
Achievement	Query on user; streaming agg	Tenant isolation; fast TTL	Write throughput; device clustering	Millisecond lookups