name: databricks-table-properties description: Provides standard TBLPROPERTIES and metadata patterns for Unity Catalog Delta tables across Bronze, Silver, and Gold medallion layers. Ensures governance compliance, performance optimization, and proper metadata tagging for all table creation operations. Covers required TBLPROPERTIES by layer (Bronze, Silver DLT, Gold), mandatory CLUSTER BY AUTO configuration, Change Data Feed (CDF) enablement, auto-optimize settings, table and column comment patterns (LLM-friendly for Bronze/Silver, dual-purpose for Gold), domain values, data classification tags, and validation checklists. Use when creating Delta tables, configuring table properties, enabling CDF, setting up auto-optimize, or ensuring governance metadata consistency. Critical for preventing missing properties, incorrect clustering configurations, and governance compliance issues. metadata: author: prashanth subrahmanyam version: "1.1" domain: infrastructure role: shared used_by_stages: [1, 2, 3, 4] last_verified: "2026-06-02" volatility: medium clients: [ide_cli, genie_code] # deploy via databricks-asset-bundles (the spine); Genie detail via genie-code-environment deploy_verb: "bundle deploy --target dev" deploy_note: "TBLPROPERTIES/CREATE TABLE are deploy-time bundle-resource bodies (RULE_10: run during bundle deploy, retained)" coverage: all_stages upstream_sources: - name: "ai-dev-kit" repo: "databricks-solutions/ai-dev-kit" paths: - "databricks-skills/databricks-unity-catalog/SKILL.md" relationship: "derived" last_synced: "2026-02-19" sync_commit: "97a3637"
Databricks Table Properties Standards
Pattern Recognition
Every table creation (Bronze, Silver, Gold) uses a consistent set of TBLPROPERTIES and metadata. This rule standardizes these patterns to ensure governance compliance.
Required Table Properties by Layer
See assets/templates/table-properties.sql for complete SQL templates.
Bronze Layer Tables (11 required properties + optional retention)
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'layer' = 'bronze',
'source_system' = 'RetailChain', # Update based on source
'domain' = '<domain>', # e.g., 'retail', 'sales', 'inventory', 'product'
'entity_type' = '<dimension|fact>',
'contains_pii' = '<true|false>',
'data_classification' = '<confidential|internal>',
'business_owner' = '<Team Name>',
'technical_owner' = 'Data Engineering',
# Optional: Add retention for compliance
'retention_period' = '7_years' # Only if required
)
Silver Layer DLT Tables (15 required properties)
table_properties={
"quality": "silver",
"delta.enableChangeDataFeed": "true",
"delta.enableRowTracking": "true",
"delta.enableDeletionVectors": "true",
"delta.autoOptimize.autoCompact": "true",
"delta.autoOptimize.optimizeWrite": "true",
"delta.tuneFileSizesForRewrites": "true",
"layer": "silver",
"source_table": "<bronze_table_name>",
"domain": "<domain>",
"entity_type": "<dimension|fact|quarantine>",
"contains_pii": "<true|false>",
"data_classification": "<confidential|internal>",
"business_owner": "<Team Name>",
"technical_owner": "Data Engineering"
}
Gold Layer Tables (14 required properties)
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.enableRowTracking' = 'true',
'delta.enableDeletionVectors' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'layer' = 'gold',
'source_layer' = 'silver',
'domain' = '<domain>',
'entity_type' = '<dimension|fact>',
'contains_pii' = '<true|false>',
'data_classification' = '<confidential|internal>',
'business_owner' = '<Team Name>',
'technical_owner' = 'Data Engineering',
'gold_type' = '<scd2|snapshot|aggregated>'
)
Clustering Configuration
⚠️ MANDATORY: ALWAYS use automatic liquid clustering
NEVER specify clustering columns manually. Always use AUTO.
# For SQL DDL (Bronze, Gold)
CLUSTER BY AUTO
# For DLT Python (Silver)
cluster_by_auto=True
Benefits of AUTO clustering:
- ✅ Delta automatically selects optimal clustering columns
- ✅ Self-tuning based on query patterns
- ✅ No manual column specification needed
- ✅ Works with all data types (including BOOLEAN)
- ✅ Adapts as data and queries evolve
❌ DO NOT DO THIS:
CLUSTER BY (column1, column2) -- ❌ WRONG: Never specify columns
CLUSTER BY (is_current) -- ❌ WRONG: BOOLEAN columns don't support clustering
✅ ALWAYS DO THIS:
CLUSTER BY AUTO -- ✅ CORRECT: Let Delta choose optimal clustering
Table Comments
Modern Pattern (RECOMMENDED for Gold Layer)
For Gold layer tables, use dual-purpose documentation without "LLM:" prefix.
See data_product_accelerator/skills/gold/design-workers/06-table-documentation/SKILL.md for comprehensive Gold layer standards.
Pattern:
[Natural description]. Business: [business context and use cases]. Technical: [implementation details].
Example:
COMMENT 'Gold layer daily sales fact table with pre-aggregated metrics at store-product-day grain. Business: Primary source for sales performance reporting including revenue, units, discounts, returns, and customer loyalty metrics. Aggregated from transaction-level Silver data for fast query performance. Used for dashboards, executive reporting, and sales analysis. Technical: Grain is one row per store-product-date combination. Pre-aggregated measures eliminate need for transaction-level scans, surrogate keys enable fast dimension joins.'
Legacy Pattern (Bronze/Silver)
For Bronze and Silver layers, "LLM:" prefix is acceptable for brevity.
# DLT Example (Silver)
@dlt.table(
name="silver_transactions",
comment="""LLM: Silver layer streaming fact table for point-of-sale transactions with comprehensive
data quality rules, price validation, discount logic verification, and referential integrity checks""",
table_properties={...},
cluster_by_auto=True
)
# SQL DDL Example (Bronze)
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'
Column Comments
Gold Layer (Dual-Purpose Format)
Every column in Gold layer must have comprehensive dual-purpose comments:
Pattern:
[Definition]. Business: [purpose, use cases, business rules]. Technical: [data type, format, calculation, source, constraints].
Examples:
# Surrogate key
store_key STRING NOT NULL
COMMENT 'Surrogate key uniquely identifying each version of a store record. Business: Used for joining fact tables to dimension. Technical: MD5 hash generated from store_id and processed_timestamp to ensure uniqueness across SCD Type 2 versions.'
# Business key
store_number STRING NOT NULL
COMMENT 'Business key identifying the physical store location. Business: The primary identifier used by store operations and field teams. Technical: Natural key from source system, same across all historical versions of this store.'
# Measure
net_revenue DECIMAL(18,2)
COMMENT 'Net revenue after subtracting returns from gross revenue. Business: The actual revenue realized from sales, primary KPI for financial reporting. Technical: gross_revenue - return_amount, represents true daily sales value.'
Bronze/Silver Layers (Simpler Format)
Column comments can be more concise but should still include key context:
store_number STRING NOT NULL
COMMENT 'Store number where the transaction occurred. Links to store dimension.'
transaction_date DATE NOT NULL
COMMENT 'Transaction date from POS system. Used for daily aggregations and trending.'
Domain Values
Standard domains used in this project:
retail- Store and location datasales- Transaction and revenue datainventory- Stock and replenishment dataproduct- Product master datalogistics- Delivery and supply chainrevenue- Financial metrics
Data Classification Values
confidential- Contains PII or sensitive business datainternal- Business data without PIIpublic- Safe for external sharing (rare)
Validation Checklist
When creating any table, ensure:
- Property count: Bronze tables have 11+ TBLPROPERTIES, Silver have 15, Gold have 14
- Cross-reference: Properties match THIS skill's layer section, not just the calling template
-
layerproperty matches the actual layer -
domainis from the standard list -
entity_typeis dimension, fact, or quarantine -
contains_piiaccurately reflects PII presence -
data_classificationaligns with contains_pii -
business_owneris a real team name -
CLUSTER BY AUTOorcluster_by_auto=Trueis set - Table comment starts with "LLM:"
- All columns have detailed comments
Common Mistakes to Avoid
❌ Don't do this:
# Missing critical properties
TBLPROPERTIES (
'layer' = 'bronze'
)
# No clustering
CREATE TABLE my_table (...)
USING DELTA
# Minimal comment
COMMENT 'Store data'
✅ Do this:
# Complete properties
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'layer' = 'bronze',
'source_system' = 'RetailChain',
'domain' = 'retail',
'entity_type' = 'dimension',
'contains_pii' = 'true',
'data_classification' = 'confidential',
'business_owner' = 'Retail Operations',
'technical_owner' = 'Data Engineering'
)
CLUSTER BY AUTO
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'
Gotchas
- Templates in worker skills may be incomplete. The
copy_from_source.pytemplate anddata-source-approaches.mdexamples in the Bronze skill contain only performance properties (CDF, auto-optimize). After copying from ANY template, cross-check every table against the required property list for its layer in THIS skill. Anti-pattern: assuming the template has all properties. - DEEP CLONE preserves some properties but not all. DEEP CLONE preserves CDF settings, CLUSTER BY AUTO, column COMMENTs, PK constraints, and row tracking metadata. It does NOT automatically set governance properties (
domain,entity_type,contains_pii,data_classification,business_owner,technical_owner). After every DEEP CLONE, apply the remaining enterprise TBLPROPERTIES withALTER TABLE ... SET TBLPROPERTIES. Seeassets/templates/table-properties.sqlfor the post-clone template. - Skipping requirements collection makes correct properties impossible. If per-table metadata (
entity_type,contains_pii,data_classification) is never collected from the user or schema CSV, the agent cannot set the right values. Always resolve these before writing DDL. - Never use
DEFAULTcolumn clauses in DDL. ADEFAULT <expr>clause (e.g.is_active BOOLEAN NOT NULL DEFAULT true) requires thedelta.feature.allowColumnDefaultstable feature, which is OFF by default — theCREATE TABLEfails. Do NOT enable the feature flag; declare the column withoutDEFAULTand supply the value at INSERT time. Also do not add columns the template/design never specified. Seeunity-catalog-constraints→ "Never UseDEFAULTColumn Clauses in DDL".