name: databricks-metric-view-advisor description: Use this skill when the user wants to create Unity Catalog metric views — whether starting from gold/fact tables, existing AI/BI dashboards, SQL query files, Genie spaces, or KPI spreadsheets. Triggers on intent like "formalize our KPIs," "build a metric/semantic layer," "define measures and dimensions from our tables," "standardize aggregations so other teams can reuse them," or "turn our ad-hoc queries into reusable metrics." Guides an interactive workflow — analyzing source assets, generating YAML definitions, checking for overlap with existing views, and deploying. Do NOT use for querying or altering an already-existing metric view, comparing metric view frameworks, creating regular Unity Catalog tables/schemas, or MLflow/model tracking. compatibility: Requires databricks CLI (>= v1.0.0) metadata: version: "0.1.0" parent: databricks-metric-views
Metric View Advisor
Create Unity Catalog metric views from your existing Databricks assets — gold/fact schemas, AI/BI dashboards, SQL queries, Genie spaces, or KPI files. This advisor guides an interactive workflow that analyzes those sources, synthesizes them into richer, deduplicated suggestions, checks for overlap with views that already exist, and walks deployment end to end. Unlike a single-input "create a metric view" helper, it combines multiple input sources into one coherent set of definitions.
⚠️ REQUIRED — this skill depends on its parent
databricks-metric-views. The advisor is not self-contained: the baseline metric-view YAML specification (top-level fields, dimensions, measures, window measures, joins, filter, materialization) and the baseline pattern library live only in the parentdatabricks-metric-viewsskill. You must have that skill available and load it alongside this one — the advisor's reference files deliberately point back to it instead of duplicating the spec. Without the parent skill, the advisor cannot produce correct definitions.
FIRST: Read the parent databricks-metric-views skill for the baseline YAML spec and patterns. This advisor builds on that foundation and only documents the additional material it needs — the interactive multi-source workflow, input-source handlers, overlap detection, and the advisor-specific YAML/pattern extras (gotchas, composability, semantic metadata, LOD, the SQL-source fallback).
Prerequisites:
- The parent
databricks-metric-viewsskill must be available and loaded (see the REQUIRED note above) — this is mandatory, not optional. - A working Databricks CLI (>= v1.0.0) authenticated to a workspace profile. All CLI/SQL commands this skill needs are documented in references/cli-operations.md — read that file before running any command in the steps below.
How tools are used
All operations run through the Databricks CLI. The mechanics — executing SQL, discovering table schemas, fetching dashboard/Genie definitions, deploying and querying metric views — are documented in references/cli-operations.md. Read that file before running any command in the steps below. In short:
- Run SQL:
databricks experimental aitools tools query "<SQL>" --profile <PROFILE>(long DDL →aitools tools statement submit --file, see cli-operations.md) - Inspect a table:
databricks experimental aitools tools discover-schema <catalog.schema.table> --profile <PROFILE> - Default warehouse:
databricks experimental aitools tools get-default-warehouse --profile <PROFILE> - Fetch a dashboard:
databricks lakeview get <dashboard_id> --profile <PROFILE>(draft definition with datasets); fetch a Genie space / metric-view definition:databricks api get ...(see cli-operations.md)
If the host agent has its own native tools (e.g. a
readAssetById-style dashboard/asset reader), it may use those instead of these commands. That's fine — but verify the result is non-empty. A native reader often returns the published dashboard serialization, which can come back empty (datasets: [],pages: []); an empty result is a fetch-method artifact, not an empty dashboard. When a native fetch returns empty or partial data, fall back to the CLI/REST commands documented in references/cli-operations.md (for dashboards: trylakeview get, thenlakeview get-published, then Input 3).
Workflow Overview
CRITICAL: This is an interactive, step-by-step workflow. After completing each step, you MUST stop and wait for the user's response before proceeding to the next step. Never combine multiple steps into a single response. Never skip ahead. Each step should be its own conversational turn.
Opening message: When the skill starts, greet the user and outline the setup:
"I'll help you create Unity Catalog metric views from your existing Databricks assets.
You can combine multiple input sources — schemas, dashboards, queries, Genie spaces, and KPIs — and I'll synthesize insights from all of them.
I have 5 quick setup questions — I'll ask them one at a time:
- Databricks workspace / profile
- Input sources (pick any combination)
- Source identifiers
- Target location
- Review preference
Let's begin!"
Then immediately ask the first question. Ask questions one at a time. Do NOT show details of upcoming questions — just ask the current one, wait for the response, then move to the next. If the user proactively provides multiple answers in one message, accept them and skip those questions.
Step 1: Gather Setup Info (ask one at a time)
Ask these questions sequentially, waiting for a response after each.
1a. Databricks workspace / CLI profile
Establish the workspace profile and authentication:
- NEVER auto-select a profile. List profiles with
databricks auth profiles, present them all (with workspace URLs) to the user, and let them choose — even if only one exists. - Accept a profile name, or a workspace URL the user types directly.
- Validate auth before continuing with
databricks auth describe --profile <PROFILE>(no token is minted — it just reports the resolved host and auth status). If auth fails or the profile is stale, re-authenticate by runningdatabricks auth login --profile <PROFILE>when starting the advisor session (the host is already stored in the profile — only pass--host <workspace-url>when creating a brand-new profile) before proceeding.
After auth is validated, discover a SQL warehouse for the session (automatic — not a user question):
databricks experimental aitools tools get-default-warehouse --profile <PROFILE>
Store the warehouse id for all SQL execution this session. The query / discover-schema tools auto-pick the default warehouse, so an explicit id is only needed for the statement submit path (pass --warehouse <ID> or set DATABRICKS_WAREHOUSE_ID). Don't prompt the user with a warehouse question — pick the default automatically. But if the user names a specific warehouse (by id or by referring to one in the session), honor that choice instead of the default and use it for all SQL this session.
STOP — wait for the user's response.
1b. Input sources (multi-select)
Ask which input sources they want to use. They can pick any combination — the more sources, the richer the metric view suggestions.
| # | Input Source | What You Need | Requires Schema? |
|---|---|---|---|
| 1 | Gold schema | catalog.schema |
— (is a schema) |
| 2 | AI/BI dashboard | Dashboard ID or URL | No |
| 3 | Queries on gold tables | .sql file path |
Yes — needs catalog.schema |
| 4 | Genie space | Space ID | No |
| 5 | KPIs, Measures & Dimensions | .csv/.yaml file path |
Yes — needs catalog.schema |
"Pick one or more (e.g.,
1, 2or1, 3, 5). I'll combine insights from all selected sources."
Note: If the user selects 3 or 5 without also selecting 1, you still need a catalog.schema — ask for it in Step 1c. If the user selects 1 along with 3 or 5, that same schema is shared.
STOP — wait for the user's response before continuing.
1c. Source identifiers
Based on all input sources selected in 1b, ask for the required identifiers. Collect everything needed in one question — group the asks clearly:
- Gold schema (1) →
catalog.schema - AI/BI dashboard (2) → Dashboard ID or URL
- Queries on gold tables (3) →
.sqlfile path (+catalog.schemaif source 1 was not selected) - Genie space (4) → Space ID
- KPIs, Measures & Dimensions (5) →
.csv/.yamlfile path (+catalog.schemaif source 1 was not selected)
If multiple sources share a catalog.schema (e.g., 1 + 3 + 5), ask for it once.
STOP — wait for the user's response before continuing.
1d. Target catalog.schema
Ask: "Which catalog.schema should the metric views be created in? (Can be the same as or different from the source.)"
After the user responds, validate the target schema exists before continuing — run SHOW SCHEMAS IN <catalog> LIKE '<schema>' (see cli-operations.md for how to execute SQL):
- If the schema exists → proceed to the next question.
- If the schema does not exist → tell the user:
"The schema
<catalog>.<schema>does not exist. Would you like me to create it, or would you prefer to use a different target?"- If the user says create it → run
CREATE SCHEMA IF NOT EXISTS <catalog>.<schema>, confirm success, then proceed. - If the user provides a different target → validate that one instead.
- If the user says create it → run
STOP — wait for the user's response before continuing.
1e. Discover existing metric views (automatic — not a user question)
After the target schema is confirmed, automatically check for existing metric views in the target schema. This prevents duplicate/overlapping views from accumulating across multiple runs of the skill.
- List existing metric views — metric views appear in
information_schema.tableswithtable_type = 'METRIC_VIEW'(they do NOT appear inSHOW VIEWS):
SELECT table_name FROM <target_catalog>.information_schema.tables
WHERE table_schema = '<target_schema>' AND table_type = 'METRIC_VIEW'
If no metric views exist (empty result, or the schema was just created) → note internally "Fresh schema — no existing metric views to check for overlap" and skip to Step 1f.
If metric views exist, fetch each one's definition with
DESCRIBE TABLE EXTENDED <full_name> AS JSON(see cli-operations.md). From each, extract the structural fingerprint: source table (fully qualified), dimensions(name, expr), measures(name, expr), joined tables. Gracefully skip any view where the describe fails (it may be a regular SQL view).Store this inventory internally for use in Step 3's overlap check.
Present an informational summary (do NOT ask a question — just inform and proceed):
Existing metric views in
<target_catalog>.<target_schema>:
# View Source Table Dims Measures 1 order_metrics ...orders 9 9 2 lineitem_analytics ...lineitem 16 15 I'll check for overlap with my suggestions before creating anything new.
Then proceed automatically to Step 1f.
1f. Review preference
| # | Option |
|---|---|
| 1 | Review first — I'll show suggestions and save them to a YAML file for your review/editing before creating anything |
| 2 | Auto-create — I'll generate and deploy metric views automatically (suggestions file still saved for reference) |
STOP — wait for the user's response before continuing. SQL file saving defaults to yes — only mention it if the user asks.
How review preference affects the workflow:
- Review first (1): Step 3 saves
suggestions.yaml, displays suggestions, and waits for the user to approve, edit the file, or provide an alternative file before proceeding to Step 4. Each subsequent step also waits for user confirmation. - Auto-create (2): Step 3 still saves
suggestions.yamlfor reference. Steps 3–4 proceed automatically (no approval needed for suggestions or YAML generation). Step 5 asks about materialization. Step 6 deploys automatically and verifies with test queries. Step 7 (sample queries) and next steps proceed without waiting.
Step 2: Analyze the Inputs
For each selected input source, follow its detailed analysis instructions in references/input-handlers.md. Run all applicable handlers, then merge findings into a single combined analysis.
Reference files — load these when generating definitions:
- Parent
databricks-metric-viewsskill — the baseline YAML spec (top-level fields, dimensions, measures, window measures, joins, filter, materialization) and the baseline pattern library (ratios, filtered measures, TPC-H demo, detailed window measures, ALTER). Read it first. - references/yaml-reference.md — advisor additions to the spec: gotchas table, expanded source options, composability, extra measure/join rules, semantic metadata, LOD, extra materialization detail, and a correct comprehensive example
- references/patterns.md — advisor template patterns (metadata-rich single-table & composability templates, correctly-quoted star/snowflake joins, the SQL-source fallback) plus a pointer to the parent's additional patterns
Analysis steps when multiple sources are selected:
- Run each input handler — Execute the analysis steps for every selected source (e.g., if the user selected Gold schema + Dashboard + KPIs, run Input 1, Input 2, and Input 5 handlers).
- Merge findings — Combine all discovered tables, dimensions, measures, and relationships into one unified picture. See the "Merging Multiple Input Sources" section in references/input-handlers.md for detailed merge rules.
- Cross-validate — Use insights from one source to enrich another:
- Dashboard queries reveal which schema columns are actually used in practice
- KPI definitions provide business names for raw column aggregations
- Genie sample questions reveal how users think about the data
- SQL queries show recurring patterns that should be standardized
- Schema inspection uncovers tables/columns that other sources missed
The combined analysis must produce:
- A list of source tables with their columns, types, row counts, and comments (table-level and column-level)
- Identified fact tables (contain measures/events) vs dimension tables (contain attributes/lookups)
- Identified relationships between tables (foreign keys, join paths, Genie join instructions, dashboard JOIN clauses)
- Candidate dimensions (categorical columns, dates, hierarchies) — noting which input source(s) each came from. Include null-safe handling for nullable columns.
- Candidate measures (numeric columns suitable for aggregation) — noting which input source(s) each came from. Include both atomic measures (SUM, COUNT, AVG) and composed measures (ratios, rates, filtered measures built using
MEASURE()composability). - Candidate global filters — persistent WHERE conditions that should always apply (e.g., exclude historical data before a cutoff date, exclude cancelled/test records). Look for common WHERE clauses in dashboard queries, SQL files, and Genie SQL query instructions.
- Metadata inventory — catalog/schema comments, table/column comments, Genie column descriptions, UC tags, partition/clustering keys, and dashboard parameters. These inform
comment,display_name, andsynonymsfields on every dimension and measure. - Cross-source insights — patterns discovered by combining sources (e.g., "Dashboard uses
SUM(amount)which maps to KPI 'Total Revenue'; Genie users ask about this as 'total sales'").
Present findings to the user in a summary table. If multiple sources contributed to the same dimension or measure, note the provenance (e.g., "Region — from schema column + dashboard filter + Genie sample question").
STOP — wait for the user to acknowledge the analysis before proceeding to suggestions.
Step 3: Suggest Metric Views
Based on your analysis, suggest metric views that would provide value. This step checks for overlap with existing metric views, builds suggestions from all gathered metadata, runs a gap analysis, then saves and presents suggestions.yaml for the user to approve, edit, or extend.
Follow the full procedure in references/step-3-suggest-metric-views.md — it covers the overlap-detection logic (coverage scoring + extend / replace / create-alongside / skip), the per-field suggestion checklist, the gap analysis, the suggestions.yaml format, the timestamped output-folder convention, and how to handle the user's response.
STOP — wait for the user to respond before proceeding. Do NOT generate YAML definitions until the user approves the suggestions or provides an updated file. (Auto-create mode: proceed without waiting, but still resolve any 40–69% overlap by asking.)
Step 4: Create Metric View Definitions
For each approved metric view, generate the full CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML AS $$ ... $$ definition, save it into the run folder, and present it to the user.
Follow the full procedure in references/step-4-create-definitions.md — it covers the CREATE-statement template, the critical YAML rules (version, composability, snowflake dot-chain, MEASURE() backticking, no format blocks, DATEDIFF), the prefer-joins-then-SQL-source fallback strategy, and the local SQL-file saving convention.
STOP — wait for the user to review the generated definitions before proceeding.
Step 5: Materialization (optional — ask before deploy)
Before deployment, ask the user whether they want to add materialization to any of the metric view definitions. Do NOT auto-decide. Materialization is configured as part of the YAML definition, so it must be decided before deploying.
"Before I deploy, would you like to add materialization to pre-compute aggregations for faster queries?
# Option 1 No materialization — queries run on live data (default, simplest) 2 Yes, add materialization — I'll help you configure pre-computed aggregations Materialization is useful when metric views are queried frequently, source tables are large, or you want sub-second responses. It requires serverless compute and incurs Lakeflow Declarative Pipelines charges."
STOP — wait for the user's response.
If the user chooses 1: Skip to Step 6.
If the user chooses 2: Walk through materialization configuration:
- 5a. Select which metric views to materialize (present a numbered list; accept
1,1, 2, orall). STOP — wait. - 5b. Configure type for each — Aggregated (pick dimension/measure combos), Unaggregated (full data model), or Both. STOP — wait. For Aggregated/Both, suggest the most likely dimension/measure combinations based on what appeared most across input sources, then STOP — wait.
- 5c. Set refresh schedule —
every 1 hour/every 6 hours/every 24 hours/ custom. If table properties revealed arefresh_frequency, note that a faster schedule won't provide fresher data. STOP — wait. - 5d. Update definitions with the
materialization:block (see the Materialization — Additional Detail section in references/yaml-reference.md and the Materialized Metric View pattern in the parentdatabricks-metric-viewsskill), update the saved SQL files, and re-display the final YAML.
Step 6: Deploy
Ask the user if they want to deploy:
# Option 1 Deploy now — I'll create the metric views (includes materialization if configured) 2 Review only — you already have the SQL files; you'll deploy manually later
STOP — wait for the user's response before deploying.
Deploy each metric view by submitting its saved <metric_view_name>.sql file (written in Step 4) with databricks experimental aitools tools statement submit --file <metric_view_name>.sql --warehouse <warehouse_id>, then confirming success with statement get <statement_id> (see cli-operations.md — long DDL goes through the file-based statement path, not the inline query tool, to avoid heredoc/JSON escaping issues). If the user opted out of saving SQL files in Step 1, write the statement to a temporary .sql file first. If the user chose "Replace" for any overlap in Step 3, drop the old view after deploying the new one (DROP VIEW IF EXISTS <old_view>). If they chose "Extend", the view is deployed under the existing name via CREATE OR REPLACE.
After creation, verify each metric view with a test query (one dimension + one measure, LIMIT 5). Report any errors and help fix them:
| Error | Cause | Fix |
|---|---|---|
UNRESOLVED_COLUMN |
Snowflake join missing parent prefix | Full dot-chain: customer.nation.n_name |
PARSE_SYNTAX_ERROR |
Unquoted multi-word MEASURE() name | Add backticks: MEASURE(`Total Revenue`) |
METRIC_VIEW_INVALID_VIEW_DEFINITION |
format block present |
Remove all format blocks |
DATATYPE_MISMATCH |
Date subtraction instead of DATEDIFF | Use DATEDIFF(date1, date2) |
SCHEMA_NOT_FOUND |
Target schema does not exist | CREATE SCHEMA IF NOT EXISTS <catalog>.<schema>, or use a different target |
TABLE_OR_VIEW_NOT_FOUND |
Source/joined table dropped or renamed | Verify with SHOW TABLES IN <catalog>.<schema> LIKE '<table>' and fix the reference |
INSUFFICIENT_PRIVILEGES |
Missing CREATE VIEW or USE SCHEMA on target |
GRANT CREATE TABLE, USE SCHEMA ON SCHEMA <schema> TO <principal> (least privilege), or use a schema the user owns |
If materialization was configured, also tell the user how to trigger a manual refresh (REFRESH MATERIALIZED VIEW <name>), check status (DESCRIBE EXTENDED <name>), verify query rewrite (EXPLAIN EXTENDED <query> — look for __materialization_mat___metric_view), and that refreshes incur Lakeflow Declarative Pipelines charges.
STOP — wait for the user to confirm deployment results are satisfactory before proceeding.
Step 7: Show Sample Queries
CRITICAL — Metric View Query Syntax. Metric views are NOT regular SQL views. Every query MUST use both MEASURE() and GROUP BY together:
SELECT
`Dimension Name`,
MEASURE(`Measure Name`) AS `Measure Name`
FROM catalog.schema.metric_view
GROUP BY ALL
ORDER BY `Dimension Name`;
MEASURE()wrapper — every measure column MUST be wrapped, or you getMETRIC_VIEW_MISSING_MEASURE_FUNCTION.GROUP BY— dimensions MUST appear in aGROUP BY(useGROUP BY ALL), or you getMISSING_GROUP_BY.SELECT *is NOT supported on metric views.
For each created metric view, generate 3-5 sample queries demonstrating: basic aggregation (one dim, two measures); multi-dimension slice; filtered query; time trend (if a date dimension exists); and Top-N (ORDER BY measure DESC LIMIT 10). Backtick-quote names with spaces, use GROUP BY ALL, and alias each MEASURE() call.
Execute each sample query to verify it works and show the results. Save each metric view's queries as <metric_view_name>_sample_queries.sql in the run folder (default: yes, unless the user opted out).
STOP — wait for the user to acknowledge before presenting next steps.
Next Steps (suggestions)
- Grant access:
GRANT SELECT ON VIEW <metric_view> TO <principal>to share with teams - Add to a Genie space: metric views work natively with AI/BI Genie for natural language querying
- Add to AI/BI dashboards: use as datasets for visualizations
- Set up SQL alerts: threshold-based alerts on measures
- BI tools / JDBC: metric views are accessible via the Databricks JDBC driver and BI connectors
- Compose metric views: use an existing metric view as the source for a new one — layered metrics
- Inspect with metadata:
DESCRIBE TABLE EXTENDED <metric_view> AS JSONfor the full definition - Set PK/FK constraints with RELY on underlying tables for optimal join performance
Important Notes
- DBR 17.2+ required for YAML v1.1 metric views
- SELECT * is NOT supported — must explicitly list dimensions and use MEASURE()
- Querying requires BOTH
MEASURE()ANDGROUP BY— using only one causes an error (see Step 7) - MEASURE() cannot use OVER clause — no window function usage on measures
- Window measures (running totals, period-over-period) require
version: 0.1— only suggest these if the user specifically asks - Joins must be many-to-one — in many-to-many cases, only the first match is used
- Joins are LEFT OUTER JOIN — dimension rows without fact matches are excluded; fact rows without dimension matches get NULLs
- Source can be a SQL query — e.g.,
(SELECT * FROM table WHERE active = true)— but joins are NOT supported with SQL query sources - Always add
commentandsynonymsfields — they power Genie's natural language understanding - Prefer fewer, richer metric views over many narrow ones
Limitations
- No Delta Sharing — metric views cannot be shared via Delta Sharing
- No data profiling — data profiling is not supported on metric views
- ALTER VIEW removes UC comments — unless
commentfields are explicitly in the YAML - Materialization is experimental — only
relaxedmode supported; requires serverless compute