name: gen-metrics description: Generate MetricFlow metrics from natural language business descriptions tags: - metrics - metricflow version: "1.2.0" user_invocable: false disable_model_invocation: false allowed_agents: - gen_metrics
Generate Metrics Skill
Guide the user through metric generation using natural language business descriptions.
Phase 0: Discovery — Scan Existing Assets
Before anything else, call list_metrics() to get all metrics already in the knowledge base. Build an existing metric catalog JSON array with each metric's exact name, type, description when available, and subject_path when available. Use this throughout the remaining phases to:
- Skip redundant work — don't recreate metrics that already exist
- Reuse existing measures — reference measures from existing models instead of creating duplicates
- Detect conflicts — warn the user if a proposed metric name collides with an existing one
- Enable derived/ratio metrics — know which metrics can serve as building blocks for more complex definitions
Only inspect and edit semantic model YAML files under the current datasource directory shown in the system prompt, such as subject/semantic_models/<current_datasource>/.... Do not reuse or sync YAML files from sibling datasource directories; those files are outside the active MetricFlow adapter scope.
Phase 1: Understand Intent
Analyze the user's request and confirm the generation scope before proceeding. When ask_user is available, call it to confirm the metric name(s), business meaning, and calculation logic. When ask_user is not available (for example workflow or batch mode), infer from the provided SQL/request and stop only if the scope is materially ambiguous.
Input Mode Detection
- Single mode: User describes one metric or provides one SQL → follow Step 1a–1d below
- Batch mode: User provides multiple SQL queries (pasted directly, or a CSV file path containing
question+sqlcolumns) → follow Step 1-batch below
Single Mode: Step 1a–1d
Step 1a: Inspect the table — Call describe_table(table_name) to understand the columns and types. Optionally call read_query to sample data.
Step 1b: Ask for reference SQL (optional) — When ask_user is available, use it to ask:
"Do you have any existing SQL queries for this table that show the aggregations you care about? You can paste them here, or skip if not available."
When ask_user is not available, skip this question and infer SQL/aggregation context from the user's request, attached files, or discovered query/table evidence. If that is not enough, stop and explain the missing information instead of calling ask_user.
If the user provides SQL, parse it to extract:
- Final business output expressions (e.g.,
SUM(amount) / COUNT(DISTINCT user_id) AS arppu→ candidate metricarppu) - Aggregation functions + columns that the final metric depends on (e.g.,
SUM(amount)→ candidate measuretotal_amount,COUNT(*)→ candidate measurerecord_count) - GROUP BY columns → recommended dimensions
- WHERE conditions → potential metric constraints
If the provided SQL contains no metric-producing output, keep filter-only or detail-query evidence as filters, dimensions, segments, or view evidence instead of generating fake metrics.
If the user skips, proceed to Step 1c using only table structure and the user's description.
Step 1c: Propose metric candidates — Based on the table structure, reference SQL (if provided), and user's request, identify potential metric scenarios. See "Metric type detection rules" below.
Step 1d: Confirm scope — when ask_user is available, call it to confirm and present proposed metrics with multi_select: true (see Step 1-batch-d for format). If ask_user is not available, proceed with the confirmed/inferred scope from the input.
Batch Mode: Step 1-batch
Step 1-batch-a: Parse SQL queries
- The input may contain multiple SQL queries in various forms:
- Direct paste: multiple SQL statements in the prompt
- File path: user provides a path — call
read_fileto load it, then parse by file type:.sql: split by;or blank-line separators to extract individual statements.csv/.tsv: identify the SQL column by header name (common names:sql,query,SQL,statement) or by content heuristic (column values contain SQL keywords likeSELECT,FROM,GROUP BY). The description/question column is any remaining text column. If column roles are ambiguous, callask_userwhen available to confirm which column is SQL; otherwise stop and explain the missing column mapping.- Other formats: call
ask_userwhen available to clarify the file structure before proceeding; otherwise stop and explain the supported file formats or required structure.
- Parse all SQL queries from the input
- Call
describe_tablefor each unique table found in the SQL queries
Step 1-batch-b: Mine metric candidates from SQL ASTs
Call analyze_metric_candidates_from_history with all parsed SQL queries and existing_metric_catalog_json from Phase 0. Use its output to preserve final business metric expressions and their dependencies:
- Preserve final output metrics — SQL aliases and final SELECT expressions are the primary metric candidates.
- Keep base measures as dependencies — base measures support the final metric but do not replace it.
- Deduplicate by business metric — merge repeated aliases/normalized expressions across SQL files while preserving source evidence.
- Separate non-metrics — filter-only/detail SQL belongs in
non_metric_evidence, not metric YAML. - Respect modeling classifications — if
query_classificationismetric_plus_derived_datasourceorderived_datasource_recommendationsis non-empty, do not generate a direct metric fromblocked_direct_metric_candidates; first model the recommendedsql_querydata source or materialized view, then define metrics on that data source. - Choose business-safe names — if a candidate has
requires_name_translation: true, treatnameas a technical fallback only. Also inspect everysource_alias: when the alias appears generated or lacks business meaning, do not use it as the final MetricFlow name. In interactive mode, ask the user to confirm if the business meaning is unclear; in batch/bootstrap mode, infer a clear English snake_case name from the SQL expression, question, and table/column context without stopping. - Preserve SQL literal values — if
literal_mappingsis present, keep the literalvalueexactly as it appears in SQL predicates/CASE/sql_query output. Only MetricFlow object names may be translated or normalized. - Preserve SQL time grain — if
time_grain_evidenceis present, expose an equivalent time dimension in any derived data source. Do not replace a projected date such asCURDATE() AS part_dtorDATE(create_time) AS part_dtwith rawcreate_timeas the primary time dimension. Definetype: TIMEonly for physical DATE/TIME/TIMESTAMP columns or SQL expressions /sql_queryaliases guaranteed to return DATE/TIME/TIMESTAMP values. Numeric surrogate keys such as*_date_sk,*_date_key,*_dt_key, or integer YYYYMMDD keys must be identifiers or categorical dimensions unless converted to a real date. - Preserve post-aggregation constraints — if
post_aggregation_constraintsis present, keep each HAVING/post-aggregation condition as a query constraint, metric usage note, or later derived data source. Do not silently drop it or push it into a base measure. - Cross-reference with Phase 0 — remove any candidate that already exists in the knowledge base.
- Separate derived metrics — treat
derived_metric_candidatesas second-stage metrics over existing metrics. Do not mix them into base semantic model or measure generation. - Ignore passthrough references — entries in
identity_metric_referencesshow existing metrics selected without new business formula; do not generate new metrics for them. - Do not promote support measures — a SELECT projection that only supports another final KPI, such as a denominator, row count, or intermediate aggregation, may be added as a semantic-model measure. Do not also wrap it as a top-level business metric unless the user question or candidate plan identifies it as a final KPI.
- Respect
support_measure_candidates— these are dependency or comparison measures, not direct metrics. You may add them to a semantic model only if a generated metric needs them, but do not publish ametric:block for them.
Step 1-batch-c: Business metric principle
From N SQL queries, propose a focused set of business metrics. Ask yourself for each candidate:
- Is this a final output a business user would recognize as a KPI?
- Are its base measures complete enough to validate and dry-run?
- Should the evidence be a metric, or only a filter/dimension/segment/view definition?
- Is this alias only a supporting count/sum used by another final output? If yes, create or reuse the measure but do not publish a separate metric for it.
- Does the tool say the metric depends on a ranked/windowed CTE or other derived data source? If yes, generate the derived data source first instead of forcing a direct metric.
- Are SQL literals, output time grain, and HAVING/post-aggregation constraints preserved from the tool evidence?
Step 1-batch-d: Confirm with the user when possible
- When
ask_useris available, present the mined business metric candidates as options withmulti_select: true - Pass
questionsas an actual array argument, not a JSON string. Example tool arguments:{ "questions": [ { "title": "Metrics", "question": "I analyzed N SQL queries and identified the following metric candidates. Select which ones to generate:", "options": ["paid_arppu - SUM(paid_amount) / COUNT(DISTINCT user_id)", "gross_margin_rate - (SUM(revenue) - SUM(cost)) / SUM(revenue)"], "multi_select": true } ] } - Clearly show how many SQL queries were analyzed, how many metric candidates were extracted, and which candidates were skipped as non-metric evidence.
- When
ask_useris not available, proceed with the mined metrics only if the input makes the scope unambiguous; otherwise stop and explain what needs to be provided.
Metric type detection rules
- Simple counting + filter: "How many completed orders" → conditional measure in the semantic model +
measure_proxymetric referencing that measure by string - Aggregation + filter: "Total revenue from premium customers" → conditional measure in the semantic model +
measure_proxymetric referencing that measure by string - Ratio: "Order completion rate", "Refund rate", "Revenue share", "Revenue per user" →
ratiotype - Expression: "Gross profit", "Gross margin rate" →
exprtype combining measures - Derived: "ROAS over existing revenue and ad_spend metrics" →
derivedtype combining metrics - Cumulative: "Running total of revenue", "MTD sales", "Year-to-date signups" →
cumulativetype
Detection keywords:
- "running total", "MTD", "YTD", "cumulative", "to-date" → cumulative
- "rate", "ratio", "percentage of", "share of" → ratio
- "per", "divided by", "average ... per" → ratio or expr depending on expression shape
- "list all...", "show me the..." → not a metric, better suited for
gen_sql
IMPORTANT: Do NOT proceed to Phase 2 with materially ambiguous scope. Use ask_user when available; otherwise stop and explain what information is needed.
Phase 2: Ensure Semantic Model Exists
For each table involved in the metric:
2a. Check Existing Model
- Call
check_semantic_object_exists(name="{table_name}", kind="table")to check if a semantic model exists. - If the semantic model exists:
- Use
read_fileto read the existing semantic model YAML - Verify that it contains the measures and dimensions needed for this metric
- If missing measures/dimensions, use
edit_fileto add them, thenvalidate_semantic
- Use
2b. Create Missing Model
If the semantic model is missing, follow the gen-semantic-model workflow when that skill is available. In brief: inspect table structure with describe_table, discover joins with analyze_table_relationships when multiple tables are involved, use analyze_column_usage_patterns for likely measures and dimensions, write the semantic model YAML under the semantic model directory shown in the system prompt, then run validate_semantic and fix issues until it passes before continuing.
2c. Multi-Table / JOIN SQL Modeling
When the metric involves multiple tables (detected from JOIN in SQL or user description), choose the modeling strategy based on SQL complexity:
Strategy A: Identifier-based JOIN (default — use when possible)
Use when: simple equi-JOIN between 2-3 tables via foreign keys, ≤ 2 JOIN hops.
- Each table gets its own
data_sourcewithsql_table - Tables are linked via matching
identifiers(samename, one PRIMARY, one FOREIGN) - Use
analyze_table_relationshipsresults to set up correct identifier linkages - Example:
orders.customer_id(FOREIGN) links tocustomers.customer_id(PRIMARY) — both identifiers sharename: customer - MetricFlow engine automatically resolves the JOIN path at query time
Strategy B: sql_query pre-joined data source (complex cases)
Use when: non-equi JOINs, > 2 hop joins, subqueries, LATERAL/CROSS joins, complex ON conditions, or window functions in the JOIN.
- Create a single
data_sourcewithsql_querycontaining the pre-joined SQL - Flatten the result: measures and dimensions reference the output columns directly
- Example:
data_source: name: order_customer_summary sql_query: > SELECT o.order_id, o.amount, o.order_date, c.name as customer_name, c.segment FROM schema.orders o JOIN schema.customers c ON o.customer_id = c.id measures: - name: total_revenue agg: SUM expr: amount dimensions: - name: customer_name type: CATEGORICAL - name: order_date type: TIME type_params: is_primary: true time_granularity: DAY - Trade-off: dimensions from the pre-joined query are NOT reusable by other data sources (no identifier linkage). Only use this when Strategy A cannot handle the complexity.
Decision rule: Default to Strategy A. Switch to Strategy B only if the JOIN cannot be expressed as simple identifier matching (e.g., composite keys, non-equi conditions, 3+ hop joins, or subquery-based logic).
Phase 3: Generate and Validate
File paths: All write_file / edit_file / read_file calls use paths relative to the filesystem sandbox root. Always use the semantic model directory shown in the system prompt so subsequent reads find the file. For example:
- Semantic model:
subject/semantic_models/<current_datasource>/{table_name}.yml - Metric file:
subject/semantic_models/<current_datasource>/metrics/{table_name}_metrics.yml
Bare filenames are silently normalized by the host, but the prefixed form is preferred for clarity. Absolute paths are also tolerated.
Do not read, edit, or pass metric_file / semantic_model_files paths from another datasource directory such as subject/semantic_models/other_datasource/....
Check existing: Call
check_semantic_object_exists(name="{metric_name}", kind="metric")for each metric confirmed in Phase 1. If it already exists, inform the user and skip it.Write metric YAML: Use
write_fileto save each metric definition tosubject/semantic_models/<current_datasource>/metrics/{table_name}_metrics.yml.- For
measure_proxy, keeptype_params.measureas a string measure name. - For filtered metrics, add a dedicated conditional measure to the semantic model first, then reference that measure from the metric YAML.
- Each generated metric must be an explicit named top-level
metric:YAML document. Do not emit unnamedmetric:blocks or wrap metrics inside another object.
- For
Validate (MUST PASS): Call
validate_semanticto check the metric YAML.- If validation fails, fix errors with
edit_fileand retry until it passes. - Do NOT proceed to Phase 4 until validation passes. No exceptions.
- If validation fails, fix errors with
Dry-run SQL: Call
query_metrics(metrics=["{metric_name}"], dry_run=True)to generate the SQL.- If the source SQL groups by dimensions or a time grain, also dry-run the generated metric set with matching
dimensions/time_granularityfrom that source query. - Use
get_dimensionsto find exact generated dimension names; if a grouped source dimension cannot be queried, fix the semantic model joins/dimensions and retry. - Collect the SQL into a dict:
{"{metric_name}": "SELECT ..."}
- If the source SQL groups by dimensions or a time grain, also dry-run the generated metric set with matching
Phase 4: Batch Sync to Knowledge Base
After all generated metrics have passed validation and dry-run:
- Collect all generated metrics and their dry-run SQLs into
metric_sqls_json - You MUST call
end_metric_generation(metric_file, semantic_model_files, metric_sqls_json)ONCE to sync them to Knowledge Base while you can still fix publish errors - Do not rely on the final JSON host fallback. The host fallback is only a last-resort guard when the tool call was accidentally missed.
- If no metrics were generated, do NOT call
end_metric_generation
Phase 1 confirms the generation scope; validation plus dry-run are the acceptance gate before syncing.
Common Pitfalls (MUST avoid)
Explicit metric files: Write explicit metric YAML files under the semantic model directory's
metrics/subdirectory instead of relying oncreate_metric: true. Runtime-generated metrics are not part of the persisted metric catalog.Metric name must match measure name: For a
measure_proxymetric, the metric name should typically equal the measure name (or be a clear derivative). Thetype_params.measuremust exactly match a measure name from the semantic model. Do NOT invent unrelated names (e.g., measureactivity_count→ metric name should beactivity_count, NOTtotal_activity_countoractivity_count_metric).Filtered metrics: Model reusable filter logic as a conditional measure in the semantic model, such as
expr: "CASE WHEN status = 'completed' THEN 1 ELSE 0 END"withagg: SUM, then writetype_params.measure: completed_order_countin the metric YAML.Check before creating: ALWAYS call
check_semantic_object_exists(name="{metric_name}", kind="metric")before writing a new metric. If the metric already exists, skip it.Verify names after validation: After
validate_semanticsucceeds and the adapter reloads, calllist_metricsto see the exact metric names available. Use these exact names when callingquery_metrics.Every metric needs explicit YAML: Whether it's a simple aggregation, filtered variant, ratio, expr, derived, or cumulative — write a
metric:entry in the metrics YAML file so it can be persisted and discovered later.Derived metrics are second-stage: Generate non-derived metrics first, validate them, refresh the metric catalog with
list_metrics, then generatederived_metric_candidatesonly when every referenced metric exists in the refreshed catalog or was generated earlier in the same batch.Support measures are not always metrics: Add support measures needed for ratios, expressions, filters, and validation, but do not publish each support measure as a separate metric unless it is itself a requested/final business KPI.
Important Rules
- Phase 1: Confirm which metrics to generate before proceeding. Use
ask_userwhen it is available. - Validation MUST pass — always call
validate_semanticand ensure it passes before proceeding to the next phase. If it fails, fix and retry until it passes. - Sync automatically after validation — once validation and dry-run pass, call
end_metric_generationwithout another user confirmation. The final JSONmetric_fileis only a last-resort fallback. - COUNT agg must use
expr: "1"— never useexpr: {column}with COUNT (use COUNT_DISTINCT for that). - For ratio metrics, both numerator and denominator measures must exist in the semantic model.
- For expr metrics, all referenced measures must exist in the semantic model.
- For derived metrics, all referenced metrics must already be defined, the expression must not be a single metric passthrough, and the dependency graph must not contain cycles.
- For cumulative metrics, the measure must exist and a primary time dimension must be defined.
- Use consistent naming: metric names in snake_case, measure names matching the semantic model.
- Every metric data_source needs a primary time dimension when a reliable DATE/TIME/TIMESTAMP column or expression exists. Do not force a primary TIME dimension from numeric surrogate keys; join/convert to a real date first.
- Measure names must be globally unique across all data sources.
- For snapshot/balance data, always add
non_additive_dimensionto prevent incorrect time aggregation. - Keep files scoped — only write semantic model YAML and metric YAML files. Sync metrics through
end_metric_generation; the final JSONmetric_fileis only a last-resort fallback.