bootstrap

star 23

Bootstrap a semantic layer from a database. Use when the user wants to create .view.yml files from their warehouse schema, or when starting a new airlayer project from scratch.

oxy-hq By oxy-hq schedule Updated 6/10/2026

name: bootstrap description: Bootstrap a semantic layer from a database. Use when the user wants to create .view.yml files from their warehouse schema, or when starting a new airlayer project from scratch.

Bootstrap a Semantic Layer

You are bootstrapping a semantic layer for airlayer. This means discovering what's in the user's database and generating .view.yml files that define dimensions, measures, and entities.

Prerequisites

The user needs a config.yml with database connection details. If they don't have one, help them create it. The format is:

databases:
  - name: <name>
    type: <postgres|snowflake|bigquery|duckdb|motherduck|gsheets|mysql|clickhouse|databricks|sqlite>
    # ... connection fields vary by type (see docs/agent-execution.md)

MotherDuck example:

databases:
  - name: cloud
    type: motherduck
    token_var: MOTHERDUCK_TOKEN
    database: my_db

Google Sheets example (each entry under sheets becomes a queryable table):

databases:
  - name: sheets
    type: gsheets
    token_var: GSHEET_TOKEN              # Google OAuth access token
    # key_file: ./service-account.json   # or a service-account JSON key
    sheets:
      orders: https://docs.google.com/spreadsheets/d/SPREADSHEET_ID
      customers:                          # detailed form selects a tab/range
        url: https://docs.google.com/spreadsheets/d/SPREADSHEET_ID
        sheet: Customers
        range: A1:F500

airlayer must be built with executor support: cargo build --features exec (or a specific driver like exec-postgres).

Workflow

Step 1: Introspect the schema

Run schema introspection to discover all tables, columns, and types:

airlayer inspect --schema --config <config.yml>

Optionally filter to a specific schema/dataset:

airlayer inspect --schema <schema_name> --config <config.yml>

This returns JSON with every table and column. Read the output carefully — it's your source of truth for what's available.

Step 2: Ask the user which tables to model

Present the discovered tables to the user and ask which ones they want in the semantic layer. Don't model everything — focus on the tables they care about for analytics.

Step 3: Generate .view.yml files

For each selected table, create a .view.yml file in the project directory. Follow these rules:

Dimensions (attributes to group/filter by):

  • String columns → type: string
  • Date columns → type: date
  • Datetime/timestamp columns → type: datetime
  • Boolean columns → type: boolean
  • Numeric columns used for grouping (IDs, codes) → type: string or type: number

Measures (aggregations):

  • Row count → type: count (no expr needed)
  • Unique counts → type: count_distinct with expr: <column>
  • Sums → type: sum with expr: <column>
  • Averages → type: average with expr: <column>
  • Computed measures → type: sum with expr: "quantity * price" etc.

Entities (join keys):

  • Primary keys → type: primary, key: <column>
  • Foreign keys → type: foreign, key: <column>
  • Name entities after the concept they represent (e.g., customer, order), not the column name

Naming conventions:

  • name: should be snake_case, semantic (e.g., total_revenue not sum_amount)
  • expr: is the raw SQL expression — reference actual column names from the schema
  • description: add for any non-obvious measures or computed fields

Example view:

name: orders
description: "Sales orders with customer and product data"
dialect: postgres
datasource: warehouse
table: public.orders

entities:
  - name: customer
    type: foreign
    key: customer_id

dimensions:
  - name: order_id
    type: string
    expr: order_id

  - name: status
    type: string
    expr: status

  - name: created_at
    type: datetime
    expr: created_at

measures:
  - name: order_count
    type: count

  - name: total_revenue
    type: sum
    expr: amount

Step 4: Profile dimensions

After creating views, profile them to verify the data looks right:

# Profile all dimensions in a view
airlayer inspect --profile <view_name> --config <config.yml>

# Profile a single dimension
airlayer inspect --profile <view_name>.<dimension_name> --config <config.yml>

Review the profile output:

  • String dimensions: Check cardinality and values — are they what you'd expect?
  • Number dimensions: Check min/max/mean — do the ranges make sense?
  • Date dimensions: Check the date range — is it current data?

Step 5: Test with queries

Run a few test queries to validate the semantic layer:

airlayer query --execute --config <config.yml> \
  --dimension <view>.<dim> --measure <view>.<measure>

Check the envelope:

  • status: "success" → the view works
  • sql → does the generated SQL look correct?
  • data → are the values reasonable?

Step 6: Iterate

If something's wrong:

  • Wrong column name in expr → fix the expr, re-run
  • Missing measure → add it to the view, re-run
  • Bad aggregation type → change the measure type
  • Need joins → add entities to both views, airlayer infers JOINs automatically

Important notes

  • The dialect field must match the database type (postgres, snowflake, bigquery, duckdb, motherduck, gsheets, mysql, clickhouse, databricks, redshift, sqlite)
  • MotherDuck and Google Sheets use the duckdb dialect — set dialect: duckdb in views that target them
  • The datasource field must match a database name in config.yml
  • The table field is the actual table name in the database (can be schema-qualified like public.orders)
  • All views in a query must use the same dialect
  • Entity names must match across views for joins to work (e.g., both views declare entity customer)
Install via CLI
npx skills add https://github.com/oxy-hq/airlayer --skill bootstrap
Repository Details
star Stars 23
call_split Forks 5
navigation Branch main
article Path SKILL.md
More from Creator