databricks

name: databricks description: Query Databricks, explore Unity Catalog, and manage AI/BI dashboards. Triggered by "databricks", "check databricks", "query data", "run query", or "check databricks". allowed-tools: - read - write - edit - bash - glob

Query Zapier's Databricks warehouse, explore Unity Catalog, and manage AI/BI dashboards via a local CLI tool.

Config

Key	Value
Skill directory	`~/.config/opencode/skills/databricks`
Host	`https://dbc-37d560c2-40fd.cloud.databricks.com`
Warehouse ID	`24ca15a1207d8b4a`
Token page	`https://dbc-37d560c2-40fd.cloud.databricks.com/settings/user/developer/access-tokens`

Requires Zapier VPN (Viscosity) for all commands.

Credentials & CLI

The CLI tool and .env credentials both live in this skill directory (~/.config/opencode/skills/databricks). Self-contained — no external repo dependency.

If the .env is missing or the token is expired, tell James — do NOT create it or guess the token value.

Setup (one-time)

Before first use, check that deps are installed:

# Install deps if node_modules is missing
pnpm install   # workdir: ~/.config/opencode/skills/databricks

Running Commands

All commands are run from this skill directory. The .env is loaded automatically by dotenv since it's in the working directory.

pnpm run databricks <command>

When using the Bash tool, always set workdir to /Users/jbaldwin/.config/opencode/skills/databricks.

Commands Reference

Authentication

pnpm run databricks whoami

Unity Catalog Exploration

# List catalogs
pnpm run databricks catalogs

# List schemas in a catalog
pnpm run databricks schemas <catalog>
# Example: pnpm run databricks schemas public
# Example: pnpm run databricks schemas production_refined

# List tables in a schema
pnpm run databricks tables <catalog> <schema>
# Example: pnpm run databricks tables public db_zapier
# Example: pnpm run databricks tables production_refined events

# Describe a table (full name: catalog.schema.table)
pnpm run databricks describe <catalog.schema.table>
# Example: pnpm run databricks describe public.fact_zap_usage
# Example: pnpm run databricks describe production_refined.db_zapier.flow_node

SQL Queries

# Run a query (default output: markdown table)
pnpm run databricks query "SELECT COUNT(*) FROM public.dim_account_current"

# Output formats: table (default), json, jsonl, csv
pnpm run databricks query "SELECT * FROM public.dim_plan LIMIT 5" -f json

# Save to file
pnpm run databricks query "SELECT * FROM public.dim_plan" -o results.json -f json

# Read SQL from file
pnpm run databricks query -i query.sql -o results.csv -f csv

# Custom timeout (default 50s)
pnpm run databricks query "SELECT * FROM large_table" -t 120

Async Queries (long-running)

# Submit and get statement ID back immediately
pnpm run databricks query-async "SELECT * FROM huge_table"

# Check status
pnpm run databricks status <statement-id>

# Fetch results when done
pnpm run databricks results <statement-id> -f json

Sample Data

# Quick sample (default 10 rows)
pnpm run databricks sample public.dim_account_current

# Custom row count
pnpm run databricks sample public.fact_zap_usage -n 50

# Different format
pnpm run databricks sample public.dim_plan -n 5 -f json

SQL Warehouses

pnpm run databricks warehouses

AI/BI Dashboards

# List dashboards
pnpm run databricks dashboards

# Get dashboard details (datasets, SQL, charts)
pnpm run databricks dashboard <dashboard-id>

# Create dashboard with a chart
pnpm run databricks dashboard-create \
  -n "Dashboard Name" \
  -s "SELECT date, COUNT(*) as count FROM table GROUP BY date" \
  -c line -x date -y count

# Add chart to existing dashboard
pnpm run databricks dashboard-add-chart <dashboard-id> \
  -n "Chart Name" \
  -s "SELECT month, SUM(revenue) as total FROM sales GROUP BY month" \
  -c bar -x month -y total

# Update SQL for a dataset
pnpm run databricks dashboard-update-sql <dashboard-id> \
  -d main_dataset -s "SELECT new_query FROM ..."

# Publish (make viewable)
pnpm run databricks dashboard-publish <dashboard-id>

# Delete (moves to trash)
pnpm run databricks dashboard-delete <dashboard-id>

Chart types: bar, line, area, scatter, pie, table, counter

Output Formats

Format	Flag	Use case
`table`	`-f table` (default)	Viewing in terminal, markdown tables
`json`	`-f json`	Pretty-printed, good for inspection
`jsonl`	`-f jsonl`	Large exports, one object per line
`csv`	`-f csv`	Spreadsheet-friendly

Workflow

Exploring data James doesn't know the shape of

Start with catalogs to see what's available
Drill into schemas <catalog> for a relevant catalog
List tables <catalog> <schema> to find relevant tables
describe <catalog.schema.table> to see columns and types
sample <table> -n 5 to see actual data
Write and run the actual query

Answering a data question

If James names specific tables, skip to querying
If not, explore the catalog to find relevant tables
Describe tables to understand columns
Sample a few rows to understand data shape
Write the SQL query
Run it, present results clearly
If James wants a dashboard, create one from the query

Building a dashboard

Get or write the SQL query first
Run the query to verify it works
dashboard-create with the query and appropriate chart type
dashboard-publish to make it viewable
Share the URL with James

Large/slow queries

Use query-async to submit without blocking
Poll status <id> until SUCCEEDED
Fetch with results <id> -f json
Or just increase timeout: query "..." -t 300

Key Zapier Catalogs

Catalog	Description
`public`	Main analytics tables (dim/fact tables)
`production_refined`	Refined production data

Common tables to know about:

public.dim_account_current — Account dimension
public.dim_plan — Plan information
public.fact_zap_usage — Zap usage facts

Use describe to find more — don't guess column names.

Troubleshooting

Error	Fix
"DATABRICKS_HOST and DATABRICKS_TOKEN environment variables are required"	`.env` missing from skill directory. James needs to create it.
"Databricks API error (401)"	Token expired. James needs a new one from the token page.
"Databricks API error (403)"	Check VPN is connected. Check token permissions.
Query timeout	Increase timeout with `-t 300` or use `query-async`.
Connection refused / network error	VPN not connected (Viscosity).