name: databricks description: Query Databricks, explore Unity Catalog, and manage AI/BI dashboards. Triggered by "databricks", "check databricks", "query data", "run query", or "check databricks". allowed-tools: - read - write - edit - bash - glob
Databricks
Query Zapier's Databricks warehouse, explore Unity Catalog, and manage AI/BI dashboards via a local CLI tool.
Config
| Key | Value |
|---|---|
| Skill directory | ~/.config/opencode/skills/databricks |
| Host | https://dbc-37d560c2-40fd.cloud.databricks.com |
| Warehouse ID | 24ca15a1207d8b4a |
| Token page | https://dbc-37d560c2-40fd.cloud.databricks.com/settings/user/developer/access-tokens |
Requires Zapier VPN (Viscosity) for all commands.
Credentials & CLI
The CLI tool and .env credentials both live in this skill directory (~/.config/opencode/skills/databricks). Self-contained — no external repo dependency.
If the .env is missing or the token is expired, tell James — do NOT create it or guess the token value.
Setup (one-time)
Before first use, check that deps are installed:
# Install deps if node_modules is missing
pnpm install # workdir: ~/.config/opencode/skills/databricks
Running Commands
All commands are run from this skill directory. The .env is loaded automatically by dotenv since it's in the working directory.
pnpm run databricks <command>
When using the Bash tool, always set workdir to /Users/jbaldwin/.config/opencode/skills/databricks.
Commands Reference
Authentication
pnpm run databricks whoami
Unity Catalog Exploration
# List catalogs
pnpm run databricks catalogs
# List schemas in a catalog
pnpm run databricks schemas <catalog>
# Example: pnpm run databricks schemas public
# Example: pnpm run databricks schemas production_refined
# List tables in a schema
pnpm run databricks tables <catalog> <schema>
# Example: pnpm run databricks tables public db_zapier
# Example: pnpm run databricks tables production_refined events
# Describe a table (full name: catalog.schema.table)
pnpm run databricks describe <catalog.schema.table>
# Example: pnpm run databricks describe public.fact_zap_usage
# Example: pnpm run databricks describe production_refined.db_zapier.flow_node
SQL Queries
# Run a query (default output: markdown table)
pnpm run databricks query "SELECT COUNT(*) FROM public.dim_account_current"
# Output formats: table (default), json, jsonl, csv
pnpm run databricks query "SELECT * FROM public.dim_plan LIMIT 5" -f json
# Save to file
pnpm run databricks query "SELECT * FROM public.dim_plan" -o results.json -f json
# Read SQL from file
pnpm run databricks query -i query.sql -o results.csv -f csv
# Custom timeout (default 50s)
pnpm run databricks query "SELECT * FROM large_table" -t 120
Async Queries (long-running)
# Submit and get statement ID back immediately
pnpm run databricks query-async "SELECT * FROM huge_table"
# Check status
pnpm run databricks status <statement-id>
# Fetch results when done
pnpm run databricks results <statement-id> -f json
Sample Data
# Quick sample (default 10 rows)
pnpm run databricks sample public.dim_account_current
# Custom row count
pnpm run databricks sample public.fact_zap_usage -n 50
# Different format
pnpm run databricks sample public.dim_plan -n 5 -f json
SQL Warehouses
pnpm run databricks warehouses
AI/BI Dashboards
# List dashboards
pnpm run databricks dashboards
# Get dashboard details (datasets, SQL, charts)
pnpm run databricks dashboard <dashboard-id>
# Create dashboard with a chart
pnpm run databricks dashboard-create \
-n "Dashboard Name" \
-s "SELECT date, COUNT(*) as count FROM table GROUP BY date" \
-c line -x date -y count
# Add chart to existing dashboard
pnpm run databricks dashboard-add-chart <dashboard-id> \
-n "Chart Name" \
-s "SELECT month, SUM(revenue) as total FROM sales GROUP BY month" \
-c bar -x month -y total
# Update SQL for a dataset
pnpm run databricks dashboard-update-sql <dashboard-id> \
-d main_dataset -s "SELECT new_query FROM ..."
# Publish (make viewable)
pnpm run databricks dashboard-publish <dashboard-id>
# Delete (moves to trash)
pnpm run databricks dashboard-delete <dashboard-id>
Chart types: bar, line, area, scatter, pie, table, counter
Output Formats
| Format | Flag | Use case |
|---|---|---|
table |
-f table (default) |
Viewing in terminal, markdown tables |
json |
-f json |
Pretty-printed, good for inspection |
jsonl |
-f jsonl |
Large exports, one object per line |
csv |
-f csv |
Spreadsheet-friendly |
Workflow
Exploring data James doesn't know the shape of
- Start with
catalogsto see what's available - Drill into
schemas <catalog>for a relevant catalog - List
tables <catalog> <schema>to find relevant tables describe <catalog.schema.table>to see columns and typessample <table> -n 5to see actual data- Write and run the actual query
Answering a data question
- If James names specific tables, skip to querying
- If not, explore the catalog to find relevant tables
- Describe tables to understand columns
- Sample a few rows to understand data shape
- Write the SQL query
- Run it, present results clearly
- If James wants a dashboard, create one from the query
Building a dashboard
- Get or write the SQL query first
- Run the query to verify it works
dashboard-createwith the query and appropriate chart typedashboard-publishto make it viewable- Share the URL with James
Large/slow queries
- Use
query-asyncto submit without blocking - Poll
status <id>until SUCCEEDED - Fetch with
results <id> -f json - Or just increase timeout:
query "..." -t 300
Key Zapier Catalogs
| Catalog | Description |
|---|---|
public |
Main analytics tables (dim/fact tables) |
production_refined |
Refined production data |
Common tables to know about:
public.dim_account_current— Account dimensionpublic.dim_plan— Plan informationpublic.fact_zap_usage— Zap usage facts
Use describe to find more — don't guess column names.
Troubleshooting
| Error | Fix |
|---|---|
| "DATABRICKS_HOST and DATABRICKS_TOKEN environment variables are required" | .env missing from skill directory. James needs to create it. |
| "Databricks API error (401)" | Token expired. James needs a new one from the token page. |
| "Databricks API error (403)" | Check VPN is connected. Check token permissions. |
| Query timeout | Increase timeout with -t 300 or use query-async. |
| Connection refused / network error | VPN not connected (Viscosity). |