name: duckdb description: "Query and explore a local DuckDB warehouse -- list tables, inspect schemas, run SQL, ingest CSV/JSON/Parquet files." metadata: {"openclaw": {"emoji": "🦆", "requires": {"bins": ["duckdb"]}, "tags": ["database", "duckdb", "sql", "query", "data"]}}
DuckDB
You can query and explore a local DuckDB database using the duckdb CLI.
Use this whenever the user asks about data, wants to run SQL, or needs to load files.
For data transformations, use the dbt skill. For SQL code review, use the sql-reviewer skill.
Database Location
The project warehouse lives at userdata/warehouse.duckdb (relative to the project root). Always use this path unless the user specifies a different database file.
Sample Data
Sample CSV files are available in the project at templates/sampledata/:
templates/sampledata/sample_customers.csvtemplates/sampledata/sample_orders.csvtemplates/sampledata/sample_payments.csvtemplates/sampledata/sample_products.csv
To ingest them:
duckdb userdata/warehouse.duckdb -c "CREATE TABLE customers AS SELECT * FROM read_csv_auto('templates/sampledata/sample_customers.csv');"
Or query them directly without loading:
duckdb -c "SELECT * FROM read_csv_auto('templates/sampledata/sample_customers.csv') LIMIT 5;"
Commands
Run SQL against the warehouse using the duckdb CLI. The general pattern is:
duckdb userdata/warehouse.duckdb -c "<SQL>"
List tables
duckdb userdata/warehouse.duckdb -c "SHOW TABLES;"
Inspect a table schema
duckdb userdata/warehouse.duckdb -c "DESCRIBE <table_name>;"
Run a query
duckdb userdata/warehouse.duckdb -c "SELECT * FROM <table_name> LIMIT 10;"
Row counts
duckdb userdata/warehouse.duckdb -c "SELECT COUNT(*) AS row_count FROM <table_name>;"
Ingest a CSV file
duckdb userdata/warehouse.duckdb -c "CREATE TABLE <name> AS SELECT * FROM read_csv_auto('<path/to/file.csv>');"
Ingest a Parquet file
duckdb userdata/warehouse.duckdb -c "CREATE TABLE <name> AS SELECT * FROM read_parquet('<path/to/file.parquet>');"
Ingest a JSON file
duckdb userdata/warehouse.duckdb -c "CREATE TABLE <name> AS SELECT * FROM read_json_auto('<path/to/file.json>');"
Query a file directly (without loading)
DuckDB can query files in place -- no need to import first:
duckdb -c "SELECT * FROM read_csv_auto('templates/sampledata/sample_customers.csv') LIMIT 5;"
Export query results to CSV
duckdb userdata/warehouse.duckdb -c "COPY (SELECT * FROM <table>) TO '<output.csv>' (HEADER, DELIMITER ',');"
When to Use
- User asks "what data do I have?" ->
SHOW TABLES;thenDESCRIBE <table>; - User asks to see sample data ->
SELECT * FROM <table> LIMIT 5; - User asks to load a file ->
CREATE TABLE ... AS SELECT * FROM read_csv_auto(...)(or parquet/json) - User asks a data question -> write SQL and run it
- User asks about row counts, aggregations, filters -> write and run the appropriate SQL
- User wants to explore a CSV/Parquet/JSON without loading -> query the file directly with
read_csv_auto()/read_parquet()/read_json_auto() - User asks for analysis -> run queries, then present results using charts and tables (see Analysis Workflow below)
Analysis Workflow
When the user asks for analysis, follow this pattern:
- Query the data using DuckDB with
-jsonflag for structured output - Present key metrics in a summary markdown table
- Produce charts using
chartfenced code blocks with JSON specs (see SOUL.md for format) - Show detail tables in markdown format
- Provide insights -- bullet points explaining what the data tells us
Example: Producing a Chart from Query Results
Run the query:
duckdb userdata/warehouse.duckdb -json -c "SELECT category, ROUND(SUM(unit_price * quantity), 2) AS revenue FROM orders JOIN products ON orders.product_sku = products.sku GROUP BY category ORDER BY revenue DESC LIMIT 10;"
Then format the results as a chart block in your response:
```chart
{
"type": "bar",
"title": "Revenue by Category",
"data": [
{"name": "Electronics", "value": 4800},
{"name": "Furniture", "value": 3200}
],
"xKey": "name",
"yKey": "value",
"color": "#6366f1"
}
```
Common Analysis Queries
Customer Lifetime Value:
SELECT c.first_name || ' ' || c.last_name AS customer,
COUNT(DISTINCT o.order_id) AS orders,
ROUND(SUM(o.quantity * p.unit_price), 2) AS total_revenue,
ROUND(AVG(o.quantity * p.unit_price), 2) AS avg_order_value,
MIN(o.order_date) AS first_order,
MAX(o.order_date) AS last_order
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_sku = p.sku
GROUP BY 1 ORDER BY total_revenue DESC;
Monthly Revenue Trend:
SELECT STRFTIME(order_date, '%Y-%m') AS month,
COUNT(DISTINCT order_id) AS orders,
ROUND(SUM(quantity * unit_price), 2) AS revenue
FROM orders o JOIN products p ON o.product_sku = p.sku
GROUP BY 1 ORDER BY 1;
Product Performance:
SELECT p.product_name, p.category,
SUM(o.quantity) AS units_sold,
ROUND(SUM(o.quantity * p.unit_price), 2) AS revenue,
COUNT(DISTINCT o.customer_id) AS unique_buyers
FROM products p JOIN orders o ON p.sku = o.product_sku
GROUP BY 1, 2 ORDER BY revenue DESC;
Payment Method Distribution:
SELECT pay.payment_method,
COUNT(*) AS transactions,
ROUND(SUM(pay.amount), 2) AS total_amount,
ROUND(AVG(pay.amount), 2) AS avg_amount
FROM payments pay
GROUP BY 1 ORDER BY total_amount DESC;
Output Formatting
- Use
-markdownflag for markdown-formatted table output:duckdb userdata/warehouse.duckdb -markdown -c "..." - Use
-jsonflag for JSON output:duckdb userdata/warehouse.duckdb -json -c "..." - Use
-csvflag for CSV output:duckdb userdata/warehouse.duckdb -csv -c "..." - Default output is column-aligned plain text
- Prefer
-jsonfor analysis -- it's easiest to transform into chart specs
Tips
- Multiple statements: Separate with
;inside a single-cstring - Temp tables: Use
CREATE TEMP TABLEfor intermediate results within a session - Aggregations: DuckDB supports window functions, CTEs,
PIVOT/UNPIVOT,GROUP BY ALL - String matching: Use
ILIKEfor case-insensitive matching - Sampling:
SELECT * FROM <table> USING SAMPLE 100;for random samples - DuckDB supports full SQL (joins, subqueries, CTEs, window functions)
- Exit code 1 means failure -- check stderr for the error message
Notes
- Database file:
userdata/warehouse.duckdb(created automatically on first use) - DuckDB natively reads CSV, Parquet, JSON, and Excel files
- No server needed -- DuckDB is an embedded database
- The database file is safe to delete and rebuild from source files