duckdb

star 2

Guide for working with DuckDB — CLI usage, SQL execution, reading files (CSV, Parquet, JSON, Excel), extensions (httpfs, spatial, json, excel, postgres, etc.), and the Python duckdb package. Use this skill whenever the user is working with DuckDB, whether via CLI, SQL scripts, or Python. Trigger on: duckdb queries, reading local/remote files with duckdb, installing duckdb extensions, duckdb in Python/pandas, exporting data, attaching or querying PostgreSQL from DuckDB, or any question about DuckDB behavior or syntax.

bmsuisse By bmsuisse schedule Updated 6/5/2026

name: duckdb description: > Guide for working with DuckDB — CLI usage, SQL execution, reading files (CSV, Parquet, JSON, Excel), extensions (httpfs, spatial, json, excel, postgres, etc.), and the Python duckdb package. Use this skill whenever the user is working with DuckDB, whether via CLI, SQL scripts, or Python. Trigger on: duckdb queries, reading local/remote files with duckdb, installing duckdb extensions, duckdb in Python/pandas, exporting data, attaching or querying PostgreSQL from DuckDB, or any question about DuckDB behavior or syntax.

DuckDB

Docs · SQL Reference · Python API

CLI

Can be installed via uv add duckdb-cli>=1.5 --group dev

uv run duckdb                        # in-memory session
uv run duckdb mydb.duckdb            # persistent database
uv run duckdb mydb.duckdb -c "SELECT 42"   # run one statement
uv run duckdb mydb.duckdb < script.sql     # run a file

Useful dot-commands inside the shell:

Command Effect
.tables list tables
.schema tablename show DDL
.mode csv / .mode markdown output format
.output file.csv redirect output to file
.timer on show query timing
.exit quit

CLI docs


Reading Files

DuckDB can query files directly — no import step needed.

-- CSV
SELECT * FROM read_csv('data.csv');
SELECT * FROM read_csv('data.csv', header=true, delim=';');
SELECT * FROM 'data.csv';          -- auto-detected shorthand

-- Parquet
SELECT * FROM read_parquet('data.parquet');
SELECT * FROM read_parquet('s3://bucket/path/*.parquet');  -- requires httpfs

-- JSON
SELECT * FROM read_json('data.json');
SELECT * FROM read_json_auto('data.json');   -- infers schema

-- Multiple files / glob
SELECT * FROM read_csv('folder/*.csv');

CSV · Parquet · JSON

Exporting

COPY (SELECT * FROM tbl) TO 'out.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM tbl) TO 'out.csv' (HEADER, DELIMITER ',');

Extensions

Install once per database, then load each session (or use AUTOLOAD):

INSTALL httpfs;   LOAD httpfs;
INSTALL spatial; LOAD spatial;
INSTALL excel;   LOAD excel;
INSTALL json;    LOAD json;      -- usually auto-loaded
Extension Use case
httpfs Read/write S3, GCS, HTTP URLs
spatial Geospatial types and functions (PostGIS-like)
excel Read .xlsx files via read_xlsx()
json JSON functions (often bundled)
iceberg Apache Iceberg table support
delta Delta Lake support
postgres Attach and query a Postgres DB
-- S3 example (needs httpfs)
SET s3_region='eu-central-1';
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');

-- Attach Postgres
ATTACH 'host=localhost dbname=mydb user=me' AS pg (TYPE POSTGRES);
SELECT * FROM pg.public.orders LIMIT 10;

Extensions list


PostgreSQL Extension

Lets DuckDB read and write directly from/to a live PostgreSQL instance. Auto-loaded on first use.

ATTACH 'dbname=mydb user=postgres host=127.0.0.1' AS pg (TYPE postgres);
SELECT * FROM pg.public.orders LIMIT 10;
DETACH pg;

See references/postgres.md for full details: connecting, reading/writing, COPY, transactions, secrets, connection pool, and all functions.


Python

pip install duckdb
import duckdb

# In-memory (default)
with duckdb.connect() as con:
    df = con.execute("SELECT * FROM read_csv('data.csv')").df()

# Persistent
with duckdb.connect("mydb.duckdb") as con:
    # Query → DataFrame
    df = con.execute("SELECT * FROM read_csv('data.csv')").df()

    # Register a DataFrame as a table
    import pandas as pd
    df = pd.read_csv("data.csv")
    con.register("df_view", df)
    result = con.execute("SELECT count(*) FROM df_view").fetchall()

    # Parameterized queries
    con.execute("SELECT * FROM tbl WHERE id = ?", [42])

# Query a DataFrame directly (no connection needed)
result = duckdb.sql("SELECT * FROM df WHERE value > 10").df()

Python API docs


Useful SQL Patterns

-- Describe a file schema without loading it
DESCRIBE SELECT * FROM read_parquet('data.parquet');

-- Summarize a table
SUMMARIZE my_table;

-- Create table from file
CREATE TABLE orders AS SELECT * FROM read_csv('orders.csv');

-- Window functions
SELECT *, row_number() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees;

-- PIVOT
PIVOT sales ON region USING sum(amount);

-- Unnest arrays
SELECT unnest([1, 2, 3]) AS val;

SQL functions · Window functions

Install via CLI
npx skills add https://github.com/bmsuisse/skills --skill duckdb
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator