matlab-use-duckdb

star 604

Generates MATLAB code for DuckDB database operations using Database Toolbox. Use when connecting to DuckDB (in-memory or file-based), querying CSV/Parquet/JSON/Excel files with SQL, creating development databases, preprocessing out-of-memory data, using duckdb(), installing DuckDB extensions, using DuckDB as an analytical engine in MATLAB, converting MATLAB analytics to SQL queries, optimizing data pipelines that use MATLAB file I/O before processing, or replacing file I/O bottlenecks with direct DuckDB reads.

matlab By matlab schedule Updated 6/4/2026

name: matlab-use-duckdb description: "Generates MATLAB code for DuckDB database operations using Database Toolbox. Use when connecting to DuckDB (in-memory or file-based), querying CSV/Parquet/JSON/Excel files with SQL, creating development databases, preprocessing out-of-memory data, using duckdb(), installing DuckDB extensions, using DuckDB as an analytical engine in MATLAB, converting MATLAB analytics to SQL queries, optimizing data pipelines that use MATLAB file I/O before processing, or replacing file I/O bottlenecks with direct DuckDB reads." license: MathWorks BSD-3-Clause metadata: author: MathWorks version: "1.1"

MATLAB Database Toolbox Interface to DuckDB

Use when working with DuckDB databases from MATLAB using Database Toolbox. DuckDB is an embedded analytical database engine that ships with Database Toolbox starting in R2026a. It enables SQL-based analytics on files, out-of-memory data preprocessing, and portable development databases — all without external database server configuration.

When to Use This Skill

  • Connecting to a DuckDB database (in-memory or file-based)
  • Creating a new DuckDB database file for development workflows
  • Querying CSV, Parquet, or JSON files directly with SQL
  • Preprocessing large data that doesn't fit in memory before importing into MATLAB
  • Using DuckDB as an analytical engine for filtering, aggregation, joins, or sorting
  • Installing and using DuckDB extensions
  • Converting existing MATLAB array/table operations into equivalent DuckDB SQL queries
  • Optimizing a data pipeline that reads files into MATLAB memory before processing
  • Replacing MATLAB file I/O bottlenecks (readtable, readmatrix, parquetread, xlsread, csvread) with direct DuckDB file reads
  • User mentions keywords: DuckDB, duckdb, analytical engine, embedded database, parquet, CSV analytics, in-memory database, portable database, development database, out-of-memory preprocessing, optimize data pipeline, convert to SQL, replace readtable, replace parquetread, file bottleneck

When NOT to Use

  • Connecting to MySQL, PostgreSQL, SQLite, or other external databases — use their native interfaces or JDBC/ODBC
  • Data fits in memory and only needs standard MATLAB operations — use readtable/readmatrix directly
  • Object-relational mapping — use ORM (ormread/ormwrite with Mappable classes)
  • MongoDB, Cassandra, or Neo4j — use their dedicated Database Toolbox interfaces

On-Load Protocol

When this skill is loaded into a session where code already exists:

  1. Audit the data pipeline — Identify how data enters the workflow:

    • Is data read via MATLAB I/O (readtable, readmatrix, readcell, readtimetable, parquetread, xlsread, csvread)?
    • Is data then written to DuckDB with sqlwrite before querying?
    • If yes: this is the load-then-query anti-pattern. DuckDB can likely read the source file directly via read_csv, read_parquet, or read_xlsx (excel extension).
  2. Evaluate each data source against the decision framework:

    • Can DuckDB read this file type directly? (CSV, Parquet, JSON, Excel via extension)
    • Can filtering/aggregation be pushed into the SQL read?
    • Is the MATLAB I/O step a performance bottleneck?
  3. Recommend architectural changes — Do not limit review to API correctness. Propose replacing readtable/xlsread + sqlwrite + query chains with fetch(conn, "SELECT ... FROM read_csv/read_parquet/read_xlsx(...)").

The highest-value patterns in this skill are architectural: file-analytics pushdown eliminates entire pipeline stages and can yield 10x+ speedups.

What Is DuckDB and Why Does Database Toolbox Ship It?

DuckDB is an embedded, serverless analytical database engine. Unlike MySQL or PostgreSQL, it requires no server, no configuration, and runs in-process within MATLAB.

Why it ships with Database Toolbox (R2026a+):

  • Zero-config databaseconn = duckdb() gives you a full SQL engine instantly.
  • Analytical engine for files — Query CSV, Parquet, and JSON files directly with SQL without loading them into memory.
  • Out-of-memory preprocessing — Filter, aggregate, join, and sort datasets larger than memory, then bring only results into MATLAB.
  • Portable development databases.duckdb or .db files work on any machine with Database Toolbox. No database setup needed.
  • AI agent advantage — An agent's SQL knowledge directly translates to powerful analytical queries.

DuckDB does NOT replace MATLAB's file I/O (readtable, etc.). It is a performant alternative when data exceeds memory or SQL operations are more natural than MATLAB table operations.

Critical Rules

Connection

  • ALWAYS use duckdb() to connect — not database(), not JDBC, not ODBC.
  • ALWAYS verify with isopen(conn) and close with close(conn).

API Surface

  • All standard functions work: sqlread, fetch, execute, sqlwrite, sqlfind, sqlinnerjoin, sqlouterjoin, commit, rollback.
  • DuckDB does NOT support databasePreparedStatement. Use execute or sqlwrite instead.
  • Use ExcludeDuplicates via databaseImportOptions when reading from database tables (with sqlread). For direct file queries (read_csv/read_parquet via fetch), use SELECT DISTINCT in SQL.

File Queries

  • ALWAYS use fetch (not sqlread) for file queries — they require SQL syntax like SELECT * FROM read_csv('file.csv').
  • ALWAYS use single quotes for file paths inside SQL: read_csv('data.csv').

Decision Framework

Which connection mode should I use?

Goal Connection Why
Analytical queries on files duckdb() No persistence needed; query files directly
Temporary workspace duckdb() Fast, discarded on close
Portable development database duckdb("mydata.duckdb") Creates a .duckdb or .db file; works on any machine
Open existing database duckdb("existing.db") Read/write access to pre-existing .db or .duckdb file
Read-only shared database duckdb("shared.duckdb", ReadOnly=true) Prevents accidental writes

When should I use DuckDB vs. MATLAB file I/O?

Scenario Recommendation
Small data, simple operations readtable / readmatrix
Data exceeds memory, needs filtering/aggregation DuckDB (preprocess in SQL, analyze in MATLAB)
Query across multiple CSV/Parquet files DuckDB with glob patterns
Portable development database DuckDB file-based connection
MATLAB-specific analysis (signal processing, ML) Preprocess in DuckDB, analyze in MATLAB

Common Patterns

Pattern 1: Analytical Engine on Files

conn = duckdb();
result = fetch(conn, "SELECT region, SUM(revenue) as total " + ...
    "FROM read_parquet('sales.parquet') " + ...
    "GROUP BY region ORDER BY total DESC");
close(conn);

Pattern 2: Out-of-Memory Preprocessing

conn = duckdb();
summary = fetch(conn, "SELECT date, AVG(value) as avg_val " + ...
    "FROM read_csv('huge_dataset.csv') " + ...
    "WHERE status = 'valid' " + ...
    "GROUP BY date ORDER BY date");
close(conn);
% summary fits in memory — continue with MATLAB analysis

Pattern 3: Development Database

conn = duckdb("dev.duckdb");
sqlwrite(conn, "experiments", experimentData);
rf = rowfilter("score");
results = sqlread(conn, "experiments", RowFilter=rf.score > 0.8);
close(conn);

Pattern 4: Multi-File Query with Glob

conn = duckdb();
data = fetch(conn, "SELECT * FROM read_parquet('data/year=2024/*.parquet') " + ...
    "WHERE category = 'A'");
close(conn);

Pattern 5: Extensions

conn = duckdb();
execute(conn, "INSTALL httpfs");
execute(conn, "LOAD httpfs");
data = fetch(conn, "SELECT * FROM read_parquet('https://example.com/data.parquet') LIMIT 1000");
close(conn);

For Excel files, use the excel extension with read_xlsx (NOT st_read from spatial):

conn = duckdb();
execute(conn, "INSTALL excel");
execute(conn, "LOAD excel");
data = fetch(conn, "SELECT * FROM read_xlsx('report.xlsx')");
close(conn);

Pattern 6: Persistent Import from File

conn = duckdb("analytics.duckdb");
execute(conn, "CREATE TABLE events AS SELECT * FROM read_parquet('raw_events.parquet')");
% Future sessions: query by table name (no file re-read)
result = sqlread(conn, "events");
close(conn);

For detailed examples, see:

  • File analytics and out-of-memory preprocessing: reference/cards/file-analytics.md
  • Development database workflows: reference/cards/development-database.md
  • DuckDB extensions: reference/cards/extensions.md

Common Mistakes

% WRONG — using database() or JDBC to connect to DuckDB
conn = database("", "", "", "org.duckdb.DuckDBDriver", "jdbc:duckdb:");
% CORRECT
conn = duckdb();

% WRONG — using sqlread for file queries (expects a table name)
data = sqlread(conn, "read_csv('data.csv')");
% CORRECT — use fetch with SQL
data = fetch(conn, "SELECT * FROM read_csv('data.csv')");

% WRONG — double quotes for file paths in SQL
data = fetch(conn, "SELECT * FROM read_csv(""data.csv"")");
% CORRECT — single quotes
data = fetch(conn, "SELECT * FROM read_csv('data.csv')");

% WRONG — loading huge file into MATLAB then filtering
data = readtable("huge.parquet"); filtered = data(data.val > 100, :);
% CORRECT — let DuckDB filter on disk
conn = duckdb();
filtered = fetch(conn, "SELECT * FROM read_parquet('huge.parquet') WHERE val > 100");
close(conn);

% WRONG — using databasePreparedStatement (not supported)
pstmt = databasePreparedStatement(conn, "INSERT INTO t VALUES(?, ?)");
% CORRECT — use sqlwrite
sqlwrite(conn, "t", data);

% WRONG — using st_read from spatial extension for Excel files
data = fetch(conn, "SELECT * FROM st_read('file.xlsx')");
% CORRECT — use excel extension with read_xlsx
execute(conn, "INSTALL excel");
execute(conn, "LOAD excel");
data = fetch(conn, "SELECT * FROM read_xlsx('file.xlsx')");

% WRONG — MATLAB table column named with SQL reserved keyword
data = table(1, "A", 'VariableNames', {'id','group'});
sqlwrite(conn, "t", data);  % Parser error: "group" is reserved
% CORRECT — rename column before writing
data = renamevars(data, 'group', 'experiment_group');
sqlwrite(conn, "t", data);

Checklist

Before finalizing DuckDB code, verify:

  • Pipeline check: No unnecessary MATLAB file I/O (e.g., readtable, xlsread, parquetread) + sqlwrite chains when DuckDB can read files directly
  • Connected with duckdb() or duckdb("file.duckdb") / duckdb("file.db") — not database() or JDBC
  • isopen(conn) checked after connection
  • File queries use fetch with SQL (not sqlread)
  • File paths in SQL use single quotes
  • Out-of-memory data preprocessed in DuckDB before importing to MATLAB
  • No databasePreparedStatement usage (not supported)
  • close(conn) called when done

Troubleshooting

Issue: duckdb function not found

  • Solution: Requires R2026a+ with Database Toolbox. Check with ver('database').

Issue: sqlread errors with file query

  • Solution: Use fetch(conn, "SELECT * FROM read_csv('file.csv')")sqlread expects table names only.

Issue: Permission denied on ReadOnly connection

  • Solution: Reconnect without ReadOnly: conn = duckdb("file.duckdb").

Issue: Out of memory when querying large file

  • Solution: Add WHERE, GROUP BY, LIMIT, or aggregation in SQL to reduce result size before it enters MATLAB.

Issue: File path not found in read_csv/read_parquet

  • Solution: Paths are relative to pwd. Use absolute paths or verify with dir('file.csv').

Issue: Extension install fails

Issue: sqlwrite fails with "syntax error at or near" a column name

  • Solution: Column name is a SQL reserved keyword (group, order, select, table, etc.). Rename with renamevars(data, 'group', 'experiment_group') before writing.

Issue: Type mismatch on sqlwrite

  • Solution: DuckDB supports rich types (ARRAY, LIST, STRUCT, MAP). Use sqlfind(conn, "tableName") to check column types.

Copyright 2026 The MathWorks, Inc.


Install via CLI
npx skills add https://github.com/matlab/matlab-agentic-toolkit --skill matlab-use-duckdb
Repository Details
star Stars 604
call_split Forks 76
navigation Branch main
article Path SKILL.md
More from Creator