name: duckdb-parquet-lab-workflow description: Use DuckDB to query Parquet files, inspect metadata, join tables, and convert results to pandas for analysis; commonly precedes ydata-eda-profiling for EDA on extracted tables.
DuckDB Parquet Lab Workflow
Purpose
Standardize the pattern of loading Parquet files into DuckDB, inspecting schema, running SQL joins, and converting results to pandas DataFrames.
Usage
- "load Parquet with DuckDB and join tables"
- "describe DuckDB table schema"
- "convert DuckDB query to pandas"
Instructions
- Read Parquet data with
duckdb.queryorduckdb.sqlusing SQL strings. - Inspect schema using
DESCRIBE SELECT * FROM <table>and display with.show(). - Use explicit joins with clear
LEFTorRIGHTsemantics to preserve row counts. - Convert results to pandas with
.to_df()for downstream modeling. - Use
./templates/duckdb_snippets.mdfor the standard SQL patterns.