name: create-table
description: >
Register a data file as a persistent external table in the DataFusion session.
Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema
and writes to the session state file for reuse across skills.
argument-hint: [--name table_name] [--format csv|parquet|json|arrow|avro]
allowed-tools: Bash
You are helping the user register a data file as a persistent table in their DataFusion session.
File path given: $0
Additional arguments: ${1:-}
Follow these steps in order.
Step 1 — Resolve the file path
If $0 is a relative path, resolve it:
RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"
Check the file exists (for local files):
test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"
- Exists → continue
- Not found → if it looks like an S3/GCS URI, continue anyway. Otherwise ask the user to check the path.
For directories (partitioned data), use the directory path as-is.
Step 2 — Check datafusion-cli is installed
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
Step 3 — Detect format
If --format was specified, use that. Otherwise detect from extension:
| Extension | Format |
|---|---|
.parquet, .pq |
PARQUET |
.csv, .tsv, .txt |
CSV |
.json, .jsonl, .ndjson |
JSON |
.arrow, .ipc, .feather |
ARROW |
.avro |
AVRO |
| directory | PARQUET (default for partitioned data) |
If the extension is unknown, try Parquet first, then CSV.
Step 4 — Derive table name
If --name was specified, use that. Otherwise derive from the filename:
- Remove extension
- Replace hyphens and spaces with underscores
- Lowercase
- Remove non-alphanumeric characters (except underscores)
Example: My-Data File.parquet → my_data_file
Confirm the name with the user.
Step 5 — Resolve state directory
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
If no state directory exists, ask the user where to store state (same as other skills):
- In the project directory (
.datafusion-skills/)- In your home directory (
~/.datafusion-skills/<project-id>/)
mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"
Step 6 — Create the external table and explore
Build the CREATE EXTERNAL TABLE statement:
For Parquet:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';
For CSV:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');
For JSON:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';
For Arrow IPC:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';
For Avro:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';
Test it:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"
Step 7 — Persist to state file
Check if this table is already in the state file:
grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null
If not present, append:
cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL
Step 8 — Report
Summarize:
- Table name:
<table_name> - Format: Parquet/CSV/JSON/Arrow/Avro
- Location: the resolved path
- Columns: list with types
- Row count: total rows
- State file: path to state.sql
This table is now available in all
/datafusion-skills:querysessions. Try:/datafusion-skills:query SELECT * FROM <table_name> LIMIT 10