materialized-view

star 13

Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.

datafusion-contrib By datafusion-contrib schedule Updated 3/21/2026

name: materialized-view description: > Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO. argument-hint: <"create" SQL|"refresh" name|"status"|"list"> allowed-tools: Bash

You are helping the user create and manage materialized views using Apache DataFusion.

Input: $@

A materialized view in this context is a SQL query whose results are persisted as a Parquet file, registered as an external table, and tracked for refresh. This is a lightweight, file-based approach — no database server needed.

Step 1 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion and then continue.

Step 2 — Resolve state directory

Look for an existing state file:

STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

If no state directory exists, create one. Ask the user:

Where would you like to store DataFusion session state?

  1. In the project directory (.datafusion-skills/) — colocated with the project
  2. In your home directory (~/.datafusion-skills/<project-id>/) — keeps the repo clean

Create the chosen directory and initialize state.sql if it doesn't exist:

mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"

Also create the materialized views directory and manifest:

mkdir -p "$STATE_DIR/views"
test -f "$STATE_DIR/views/manifest.json" || echo '{"views":{}}' > "$STATE_DIR/views/manifest.json"

Step 3 — Parse the command

Determine what the user wants:

  • create <SQL or natural language> → create a new materialized view
  • refresh <name> → refresh an existing materialized view
  • status → show status of all materialized views
  • list → list all materialized views
  • drop <name> → remove a materialized view

If the input is just SQL or a natural language description, treat it as create.


Command: CREATE

3a — Generate SQL if needed

If the input is natural language, generate SQL. Use the session state to understand available tables:

datafusion-cli --file "$STATE_DIR/state.sql" -c "SHOW TABLES;" 2>/dev/null

3b — Derive a view name

Generate a descriptive snake_case name from the SQL (e.g., daily_trades_by_symbol). Ask the user to confirm or rename.

3c — Validate the query

Test that the SQL runs successfully:

datafusion-cli --file "$STATE_DIR/state.sql" -c "
$SQL LIMIT 1;
"

If it fails, diagnose the error and fix the SQL.

3d — Materialize as Parquet

datafusion-cli --file "$STATE_DIR/state.sql" -c "
COPY ($SQL) TO '$STATE_DIR/views/<view_name>.parquet' STORED AS PARQUET;
"

3e — Register as external table

Append to the state file so the view is available in future sessions:

cat >> "$STATE_DIR/state.sql" <<SQL
-- Materialized view: <view_name>
-- Source SQL: $SQL
CREATE EXTERNAL TABLE IF NOT EXISTS <view_name> STORED AS PARQUET LOCATION '$STATE_DIR/views/<view_name>.parquet';
SQL

3f — Update the manifest

Update $STATE_DIR/views/manifest.json to track the view:

python3 -c "
import json, datetime
manifest_path = '$STATE_DIR/views/manifest.json'
with open(manifest_path) as f:
    manifest = json.load(f)
manifest['views']['<view_name>'] = {
    'sql': '''$SQL''',
    'parquet_path': '$STATE_DIR/views/<view_name>.parquet',
    'created_at': datetime.datetime.now().isoformat(),
    'last_refreshed': datetime.datetime.now().isoformat(),
    'source_tables': []  # extracted from SQL
}
with open(manifest_path, 'w') as f:
    json.dump(manifest, f, indent=2)
"

3g — Verify

datafusion-cli --file "$STATE_DIR/state.sql" -c "
SELECT COUNT(*) AS row_count FROM <view_name>;
DESCRIBE <view_name>;
"

Report success with row count and schema.


Command: REFRESH

Refresh a specific view

Read the view's SQL from the manifest:

python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
    manifest = json.load(f)
view = manifest['views'].get('<view_name>')
if view:
    print(view['sql'])
else:
    print('ERROR: view not found')
"

Then re-materialize:

datafusion-cli --file "$STATE_DIR/state.sql" -c "
COPY ($SQL) TO '$STATE_DIR/views/<view_name>.parquet' STORED AS PARQUET;
"

Update the last_refreshed timestamp in the manifest.

Report the new row count.


Command: STATUS

Show the status of all materialized views:

python3 -c "
import json, os, datetime
with open('$STATE_DIR/views/manifest.json') as f:
    manifest = json.load(f)
for name, view in manifest['views'].items():
    parquet = view['parquet_path']
    size = os.path.getsize(parquet) if os.path.exists(parquet) else 0
    size_mb = size / (1024 * 1024)
    print(f\"{name}:\")
    print(f\"  SQL: {view['sql'][:80]}...\")
    print(f\"  Last refreshed: {view['last_refreshed']}\")
    print(f\"  File size: {size_mb:.2f} MB\")
    print(f\"  Path: {parquet}\")
    print()
"

Also verify each view is queryable:

datafusion-cli --file "$STATE_DIR/state.sql" -c "SELECT COUNT(*) AS row_count FROM <view_name>;"

Command: LIST

python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
    manifest = json.load(f)
for name in sorted(manifest['views']):
    view = manifest['views'][name]
    print(f\"  {name} — refreshed {view['last_refreshed'][:10]}\")
"

Command: DROP

Remove a materialized view:

  1. Delete the Parquet file
  2. Remove from manifest.json
  3. Remove the CREATE EXTERNAL TABLE line from state.sql
rm -f "$STATE_DIR/views/<view_name>.parquet"
python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
    manifest = json.load(f)
manifest['views'].pop('<view_name>', None)
with open('$STATE_DIR/views/manifest.json', 'w') as f:
    json.dump(manifest, f, indent=2)
"

Remove the corresponding lines from state.sql:

sed -i '' '/-- Materialized view: <view_name>/,/^$/d' "$STATE_DIR/state.sql"

Error handling

  • Query fails → check if source tables exist, suggest /datafusion-skills:create-table to register them
  • Disk space → warn if the Parquet file is > 1GB
  • Manifest corrupt → regenerate from existing Parquet files in the views directory

Suggest next steps

After creating a view:

Your materialized view <view_name> is now available as a table. Query it with /datafusion-skills:query SELECT * FROM <view_name> LIMIT 10. To refresh it when source data changes, run /datafusion-skills:materialized-view refresh <view_name>.

Install via CLI
npx skills add https://github.com/datafusion-contrib/datafusion-skills --skill materialized-view
Repository Details
star Stars 13
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
datafusion-contrib
datafusion-contrib Explore all skills →