name: debug-transformation description: Debug dlthub transformation failures. Use when a transformation fails on a different destination than it was developed on, SQL dialect errors occur after deployment, pipeline recovery is needed after a failed run, or columns are silently dropped from output.
Debug transformation
Diagnose and fix dlthub transformation failures. Two main failure classes: SQL dialect incompatibility (transformation works on dev destination, fails on production) and pipeline state errors (stale packages, schema drift, failed jobs).
When to use this skill
- Transformation works on DuckDB locally but fails on BigQuery, Snowflake, or Postgres after deployment
- Pipeline fails with SQL syntax errors or "unsupported function" messages
- You want to verify SQL portability before deploying (proactive check)
- Pipeline is stuck in a failed/retry loop
- Columns are missing from output after the first run (NULL-only or computed/derived columns)
- You want to inspect what the transformation actually produced after a successful run
1. SQL dialect compatibility
1a. Static SQL compatibility checks
The toolkit includes a bundled dialect checker that reads your transformation file directly — no pipeline connection needed, and no script to write. It parses each @dlt.hub.transformation function's SQL in the dev dialect and transpiles it to the target dialect, catching portability issues before deployment.
Tell the user: "I'm going to run a static SQL compatibility check on your transformation file to catch any SQL dialect issues before you move to prod."
You must pass the dev and prod destination types explicitly via --read and --write. If you already have this from context (e.g. from create-transformation), use it directly. If not, call get_local_pipeline_state via MCP to retrieve the destination type for the pipeline, then ask the user for the prod destination if unknown.
Run the checker yourself and read the output:
uv run python ${CLAUDE_PLUGIN_ROOT}/tools/check_dialect.py transformations/<dataset>_to_cdm.py --read <dev_dialect> --write <prod_dialect>
Expected output looks like:
Dialects: duckdb -> bigquery
Checking 5 transformation(s) from transformations/hubspot_to_cdm.py
[dim_company] OK
[dim_person] WARN: double-quoted identifier "email"; verify destination quoting in bigquery
[fact_activity] ERROR: transpile duckdb->bigquery failed: ...
...
SUMMARY
warnings: 1
errors: 1
Walk the user through the results:
- OK — safe to deploy as-is
- WARN — portability risk worth inspecting, not a guaranteed failure; common cause is double-quoted identifiers that BigQuery/Snowflake handle differently
- ERROR — the SQL will likely fail on the target destination; rewrite to ANSI SQL (see section 1b)
- skipped — SQL is dynamically constructed and can't be statically analysed; inspect those functions manually
This check catches the most common issues, but does not replace inspecting Relation.to_sql() output if deployment still fails after fixes:
If either dlt destination is not covered by SQLGlot, the script stops and prints the available SQLGlot dialects. In that case, inspect dlt's actual Relation.to_sql() output or run a target-destination test pipeline instead. SQLGlot supports 31+ dialects — see https://sqlglot.com/sqlglot.html for dialect names.
1b. Common dialect-specific patterns to fix
Rewrite these to ANSI SQL so SQLGlot can transpile them to any destination:
| Dialect-specific pattern | Portable ANSI alternative |
|---|---|
x::TEXT, x::INT (cast shorthand) |
CAST(x AS VARCHAR), CAST(x AS INTEGER) |
IFNULL(a, b) |
COALESCE(a, b) |
ILIKE |
LOWER(x) LIKE LOWER(y) |
INT64 / STRING / FLOAT64 as type names |
BIGINT / VARCHAR / DOUBLE |
EPOCH_MS(), STRFTIME(), LIST_AGG(), etc. |
No ANSI equivalent — use query_dialect (see below) |
QUALIFY clause |
Wrap in subquery with WHERE on the window result |
1c. When dialect-specific SQL is unavoidable
If a transformation genuinely requires a dialect-specific function with no ANSI equivalent, declare the source dialect with query_dialect so SQLGlot knows how to transpile:
@dlt.hub.transformation
def my_transform(dataset: dlt.Dataset):
yield dataset(
"SELECT STRFTIME(created_at, '%Y-%m') AS month FROM events",
query_dialect="duckdb", # tells dlt this SQL is DuckDB dialect
)
If SQLGlot raises UnsupportedError or logs warnings at unsupported_level, the construct has no mapping to the target dialect and must be rewritten to an ANSI equivalent or handled in application code.
References:
- SQLGlot supported dialects: https://sqlglot.com/sqlglot.html
- SQLGlot unsupported errors: https://sqlglot.com/sqlglot.html#unsupported-errors
- dlt
Relationsource: https://github.com/dlt-hub/dlt/blob/master/dlt/dataset/relation.py#L276
2. Pipeline failure recovery
Use this escalation order. Do not skip steps.
Step 1: Inspect failures
uv run dlthub local pipeline failed-jobs <pipeline_name>
uv run dlthub local pipeline trace <pipeline_name>
Read the error messages before taking any recovery action. Most failures are fixable without touching destination state.
Step 2: Clear stale packages
If a prior run failed mid-load and left pending packages that keep retrying old (broken) SQL:
uv run dlthub local pipeline drop-pending-packages <pipeline_name>
Re-run after clearing. If the underlying SQL was fixed, this is often all that is needed.
Step 3: Reconcile local state
If local pipeline state has drifted from the destination (e.g. after a partial load or schema change):
uv run dlthub local pipeline sync <pipeline_name>
If no recoverable destination state exists, sync may not resolve partial retries — use drop-pending-packages first.
Step 4: Selective drop (last resort)
DuckDB only: if the failure is
"Parser Error: Adding columns with constraints not yet supported",dlthub local pipeline dropwill not fix it — go to section 3's DuckDB workaround instead.
Only if the steps above do not resolve the failure and incorrect schema or tables were already loaded to the destination:
uv run dlthub local pipeline drop <pipeline_name> <resource> # drop a specific resource
uv run dlthub local pipeline drop <pipeline_name> --drop-all # only with explicit user confirmation
Safety rules before dropping:
- Prefer dropping specific resources over
--drop-all - Confirm pipeline name, destination, dataset, and which resources will be dropped before executing
dropremoves destination tables and resets matching local state — this forces a full reload and may remove good data alongside bad- If uncertain which resources are safe to drop, stop and ask the user before executing
- After drop: re-run transformations and validate schema/tables before further loads
References:
- dlthub CLI reference: https://dlthub.com/docs/hub/command-line-interface
3. Missing columns and schema issues
dlthub silently drops columns it cannot type-infer — no error, no warning. Two root causes:
3a. NULL-only columns
When a column is NULL-only on the first run and no columns= hint was provided, dlthub strips the column from the schema. Subsequent runs write data but the column is absent.
3b. Computed / derived columns
When a transformation uses derived expressions — md5(), strftime(), TRY_CAST, CASE WHEN ... END, aggregates with aliases, or function chains — dlthub cannot infer the output type from the SQL alone. Without a columns= hint the column is silently dropped.
Diagnose: compare expected vs actual columns using the MCP get_table_schema tool, or inspect via:
uv run dlthub local pipeline show <pipeline_name>
Identify which computed columns are absent from the output schema. Each missing column needs an explicit columns= hint.
Fix: add columns= hints for every affected column:
@dlt.hub.transformation(
write_disposition="replace",
columns={
"company_sk": {"data_type": "text", "nullable": False},
"joined_at": {"data_type": "timestamp", "nullable": True},
"email_hash": {"data_type": "text", "nullable": True}, # md5() result
"month_bucket": {"data_type": "text", "nullable": True}, # strftime() result
"event_count": {"data_type": "bigint", "nullable": True}, # COUNT() alias
},
)
def dim_company(dataset: dlt.Dataset):
...
data_type values must match the key type contract established during create-transformation (consistently text or bigint for surrogate keys).
Apply columns= hints for:
- Any column from a
LEFT JOIN(lookup may return NULL) - Any cast from string to typed value where the source may be empty
- Any column that was NULL-only in a prior run
- Any computed or derived column:
md5(),strftime(),TRY_CAST,CASE WHEN, aggregate aliases, function chains
Known DuckDB failure: "Parser Error: Adding columns with constraints not yet supported"
This error surfaces when re-running a transformation with new or modified columns= hints against DuckDB — DuckDB's ALTER TABLE cannot add constrained columns. The workaround is to drop the dataset and re-run from scratch:
# DuckDB only — this drops all tables in the dataset
duckdb <path_to_db_file> "DROP SCHEMA <dataset_name> CASCADE;"
Then re-run the transformation script. This is DuckDB-specific behavior and will not occur on cloud destinations (BigQuery, Snowflake, Postgres). Do not use dlthub local pipeline drop for this error — it does not clear the DuckDB schema and will not resolve the constraint conflict.
Reference: https://dlthub.com/docs/hub/transformations.md
4. Validate transformation output
After fixing an issue and re-running, validate the output using the same checks as the happy path — see create-transformation Step 9.