name: migrate-dataset description: Migrate a legacy OWID dataset (no catalogPath) into the ETL pipeline. Use when user wants to migrate, backport, or convert a legacy dataset by ID, or mentions datasets without catalogPath. metadata: internal: true
Migrate Legacy Dataset
Migrate a legacy OWID dataset from MySQL into the ETL pipeline using etl b migrate. This gives the dataset a proper catalogPath and moves it under ETL management.
Inputs
Required:
dataset_id- The MySQL dataset ID (integer)namespace- Target namespace (e.g.,plastic_pollution,trade,food)
Optional:
version- Dataset version date (default:latest). Prefer using the source's retrieval date (YYYY-MM-DD format).short_name- Override the auto-generated short name
Workflow
1. Look up the dataset
Query MySQL to understand the dataset before migrating:
make query SQL="SELECT id, name, isPrivate FROM datasets WHERE id = <dataset_id>"
make query SQL="SELECT id, name, catalogPath FROM variables WHERE datasetId = <dataset_id>"
make query SQL="SELECT id, name, description FROM sources WHERE datasetId = <dataset_id>"
Present to the user:
- Dataset name and privacy status
- Number of variables and whether they have
catalogPath(should be NULL for legacy datasets) - Source info — use the source's
retrievedDateor publication date to suggest aversion
2. Confirm namespace and version with user
Get the list of existing namespaces to pick from:
ls etl/steps/data/garden/
Based on the dataset topic and existing namespaces, suggest an appropriate:
- namespace: pick an existing namespace if one fits, otherwise propose a new one
- version: from the source's retrieved date in YYYY-MM-DD format
Ask the user to confirm or adjust before proceeding.
3. Run the migration
.venv/bin/etl b migrate --dataset-id <dataset_id> --namespace <namespace> --version <version> --run
The --run flag handles the full pipeline automatically:
- Backports data from MySQL to S3 snapshots
- Generates ETL step files (snapshot, garden, grapher)
- Adds entries to
dag/migrated.yml - Runs the snapshot step
- Runs ETL + grapher step (creates new dataset in MySQL with
catalogPath) - Matches indicators between old and new datasets (perfect match, non-interactive)
- Upgrades charts to point at new variables
Important notes
- The indicator upgrader DB table is cumulative — mappings from previous migrations persist. This is harmless but means the upgrade step may show mappings from earlier migrations.
- Only variables used in charts need to match. Unused variables with no chart references can safely be unmatched.
- If variable names changed significantly (human-readable -> underscored), perfect matching at 100% threshold may miss some. Check if unmatched variables appear in any charts.
- Legacy datasets that are private (
isPrivate=1) will use private S3 buckets automatically.
Troubleshooting
- "No such dataset": Verify the dataset ID exists with
make query SQL="SELECT * FROM datasets WHERE id = <id>" - Indicator match finds 0 matches: Variable names likely changed. Consider running
etl indicator-upgrade matchmanually with a lower--auto-threshold(e.g., 80). - Grapher step fails: Check that the staging DB is accessible and
GRAPHER_USER_IDis set.
Example
User: migrate dataset 5291
Agent: [looks up dataset] "Plastic ocean pollution (Meijer et al. 2021)" - 8 variables, 9 charts, source retrieved 2021-04-30
Suggested: namespace=plastic_pollution, version=2021-04-30
User: looks good
Agent: .venv/bin/etl b migrate --dataset-id 5291 --namespace plastic_pollution --version 2021-04-30 --run
-> Dataset 5291 -> 7360, 6/8 variables matched, 8 charts upgraded