name: update-dataset description: End-to-end dataset update workflow with PR creation, snapshot, meadow, garden, and grapher steps. Use when user wants to update a dataset, refresh data, run ETL update, or mentions updating dataset versions. metadata: internal: true
Update Dataset (PR → snapshot → steps → grapher)
Use this skill to run a complete dataset update with Claude Code subagents, keep a live progress checklist, and pause for user approval only when something needs attention.
Paired skill — keep in sync.
/review-data-pris the reviewer-side counterpart of this skill: it verifies the outcomes of the author-side steps defined here. Whenever you add, remove, or change a workflow step in this file, check whetherreview-data-pr/SKILL.mdneeds a matching reviewer-side check (and add it in the same commit if so). The reverse also holds — see the mirror note there.
Inputs
<namespace>/<old_version>/<name>- Get
<new_version>as today's date by runningdate -u +"%Y-%m-%d"
Optional trailing args:
- branch: The working branch name (defaults to current branch)
Assumptions:
- All artifacts are written to
workbench/<short_name>/. - Persist progress to
workbench/<short_name>/progress.mdand update it after each step. - Persist reusable update facts to
workbench/<short_name>/update-context.ymlas they are discovered. This is the canonical context artifact for the PR description, review handoff, anddata-updates-comms.
Progress checklist (maintain, tick live, and persist to progress.md)
- Parse inputs and resolve: channel, namespace, version, short_name, old_version, branch
- Clean workbench directory: delete
workbench/<short_name>unless continuing existing update - Run ETL update workflow via
etl-updatesubagent (help → dry run → approval → real run) - Add yourself to
dataset.ownersin the new garden.meta.yml(don't reorder; preserve existing names and markers) - Catalog
# NOTE:/# TODO:comments carried over from the old step files intonotes_to_check.md - Detect any
sanity_checksfunctions and their log-control flags; append tonotes_to_check.md - Create or reuse draft PR and work branch
- Update snapshot and compare to previous version; capture summary
- Meadow step: run + fix + diff + summarize
- Garden step: run + fix + diff + summarize
- Review
sanity_checksoutput (enable log flag, re-run, scan log, revert flag) — if none found and the garden step does non-trivial logic, recommend adding them; if present but missing value bounds (positive / [0,1] / [0,100] per indicator type), suggest those too (see 5b-bis) - Country harmonization audit: validate
.countries.jsonagainst canonical regions (flag provider regions not yet in the regions dataset →/add-provider-regions), audit.excluded_countries.json, scan garden log for missing/unused/unknown warnings - Region-provider drift: if this dataset's aggregates are in
regions.yml(defined_by: <provider>), check whether the new version changed the provider's region set or country membership; if so, updateregions.ymland re-propagate via/add-provider-regions - Grapher step: run + verify (skip diffs), or explicitly mark N/A
- Re-evaluate each catalogued
# NOTE:/# TODO:against fresh data; delete resolved workarounds + comments together, or record status in PR body - Check metadata: typos, Jinja spacing, style guide compliance
- Verify indicator-metadata coverage,
dataset.update_period_days, snapshot DVCdate_publishedandcitation_fullyear (etl updatecopies both verbatim — bump to the producer's real release date / year, or todate_accessed/ current year if the source doesn't publish one), and that all URLs resolve (HEAD-check) - Commit, push, and update PR description
- Run indicator upgrade on staging and persist report
- Update
update-context.ymlwith published chart count and 1–3 chart views for the public announcement - Render Slack announcement via
data-updates-comms, add to PR description, post@codex reviewas a separate PR comment, and notify user to post it to #data-updates-comms - Draft public-facing "Data update" post for OWID /latest, add to PR description, hand to user for review and publication
- Address Codex review comments (fix valid ones + resolve all threads)
- Run downstream-dependency check (
rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"); for each consumer outside the dataset's own chain, decide with the user whether to bump in this PR or document under "Downstream dependencies" for a follow-up PR (see "Downstream dependency check" section below for details) - Ask the user whether to remove the old DAG entries; if yes, delete them and their files AND relocate the new entries into the old slot (see "Removing the old version & reordering the DAG") — don't forget this step
- Hand off Wizard QA links to the user (Anomalist + Chart Diff on the staging branch) — this is the final step
Persistence:
- After ticking each item, update
workbench/<short_name>/progress.mdwith the current checklist state and a timestamp.
Checkpoints — when to pause
Default: keep going. Run through the full workflow without stopping unless one of the conditions below is met.
Stop and ask the user when:
- A step fails and the fix is ambiguous (multiple reasonable approaches, or you're unsure of the correct one)
- Data structure changed significantly (columns removed/renamed, large row count drops, schema changes that may affect charts)
- Country harmonization has new unmatched countries that need manual decisions
- The snapshot requires a manual download or credentials you don't have
- Indicator upgrade had imperfect matches (< 100% similarity) that need human review
- Anything that could silently break charts or lose data
Don't stop for:
- Routine assertion count updates (just update them and note in the summary)
- Clean step runs with only row increases
- Expected warnings (SettingWithCopyWarning, known unmapped territories)
- Straightforward filename/version reference updates
When you do stop, present a concise summary of the issue and what options exist.
When the update isn't a drop-in version bump
Some updates carry structural changes that make the standard rename-only flow the wrong tool. Recognise them up front and adjust the workflow.
Triggers — any of these means you're in restructure territory, not a version bump:
short_namechanges (producer rebranded the dataset).- File format/schema changes (wide → long, different file extension with a different column set, new dimensions).
- Policy/indicator set changes substantially (splits, dropped composites, newly added areas).
- Score semantics change (e.g. binary → continuous with subnational coverage).
Workflow adjustments:
- Skip
etl update. The rename-only flow copies the old step files into a new folder — useless when the schema is different. Author the new step chain by hand, using the old version as inspiration but not as a starting copy. - Add the new chain to the DAG before removing the old. Leave both chains active while you build and validate v2; remove the v1 entries only once v2 is on staging and the chart remap is queued or done.
- Decide on naming convention upfront. Ask the user whether to preserve v1 short_names where they map cleanly, or to adopt the source's fresh naming scheme. Fresh naming is cleaner but means the auto-Indicator-Upgrader can't help.
- Hand-curate the v1 → v2 indicator mapping. When short_names change entirely, the auto-upgrader has nothing to match on, but the Indicator Upgrader also matches on
title— so if v2 titles are descriptive (full sentences rather than the bare short_name), you can hand the user a table of v1 title → v2 title pairs and they can drive the chart remap from there. Generate this table from the v1 meta.yml + the v2 grapher catalog. - Defer the Slack and
/latestannouncements until charts have been remapped. Both posts depend oncharts.published_countandcharts.selected_viewsfrom the v2 chain. Drafting them before the remap gives the wrong count (zero) and no representative views. Tell the user to ping you when the chart remap is done, then run steps 8 / 9 / 9b.
For the long-format with dimensions sub-case specifically (e.g. one row per (country, year, <dim1>, <dim2>)), use the modern OWID pattern:
- Meadow + garden:
tb.format(["country", "year", <dim1>, <dim2>, ...], sort_columns=True). - Aggregations:
paths.regions.add_aggregates(tb, index_columns=[...full key...], regions=REGIONS, aggregations={...}). - Grapher: pass long tables through unchanged; the framework auto-expands them into per-cell variables.
- Metadata: variables are keyed by the long-column name, with
<% if <dim> == "X" and <dim2> == "Y" %>...<% endif %>Jinja blocks insidetitle,description_short,display.name. Grep this repo fortb.format(["country", "year"with more than two index entries to find current reference examples. - Jinja coverage: after building the grapher dataset, verify every active
(dim1, dim2)cell renders a non-empty value — read every column from the built grapher dataset and assertmetadata.titleis non-empty. A dimension combination with no matching<% if %>branch ships an untitled indicator.
Workflow orchestration
Initial setup
- Check if
workbench/<short_name>/progress.mdexists to determine if continuing existing update - If starting fresh: delete
workbench/<short_name>directory if it exists - Create fresh
workbench/<short_name>directory for artifacts
- Check if
Run ETL update command (etl-update subagent)
- Inputs:
<namespace>/<old_version>/<short_name>plus any required flags - Pick the URI that matches what's actually changing:
- If the source data is changing — new source files, modified extractor, anything that affects the snapshot output — run from the snapshot URI with
--include-usages. This bumps the whole chain (snapshot → meadow → garden → grapher) to the new version together:etl update snapshot://<ns>/<old_v>/<short>.<ext> --include-usages - If only garden logic / metadata is changing and the source data is unchanged, run from the garden URI. This bumps garden and grapher only; snapshot and meadow stay on the old version.
- If the source data is changing — new source files, modified extractor, anything that affects the snapshot output — run from the snapshot URI with
- Either way, run
etl updateonce. Don't call it separately per channel — that leaves stale version references in the DAG (e.g., new garden pointing to old meadow). - Perform help check, dry run, approval, then real execution; capture summary for later PR notes
- After running, always verify the dag file: grep for the old version and confirm all internal references between the new steps point to the new version (e.g., garden depends on new meadow, not old meadow).
etl updatewrites the new entries in the flat DAG form — convert them to the nested (compact) form now, while you're in the file, rather than leaving it until archiving (otherwise the flat block tends to survive the whole update unnoticed). See the example andload_dag()parse-check under "DAG archiving & reordering" step 4.
- Inputs:
1a-bis) Add yourself to dataset.owners in the new garden .meta.yml
You've just become a contributor to this dataset, so add your canonical OWID name to its owners: list. Don't reorder — keep the existing primary first; append yourself at the end. Skip if you're already there.
Your canonical name must match an entry in the schema enum (schemas/dataset-schema.json). Resolve it from git config user.name via etl.owners.resolve_owner; if that returns None, add a mapping in etl/owners.py and a row in the schema enum before continuing.
Edit the YAML in place, preserving comments and the existing # review / # backport / # fasttrack markers on other entries.
1b) Check for outdated practices (check-outdated-practices skill)
- After
etl updatecreates new step files, run the/check-outdated-practicesskill on every new step file — including helper modules thatetl updatedoesn't generate but you copied by hand (e.g.*_omms.py), since those carry legacy patterns too - The skill reads the extension as the source of truth for the full pattern set (the
geo.add_*aggregation/population helpers are flagged, not justgeo.harmonize_countries) — don't rely on a remembered subset - Fix any findings before proceeding — this avoids propagating legacy patterns into new versions
geo.harmonize_countries→paths.regions.harmonize_namesis mechanical and safe.geo.add_regions_to_table→paths.regions.add_aggregateschanges the aggregation core — prove equivalence with a controlled A/B test, not a diff against the old feather. Build the new garden both ways against the same current catalog (swap the call, rebuild, save output; revert, rebuild, save output) and diff the two. Do NOT conclude "the helper shifts aggregates across all years" from a new-vs-old-feather diff — that conflates the helper with upstream-dataset drift (see step 5). In practice the two helpers are equivalent bar tiny historical edge cases (e.g. one region-year's population residual); if so, modernize.add_aggregatesalso auto-resolves income groups from the DAG, so it's the right tool when you later need WB income-group aggregates.
1c) Catalog # NOTE: / # TODO: comments in the copied step files (don't resolve yet)
- Run
rg -n "#\s*(NOTE|TODO|FIXME|HACK|XXX):" snapshots/<namespace>/<new_version>/ etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/. - Filter out generic boilerplate (e.g.
# NOTE: To learn more about the fields, hover over their names.at the top of.meta.yml). - Save the remaining actionable items to
workbench/<short_name>/notes_to_check.md— one entry per annotation, recording file path, line number, which step it lives in (meadow/garden/grapher), and what the workaround does. - Don't act on them yet. Resolution requires fresh data and happens after each step's run — see step 6a.
1d) Detect sanity-check logic in the copied step files Sanity checks live in two different forms — detect both:
- Function form —
def sanity_check…/sanity_check…(call sites. Often gated by a module-level boolean flag (DEBUG,SHOW_SANITY_CHECK_LOGS,LONG_FORMAT) that defaults toFalseto keep normal runs quiet. Examples:etl/steps/data/garden/wb/.../world_bank_pip.py(SHOW_SANITY_CHECK_LOGS),etl/steps/data/garden/wid/.../world_inequality_database.py(DEBUG+LONG_FORMAT),etl/steps/data/garden/lis/.../luxembourg_income_study.py(no flag; prints unconditionally viatabulate). - Inline comment form —
# Sanity check/# Sanity checks/# sanity checkmarking an inline assertion block that isn't wrapped in a dedicated function. Very common:etl/steps/data/garden/emdat/.../natural_disasters.py,etl/steps/data/garden/emissions/.../national_contributions.py,etl/steps/data/garden/irena/.../renewable_capacity_statistics.py. These usually have no log flag — the block simply runs on every step execution and either passes or raises.
Run a combined sweep:
rg -n -i "def sanity_check|sanity_check\(|#\s*sanity check" \
snapshots/<namespace>/<new_version>/ \
etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/
Append a "Sanity checks" section to workbench/<short_name>/notes_to_check.md listing each hit — for each, record: file path + line number, which form (function vs. inline comment), the name of any log-control flag (function form only), and a one-line description of what's being asserted (read the surrounding 5–10 lines).
Don't act yet — the review happens in step 5b once the garden step has been run on the new data.
Create PR and integrate update via subagent (etl-pr)
- Inputs:
<namespace>/<old_version>/<short_name> - Create or reuse draft PR, set up work branch, and incorporate the ETL update outputs
- Inputs:
Snapshot run & compare (snapshot-runner subagent)
- Inputs:
<namespace>/<new_version>/<short_name>and<old_version>
Hand-maintained snapshots + editorial data edits. Some snapshots have no
url_download— the.pyprompts for--path-to-fileand the docstring says the data was "provided by email" / curated by hand. When the update is a small editorial correction (the user gives you the facts directly, e.g. "country X did Y in year Z"), you can produce the new snapshot yourself: copy the previous version's data file (data/snapshots/<ns>/<old_version>/<file>), change only the specific cells, and assert in a quick script exactly which rows/cells changed (and that all others are byte-identical) before runningetls ... --path-to-file <edited>. Then update the.pydocstring to document the edit and bump the.dvcdate_published/citation_fullyear (ask the user whether it's a new producer release or an OWID-applied edit — see step 6c). Verify the user's stated facts against the existing data first — some may already be encoded from a prior release (in this update, one of the two reported events was already in the live snapshot; only the other was a genuine change). Tell the user what's already present rather than blindly re-adding it.- Inputs:
Meadow step repair/verify (step-fixer subagent, channel=meadow)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
Garden step repair/verify (step-fixer subagent, channel=garden)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
Diff against a freshly-rebuilt old version, not the stale feather on disk. The old version's
data/garden/.../feather was built whenever it last ran — possibly against an earlier snapshot of a shared upstream dataset (population, regions, income_groups). A fresh build of the new version uses the current upstream, so a naive new-vs-old-feather diff shows differences in every population-weighted cell across all years and regions — pure upstream drift that has nothing to do with your change. Before trusting any diff, rebuild the old version on the current catalog and diff against that:.venv/bin/etlr data://meadow/<ns>/<old_version>/<short> --private --force --only .venv/bin/etlr data://garden/<ns>/<old_version>/<short> --private --force --onlyThe apples-to-apples diff should collapse to just your intended change. Mention the drift separately in the PR (Chart Diff on staging will show it, because the live data is also stale relative to current upstream). This bit me twice in one update — don't skip it.
When NaN can appear on one side, don't let
.fillna(False)hide it. In a cell-by-cell diff,(a - b).abs() <= tolevaluates toNaNwhen exactly one side is NaN; a downstream.fillna(False)then silently drops that real one-sided change. Treat "one side NaN, other side not" as a difference explicitly.
5b) Review sanity-checks output (only if step 1d catalogued any) Handling depends on the form catalogued in step 1d.
Function form with a log-control flag (e.g. SHOW_SANITY_CHECK_LOGS, DEBUG):
- Flip the flag to
Trueat the top of the garden step file. - Re-run the garden step, capturing output:
.venv/bin/etlr data://garden/<namespace>/<new_version>/<short_name> --private --force --only \ > workbench/<short_name>/sanity_checks.log 2>&1 - Review the log: scan for
AssertionError,error,warning,dropped, outliers flagged by country/year, unexpected totals. Surface actionable findings in the PR description under a "Sanity-check findings" collapsed section. - Revert the flag to its original value (usually
False) before committing. Verify withgit diffthat the garden file has no unintended changes.
Function form with no flag, or inline # Sanity check(s) comment blocks:
- Read each catalogued block (pull 5–15 lines of context around the hit) to understand what invariant is being tested.
- Important: a sanity check can enforce its finding either by raising (
assert,raise) or by logging (paths.log.warning,.critical, even.fatal). Logging variants do NOT fail the step — so "step 5 passed" is not proof that every invariant held. If the block uses logging, re-run the step and scan stdout/stderr for the relevant keywords; don't trust the exit code alone. - For non-trivial invariants (monotonicity, totals, bounds), also spot-check qualitatively against the fresh garden output via a short
.venv/bin/pythonsnippet. - Record any anomalies under "Sanity-check findings" in the PR description. No log artifact to keep here since the step's own output is the evidence.
In either form: if sanity_checks raise AssertionError on the new data, stop and decide with the user whether the assertion needs a threshold bump, whether upstream data genuinely broke, or whether the invariant is obsolete. If the check only logs, treat a new/expanding set of warnings the same way — they're the signal the sanity check was written to produce.
Watch for silent-delete patterns. Some sanity_checks functions also mutate the table — e.g. world_bank_pip's sanity_checks drops rows that fail invariants and reports the count via the log-control flag. With the flag off the deletions still happen; the reviewer just never learns which rows disappeared. When reading a sanity_checks function, scan for drop, filter, tb = tb[...] — anything that removes rows — and list every deletion in the PR body, not just the warning counts. If the deletion seems newly applicable to upstream fixes (e.g. the row should no longer be anomalous in the new release), that's a candidate for removing the workaround entirely.
5b-bis) Recommend sanity checks when the garden step lacks them (or lacks value bounds) Runs after the garden step is built. Two triggers:
- Step 1d catalogued no sanity checks and the garden step does more than a straight load-and-format (harmonization, column drops/renames, aggregations, derivations) — recommend adding
sanity_check_inputs/sanity_check_outputs. A Codex review will flag this anyway; better to handle it during the update. - The step has checks but you can spot missing value-bound coverage of the kinds below — suggest the additions.
Ground every threshold in the built data before writing it. Run min/max/nunique/value_counts per indicator on the built garden table first — never write plausible-sounding bounds. The classic trap: "% of GDP" indicators look like percentages but legitimately exceed 100 (IMF PFMH: gross debt up to 495%, expenditure 595% in crisis years; UK's post-Napoleonic 260% debt is on a published chart). A blind 0–100 assert fails the build on day one.
Suggested checks by indicator type (propose the applicable ones, not all):
| Indicator type | Check |
|---|---|
| Share / proportion | values in [0, 1] |
| Percentage of a whole | values in [0, 100] |
| "% of GDP" and similar ratios | non-negative for levels (revenue, spending, debt); can exceed 100 — verify against data before capping. Where a ≤100 bound mostly holds, enforce it with a documented exception set (e.g. PFMH expenditure ≤ 100 outside {Equatorial Guinea, Kuwait, Kiribati} — each with a comment explaining why it's legitimate) so a new country crossing the line fails for review |
| Balances, growth rates, interest rates | can legitimately be negative — do NOT impose ≥ 0 |
| Mutually exclusive share categories | components sum to 100% (or 1) per (country, year), within a small rounding tolerance (e.g. abs(sum - 100) < 0.1) — first verify against the data that the source's categories are actually exhaustive (some breakdowns ship without an "other" remainder, or only for a subset of rows) |
| Categorical flags / codes | exact value set (e.g. <= {0, 1}) and non-null |
| Input schema | set-equality on expected columns — catches the next source rename (PFMH: debt → d) with a clear message |
| Coverage | country-count floors (≥ previous version); a drop is usually a parsing/mapping regression — re-audit before bumping the constant |
Implementation conventions: constants at the top, run() first, check functions below run(); sanity_check_inputs(tb) right after loading meadow, sanity_check_outputs(tb) right before paths.create_dataset(...); plain assert with messages that name the offending values. Reference example: imf/2026-06-12/public_finances_modern_history.py.
sanity_check_outputs runs before tb.format(...), so column names are still as the code produced them — format() lowercases/underscores them afterward. If the step builds columns like f"{col}_{status}_pop" where status is "Legal"/"Illegal"/"missing", the pre-format columns are status_Legal_pop (mixed case), not status_legal_pop. Select such columns case-robustly (e.g. [c for c in tb.columns if c.endswith("_pop")] then filter on c.lower()), or you'll get a KeyError that only surfaces at runtime. Keep update-specific facts (e.g. "country X is Legal in year Y") out of the committed checks — verify those via the garden diff at update time so the checks stay valid across future releases.
Negative-test the checks: after the step passes on real data, simulate each failure mode (rename a column, corrupt a flag, push a value out of bounds) and confirm the matching assertion fires — a check that never trips is untested code.
5c) Country harmonization audit
Run after the garden step completes (and after 5b if it ran). Verifies that the country entities reaching the garden output are canonical, and that the mappings/exclusions consumed by paths.regions.harmonize_names(...) are well-formed. Output: workbench/<short_name>/harmonization_audit.md.
Modern API. Garden steps should be calling paths.regions.harmonize_names(tb, country_col=..., countries_file=..., excluded_countries_file=...) — the wrapper in etl/data_helpers/geo.py:1874. If you find a step still using the deprecated geo.harmonize_countries(...) directly, step 1b's /check-outdated-practices should already have flagged it; treat that as a separate cleanup. The audit below is API-agnostic — both call sites end up emitting the same three warning strings. Some garden steps don't use the harmonizer at all and instead assign country inline in Python (no .countries.json involved); for those, the JSON checks below have nothing to look at — the garden-output check in step 5 is what catches non-canonical entities, so always run it.
Source of truth. Canonical names come from two datasets, both consulted by the harmonizer:
data/garden/regions/2023-01-01/regions— countries, continents, and OWID-defined aggregates. The runtime authority ispaths.regions.tb_regions["name"]. This is built frometl/steps/data/garden/regions/2023-01-01/regions.ymlplus a merge withregions.codes.csvand field defaults — don't parse the YAML in isolation or you'll miss the legacy entries and produce false positives.data/garden/wb/<latest>/income_groups— the four World Bank income-group aggregates (High-income countries,Upper-middle-income countries,Lower-middle-income countries,Low-income countries). OWID treats the latest version of this dataset as the official one, so the audit must resolve the version dynamically (don't pin a date — it goes stale when WB publishes a refresh). The names live in theclassificationcolumn of theincome_groups_latesttable.
The audit's "canonical" set is the union of these two. A .countries.json entry looks like "Source name": "Target name" — the audit checks that every target name (the value the source gets harmonized to) appears in either dataset. Anything else is flagged.
Capture a fresh garden log:
.venv/bin/etlr data://garden/<namespace>/<new_version>/<short_name> --private --force --only \ > workbench/<short_name>/harmonization.log 2>&1Scan the log for the three harmonization warnings. These are emitted by
etl/data_helpers/geo.py(excluded list) andlib/datautils/owid/datautils/dataframes.py(mapping warnings) — the wording is stable:rg -n "missing values in mapping\.|unused values in mapping\.|Unknown country names in excluded countries file:" \ workbench/<short_name>/harmonization.logFor each warning, the entity list follows on subsequent lines (because
harmonize_countries()is called withshow_full_warning=Trueby default). Capture them.Validate
.countries.jsontarget names against canonical regions + income groups. Each entry maps a source name (key) to a target / harmonized name (value); this check looks at the values. For each garden step in this update:import json from pathlib import Path from owid.catalog import Dataset # Resolve the canonical regions dataset dynamically (latest built version). # Don't pin a date — when the regions step version advances, a hard-coded path # would validate against a stale catalog and flag valid targets as non-canonical. regions_dirs = sorted(Path("data/garden/regions").glob("*/regions")) if not regions_dirs: raise RuntimeError( "No data/garden/regions/<version>/regions built locally — the audit can't " "run without the canonical regions catalog. Build it first with " "`.venv/bin/etlr data://garden/regions/<latest>/regions --private`." ) tb_regions = Dataset(str(regions_dirs[-1]))["regions"] canonical_regions = set(tb_regions["name"].dropna().astype(str)) # Add OWID's official income-group aggregates to the canonical set, if available. # OWID treats the latest income_groups version as official. This artifact is # often not built locally during a non-income-groups dataset refresh — degrade # gracefully (warn and skip) rather than aborting the audit. ig_dirs = sorted(Path("data/garden/wb").glob("*/income_groups")) if ig_dirs: ds_ig = Dataset(str(ig_dirs[-1])) canonical_income = set(ds_ig["income_groups_latest"]["classification"].dropna().astype(str).unique()) else: print( "[WARN] No data/garden/wb/<version>/income_groups built locally — " "skipping income-group enrichment. The four WB income-group aggregates " "(High/Upper-middle/Lower-middle/Low-income countries) may surface as " "'not in canonical' until you build that dataset." ) canonical_income = set() canonical = canonical_regions | canonical_income mapping = json.loads(Path("etl/steps/data/garden/<namespace>/<new_version>/<short_name>.countries.json").read_text()) not_in_canonical = sorted({v for v in mapping.values() if v and v not in canonical}) print("Targets not in OWID's canonical regions or income groups:", not_in_canonical) # Provider regional groupings the source ships (e.g. "Europe (WB)", "Asia and the Pacific (ILO)") # surface here when OWID hasn't added that provider's regions yet. Flag the "(Provider)"-suffixed ones: import re provider_region_candidates = [v for v in not_in_canonical if re.search(r"\([\w .&-]+\)$", str(v))] print("Look like provider regions not yet in the regions dataset:", provider_region_candidates)A non-empty
not_in_canonicallist means the mapping points at entities that aren't registered in either the regions catalog or the income-groups dataset. This isn't automatically a bug — it's a heads-up. Stop and decide with the user before proceeding — same pattern as the global "Checkpoints — when to pause" section at the top of this skill. Common causes (in order from "fix" to "accept"): typo, retired alias used as canonical, casing/whitespace mismatch, or a custom aggregate the source defines that OWID has no equivalent for. For typos/casing — fix the JSON.If the unmatched targets are a provider's regional grouping that OWID doesn't define yet — the
provider_region_candidatesabove ("(Provider)"-suffixed names like"Europe (WB)","Sub-Saharan Africa (ILO)"), or the source ships aregion/subregioncolumn feeding these — don't just accept them as outside-the-system. The proper fix is to add that provider's regions to the regions dataset with the/add-provider-regionsskill (separate PR): they then become canonical, merge with regions/population infrastructure, and get a{provider}_regionmap indicator, instead of living outside the system. Surface this to the user and offer to run/add-provider-regions. Only for genuinely one-off, non-geographic groupings the regions system shouldn't own (BRICS, G7, G20) — accept and note in the PR description that those live outside the canonical system. For a real new historical region — add an entry toregions.ymlin a separate PR.Audit
.excluded_countries.json. The file is optional; skip if it doesn't exist:excluded_path = Path("etl/steps/data/garden/<namespace>/<new_version>/<short_name>.excluded_countries.json") if excluded_path.exists(): excluded = json.loads(excluded_path.read_text()) suspicious_canonical = sorted(set(excluded) & canonical) # Also surface continents and aggregates separately for review aggregates = set(tb_regions[tb_regions["region_type"].isin(["continent", "aggregate"])]["name"].dropna().astype(str)) suspicious_aggregates = sorted(set(excluded) & aggregates) print("Excluded entries that ARE canonical regions:", suspicious_canonical) print("Excluded entries that are continents/aggregates:", suspicious_aggregates) print("Full excluded list for review:", sorted(excluded))suspicious_canonicalis the actionable signal: each entry is a known country/region that we are dropping. Sometimes this is intentional (e.g. dropping "World" rows because the source double-counts them) — surface, don't auto-fix. Pause and ask the user if the list is non-empty. The full list is dumped so the LLM can also eyeball it for entities that aren't incanonicalbut look like real countries (typos, alternative names) we should be mapping rather than dropping.Audit garden output entities. Always run this check, regardless of whether
.countries.jsonexists or is populated — JSON mappings describe inputs to the harmonizer, but the entities that actually reach Grapher are whatever sits in thecountrycolumn/index of the built garden tables. Inlinecountryassignments (e.g. hardcodedtb["country"] = "England and Wales") and post-harmonization mutations both bypass the JSON check entirely; this is the only step that catches them.from pathlib import Path from owid.catalog import Dataset garden_dir = Path("data/garden/<namespace>/<new_version>/<short_name>") ds_garden = Dataset(str(garden_dir)) entities: set[str] = set() for tname in ds_garden.table_names: tb = ds_garden[tname] # `country` can live in the index (after .format()) or as a regular column. if "country" in tb.index.names: entities.update(tb.index.get_level_values("country").dropna().astype(str).unique()) elif "country" in tb.columns: entities.update(tb["country"].dropna().astype(str).unique()) # tables with no country column are silently skipped (e.g. reference tables) output_not_in_canonical = sorted(entities - canonical) print("Garden output entities not in OWID's canonical regions or income groups:", output_not_in_canonical)Same triage rules as the JSON-targets check (Python check #3): typo / casing / alias / legitimately custom aggregate. A non-empty list means at least one entity that ships to Grapher isn't registered in either the regions catalog or the income-groups dataset. Stop and decide with the user before proceeding. Common fixes: typo or casing → patch the inline assignment (or
.countries.json, whichever is the source) so the value matches the canonical name; alias → switch to the canonical name; legitimate custom aggregate → accept and note in the PR description that the entity lives outside the canonical system.Write findings to
workbench/<short_name>/harmonization_audit.mdwith six sections, populated only when non-empty. Each section must list every flagged entity, not just a count — counts alone aren't actionable, the user (or you) needs to read the actual names to judge whether each is intentional. For long lists (>20 entries) group by pattern when the grouping is obvious (e.g. ILO's" (ILO)"-suffixed regions vs. international orgs vs. derived "World ..." aggregates) so the reviewer can scan categories instead of one flat list. Sections:## Missing in mapping— countries in source data not in.countries.json(from log warning #1) — list each missing source name## Unused mappings—.countries.jsonentries the data never used (warning #2) — list each unused source→target pair## Unknown excluded entries—.excluded_countries.jsonentries not present in source data (warning #3) — list each## Targets not in OWID's canonical regions or income groups— target names from.countries.jsonthat aren't registered in either dataset (Python check #3) — list each target name and the source names that map to it## Excluded entries matching canonical regions— possible over-exclusion (Python check #4) — list each## Garden output entities not in OWID's canonical regions or income groups— distinctcountryvalues found in the built garden tables that aren't in canonical regions or income groups (Python check #5) — list each entity
Surface in PR. If any section was populated, add a collapsed "Harmonization audit" section to the PR description (after the per-step sections, before the Slack announcement) with the same listings, not just a summary. Empty sections can be omitted.
When you report progress to the user during the workflow, never just give a count — always include the list (or grouped categories) so they can judge in one glance.
Checkpoint summary:
- "Targets not in OWID's canonical regions or income groups" or "Garden output entities not in OWID's canonical regions or income groups" or "Missing in mapping" non-empty ⇒ stop, decide with user.
- "Excluded entries matching canonical regions" non-empty ⇒ stop, ask whether each exclusion is intentional.
- "Unused mappings" or "Unknown excluded entries" non-empty ⇒ surface in PR description; not a blocker.
5d) Region-provider drift — if this dataset defines OWID regions, propagate the change
The harmonization audit (5c-3) catches providers not yet in regions.yml. This check is the counterpart: a provider whose aggregates are already in regions.yml can change its regions on a version bump (new/removed/renamed regions, or shifted country membership), and regions.yml won't update itself.
- Does this dataset own regions?
grep "defined_by: <provider>" etl/steps/data/garden/regions/2023-01-01/regions.yml(e.g.ilo_1/ilo_2,maddison,wid,wb,who). No matches ⇒ skip this step. - Re-extract the provider's region→country mapping from the new version — from the same source the regions were originally derived from (a
region/subregioncolumn, region entities, or a table-of-contents tier; see/add-provider-regionsStep 1) — and diff it againstregions.yml(defined_by: <provider>members), set-equality per region, plus new/removed/renamed regions. If the new version no longer carries the composition, the provider's section in the world-region-map-definitions article links to the defining source — use it to re-derive the provider's existing regions (regions newly introduced by this update won't be documented there yet — get those from the source or ask the user). - If it drifted: stop and decide with the user (composition changes move every region-aggregated value downstream), then update
regions.ymland re-propagate via/add-provider-regions— rebuild garden regions, refresh the{provider}_regiongrapher indicators + metadata, regenerate owid-grapherregions.data.ts(runRegionsUpdater), and update the article section. - If unchanged: note it and move on.
- Grapher step run/verify (step-fixer subagent, channel=grapher, add --grapher)
- Skip diff
6a) Re-evaluate # NOTE: / # TODO: items from step 1c against fresh data
Now that meadow, garden, and grapher have run on the new data, go back to workbench/<short_name>/notes_to_check.md and decide each item's fate. For each entry:
- Identify what the workaround does (read the surrounding code).
- Load the affected step's output with
owid.catalog.Dataset(or inspect the raw snapshot) and compare corrected vs. uncorrected values. Cross-check the producer's release notes / changelog if available. - If the upstream issue is fixed → delete the workaround and its
# NOTE:/# TODO:comments in the same commit, then re-run the affected step (use--force --only, add--grapherfor grapher) so downstream artifacts pick up the change. - If the workaround is still needed → leave it and add a one-line status under a PR-description section titled "Not covered in this PR" (e.g. "Sierra Leone ×1000 correction still required — raw value in the 2026 file is still ~1/1000 of plausible"). These are deliberately deferred items the next updater should re-check. Delete the whole section if its last item gets resolved mid-PR.
- If you're uncertain → keep it, flag it in the PR description, and ask the user.
Do this before step 6b (metadata checks) so any re-runs triggered by comment-removal happen before the metadata sweep, not after.
6b) Metadata quality checks — run after all ETL steps are built Run all four checks on the newly built garden and grapher datasets so every issue surfaces together. Each skill writes results to the terminal; fix what comes up before moving on.
- Typos —
/check-metadata-typosscoped to the current step. Run on each of the new.meta.ymlfiles (garden first, then grapher). Accept or skip each suggested fix. - Jinja spacing —
/check-metadata-spacingon the built garden and grapher datasets. Catches template artifacts like doubled spaces or stray newlines that only appear after Jinja rendering. - Style guide —
/check-metadata-styleon the grapher step. Audits user-facing fields (title, subtitle, description_short, display.name, presentation.*) against OWID's Writing and Style Guide. Rules live in.claude/skills/check-metadata-style/STYLE_GUIDE.md, so no Notion access is needed — but if the guide looks out of date, refresh that file from Notion in a separate PR. - Clarity for a general audience — read every user-facing field with non-specialist eyes. The other three skills enforce structure and style; this one judges whether the text is understandable.
Clarity checklist (do manually, no skill yet)
OWID readers are not domain experts. Walk each indicator's user-facing fields and flag anything that requires inside knowledge to parse.
| Field | Clarity check |
|---|---|
title / presentation.title_public |
A non-specialist should know what the indicator measures from the title alone. Expand acronyms unless universally known (skip GDP; expand GWIS, MFI, SDG, IHME). Don't cram units into the title. |
description_short |
One or two short sentences: what the metric is and what it covers. No jargon without a gloss. Active voice. The chart subtitle is short by design — no run-on or stacked clauses. |
description_key |
Each bullet should land a distinct, useful fact. Skip filler ("this dataset is widely used"); prefer substantive caveats (coverage gaps, methodology limits, what counts/doesn't count). |
display.name |
Short legend label. Reads naturally on a chart axis/legend; doesn't restate the title. |
presentation.grapher_config.note |
Concise footnote, ≤1 sentence ideally. |
Flag and rewrite when you find:
- Acronyms or technical terms that aren't expanded the first time they appear
- Sentences that only make sense if you already know the data source
- Quantitative claims with no unit context (e.g. "burned area" without "in hectares" surfacing somewhere in the user-facing text)
- Inconsistent terminology between indicators in the same dataset (e.g. "wildfires" in one, "vegetation fires" in another)
- Domain phrases that have a plain-English equivalent (e.g. "anthropogenic emissions" → "human-caused emissions")
When a phrasing is ambiguous, propose a concrete rewrite — don't just flag it.
If any skill rewrites a .meta.yml, re-run the affected step so the built catalog reflects the edits. Add --grapher when the affected step is on the grapher channel — without it the local catalog is updated but staging stays stale, so the step 7 indicator upgrade sees the old text.
# garden / meadow:
.venv/bin/etlr <channel>/<namespace>/<new_version>/<short_name> --private --force --only
# grapher:
.venv/bin/etlr grapher/<namespace>/<new_version>/<short_name> --grapher --private --force --only
Then re-run the relevant check to confirm zero remaining violations.
6c) Indicator metadata coverage, dataset block, and link verification The other quality checks catch content issues; this step catches missing fields and broken URLs before they reach review.
Snapshot DVC freshness. etl update clones the previous snapshot's .dvc content verbatim except for date_accessed. Always re-check date_published and the year in citation_full / attribution under snapshots/<ns>/<new_version>/*.dvc — they will otherwise silently ship the old version's values. Set date_published to the producer's real release date when discoverable; otherwise copy date_accessed. Bump the year in citation_full and attribution to match.
Citation year vs
date_publishedyear. After setting both, check whether the year insidecitation_full/attributionmatchesdate_published's year. If they differ, confirm it's intentional before shipping: it's legitimate when the producer labels the release by edition rather than publish date (e.g. UN IGME's "2025 report" published2026-03-17→citation_full(2025),date_published2026), but otherwise it's a stale citation. When the gap is deliberate, leave a one-line note for the reviewer so they don't re-flag it.Last-Modifiedheader asdate_publishedsource. When the producer's page states no release date (common on fully JS-rendered sites like the IMF Datamapper), the download URL's HTTPLast-Modifiedheader is a defensible source — it's the server's own timestamp for the file, not an inference. Corroborate it against a release-named filename (e.g.…Dec 2025.xlsx+Last-Modified: Fri, 12 Dec 2025) and note the provenance when reporting to the user.Stale producer description on a JS-rendered page. If the
.dvcmeta.origin.descriptionis producer text that may have changed but the page is an SPA shell (static HTML empty, WebFetch 403s, Wayback archives only the shell), don't burn time probing API endpoints and don't rewrite the producer's text to match the data — the blurb can legitimately lag their own releases (FPP shipped 153 countries while the page said 151). Ask the user to paste the page text from their browser, then diff it against the existing.dvctext and apply only the substantive changes. Clipboard pastes flatten typography (curly quotes → straight, en-dashes → hyphens) — keep the existing typographic punctuation unless the words themselves changed.
Mandatory fields per indicator. For every indicator in the garden .meta.yml, confirm these are set (either on definitions.common or per-indicator):
| Field | Notes |
|---|---|
title |
Per-indicator |
unit |
Common is fine |
short_unit |
Common is fine |
description_short |
Per-indicator |
description_key |
At least one bullet; usually common |
processing_level |
minor or major |
presentation.topic_tags |
At least one tag |
display.numDecimalPlaces |
Common is fine |
display.tolerance |
Common is fine — chart tolerance for missing years |
display.name |
Per-indicator — required for legend labels |
presentation.attribution_short |
Set explicitly — does NOT inherit from the origin's attribution_short (verified: MySQL variables.attributionShort stays NULL if it's only on the origin). Place under definitions.common.presentation for the common case. |
Conditional: if processing_level: major, every indicator with that level MUST also have description_processing.
Not mandatory (skip if you don't need them): presentation.title_public, presentation.title_variant, presentation.attribution.
Dataset block. Garden .meta.yml MUST include update_period_days:
dataset:
update_period_days: <N>
This controls the auto-update cadence. Even when the rest of the dataset: block is empty, never strip update_period_days — leave the block in place with just that field.
Link verification. Run a HEAD request on every URL in the new .dvc and .meta.yml files (all channels — meadow .meta.yml files matter when they exist). Anything non-2xx is a signal, not a guaranteed break — always double-check before acting:
for url in $(rg --no-filename -No "https?://[^\"' ]+" snapshots/<namespace>/<new_version>/ etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/ \
| sed -E 's/[).,;:>]+$//' \
| sort -u); do
printf "%s %s\n" "$(curl -sI -L -o /dev/null -w '%{http_code}' --max-time 15 -A 'Mozilla/5.0' "$url")" "$url"
done
The --no-filename flag prevents rg from prepending path: to each match (otherwise the for-loop tries to curl path:url and every check returns 000). -A 'Mozilla/5.0' sometimes coaxes a real response out of Cloudflare-fronted hosts, but it doesn't always work — see the next note.
curl non-2xx ≠ broken. Cloudflare-fronted sites (notably ourworldindata.org) can return 404 to curl on URLs that work fine in a browser, depending on edge-node routing, IP geolocation, and cached state. Before treating a 4xx as a real failure:
- Re-check with
WebFetch(the built-in tool). It uses a different code path and aMozilla/5.0UA that Cloudflare usually accepts. A200with a coherent page body is authoritative — trust it over curl. - If
WebFetchalso fails, sanity-check the Wayback Machine:https://web.archive.org/web/<year>/<url>. A recent successful snapshot means the URL is reachable on the public internet and your local route is the problem. - Only act on a true failure — both
WebFetchand Wayback unable to reach the URL — and even then flag and ask the user before silently rewriting an external link in metadata. Replacing a working link with a "safer" alternative because of a curl false-positive is worse than leaving the original. Apply the same restraint here as the global "Checkpoints — when to pause" section.
Fix any genuinely-non-2xx hit on url_main, url_download, license.url, or URLs referenced from description / description_key before continuing. The sed strips trailing markdown/punctuation chars (), ., ,, ;, :, >) so URLs inside [text](url) aren't reported as broken because of a stray closing paren.
Verification. After editing, re-run the affected step (with --grapher if grapher) so the catalog reflects the changes. Then confirm presentation.attribution_short actually landed:
from owid.catalog import Dataset
ds = Dataset("data/grapher/<ns>/<v>/<short_name>")
tb = ds["<table>"]
print(tb["<col>"].metadata.presentation.attribution_short) # must NOT be None
Or after the staging upload:
make query SQL="SELECT shortName, attributionShort FROM variables WHERE catalogPath LIKE '%<ns>/<v>/<short_name>%'"
Indicator upgrade (optional, staging only)
- First upload the new grapher dataset to the staging DB (required before the upgrader can detect it):
STAGING=<branch> .venv/bin/etlr data://grapher/<namespace>/<new_version>/<short_name> --grapher --private - Then run the automatic upgrader:
STAGING=<branch> .venv/bin/etl indicator-upgrade auto - CRITICAL: After the upgrader finishes, always verify it actually worked by querying staging:
If the count is 0, the upgrade did not run — re-run it.mysql -h "staging-site-<branch>" -u owid --port 3306 -D owid -e "SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'" - The auto-upgrader only remaps grapher charts — NOT ETL-defined explorers or MDims. Explorers (
export://explorers/...) and multidims (export://multidim/...) reference indicators by catalog path and are rebuilt by running their export steps, which the indicator-upgrader never touches. If the dataset has any (check the DAG:rg "export://(explorers|multidim)/.*/<short_name>" dag/ -g "*.yml"), they'll still point at the old variables on staging until you re-run them:
Verify none still reference the old version (both queries should return empty):STAGING=<branch> .venv/bin/etlr export://explorers/<ns>/latest/<short> export://multidim/<ns>/latest/<short> ... --export --private# explorers mysql -h "staging-site-<branch>" -u owid -P 3306 -D owid -e "SELECT DISTINCT ev.explorerSlug FROM explorer_variables ev JOIN variables v ON ev.variableId=v.id WHERE v.catalogPath LIKE '%<ns>/<old_version>%'" # mdims mysql -h "staging-site-<branch>" -u owid -P 3306 -D owid -e "SELECT DISTINCT mdp.slug FROM multi_dim_x_chart_configs mx JOIN variables v ON mx.variableId=v.id JOIN multi_dim_data_pages mdp ON mdp.id=mx.multiDimId WHERE v.catalogPath LIKE '%<ns>/<old_version>%'" - Also verify narrative charts. Narrative-chart configs can pin a
variableIdin their own patch (not inherited from the parent chart), and that id can date from a version older than the one this update started from — left stale by a previous cycle. The auto-upgrader only carriesold_version → new_versionmappings, so it can never remap those, and thechart_dimensionscount above can't catch them either: narrative-chart variable ids live only insidechart_configs. The upgrader warns about this case ("was NOT remapped: it pins indicators from a version of the upgraded dataset that the mapping does not cover") — watch its output for that warning. But the upgrader only visits narrative charts whose parent chart was affected by the mapping; a stale narrative chart whose parent no longer uses any mapped indicator is never visited and stays silent. So always run this catch-all scan over all narrative-chart configs:
If any are found, remap them with an explicit mapping via the upgrader's own CLI helpers (load by parent chart id;# STAGING=<branch> .venv/bin/python — scan narrative chart configs for variables on ANY old version import json from etl.config import OWID_ENV old_vars = set(OWID_ENV.read_sql( "SELECT v.id FROM variables v JOIN datasets d ON d.id = v.datasetId " "WHERE d.catalogPath LIKE %(p)s AND d.catalogPath NOT LIKE %(new)s", params={"p": "%/<short_name>", "new": "%<new_version>%"})["id"]) df = OWID_ENV.read_sql("SELECT nc.id, nc.name, nc.parentChartId, JSON_EXTRACT(cc.full, '$.dimensions') AS dims " "FROM narrative_charts nc JOIN chart_configs cc ON cc.id = nc.chartConfigId") stale = [(r["id"], r["name"], d["variableId"]) for _, r in df.iterrows() if r["dims"] for d in json.loads(r["dims"]) if d.get("variableId") in old_vars] print(stale) # must be emptycli_upgrade_indicatorswon't reach them because it finds charts viachart_dimensions):
Then re-run the scan and confirm it's empty.from sqlalchemy.orm import Session import etl.grapher.model as gm from apps.indicator_upgrade.upgrade import push_new_narrative_charts_cli from etl.config import OWID_ENV with Session(OWID_ENV.engine) as session: ncs = gm.NarrativeChart.load_narrative_charts_by_parent_chart_ids(session, [<parent_chart_id>]) ncs = [nc for nc in ncs if nc.id in {<stale_nc_ids>}] errors = push_new_narrative_charts_cli(ncs, {<old_var_id>: <new_var_id>})
- First upload the new grapher dataset to the staging DB (required before the upgrader can detect it):
Update context for public announcement
- Maintain
workbench/<short_name>/update-context.ymlas the canonical record of facts discovered during the update. Do not wait until the end if a fact is already known; append/update as each step completes. - At minimum, record:
dataset: namespace: <namespace> old_version: <old_version> new_version: <new_version> short_name: <short_name> title: <public dataset title, if known> producer: <producer, if known> source: release_date: <snapshot origin date_published, if known> next_release: <best-effort, or null> url_main: <source page, if known> citation_full: <citation, if known> coverage: year_min: <garden min year> year_max: <garden max year> countries: <distinct countries/entities> includes_regions: <true/false> sparse_recent_year_note: <note, or null> charts: published_count: <published chart count> size_qualifier: <handful|moderate|large|massive> explorers: <list of published explorer slugs using this data, or []> mdims: <list of MDim slugs using this data, with published flag, or []> selected_views: - title: <chart title> slug: <chart slug> rationale: <why this represents the dataset> update_summary: snapshot_diff: <short summary or artifact path> meadow_diff: <short summary or artifact path> garden_diff: <short summary or artifact path> notable_changes: [] sanity_check_findings: [] resolved_workarounds: [] editorial_context: why_it_matters_snippets: [] caveat_snippets: [] interesting_update_snippets: [] - Query the staging DB for published charts using the new dataset (filter on
c.publishedAt IS NOT NULL). Draft/unlisted charts must not be counted in the announcement:SELECT c.id, cc.slug, cc.full->>'$.title' as title, cc.full->>'$.type' as type, cc.full->>'$.hasMapTab' as hasMapTab FROM charts c JOIN chart_configs cc ON cc.id = c.configId JOIN chart_dimensions cd ON cd.chartId = c.id JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%<namespace>/<new_version>%' AND c.publishedAt IS NOT NULL GROUP BY c.id - Charts are not the only surface — also count the explorers and MDims that use this data. Many datasets feed published OWID explorers and multi-dimensional collections, which the grapher-
chartsquery above misses entirely. Run the export steps first (step 7) so these point at the new variables, then query both:
Chart-based explorers can also attach via-- Explorers (note isPublished — only published ones count for the announcement) SELECT DISTINCT ev.explorerSlug, e.isPublished FROM explorer_variables ev JOIN variables v ON ev.variableId = v.id JOIN explorers e ON e.slug = ev.explorerSlug WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'; -- MDims (note published flag — drafts are published=0) SELECT DISTINCT mdp.slug, mdp.published FROM multi_dim_x_chart_configs mx JOIN variables v ON mx.variableId = v.id JOIN multi_dim_data_pages mdp ON mdp.id = mx.multiDimId WHERE v.catalogPath LIKE '%<namespace>/<new_version>%';explorer_charts.chartId(join throughchart_dimensions) rather thanexplorer_variables— check that table too if the variable-based query comes up empty but the DAG shows an explorer step. Record published explorers/MDims undercharts.explorers/charts.mdimsinupdate-context.yml, and fold them into the "How many charts did this update affect?" answer (e.g. "10 published charts, 3 explorers, plus 3 draft MDims"). Only count published surfaces (isPublished=1/published=1) toward the public announcement; note unpublished ones for QA. - Map the published chart count to
size_qualifier: 1–9 =handful, 10–49 =moderate, 50–199 =large, 200+ =massive. - Pick 1–3
selected_viewsusing these criteria (in order of preference):- Map views — immediately visual, readers can find their own country
- Charts with punchy, standalone headlines — titles that make a clear claim work best for social sharing
- Global trend charts (StackedArea / World) — show the big picture over time
- Skip: population-weighted variants (harder to read quickly), within-regime breakdowns (too niche), country-specific views
- Add snippets for the editorial prompts from source metadata, garden/grapher metadata, resolved sanity-check/workaround notes, and non-routine PR changes. Keep these as snippets/facts, not polished Slack prose.
- Maintain
Slack announcement & PR update
- Run the
data-updates-commsskill withworkbench/<short_name>/update-context.ymlas input.data-updates-commsis the canonical owner of the Slack form wording, copy-paste format, editorial framing, search URL, and any standalone fallback gathering. Do not duplicate that rendering logic here. - Save the rendered draft to
workbench/<short_name>/slack-announcement.md. - If
data-updates-commsreports missing mechanical fields, gather them, updateupdate-context.yml, and re-render rather than inventing values. Ask the user if a missing field requires judgment. - Add the announcement to the PR description as a collapsed
<details>section titled "Slack Announcement", with the file content embedded inside a triple-backtickmarkdownfence. - Post
@codex reviewas a separate PR comment (not in the PR description) to trigger an automated code review. Use:gh pr comment <pr_number> --body "@codex review" - Tell the user, with a markdown link to the saved file so they can click through to open it:
"Slack announcement drafted at [workbench/<short_name>/slack-announcement.md](workbench/<short_name>/slack-announcement.md) and added to the PR description. Please review and post it to #data-updates-comms."Always render the path as a markdown link[…](…), not as inline-code — the chat UI renders it as clickable that way.
- Run the
9b) Data update post (for OWID /latest)
Draft the short reader-facing post that gets published on https://ourworldindata.org/latest. The team drafts these in Google Docs in the shared /Data updates Drive folder (https://drive.google.com/drive/folders/1oL0uLHKI6f2qi1rJA6-qFFRYEBw_-rfm), and OWID's CMS ingests the doc into the published feed.
The skill's job is to produce paste-ready Google Doc content in the exact CMS format the team uses (frontmatter title / excerpt / type / authors / kicker → \[+body\] marker → body prose with inline markdown links → {.cta} block → {.image} block → \[\] end marker). Don't invent your own format — every published post in the Drive folder follows the same shape.
This is separate from the Slack announcement — that one is a 10-field form for the internal channel; this one is a mini-blog-post for OWID readers, and the format is structured for CMS ingestion.
Steps:
- Open
.claude/skills/update-dataset/data-update-template.mdand follow it — the template has the exact paste-ready format plus three worked examples (NVIDIA, H5N1, World Bank PIP) lifted verbatim from the Drive folder. - Use the facts already gathered in
workbench/<short_name>/update-context.yml(step 8) —dataset.title,dataset.producer,source.url_main,source.citation_full,coverage.*,charts.published_count,charts.selected_views, and theeditorial_context.*snippet lists. Also pull fromworkbench/<short_name>/slack-announcement.md(step 9 output) — the editorial framing already drafted there is the closest cousin. If a field needed for the post isn't yet inupdate-context.yml, gather it (snapshot DVC, garden.meta.yml, orurl_mainvia WebFetch) and persist it back to the YAML so the next consumer doesn't re-do the work. - Title shape — a punchy finding/claim, a question, or an action/invitation. Not just the dataset name. See the template's "Field-by-field guidance" for examples and decision logic.
- Body — 100–200 words, first-person, conversational. Sample: ATUS ~105, NVIDIA ~140, robots ~110, OECD Government at a Glance ~155, US data centers ~145, UNU-WIDER ~155, World Bank PIP ~190, ozone ~165, mobile money ~180, fertilizers ~170, H5N1 ~135. The body should give a reader a reason to care and at least one concrete number — not "I refreshed our charts".
- Inline markdown links throughout the body for the producer's page, methodology pages, and related OWID articles.
*italics*for emphasis, sparingly. - CTA URL choice:
- One chart focus ⇒ grapher URL
https://ourworldindata.org/grapher/<slug>. - Multiple charts (default) ⇒ search URL
https://ourworldindata.org/search?datasetProducts=<URL-encoded dataset title>— value is the dataset title, resolved with this priority: (a) thedataset.titlefield in the garden.meta.ymlif it's set there (an override), otherwise (b) themeta.origin.titlefield in the snapshot.dvc. Often includes a parenthetical acronym likeLuxembourg Income Study (LIS)orWorld Bank Poverty and Inequality Platform (PIP). Not the bareproducerfield. - Topic has an existing OWID explorer ⇒
https://ourworldindata.org/explorers/<name>. - Curated topic page exists ⇒ topic URL (e.g.
/sdgs). - Do not use
/collection/custom?charts=…URLs.
- One chart focus ⇒ grapher URL
- CTA text — descriptive: "Explore the updated data in our interactive charts" (default), "Explore all of the updated data in our interactive charts" (broad), "Explore the interactive version of this chart" (single chart), "Explore this data going back to YYYY in our interactive chart" (single chart with date depth).
- Image filename —
YYYY-MM-data-update-<slug>.png(e.g.2026-04-data-update-h5n1-flu.png). The skill doesn't generate the image; the user adds it to the Doc separately. - Save the draft to
workbench/<short_name>/data-update.md. - Add a collapsed
<details>section titled "Data update post (for OWID /latest)" to the PR description, placed after the Slack-announcement section, with the file content embedded inside a triple-backtickmarkdownfence. - Tell the user, with a markdown link to the saved file so they can click through to open it:
"Data update post drafted at [workbench/<short_name>/data-update.md](workbench/<short_name>/data-update.md) in the Google Docs CMS format. Please create a new Google Doc in /Data updates, paste the draft, attach the chart screenshot, and share for review."Always renderworkbench/<short_name>/data-update.mdas a markdown link[…](…)rather than as a bare path or inline-code path — the chat UI renders it as clickable that way.
- Codex review: address comments and resolve threads
- Codex's delivery channel depends on the verdict — poll both. A clean pass arrives as an issue comment ("Didn't find any major issues") from
chatgpt-codex-connector[bot], with zero inline comments and no formal review object. A review with findings arrives as a formal review ("💡 Codex Review") with inline comments, and no issue comment. A watcher polling only one channel waits forever on the other outcome — treat a hit on either as completion. - Wait ~60 seconds after posting
@codex review, then poll both channels:gh api repos/owid/etl/issues/<pr_number>/comments | python3 -m json.tool # clean-pass summary lands here gh api repos/owid/etl/pulls/<pr_number>/comments | python3 -m json.tool # findings land here as inline comments - Codex posts in one of two places — always check both. When it finds issues, it leaves inline review comments (the endpoint above) with resolvable threads. When it finds nothing, it posts a single top-level PR (issue) comment instead — no inline comments, no threads — e.g. "Codex Review: Didn't find any major issues. Keep it up!". So if the inline-comments endpoint is empty, check the issue comments before concluding Codex hasn't run yet:
A "no issues" / 👍 comment fromgh api repos/owid/etl/issues/<pr_number>/comments \ --jq '.[] | select(.user.login | test("codex";"i")) | .body'chatgpt-codex-connector[bot]means the review is done and there's nothing to address — don't keep polling for inline comments that will never come. - Fetch open review thread IDs via GraphQL:
gh api graphql -f query='{ repository(owner:"owid", name:"etl") { pullRequest(number:<pr_number>) { reviewThreads(first:20) { nodes { id isResolved comments(first:1) { nodes { body } } } } } } }' - For each unresolved Codex comment:
- If valid: apply the fix, commit, push, then resolve the thread:
gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"<thread_id>"}) { thread { id isResolved } } }' - If not valid / not applicable: reply explaining why, then resolve the thread:
gh api repos/owid/etl/pulls/<pr_number>/comments/<comment_id>/replies -f body="<explanation>" gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"<thread_id>"}) { thread { id isResolved } } }'
- If valid: apply the fix, commit, push, then resolve the thread:
- If neither the inline-comments endpoint nor the issue-comments endpoint shows a Codex post after 60 s, wait another 60 s and retry (up to ~5 min total). Codex can take 5–10 min — a clean review often arrives only as the top-level "no issues" comment.
Committing and pushing
Commit and push incrementally as you go — after each step that produces code changes. Don't wait until the end. Use descriptive commit messages with appropriate emojis (the one auto-prepended by etl pr for the chosen category + 🤖 for AI-written code).
At the end of the workflow, update the PR description with:
- A tracking-issue link as the first line of the Summary — e.g.
Tracks: [owid/owid-issues#NNNN](https://github.com/owid/owid-issues/issues/NNNN). Most data updates have a correspondingowid-issuesticket; try to find it by searching the title or<short_name>first, and ask the user for the issue number if you can't locate one rather than skipping the link silently. - A summary of key changes at the top
- Collapsed
<details>sections only for the pipeline steps that changed in a non-obvious way. Skip any step that's just the boilerplate generated byetl update— don't add a placeholder like "unchanged from boilerplate". The Summary already explains the why; per-step sections are only for the how, when the how isn't obvious from the diff. - A collapsed section for the Slack announcement
Downstream dependency check
After completing the update, check if any other datasets depend on the old version of the updated dataset:
rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"
Filter out the old dataset's own DAG entries (snapshot → meadow → garden → grapher chain). Any remaining references are downstream dependents that still point to the old version.
If downstream dependents exist:
- Tell the user which datasets depend on the old version and need updating in a follow-up PR
- Add a "Downstream dependencies" section to the PR description (not collapsed — this is important) listing the dependent datasets with a note that they should be updated to point to the new version in a follow-up PR
Removing the old version & reordering the DAG
After the ETL update, etl update appends the new version entries to the bottom of the main DAG file while the old version's entries stay in their original slot. Always ask the user whether to remove the old version — but never skip this checklist item, and when the user agrees, always do the reorder too.
Workflow when the user agrees:
- Delete the old version. Remove its entries (snapshot → meadow → garden → grapher) from the main DAG file (e.g.,
dag/poverty_inequality.yml) and delete its files (etl/steps/...,snapshots/...). The archive dag (dag/archive/*.yml) is not edited by hand —etl archive-dagreconstructs it from git history, recording each removed step with the commit where it was last active (for recovery viagit checkout). - Move the new entries into the old slot so the dataset stays grouped with its neighbours and section comment. The new entries should not remain at the bottom of the main DAG.
- Preserve the original section comment (same indentation as the old block) above the new entries.
- Prefer the nested (compact) DAG format.
etl updateemits the flat form (each step a separate top-level key with a flat dep list); the loader (etl/dag_helpers.py:_parse_dag_yaml) also accepts the nested form, where the chain is declared inline and flattens to the same graph. The nested form is the team's preferred style and is usually what the archived old block already used:
Convert the relocated new entries to nested while reordering, so the active and archived blocks match. Verify it parses withdata://grapher/<ns>/<v>/<short>: - data://garden/<ns>/<v>/<short>: - data://garden/regions/2023-01-01/regions - data://meadow/<ns>/<v>/<short>: - snapshot://<ns>/<v>/<short>.csvpython -c "from etl.dag_helpers import load_dag; load_dag()"(a malformed nesting raises). - Verify:
rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"returns nothing, andrg "<namespace>/<new_version>/<short_name>" dag/ -g "*.yml"shows the entries only in the main file (under the section comment), not at the bottom. - Run
make checkand commit with🔨🤖 Remove old <name> entries and reorder DAG.
Expect a Codex false-positive on the archive edit. Because this step touches dag/archive/*.yml, Codex often flags it ("avoid updating archived DAG entries" — the AGENTS.md rule against editing archived files). This is expected: archiving is the explicitly-requested workflow step, and the rule's own "unless explicitly asked" exception applies. Reply citing that and resolve the thread — don't revert the archive.
Final QA hand-off — Anomalist + Chart Diff in Wizard
This is the last step, after the DAG archive has been committed. Don't auto-run these — they're human-judgment tools. Hand off the two staging links so the user can review and click through:
Anomalist — flags variables whose new values diverge from the old version beyond statistical thresholds. Catches accidental scale changes, base-year rebases that propagated the wrong way, and silent drops.
http://staging-site-<container_branch>/etl/wizard/anomalistCheck the upgrade detectors' coverage before handing off. Anomalist's
upgrade_missing/upgrade_changedetectors only compare old→new variable pairs from the wizard's variable-mapping table — and the indicator upgrader persists mappings only for charted indicators. If only some of the dataset's indicators are used in charts (the common case), the upgrade detectors silently skip the rest, and a partial mapping suppresses the shortName-inference fallback that would otherwise cover everything. Verify withWizardDB.get_variable_mapping_raw(): if it has fewer pairs than the dataset has indicators, rebuild the full mapping by shortName (old vs. newvariablesrows bydatasetId) and re-run:STAGING=<branch> .venv/bin/etl anomalist --anomaly-types upgrade_missing --anomaly-types upgrade_change \ --dataset-ids <new_dataset_id> --variable-mapping '<full json mapping>' --forceThen spot-check the stored
anomalies.dfReducedrows include indicators beyond the charted ones.Chart Diff — shows side-by-side before/after thumbnails for every chart that uses an upgraded indicator. Catches visual regressions the schema-level checks miss (axis ranges, color steps, legend changes).
http://staging-site-<container_branch>/etl/wizard/chart-diff
Important: derive <container_branch> correctly. The staging hostname is not simply staging-site-<branch>. The container name is produced by get_container_name(branch) in etl/config.py:
- Replace
/,.,_with-in the branch name. - Strip a leading
staging-site-if present. - Truncate to the first 28 characters (Cloudflare DNS limit).
- Strip any trailing
-.
Branches over 28 chars therefore get clipped. Example: data-military-expenditure-2026 (30 chars) → container data-military-expenditure-20 → hostname staging-site-data-military-expenditure-20. The simplest way to get the correct value is to call the helper:
.venv/bin/python -c "from etl.config import get_container_name; print(get_container_name('<branch>'))"
Tell the user something like: "Final QA: please review Anomalist and Chart Diff in the Wizard. If anything looks off, let me know and I'll investigate."
These pages need a fresh staging build, so they're only meaningful after the PR's grapher upload to staging has completed and the staging server has rebuilt.
Guardrails and tips
END_YEAR/ "as of" framing for status/event datasets. When a dataset records events (and derives a status time series) and its latest event year lags the release date, you face a choice: forward-fill the latest status to the release year, or stop the series at the last event year and note the "as of" date in metadata. Prefer the latter — forward-filling invents data points for years with no source information (and shifts anEND_YEAR-style constant ripples through the whole series). Keep the series at the last real year and add the currency note todescription_processingand adescription_keybullet (e.g. "The legal status shown for each country reflects the situation as of."). Confirm the choice with the user; they may change their mind (in this update we forward-filled to the release year, then reverted to the last event year + an "as of" note). - Re-test "manual upload" snapshots — the blocking may be inverse-UA. When a snapshot's docstring says the file is uploaded manually because "the website blocks the download request", verify that claim before carrying it into the new version. Some hosts (e.g. the IMF Datamapper) reject browser-like User-Agents with 403 while letting plain, honestly-identified clients through — the inverse of the usual bot-blocking — and the ETL downloader's default UA (
DEFAULT_USER_AGENTinetl/download_helpers.py) is browser-like, so the original author may have misdiagnosed an automatable source. Test both directions (plainrequestsvs. browser UA) against the direct file URL. If the plain UA works: seturl_downloadin the.dvcand passuser_agent="owid-etl/1.0 (https://ourworldindata.org)"(or similar plain UA) tosnap.create_snapshot(...). Keep the snapshot.pyscript in that case — the script-less.dvc-only path (run_snapshot_dvc_only) callscreate_snapshot()without auser_agentand would 403 — and say so in the docstring so nobody deletes it as "redundant". - DAG consistency: After
etl update, always verify that all new steps indag/main.ymlreference each other with the new version. A common bug is garden depending on old meadow or old snapshot — this silently loads stale data. - Never return empty tables or comment out logic as a workaround — fix the parsing/transformations instead.
- Column name changes: update garden processing code and metadata YAMLs (garden/grapher) to match schema changes.
- Indexing: avoid leaking index columns from
reset_index(); format tables withtb.format(["country", "year"])as appropriate. - Metadata validation errors are guidance — update YAML to add/remove variables as indicated.
- Mixed-type object columns at meadow: when
pd.read_csvproduces anobjectcolumn that mixes strings andNaN(common for sparse text columns like sources/comments/punishments), the feather repacker rejects it. Cast those columns to pandas"string"dtype beforetb.format(...). paths.regionsauto-resolves DAG dependencies:paths.regions.add_population(tb)andpaths.regions.add_aggregates(tb, regions=[...])pick up thepopulationandincome_groupsdatasets directly from the DAG. Don'tpaths.load_dataset("population")and pass it through unless the helper specifically asks for the dataset — the parameter is unused.- WB income-group aggregates: add the four classification names (
High-income countries,Upper-middle-income countries,Lower-middle-income countries,Low-income countries) to yourREGIONSlist and adddata://garden/wb/<latest>/income_groupsto the DAG.paths.regions.add_aggregates(...)auto-resolves the classification. - Detect structural placeholders dynamically: when a source ships "balanced panel" rows that are zero everywhere by design (status combos that exist only for completeness), detect them at runtime (
groupby(...).max() == 0) and assert the count matches the codebook. A coding change in the source then surfaces as a test failure instead of silently shipping noise. - Codebook-vs-data inconsistencies: when the codebook documents one thing but the actual CSV shows another (placeholder claimed but non-zero rows present, etc.), preserve the data as-shipped and flag it in the PR description for the producer to confirm. Don't silently force the data to match the codebook.
- Grapher
.meta.ymlonly when it adds something: the grapher step inherits everything viadefault_metadata=ds_garden.metadata, so drop the grapher.meta.ymlif it only duplicates the garden values. Keep it only for genuine grapher-side overrides. processing_level: majorrequiresdescription_processing: keepprocessing_level: minoras the common default and override tomajoronly on indicators that have adescription_processingfield. Don't blanket-setmajoron the common block and then leave country-level proportions without their own processing note.- Per-indicator description_processing reads better than a generic shared note: when an indicator is derived (combined-categorical buckets, regional aggregates, computed counts), spell out that indicator's derivation. Reusing named definitions for shared boilerplate is fine; just compose them into per-indicator sentences rather than dropping a single generic note across all indicators.
description_keyindefinitions.commonpropagates only to indicators without their own list: if you want a bullet to appear on every indicator, either keep it oncommon.description_keyand don't define per-indicator lists (it inherits), or prepend it explicitly to each per-indicator list (treats it as a "first bullet" pattern).- Phantom-category audit on categorical indicators: after building categorical indicators, sweep every indicator and compare YAML
sort:labels against the unique values that actually appear in the data. Phantom labels (declared insort:or in a category map but never produced) clutter chart legends with empty buckets. Either drop them fromsort:anddescription_key, or remove them from the map if they can never occur given the data shape. Re-run the audit on every data refresh — phantoms can reappear when a category is dropped upstream. NOTE:comments for the next maintainer when behaviour is data-conditional: when something in the code holds only because of the current data shape (e.g. "only 4 indicators have an EoE=0 row", "only Brazil 2025 is a transition-year artefact"), leave a# NOTE:comment near the relevant block asking the next data update to re-audit. Helps future maintainers spot which assumptions might decay before they bite.- Indicator Upgrader CLI for one-shot chart remaps: when v1 → v2 short_names change so much that the auto-upgrader can't match them, drive the remap manually. Write a small script that calls
WizardDB.add_variable_mapping(mapping={old_id: new_id, ...}, dataset_id_old=..., dataset_id_new=..., comments="...")with the explicit pairs, then runfrom apps.indicator_upgrade.upgrade import cli_upgrade_indicators; cli_upgrade_indicators(dry_run=True)to preview affected charts, and(dry_run=False)to apply. Mappings stay in the wizard DB untilWizardDB.delete_variable_mapping()is called, so a slug-collision failure can be recovered by fixing the slug and rerunning the upgrade — only un-upgraded charts get reattempted. The active staging DB is inferred from the current git branch. - Drop-in vs restructure decision point: when the new dataset has a different shape (long vs wide, more policies, changed score semantics, dropped composite measures),
etl update --renameis the wrong starting point — the structure of meadow/garden/grapher needs to follow the new shape, and the rename flow will only produce confusion. Spot this fork early at the snapshot/codebook stage, before runningetl update. Scaffold the new chain via thecreate-etl-stepsskill (wraps the wizard's cookiecutter templates) or launch the wizard UI withetlwizand use its "ETL Steps" page — both produce a consistent meadow/garden/grapher skeleton to fill in. Once scaffolded, read the v1 scripts as a reference for the source-specific logic that's still relevant (column-rename maps, status/category normalisations, country harmonisation map, sanity checks, codebook-driven structural assertions) — don't copy the v1 structure blindly, but port the bits that still apply to the new schema.
When the update is review-heavy and you need iterative back-and-forth with a topic owner over staging, see the report-indicator-changes skill for drafting the message.
Artifacts (expected)
workbench/<short_name>/snapshot-runner.mdworkbench/<short_name>/progress.mdworkbench/<short_name>/notes_to_check.md(one entry per carried-over# NOTE:/# TODO:, plus detectedsanity_checksfunctions and their log-control flags)workbench/<short_name>/sanity_checks.log(only if step 5b ran)workbench/<short_name>/meadow_diff_raw.txtandmeadow_diff.mdworkbench/<short_name>/garden_diff_raw.txtandgarden_diff.mdworkbench/<short_name>/harmonization.logandharmonization_audit.md(from step 5c)workbench/<short_name>/indicator_upgrade.json(if indicator-upgrader was used)workbench/<short_name>/update-context.yml(canonical facts gathered during the update; consumed bydata-updates-comms)workbench/<short_name>/slack-announcement.mdworkbench/<short_name>/data-update.md(public-facing post draft for OWID /latest, from step 9b)
Example usage
- Minimal catalog URI with explicit old version:
update-dataset data://snapshot/irena/2024-11-15/renewable_power_generation_costs 2023-11-15 update-irena-costs
Common issues when data structure changes
- SILENT FAILURES WARNING: Never return empty tables or comment code as workarounds!
- Column name changes: If columns are renamed/split (e.g., single cost → local currency + PPP), update:
- Python code references in the garden step
- Garden metadata YAML (e.g.,
food_prices_for_nutrition.meta.yml) - Grapher metadata YAML (if it exists)
- Index issues: Check for unwanted
indexcolumns fromreset_index()— ensure proper indexing withtb.format(["country", "year"]). - Metadata validation: Use error messages as a guide — they show exactly which variables to add/remove from YAML files.