update-dataset - SKILL.md Agent Skill

name: update-dataset description: End-to-end dataset update workflow with PR creation, snapshot, meadow, garden, and grapher steps. Use when user wants to update a dataset, refresh data, run ETL update, or mentions updating dataset versions. metadata: internal: true

Update Dataset (PR → snapshot → steps → grapher)

Use this skill to run a complete dataset update with Claude Code subagents, keep a live progress checklist, and pause for user approval only when something needs attention.

Paired skill — keep in sync. /review-data-pr is the reviewer-side counterpart of this skill: it verifies the outcomes of the author-side steps defined here. Whenever you add, remove, or change a workflow step in this file, check whether review-data-pr/SKILL.md needs a matching reviewer-side check (and add it in the same commit if so). The reverse also holds — see the mirror note there.

Inputs

<namespace>/<old_version>/<name>
Get <new_version> as today's date by running date -u +"%Y-%m-%d"

Optional trailing args:

branch: The working branch name (defaults to current branch)

Assumptions:

All artifacts are written to workbench/<short_name>/.
Persist progress to workbench/<short_name>/progress.md and update it after each step.
Persist reusable update facts to workbench/<short_name>/update-context.yml as they are discovered. This is the canonical context artifact for the PR description, review handoff, and data-updates-comms.

Progress checklist (maintain, tick live, and persist to progress.md)

Parse inputs and resolve: channel, namespace, version, short_name, old_version, branch
Clean workbench directory: delete workbench/<short_name> unless continuing existing update
Run ETL update workflow via etl-update subagent (help → dry run → approval → real run)
Add yourself to dataset.owners in the new garden .meta.yml (don't reorder; preserve existing names and markers)
Catalog # NOTE: / # TODO: comments carried over from the old step files into notes_to_check.md
Detect any sanity_checks functions and their log-control flags; append to notes_to_check.md
Create or reuse draft PR and work branch
Update snapshot and compare to previous version; capture summary
Meadow step: run + fix + diff + summarize
Garden step: run + fix + diff + summarize
Review sanity_checks output (enable log flag, re-run, scan log, revert flag) — if none found and the garden step does non-trivial logic, recommend adding them; if present but missing value bounds (positive / [0,1] / [0,100] per indicator type), suggest those too (see 5b-bis)
Country harmonization audit: validate .countries.json against canonical regions (flag provider regions not yet in the regions dataset → /add-provider-regions), audit .excluded_countries.json, scan garden log for missing/unused/unknown warnings
Region-provider drift: if this dataset's aggregates are in regions.yml (defined_by: <provider>), check whether the new version changed the provider's region set or country membership; if so, update regions.yml and re-propagate via /add-provider-regions
Grapher step: run + verify (skip diffs), or explicitly mark N/A
Re-evaluate each catalogued # NOTE: / # TODO: against fresh data; delete resolved workarounds + comments together, or record status in PR body
Check metadata: typos, Jinja spacing, style guide compliance
Verify indicator-metadata coverage, dataset.update_period_days, snapshot DVC date_published and citation_full year (etl update copies both verbatim — bump to the producer's real release date / year, or to date_accessed / current year if the source doesn't publish one), and that all URLs resolve (HEAD-check)
Commit, push, and update PR description
Run indicator upgrade on staging and persist report
Update update-context.yml with published chart count and 1–3 chart views for the public announcement
Render Slack announcement via data-updates-comms, add to PR description, post @codex review as a separate PR comment, and notify user to post it to #data-updates-comms
Draft public-facing "Data update" post for OWID /latest, add to PR description, hand to user for review and publication
Address Codex review comments (fix valid ones + resolve all threads)
Run downstream-dependency check (rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"); for each consumer outside the dataset's own chain, decide with the user whether to bump in this PR or document under "Downstream dependencies" for a follow-up PR (see "Downstream dependency check" section below for details)
Ask the user whether to remove the old DAG entries; if yes, delete them and their files AND relocate the new entries into the old slot (see "Removing the old version & reordering the DAG") — don't forget this step
Hand off Wizard QA links to the user (Anomalist + Chart Diff on the staging branch) — this is the final step

Persistence:

After ticking each item, update workbench/<short_name>/progress.md with the current checklist state and a timestamp.

Checkpoints — when to pause

Default: keep going. Run through the full workflow without stopping unless one of the conditions below is met.

Stop and ask the user when:

A step fails and the fix is ambiguous (multiple reasonable approaches, or you're unsure of the correct one)
Data structure changed significantly (columns removed/renamed, large row count drops, schema changes that may affect charts)
Country harmonization has new unmatched countries that need manual decisions
The snapshot requires a manual download or credentials you don't have
Indicator upgrade had imperfect matches (< 100% similarity) that need human review
Anything that could silently break charts or lose data

Don't stop for:

Routine assertion count updates (just update them and note in the summary)
Clean step runs with only row increases
Expected warnings (SettingWithCopyWarning, known unmapped territories)
Straightforward filename/version reference updates

When you do stop, present a concise summary of the issue and what options exist.

When the update isn't a drop-in version bump

Some updates carry structural changes that make the standard rename-only flow the wrong tool. Recognise them up front and adjust the workflow.

Triggers — any of these means you're in restructure territory, not a version bump:

short_name changes (producer rebranded the dataset).
File format/schema changes (wide → long, different file extension with a different column set, new dimensions).
Policy/indicator set changes substantially (splits, dropped composites, newly added areas).
Score semantics change (e.g. binary → continuous with subnational coverage).

Workflow adjustments:

Skip etl update. The rename-only flow copies the old step files into a new folder — useless when the schema is different. Author the new step chain by hand, using the old version as inspiration but not as a starting copy.
Add the new chain to the DAG before removing the old. Leave both chains active while you build and validate v2; remove the v1 entries only once v2 is on staging and the chart remap is queued or done.
Decide on naming convention upfront. Ask the user whether to preserve v1 short_names where they map cleanly, or to adopt the source's fresh naming scheme. Fresh naming is cleaner but means the auto-Indicator-Upgrader can't help.
Hand-curate the v1 → v2 indicator mapping. When short_names change entirely, the auto-upgrader has nothing to match on, but the Indicator Upgrader also matches on title — so if v2 titles are descriptive (full sentences rather than the bare short_name), you can hand the user a table of v1 title → v2 title pairs and they can drive the chart remap from there. Generate this table from the v1 meta.yml + the v2 grapher catalog.
Defer the Slack and /latest announcements until charts have been remapped. Both posts depend on charts.published_count and charts.selected_views from the v2 chain. Drafting them before the remap gives the wrong count (zero) and no representative views. Tell the user to ping you when the chart remap is done, then run steps 8 / 9 / 9b.

For the long-format with dimensions sub-case specifically (e.g. one row per (country, year, <dim1>, <dim2>)), use the modern OWID pattern:

Meadow + garden: tb.format(["country", "year", <dim1>, <dim2>, ...], sort_columns=True).
Aggregations: paths.regions.add_aggregates(tb, index_columns=[...full key...], regions=REGIONS, aggregations={...}).
Grapher: pass long tables through unchanged; the framework auto-expands them into per-cell variables.
Metadata: variables are keyed by the long-column name, with <% if <dim> == "X" and <dim2> == "Y" %>...<% endif %> Jinja blocks inside title, description_short, display.name. Grep this repo for tb.format(["country", "year" with more than two index entries to find current reference examples.
Jinja coverage: after building the grapher dataset, verify every active (dim1, dim2) cell renders a non-empty value — read every column from the built grapher dataset and assert metadata.title is non-empty. A dimension combination with no matching <% if %> branch ships an untitled indicator.

Workflow orchestration

Initial setup
- Check if workbench/<short_name>/progress.md exists to determine if continuing existing update
- If starting fresh: delete workbench/<short_name> directory if it exists
- Create fresh workbench/<short_name> directory for artifacts
Run ETL update command (etl-update subagent)
- Inputs: <namespace>/<old_version>/<short_name> plus any required flags
- Pick the URI that matches what's actually changing:
  - If the source data is changing — new source files, modified extractor, anything that affects the snapshot output — run from the snapshot URI with --include-usages. This bumps the whole chain (snapshot → meadow → garden → grapher) to the new version together:
```
etl update snapshot://<ns>/<old_v>/<short>.<ext> --include-usages
```
  - If only garden logic / metadata is changing and the source data is unchanged, run from the garden URI. This bumps garden and grapher only; snapshot and meadow stay on the old version.
- Either way, run etl update once. Don't call it separately per channel — that leaves stale version references in the DAG (e.g., new garden pointing to old meadow).
- Perform help check, dry run, approval, then real execution; capture summary for later PR notes
- After running, always verify the dag file: grep for the old version and confirm all internal references between the new steps point to the new version (e.g., garden depends on new meadow, not old meadow).
- etl update writes the new entries in the flat DAG form — convert them to the nested (compact) form now, while you're in the file, rather than leaving it until archiving (otherwise the flat block tends to survive the whole update unnoticed). See the example and load_dag() parse-check under "DAG archiving & reordering" step 4.

1a-bis) Add yourself to dataset.owners in the new garden .meta.yml

You've just become a contributor to this dataset, so add your canonical OWID name to its owners: list. Don't reorder — keep the existing primary first; append yourself at the end. Skip if you're already there.

Your canonical name must match an entry in the schema enum (schemas/dataset-schema.json). Resolve it from git config user.name via etl.owners.resolve_owner; if that returns None, add a mapping in etl/owners.py and a row in the schema enum before continuing.

Edit the YAML in place, preserving comments and the existing # review / # backport / # fasttrack markers on other entries.

1b) Check for outdated practices (check-outdated-practices skill)

After etl update creates new step files, run the /check-outdated-practices skill on every new step file — including helper modules that etl update doesn't generate but you copied by hand (e.g. *_omms.py), since those carry legacy patterns too
The skill reads the extension as the source of truth for the full pattern set (the geo.add_* aggregation/population helpers are flagged, not just geo.harmonize_countries) — don't rely on a remembered subset
Fix any findings before proceeding — this avoids propagating legacy patterns into new versions
geo.harmonize_countries → paths.regions.harmonize_names is mechanical and safe. geo.add_regions_to_table → paths.regions.add_aggregates changes the aggregation core — prove equivalence with a controlled A/B test, not a diff against the old feather. Build the new garden both ways against the same current catalog (swap the call, rebuild, save output; revert, rebuild, save output) and diff the two. Do NOT conclude "the helper shifts aggregates across all years" from a new-vs-old-feather diff — that conflates the helper with upstream-dataset drift (see step 5). In practice the two helpers are equivalent bar tiny historical edge cases (e.g. one region-year's population residual); if so, modernize. add_aggregates also auto-resolves income groups from the DAG, so it's the right tool when you later need WB income-group aggregates.

1c) Catalog # NOTE: / # TODO: comments in the copied step files (don't resolve yet)

Run rg -n "#\s*(NOTE|TODO|FIXME|HACK|XXX):" snapshots/<namespace>/<new_version>/ etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/.
Filter out generic boilerplate (e.g. # NOTE: To learn more about the fields, hover over their names. at the top of .meta.yml).
Save the remaining actionable items to workbench/<short_name>/notes_to_check.md — one entry per annotation, recording file path, line number, which step it lives in (meadow/garden/grapher), and what the workaround does.
Don't act on them yet. Resolution requires fresh data and happens after each step's run — see step 6a.

1d) Detect sanity-check logic in the copied step files Sanity checks live in two different forms — detect both:

Function form — def sanity_check… / sanity_check…( call sites. Often gated by a module-level boolean flag (DEBUG, SHOW_SANITY_CHECK_LOGS, LONG_FORMAT) that defaults to False to keep normal runs quiet. Examples: etl/steps/data/garden/wb/.../world_bank_pip.py (SHOW_SANITY_CHECK_LOGS), etl/steps/data/garden/wid/.../world_inequality_database.py (DEBUG + LONG_FORMAT), etl/steps/data/garden/lis/.../luxembourg_income_study.py (no flag; prints unconditionally via tabulate).
Inline comment form — # Sanity check / # Sanity checks / # sanity check marking an inline assertion block that isn't wrapped in a dedicated function. Very common: etl/steps/data/garden/emdat/.../natural_disasters.py, etl/steps/data/garden/emissions/.../national_contributions.py, etl/steps/data/garden/irena/.../renewable_capacity_statistics.py. These usually have no log flag — the block simply runs on every step execution and either passes or raises.

Run a combined sweep:

rg -n -i "def sanity_check|sanity_check\(|#\s*sanity check" \
    snapshots/<namespace>/<new_version>/ \
    etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/

Append a "Sanity checks" section to workbench/<short_name>/notes_to_check.md listing each hit — for each, record: file path + line number, which form (function vs. inline comment), the name of any log-control flag (function form only), and a one-line description of what's being asserted (read the surrounding 5–10 lines).

Don't act yet — the review happens in step 5b once the garden step has been run on the new data.

Create PR and integrate update via subagent (etl-pr)
- Inputs: <namespace>/<old_version>/<short_name>
- Create or reuse draft PR, set up work branch, and incorporate the ETL update outputs
Snapshot run & compare (snapshot-runner subagent)
- Inputs: <namespace>/<new_version>/<short_name> and <old_version>
Hand-maintained snapshots + editorial data edits. Some snapshots have no url_download — the .py prompts for --path-to-file and the docstring says the data was "provided by email" / curated by hand. When the update is a small editorial correction (the user gives you the facts directly, e.g. "country X did Y in year Z"), you can produce the new snapshot yourself: copy the previous version's data file (data/snapshots/<ns>/<old_version>/<file>), change only the specific cells, and assert in a quick script exactly which rows/cells changed (and that all others are byte-identical) before running etls ... --path-to-file <edited>. Then update the .py docstring to document the edit and bump the .dvc date_published / citation_full year (ask the user whether it's a new producer release or an OWID-applied edit — see step 6c). Verify the user's stated facts against the existing data first — some may already be encoded from a prior release (in this update, one of the two reported events was already in the live snapshot; only the other was a genuine change). Tell the user what's already present rather than blindly re-adding it.
Meadow step repair/verify (step-fixer subagent, channel=meadow)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
Garden step repair/verify (step-fixer subagent, channel=garden)
- Run, fix, re-run; produce diffs
- Save diffs and summaries
Diff against a freshly-rebuilt old version, not the stale feather on disk. The old version's data/garden/.../ feather was built whenever it last ran — possibly against an earlier snapshot of a shared upstream dataset (population, regions, income_groups). A fresh build of the new version uses the current upstream, so a naive new-vs-old-feather diff shows differences in every population-weighted cell across all years and regions — pure upstream drift that has nothing to do with your change. Before trusting any diff, rebuild the old version on the current catalog and diff against that:
```
.venv/bin/etlr data://meadow/<ns>/<old_version>/<short> --private --force --only
.venv/bin/etlr data://garden/<ns>/<old_version>/<short> --private --force --only
```
The apples-to-apples diff should collapse to just your intended change. Mention the drift separately in the PR (Chart Diff on staging will show it, because the live data is also stale relative to current upstream). This bit me twice in one update — don't skip it.

When NaN can appear on one side, don't let .fillna(False) hide it. In a cell-by-cell diff, (a - b).abs() <= tol evaluates to NaN when exactly one side is NaN; a downstream .fillna(False) then silently drops that real one-sided change. Treat "one side NaN, other side not" as a difference explicitly.

5b) Review sanity-checks output (only if step 1d catalogued any) Handling depends on the form catalogued in step 1d.

Function form with a log-control flag (e.g. SHOW_SANITY_CHECK_LOGS, DEBUG):

Flip the flag to True at the top of the garden step file.

Re-run the garden step, capturing output:

.venv/bin/etlr data://garden/<namespace>/<new_version>/<short_name> --private --force --only \
    > workbench/<short_name>/sanity_checks.log 2>&1

Review the log: scan for AssertionError, error, warning, dropped, outliers flagged by country/year, unexpected totals. Surface actionable findings in the PR description under a "Sanity-check findings" collapsed section.
Revert the flag to its original value (usually False) before committing. Verify with git diff that the garden file has no unintended changes.

Function form with no flag, or inline # Sanity check(s) comment blocks:

Read each catalogued block (pull 5–15 lines of context around the hit) to understand what invariant is being tested.
Important: a sanity check can enforce its finding either by raising (assert, raise) or by logging (paths.log.warning, .critical, even .fatal). Logging variants do NOT fail the step — so "step 5 passed" is not proof that every invariant held. If the block uses logging, re-run the step and scan stdout/stderr for the relevant keywords; don't trust the exit code alone.
For non-trivial invariants (monotonicity, totals, bounds), also spot-check qualitatively against the fresh garden output via a short .venv/bin/python snippet.
Record any anomalies under "Sanity-check findings" in the PR description. No log artifact to keep here since the step's own output is the evidence.

In either form: if sanity_checks raise AssertionError on the new data, stop and decide with the user whether the assertion needs a threshold bump, whether upstream data genuinely broke, or whether the invariant is obsolete. If the check only logs, treat a new/expanding set of warnings the same way — they're the signal the sanity check was written to produce.

Watch for silent-delete patterns. Some sanity_checks functions also mutate the table — e.g. world_bank_pip's sanity_checks drops rows that fail invariants and reports the count via the log-control flag. With the flag off the deletions still happen; the reviewer just never learns which rows disappeared. When reading a sanity_checks function, scan for drop, filter, tb = tb[...] — anything that removes rows — and list every deletion in the PR body, not just the warning counts. If the deletion seems newly applicable to upstream fixes (e.g. the row should no longer be anomalous in the new release), that's a candidate for removing the workaround entirely.

5b-bis) Recommend sanity checks when the garden step lacks them (or lacks value bounds) Runs after the garden step is built. Two triggers:

Step 1d catalogued no sanity checks and the garden step does more than a straight load-and-format (harmonization, column drops/renames, aggregations, derivations) — recommend adding sanity_check_inputs / sanity_check_outputs. A Codex review will flag this anyway; better to handle it during the update.
The step has checks but you can spot missing value-bound coverage of the kinds below — suggest the additions.

Ground every threshold in the built data before writing it. Run min/max/nunique/value_counts per indicator on the built garden table first — never write plausible-sounding bounds. The classic trap: "% of GDP" indicators look like percentages but legitimately exceed 100 (IMF PFMH: gross debt up to 495%, expenditure 595% in crisis years; UK's post-Napoleonic 260% debt is on a published chart). A blind 0–100 assert fails the build on day one.

Suggested checks by indicator type (propose the applicable ones, not all):

Indicator type	Check
Share / proportion	values in [0, 1]
Percentage of a whole	values in [0, 100]
"% of GDP" and similar ratios	non-negative for levels (revenue, spending, debt); can exceed 100 — verify against data before capping. Where a ≤100 bound mostly holds, enforce it with a documented exception set (e.g. PFMH expenditure ≤ 100 outside `{Equatorial Guinea, Kuwait, Kiribati}` — each with a comment explaining why it's legitimate) so a new country crossing the line fails for review
Balances, growth rates, interest rates	can legitimately be negative — do NOT impose ≥ 0
Mutually exclusive share categories	components sum to 100% (or 1) per (country, year), within a small rounding tolerance (e.g. `abs(sum - 100) < 0.1`) — first verify against the data that the source's categories are actually exhaustive (some breakdowns ship without an "other" remainder, or only for a subset of rows)
Categorical flags / codes	exact value set (e.g. `<= {0, 1}`) and non-null
Input schema	set-equality on expected columns — catches the next source rename (PFMH: `debt` → `d`) with a clear message
Coverage	country-count floors (≥ previous version); a drop is usually a parsing/mapping regression — re-audit before bumping the constant

Implementation conventions: constants at the top, run() first, check functions below run(); sanity_check_inputs(tb) right after loading meadow, sanity_check_outputs(tb) right before paths.create_dataset(...); plain assert with messages that name the offending values. Reference example: imf/2026-06-12/public_finances_modern_history.py.

sanity_check_outputs runs before tb.format(...), so column names are still as the code produced them — format() lowercases/underscores them afterward. If the step builds columns like f"{col}_{status}_pop" where status is "Legal"/"Illegal"/"missing", the pre-format columns are status_Legal_pop (mixed case), not status_legal_pop. Select such columns case-robustly (e.g. [c for c in tb.columns if c.endswith("_pop")] then filter on c.lower()), or you'll get a KeyError that only surfaces at runtime. Keep update-specific facts (e.g. "country X is Legal in year Y") out of the committed checks — verify those via the garden diff at update time so the checks stay valid across future releases.

Negative-test the checks: after the step passes on real data, simulate each failure mode (rename a column, corrupt a flag, push a value out of bounds) and confirm the matching assertion fires — a check that never trips is untested code.

5c) Country harmonization audit Run after the garden step completes (and after 5b if it ran). Verifies that the country entities reaching the garden output are canonical, and that the mappings/exclusions consumed by paths.regions.harmonize_names(...) are well-formed. Output: workbench/<short_name>/harmonization_audit.md.

Modern API. Garden steps should be calling paths.regions.harmonize_names(tb, country_col=..., countries_file=..., excluded_countries_file=...) — the wrapper in etl/data_helpers/geo.py:1874. If you find a step still using the deprecated geo.harmonize_countries(...) directly, step 1b's /check-outdated-practices should already have flagged it; treat that as a separate cleanup. The audit below is API-agnostic — both call sites end up emitting the same three warning strings. Some garden steps don't use the harmonizer at all and instead assign country inline in Python (no .countries.json involved); for those, the JSON checks below have nothing to look at — the garden-output check in step 5 is what catches non-canonical entities, so always run it.

Source of truth. Canonical names come from two datasets, both consulted by the harmonizer:

data/garden/regions/2023-01-01/regions — countries, continents, and OWID-defined aggregates. The runtime authority is paths.regions.tb_regions["name"]. This is built from etl/steps/data/garden/regions/2023-01-01/regions.yml plus a merge with regions.codes.csv and field defaults — don't parse the YAML in isolation or you'll miss the legacy entries and produce false positives.
data/garden/wb/<latest>/income_groups — the four World Bank income-group aggregates (High-income countries, Upper-middle-income countries, Lower-middle-income countries, Low-income countries). OWID treats the latest version of this dataset as the official one, so the audit must resolve the version dynamically (don't pin a date — it goes stale when WB publishes a refresh). The names live in the classification column of the income_groups_latest table.

The audit's "canonical" set is the union of these two. A .countries.json entry looks like "Source name": "Target name" — the audit checks that every target name (the value the source gets harmonized to) appears in either dataset. Anything else is flagged.

Capture a fresh garden log:

.venv/bin/etlr data://garden/<namespace>/<new_version>/<short_name> --private --force --only \
    > workbench/<short_name>/harmonization.log 2>&1

Scan the log for the three harmonization warnings. These are emitted by etl/data_helpers/geo.py (excluded list) and lib/datautils/owid/datautils/dataframes.py (mapping warnings) — the wording is stable:
```
rg -n "missing values in mapping\.|unused values in mapping\.|Unknown country names in excluded countries file:" \
    workbench/<short_name>/harmonization.log
```
For each warning, the entity list follows on subsequent lines (because harmonize_countries() is called with show_full_warning=True by default). Capture them.

Validate .countries.json target names against canonical regions + income groups. Each entry maps a source name (key) to a target / harmonized name (value); this check looks at the values. For each garden step in this update:

import json
from pathlib import Path
from owid.catalog import Dataset

# Resolve the canonical regions dataset dynamically (latest built version).
# Don't pin a date — when the regions step version advances, a hard-coded path
# would validate against a stale catalog and flag valid targets as non-canonical.
regions_dirs = sorted(Path("data/garden/regions").glob("*/regions"))
if not regions_dirs:
    raise RuntimeError(
        "No data/garden/regions/<version>/regions built locally — the audit can't "
        "run without the canonical regions catalog. Build it first with "
        "`.venv/bin/etlr data://garden/regions/<latest>/regions --private`."
    )
tb_regions = Dataset(str(regions_dirs[-1]))["regions"]
canonical_regions = set(tb_regions["name"].dropna().astype(str))

# Add OWID's official income-group aggregates to the canonical set, if available.
# OWID treats the latest income_groups version as official. This artifact is
# often not built locally during a non-income-groups dataset refresh — degrade
# gracefully (warn and skip) rather than aborting the audit.
ig_dirs = sorted(Path("data/garden/wb").glob("*/income_groups"))
if ig_dirs:
    ds_ig = Dataset(str(ig_dirs[-1]))
    canonical_income = set(ds_ig["income_groups_latest"]["classification"].dropna().astype(str).unique())
else:
    print(
        "[WARN] No data/garden/wb/<version>/income_groups built locally — "
        "skipping income-group enrichment. The four WB income-group aggregates "
        "(High/Upper-middle/Lower-middle/Low-income countries) may surface as "
        "'not in canonical' until you build that dataset."
    )
    canonical_income = set()

canonical = canonical_regions | canonical_income

mapping = json.loads(Path("etl/steps/data/garden/<namespace>/<new_version>/<short_name>.countries.json").read_text())
not_in_canonical = sorted({v for v in mapping.values() if v and v not in canonical})
print("Targets not in OWID's canonical regions or income groups:", not_in_canonical)

# Provider regional groupings the source ships (e.g. "Europe (WB)", "Asia and the Pacific (ILO)")
# surface here when OWID hasn't added that provider's regions yet. Flag the "(Provider)"-suffixed ones:
import re
provider_region_candidates = [v for v in not_in_canonical if re.search(r"\([\w .&-]+\)$", str(v))]
print("Look like provider regions not yet in the regions dataset:", provider_region_candidates)

A non-empty not_in_canonical list means the mapping points at entities that aren't registered in either the regions catalog or the income-groups dataset. This isn't automatically a bug — it's a heads-up. Stop and decide with the user before proceeding — same pattern as the global "Checkpoints — when to pause" section at the top of this skill. Common causes (in order from "fix" to "accept"): typo, retired alias used as canonical, casing/whitespace mismatch, or a custom aggregate the source defines that OWID has no equivalent for. For typos/casing — fix the JSON.

If the unmatched targets are a provider's regional grouping that OWID doesn't define yet — the provider_region_candidates above ("(Provider)"-suffixed names like "Europe (WB)", "Sub-Saharan Africa (ILO)"), or the source ships a region/subregion column feeding these — don't just accept them as outside-the-system. The proper fix is to add that provider's regions to the regions dataset with the /add-provider-regions skill (separate PR): they then become canonical, merge with regions/population infrastructure, and get a {provider}_region map indicator, instead of living outside the system. Surface this to the user and offer to run /add-provider-regions. Only for genuinely one-off, non-geographic groupings the regions system shouldn't own (BRICS, G7, G20) — accept and note in the PR description that those live outside the canonical system. For a real new historical region — add an entry to regions.yml in a separate PR.

Audit .excluded_countries.json. The file is optional; skip if it doesn't exist:

excluded_path = Path("etl/steps/data/garden/<namespace>/<new_version>/<short_name>.excluded_countries.json")
if excluded_path.exists():
    excluded = json.loads(excluded_path.read_text())
    suspicious_canonical = sorted(set(excluded) & canonical)
    # Also surface continents and aggregates separately for review
    aggregates = set(tb_regions[tb_regions["region_type"].isin(["continent", "aggregate"])]["name"].dropna().astype(str))
    suspicious_aggregates = sorted(set(excluded) & aggregates)
    print("Excluded entries that ARE canonical regions:", suspicious_canonical)
    print("Excluded entries that are continents/aggregates:", suspicious_aggregates)
    print("Full excluded list for review:", sorted(excluded))

suspicious_canonical is the actionable signal: each entry is a known country/region that we are dropping. Sometimes this is intentional (e.g. dropping "World" rows because the source double-counts them) — surface, don't auto-fix. Pause and ask the user if the list is non-empty. The full list is dumped so the LLM can also eyeball it for entities that aren't in canonical but look like real countries (typos, alternative names) we should be mapping rather than dropping.

Audit garden output entities. Always run this check, regardless of whether .countries.json exists or is populated — JSON mappings describe inputs to the harmonizer, but the entities that actually reach Grapher are whatever sits in the country column/index of the built garden tables. Inline country assignments (e.g. hardcoded tb["country"] = "England and Wales") and post-harmonization mutations both bypass the JSON check entirely; this is the only step that catches them.
```
from pathlib import Path

from owid.catalog import Dataset

garden_dir = Path("data/garden/<namespace>/<new_version>/<short_name>")
ds_garden = Dataset(str(garden_dir))

entities: set[str] = set()
for tname in ds_garden.table_names:
    tb = ds_garden[tname]
    # `country` can live in the index (after .format()) or as a regular column.
    if "country" in tb.index.names:
        entities.update(tb.index.get_level_values("country").dropna().astype(str).unique())
    elif "country" in tb.columns:
        entities.update(tb["country"].dropna().astype(str).unique())
    # tables with no country column are silently skipped (e.g. reference tables)

output_not_in_canonical = sorted(entities - canonical)
print("Garden output entities not in OWID's canonical regions or income groups:",
      output_not_in_canonical)
```
Same triage rules as the JSON-targets check (Python check #3): typo / casing / alias / legitimately custom aggregate. A non-empty list means at least one entity that ships to Grapher isn't registered in either the regions catalog or the income-groups dataset. Stop and decide with the user before proceeding. Common fixes: typo or casing → patch the inline assignment (or .countries.json, whichever is the source) so the value matches the canonical name; alias → switch to the canonical name; legitimate custom aggregate → accept and note in the PR description that the entity lives outside the canonical system.
Write findings to workbench/<short_name>/harmonization_audit.md with six sections, populated only when non-empty. Each section must list every flagged entity, not just a count — counts alone aren't actionable, the user (or you) needs to read the actual names to judge whether each is intentional. For long lists (>20 entries) group by pattern when the grouping is obvious (e.g. ILO's " (ILO)"-suffixed regions vs. international orgs vs. derived "World ..." aggregates) so the reviewer can scan categories instead of one flat list. Sections:
- ## Missing in mapping — countries in source data not in .countries.json (from log warning #1) — list each missing source name
- ## Unused mappings — .countries.json entries the data never used (warning #2) — list each unused source→target pair
- ## Unknown excluded entries — .excluded_countries.json entries not present in source data (warning #3) — list each
- ## Targets not in OWID's canonical regions or income groups — target names from .countries.json that aren't registered in either dataset (Python check #3) — list each target name and the source names that map to it
- ## Excluded entries matching canonical regions — possible over-exclusion (Python check #4) — list each
- ## Garden output entities not in OWID's canonical regions or income groups — distinct country values found in the built garden tables that aren't in canonical regions or income groups (Python check #5) — list each entity
Surface in PR. If any section was populated, add a collapsed "Harmonization audit" section to the PR description (after the per-step sections, before the Slack announcement) with the same listings, not just a summary. Empty sections can be omitted.

When you report progress to the user during the workflow, never just give a count — always include the list (or grouped categories) so they can judge in one glance.

Checkpoint summary:

"Targets not in OWID's canonical regions or income groups" or "Garden output entities not in OWID's canonical regions or income groups" or "Missing in mapping" non-empty ⇒ stop, decide with user.
"Excluded entries matching canonical regions" non-empty ⇒ stop, ask whether each exclusion is intentional.
"Unused mappings" or "Unknown excluded entries" non-empty ⇒ surface in PR description; not a blocker.

5d) Region-provider drift — if this dataset defines OWID regions, propagate the change The harmonization audit (5c-3) catches providers not yet in regions.yml. This check is the counterpart: a provider whose aggregates are already in regions.yml can change its regions on a version bump (new/removed/renamed regions, or shifted country membership), and regions.yml won't update itself.

Does this dataset own regions? grep "defined_by: <provider>" etl/steps/data/garden/regions/2023-01-01/regions.yml (e.g. ilo_1/ilo_2, maddison, wid, wb, who). No matches ⇒ skip this step.
Re-extract the provider's region→country mapping from the new version — from the same source the regions were originally derived from (a region/subregion column, region entities, or a table-of-contents tier; see /add-provider-regions Step 1) — and diff it against regions.yml (defined_by: <provider> members), set-equality per region, plus new/removed/renamed regions. If the new version no longer carries the composition, the provider's section in the world-region-map-definitions article links to the defining source — use it to re-derive the provider's existing regions (regions newly introduced by this update won't be documented there yet — get those from the source or ask the user).
If it drifted: stop and decide with the user (composition changes move every region-aggregated value downstream), then update regions.yml and re-propagate via /add-provider-regions — rebuild garden regions, refresh the {provider}_region grapher indicators + metadata, regenerate owid-grapher regions.data.ts (runRegionsUpdater), and update the article section.
If unchanged: note it and move on.

Grapher step run/verify (step-fixer subagent, channel=grapher, add --grapher)
- Skip diff

6a) Re-evaluate # NOTE: / # TODO: items from step 1c against fresh data Now that meadow, garden, and grapher have run on the new data, go back to workbench/<short_name>/notes_to_check.md and decide each item's fate. For each entry:

Identify what the workaround does (read the surrounding code).
Load the affected step's output with owid.catalog.Dataset (or inspect the raw snapshot) and compare corrected vs. uncorrected values. Cross-check the producer's release notes / changelog if available.
If the upstream issue is fixed → delete the workaround and its # NOTE: / # TODO: comments in the same commit, then re-run the affected step (use --force --only, add --grapher for grapher) so downstream artifacts pick up the change.
If the workaround is still needed → leave it and add a one-line status under a PR-description section titled "Not covered in this PR" (e.g. "Sierra Leone ×1000 correction still required — raw value in the 2026 file is still ~1/1000 of plausible"). These are deliberately deferred items the next updater should re-check. Delete the whole section if its last item gets resolved mid-PR.
If you're uncertain → keep it, flag it in the PR description, and ask the user.

Do this before step 6b (metadata checks) so any re-runs triggered by comment-removal happen before the metadata sweep, not after.

6b) Metadata quality checks — run after all ETL steps are built Run all four checks on the newly built garden and grapher datasets so every issue surfaces together. Each skill writes results to the terminal; fix what comes up before moving on.

Typos — /check-metadata-typos scoped to the current step. Run on each of the new .meta.yml files (garden first, then grapher). Accept or skip each suggested fix.
Jinja spacing — /check-metadata-spacing on the built garden and grapher datasets. Catches template artifacts like doubled spaces or stray newlines that only appear after Jinja rendering.
Style guide — /check-metadata-style on the grapher step. Audits user-facing fields (title, subtitle, description_short, display.name, presentation.*) against OWID's Writing and Style Guide. Rules live in .claude/skills/check-metadata-style/STYLE_GUIDE.md, so no Notion access is needed — but if the guide looks out of date, refresh that file from Notion in a separate PR.
Clarity for a general audience — read every user-facing field with non-specialist eyes. The other three skills enforce structure and style; this one judges whether the text is understandable.

Clarity checklist (do manually, no skill yet)

OWID readers are not domain experts. Walk each indicator's user-facing fields and flag anything that requires inside knowledge to parse.

Field	Clarity check
`title` / `presentation.title_public`	A non-specialist should know what the indicator measures from the title alone. Expand acronyms unless universally known (skip GDP; expand GWIS, MFI, SDG, IHME). Don't cram units into the title.
`description_short`	One or two short sentences: what the metric is and what it covers. No jargon without a gloss. Active voice. The chart subtitle is short by design — no run-on or stacked clauses.
`description_key`	Each bullet should land a distinct, useful fact. Skip filler ("this dataset is widely used"); prefer substantive caveats (coverage gaps, methodology limits, what counts/doesn't count).
`display.name`	Short legend label. Reads naturally on a chart axis/legend; doesn't restate the title.
`presentation.grapher_config.note`	Concise footnote, ≤1 sentence ideally.

Flag and rewrite when you find:

Acronyms or technical terms that aren't expanded the first time they appear
Sentences that only make sense if you already know the data source
Quantitative claims with no unit context (e.g. "burned area" without "in hectares" surfacing somewhere in the user-facing text)
Inconsistent terminology between indicators in the same dataset (e.g. "wildfires" in one, "vegetation fires" in another)
Domain phrases that have a plain-English equivalent (e.g. "anthropogenic emissions" → "human-caused emissions")

When a phrasing is ambiguous, propose a concrete rewrite — don't just flag it.

If any skill rewrites a .meta.yml, re-run the affected step so the built catalog reflects the edits. Add --grapher when the affected step is on the grapher channel — without it the local catalog is updated but staging stays stale, so the step 7 indicator upgrade sees the old text.

# garden / meadow:
.venv/bin/etlr <channel>/<namespace>/<new_version>/<short_name> --private --force --only
# grapher:
.venv/bin/etlr grapher/<namespace>/<new_version>/<short_name> --grapher --private --force --only

Then re-run the relevant check to confirm zero remaining violations.

6c) Indicator metadata coverage, dataset block, and link verification The other quality checks catch content issues; this step catches missing fields and broken URLs before they reach review.

Snapshot DVC freshness. etl update clones the previous snapshot's .dvc content verbatim except for date_accessed. Always re-check date_published and the year in citation_full / attribution under snapshots/<ns>/<new_version>/*.dvc — they will otherwise silently ship the old version's values. Set date_published to the producer's real release date when discoverable; otherwise copy date_accessed. Bump the year in citation_full and attribution to match.

Citation year vs date_published year. After setting both, check whether the year inside citation_full / attribution matches date_published's year. If they differ, confirm it's intentional before shipping: it's legitimate when the producer labels the release by edition rather than publish date (e.g. UN IGME's "2025 report" published 2026-03-17 → citation_full (2025), date_published 2026), but otherwise it's a stale citation. When the gap is deliberate, leave a one-line note for the reviewer so they don't re-flag it.
Last-Modified header as date_published source. When the producer's page states no release date (common on fully JS-rendered sites like the IMF Datamapper), the download URL's HTTP Last-Modified header is a defensible source — it's the server's own timestamp for the file, not an inference. Corroborate it against a release-named filename (e.g. …Dec 2025.xlsx + Last-Modified: Fri, 12 Dec 2025) and note the provenance when reporting to the user.
Stale producer description on a JS-rendered page. If the .dvc meta.origin.description is producer text that may have changed but the page is an SPA shell (static HTML empty, WebFetch 403s, Wayback archives only the shell), don't burn time probing API endpoints and don't rewrite the producer's text to match the data — the blurb can legitimately lag their own releases (FPP shipped 153 countries while the page said 151). Ask the user to paste the page text from their browser, then diff it against the existing .dvc text and apply only the substantive changes. Clipboard pastes flatten typography (curly quotes → straight, en-dashes → hyphens) — keep the existing typographic punctuation unless the words themselves changed.

Mandatory fields per indicator. For every indicator in the garden .meta.yml, confirm these are set (either on definitions.common or per-indicator):

Field	Notes
`title`	Per-indicator
`unit`	Common is fine
`short_unit`	Common is fine
`description_short`	Per-indicator
`description_key`	At least one bullet; usually common
`processing_level`	`minor` or `major`
`presentation.topic_tags`	At least one tag
`display.numDecimalPlaces`	Common is fine
`display.tolerance`	Common is fine — chart tolerance for missing years
`display.name`	Per-indicator — required for legend labels
`presentation.attribution_short`	Set explicitly — does NOT inherit from the origin's `attribution_short` (verified: MySQL `variables.attributionShort` stays `NULL` if it's only on the origin). Place under `definitions.common.presentation` for the common case.

Conditional: if processing_level: major, every indicator with that level MUST also have description_processing.

Not mandatory (skip if you don't need them): presentation.title_public, presentation.title_variant, presentation.attribution.

Dataset block. Garden .meta.yml MUST include update_period_days:

dataset:
  update_period_days: <N>

This controls the auto-update cadence. Even when the rest of the dataset: block is empty, never strip update_period_days — leave the block in place with just that field.

Link verification. Run a HEAD request on every URL in the new .dvc and .meta.yml files (all channels — meadow .meta.yml files matter when they exist). Anything non-2xx is a signal, not a guaranteed break — always double-check before acting:

for url in $(rg --no-filename -No "https?://[^\"' ]+" snapshots/<namespace>/<new_version>/ etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/ \
    | sed -E 's/[).,;:>]+$//' \
    | sort -u); do
    printf "%s  %s\n" "$(curl -sI -L -o /dev/null -w '%{http_code}' --max-time 15 -A 'Mozilla/5.0' "$url")" "$url"
done

The --no-filename flag prevents rg from prepending path: to each match (otherwise the for-loop tries to curl path:url and every check returns 000). -A 'Mozilla/5.0' sometimes coaxes a real response out of Cloudflare-fronted hosts, but it doesn't always work — see the next note.

curl non-2xx ≠ broken. Cloudflare-fronted sites (notably ourworldindata.org) can return 404 to curl on URLs that work fine in a browser, depending on edge-node routing, IP geolocation, and cached state. Before treating a 4xx as a real failure:

Re-check with WebFetch (the built-in tool). It uses a different code path and a Mozilla/5.0 UA that Cloudflare usually accepts. A 200 with a coherent page body is authoritative — trust it over curl.
If WebFetch also fails, sanity-check the Wayback Machine: https://web.archive.org/web/<year>/<url>. A recent successful snapshot means the URL is reachable on the public internet and your local route is the problem.
Only act on a true failure — both WebFetch and Wayback unable to reach the URL — and even then flag and ask the user before silently rewriting an external link in metadata. Replacing a working link with a "safer" alternative because of a curl false-positive is worse than leaving the original. Apply the same restraint here as the global "Checkpoints — when to pause" section.

Fix any genuinely-non-2xx hit on url_main, url_download, license.url, or URLs referenced from description / description_key before continuing. The sed strips trailing markdown/punctuation chars (), ., ,, ;, :, >) so URLs inside [text](url) aren't reported as broken because of a stray closing paren.

Verification. After editing, re-run the affected step (with --grapher if grapher) so the catalog reflects the changes. Then confirm presentation.attribution_short actually landed:

from owid.catalog import Dataset
ds = Dataset("data/grapher/<ns>/<v>/<short_name>")
tb = ds["<table>"]
print(tb["<col>"].metadata.presentation.attribution_short)  # must NOT be None

Or after the staging upload:

make query SQL="SELECT shortName, attributionShort FROM variables WHERE catalogPath LIKE '%<ns>/<v>/<short_name>%'"

Indicator upgrade (optional, staging only)

First upload the new grapher dataset to the staging DB (required before the upgrader can detect it):

STAGING=<branch> .venv/bin/etlr data://grapher/<namespace>/<new_version>/<short_name> --grapher --private

Then run the automatic upgrader:

STAGING=<branch> .venv/bin/etl indicator-upgrade auto

CRITICAL: After the upgrader finishes, always verify it actually worked by querying staging:

mysql -h "staging-site-<branch>" -u owid --port 3306 -D owid -e "SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'"

If the count is 0, the upgrade did not run — re-run it.

The auto-upgrader only remaps grapher charts — NOT ETL-defined explorers or MDims. Explorers (export://explorers/...) and multidims (export://multidim/...) reference indicators by catalog path and are rebuilt by running their export steps, which the indicator-upgrader never touches. If the dataset has any (check the DAG: rg "export://(explorers|multidim)/.*/<short_name>" dag/ -g "*.yml"), they'll still point at the old variables on staging until you re-run them:

STAGING=<branch> .venv/bin/etlr export://explorers/<ns>/latest/<short> export://multidim/<ns>/latest/<short> ... --export --private

Verify none still reference the old version (both queries should return empty):

# explorers
mysql -h "staging-site-<branch>" -u owid -P 3306 -D owid -e "SELECT DISTINCT ev.explorerSlug FROM explorer_variables ev JOIN variables v ON ev.variableId=v.id WHERE v.catalogPath LIKE '%<ns>/<old_version>%'"
# mdims
mysql -h "staging-site-<branch>" -u owid -P 3306 -D owid -e "SELECT DISTINCT mdp.slug FROM multi_dim_x_chart_configs mx JOIN variables v ON mx.variableId=v.id JOIN multi_dim_data_pages mdp ON mdp.id=mx.multiDimId WHERE v.catalogPath LIKE '%<ns>/<old_version>%'"

Also verify narrative charts. Narrative-chart configs can pin a variableId in their own patch (not inherited from the parent chart), and that id can date from a version older than the one this update started from — left stale by a previous cycle. The auto-upgrader only carries old_version → new_version mappings, so it can never remap those, and the chart_dimensions count above can't catch them either: narrative-chart variable ids live only inside chart_configs. The upgrader warns about this case ("was NOT remapped: it pins indicators from a version of the upgraded dataset that the mapping does not cover") — watch its output for that warning. But the upgrader only visits narrative charts whose parent chart was affected by the mapping; a stale narrative chart whose parent no longer uses any mapped indicator is never visited and stays silent. So always run this catch-all scan over all narrative-chart configs:

# STAGING=<branch> .venv/bin/python — scan narrative chart configs for variables on ANY old version
import json
from etl.config import OWID_ENV
old_vars = set(OWID_ENV.read_sql(
    "SELECT v.id FROM variables v JOIN datasets d ON d.id = v.datasetId "
    "WHERE d.catalogPath LIKE %(p)s AND d.catalogPath NOT LIKE %(new)s",
    params={"p": "%/<short_name>", "new": "%<new_version>%"})["id"])
df = OWID_ENV.read_sql("SELECT nc.id, nc.name, nc.parentChartId, JSON_EXTRACT(cc.full, '$.dimensions') AS dims "
                       "FROM narrative_charts nc JOIN chart_configs cc ON cc.id = nc.chartConfigId")
stale = [(r["id"], r["name"], d["variableId"]) for _, r in df.iterrows() if r["dims"]
         for d in json.loads(r["dims"]) if d.get("variableId") in old_vars]
print(stale)  # must be empty

If any are found, remap them with an explicit mapping via the upgrader's own CLI helpers (load by parent chart id; cli_upgrade_indicators won't reach them because it finds charts via chart_dimensions):

from sqlalchemy.orm import Session
import etl.grapher.model as gm
from apps.indicator_upgrade.upgrade import push_new_narrative_charts_cli
from etl.config import OWID_ENV
with Session(OWID_ENV.engine) as session:
    ncs = gm.NarrativeChart.load_narrative_charts_by_parent_chart_ids(session, [<parent_chart_id>])
ncs = [nc for nc in ncs if nc.id in {<stale_nc_ids>}]
errors = push_new_narrative_charts_cli(ncs, {<old_var_id>: <new_var_id>})

Then re-run the scan and confirm it's empty.

Update context for public announcement

Maintain workbench/<short_name>/update-context.yml as the canonical record of facts discovered during the update. Do not wait until the end if a fact is already known; append/update as each step completes.

At minimum, record:

dataset:
  namespace: <namespace>
  old_version: <old_version>
  new_version: <new_version>
  short_name: <short_name>
  title: <public dataset title, if known>
  producer: <producer, if known>
source:
  release_date: <snapshot origin date_published, if known>
  next_release: <best-effort, or null>
  url_main: <source page, if known>
  citation_full: <citation, if known>
coverage:
  year_min: <garden min year>
  year_max: <garden max year>
  countries: <distinct countries/entities>
  includes_regions: <true/false>
  sparse_recent_year_note: <note, or null>
charts:
  published_count: <published chart count>
  size_qualifier: <handful|moderate|large|massive>
  explorers: <list of published explorer slugs using this data, or []>
  mdims: <list of MDim slugs using this data, with published flag, or []>
  selected_views:
    - title: <chart title>
      slug: <chart slug>
      rationale: <why this represents the dataset>
update_summary:
  snapshot_diff: <short summary or artifact path>
  meadow_diff: <short summary or artifact path>
  garden_diff: <short summary or artifact path>
  notable_changes: []
  sanity_check_findings: []
  resolved_workarounds: []
editorial_context:
  why_it_matters_snippets: []
  caveat_snippets: []
  interesting_update_snippets: []

Query the staging DB for published charts using the new dataset (filter on c.publishedAt IS NOT NULL). Draft/unlisted charts must not be counted in the announcement:

SELECT c.id, cc.slug, cc.full->>'$.title' as title, cc.full->>'$.type' as type, cc.full->>'$.hasMapTab' as hasMapTab
FROM charts c
JOIN chart_configs cc ON cc.id = c.configId
JOIN chart_dimensions cd ON cd.chartId = c.id
JOIN variables v ON cd.variableId = v.id
WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'
  AND c.publishedAt IS NOT NULL
GROUP BY c.id

Charts are not the only surface — also count the explorers and MDims that use this data. Many datasets feed published OWID explorers and multi-dimensional collections, which the grapher-charts query above misses entirely. Run the export steps first (step 7) so these point at the new variables, then query both:
```
-- Explorers (note isPublished — only published ones count for the announcement)
SELECT DISTINCT ev.explorerSlug, e.isPublished
FROM explorer_variables ev
JOIN variables v ON ev.variableId = v.id
JOIN explorers e ON e.slug = ev.explorerSlug
WHERE v.catalogPath LIKE '%<namespace>/<new_version>%';

-- MDims (note published flag — drafts are published=0)
SELECT DISTINCT mdp.slug, mdp.published
FROM multi_dim_x_chart_configs mx
JOIN variables v ON mx.variableId = v.id
JOIN multi_dim_data_pages mdp ON mdp.id = mx.multiDimId
WHERE v.catalogPath LIKE '%<namespace>/<new_version>%';
```
Chart-based explorers can also attach via explorer_charts.chartId (join through chart_dimensions) rather than explorer_variables — check that table too if the variable-based query comes up empty but the DAG shows an explorer step. Record published explorers/MDims under charts.explorers / charts.mdims in update-context.yml, and fold them into the "How many charts did this update affect?" answer (e.g. "10 published charts, 3 explorers, plus 3 draft MDims"). Only count published surfaces (isPublished=1 / published=1) toward the public announcement; note unpublished ones for QA.
Map the published chart count to size_qualifier: 1–9 = handful, 10–49 = moderate, 50–199 = large, 200+ = massive.
Pick 1–3 selected_views using these criteria (in order of preference):
- Map views — immediately visual, readers can find their own country
- Charts with punchy, standalone headlines — titles that make a clear claim work best for social sharing
- Global trend charts (StackedArea / World) — show the big picture over time
- Skip: population-weighted variants (harder to read quickly), within-regime breakdowns (too niche), country-specific views
Add snippets for the editorial prompts from source metadata, garden/grapher metadata, resolved sanity-check/workaround notes, and non-routine PR changes. Keep these as snippets/facts, not polished Slack prose.

Slack announcement & PR update
- Run the data-updates-comms skill with workbench/<short_name>/update-context.yml as input. data-updates-comms is the canonical owner of the Slack form wording, copy-paste format, editorial framing, search URL, and any standalone fallback gathering. Do not duplicate that rendering logic here.
- Save the rendered draft to workbench/<short_name>/slack-announcement.md.
- If data-updates-comms reports missing mechanical fields, gather them, update update-context.yml, and re-render rather than inventing values. Ask the user if a missing field requires judgment.
- Add the announcement to the PR description as a collapsed <details> section titled "Slack Announcement", with the file content embedded inside a triple-backtick markdown fence.
- Post @codex review as a separate PR comment (not in the PR description) to trigger an automated code review. Use:
```
gh pr comment <pr_number> --body "@codex review"
```
- Tell the user, with a markdown link to the saved file so they can click through to open it: "Slack announcement drafted at [workbench/<short_name>/slack-announcement.md](workbench/<short_name>/slack-announcement.md) and added to the PR description. Please review and post it to #data-updates-comms." Always render the path as a markdown link […](…), not as inline-code — the chat UI renders it as clickable that way.

9b) Data update post (for OWID /latest) Draft the short reader-facing post that gets published on https://ourworldindata.org/latest. The team drafts these in Google Docs in the shared /Data updates Drive folder (https://drive.google.com/drive/folders/1oL0uLHKI6f2qi1rJA6-qFFRYEBw_-rfm), and OWID's CMS ingests the doc into the published feed.

The skill's job is to produce paste-ready Google Doc content in the exact CMS format the team uses (frontmatter title / excerpt / type / authors / kicker → \[+body\] marker → body prose with inline markdown links → {.cta} block → {.image} block → \[\] end marker). Don't invent your own format — every published post in the Drive folder follows the same shape.

This is separate from the Slack announcement — that one is a 10-field form for the internal channel; this one is a mini-blog-post for OWID readers, and the format is structured for CMS ingestion.

Steps:

Open .claude/skills/update-dataset/data-update-template.md and follow it — the template has the exact paste-ready format plus three worked examples (NVIDIA, H5N1, World Bank PIP) lifted verbatim from the Drive folder.
Use the facts already gathered in workbench/<short_name>/update-context.yml (step 8) — dataset.title, dataset.producer, source.url_main, source.citation_full, coverage.*, charts.published_count, charts.selected_views, and the editorial_context.* snippet lists. Also pull from workbench/<short_name>/slack-announcement.md (step 9 output) — the editorial framing already drafted there is the closest cousin. If a field needed for the post isn't yet in update-context.yml, gather it (snapshot DVC, garden .meta.yml, or url_main via WebFetch) and persist it back to the YAML so the next consumer doesn't re-do the work.
Title shape — a punchy finding/claim, a question, or an action/invitation. Not just the dataset name. See the template's "Field-by-field guidance" for examples and decision logic.
Body — 100–200 words, first-person, conversational. Sample: ATUS ~105, NVIDIA ~140, robots ~110, OECD Government at a Glance ~155, US data centers ~145, UNU-WIDER ~155, World Bank PIP ~190, ozone ~165, mobile money ~180, fertilizers ~170, H5N1 ~135. The body should give a reader a reason to care and at least one concrete number — not "I refreshed our charts".
Inline markdown links throughout the body for the producer's page, methodology pages, and related OWID articles. *italics* for emphasis, sparingly.
CTA URL choice:
- One chart focus ⇒ grapher URL https://ourworldindata.org/grapher/<slug>.
- Multiple charts (default) ⇒ search URL https://ourworldindata.org/search?datasetProducts=<URL-encoded dataset title> — value is the dataset title, resolved with this priority: (a) the dataset.title field in the garden .meta.yml if it's set there (an override), otherwise (b) the meta.origin.title field in the snapshot .dvc. Often includes a parenthetical acronym like Luxembourg Income Study (LIS) or World Bank Poverty and Inequality Platform (PIP). Not the bare producer field.
- Topic has an existing OWID explorer ⇒ https://ourworldindata.org/explorers/<name>.
- Curated topic page exists ⇒ topic URL (e.g. /sdgs).
- Do not use /collection/custom?charts=… URLs.
CTA text — descriptive: "Explore the updated data in our interactive charts" (default), "Explore all of the updated data in our interactive charts" (broad), "Explore the interactive version of this chart" (single chart), "Explore this data going back to YYYY in our interactive chart" (single chart with date depth).
Image filename — YYYY-MM-data-update-<slug>.png (e.g. 2026-04-data-update-h5n1-flu.png). The skill doesn't generate the image; the user adds it to the Doc separately.
Save the draft to workbench/<short_name>/data-update.md.
Add a collapsed <details> section titled "Data update post (for OWID /latest)" to the PR description, placed after the Slack-announcement section, with the file content embedded inside a triple-backtick markdown fence.
Tell the user, with a markdown link to the saved file so they can click through to open it: "Data update post drafted at [workbench/<short_name>/data-update.md](workbench/<short_name>/data-update.md) in the Google Docs CMS format. Please create a new Google Doc in /Data updates, paste the draft, attach the chart screenshot, and share for review." Always render workbench/<short_name>/data-update.md as a markdown link […](…) rather than as a bare path or inline-code path — the chat UI renders it as clickable that way.

Codex review: address comments and resolve threads

Codex's delivery channel depends on the verdict — poll both. A clean pass arrives as an issue comment ("Didn't find any major issues") from chatgpt-codex-connector[bot], with zero inline comments and no formal review object. A review with findings arrives as a formal review ("💡 Codex Review") with inline comments, and no issue comment. A watcher polling only one channel waits forever on the other outcome — treat a hit on either as completion.

Wait ~60 seconds after posting @codex review, then poll both channels:

gh api repos/owid/etl/issues/<pr_number>/comments | python3 -m json.tool   # clean-pass summary lands here
gh api repos/owid/etl/pulls/<pr_number>/comments | python3 -m json.tool    # findings land here as inline comments

Codex posts in one of two places — always check both. When it finds issues, it leaves inline review comments (the endpoint above) with resolvable threads. When it finds nothing, it posts a single top-level PR (issue) comment instead — no inline comments, no threads — e.g. "Codex Review: Didn't find any major issues. Keep it up!". So if the inline-comments endpoint is empty, check the issue comments before concluding Codex hasn't run yet:
```
gh api repos/owid/etl/issues/<pr_number>/comments \
  --jq '.[] | select(.user.login | test("codex";"i")) | .body'
```
A "no issues" / 👍 comment from chatgpt-codex-connector[bot] means the review is done and there's nothing to address — don't keep polling for inline comments that will never come.

Fetch open review thread IDs via GraphQL:

gh api graphql -f query='{ repository(owner:"owid", name:"etl") { pullRequest(number:<pr_number>) { reviewThreads(first:20) { nodes { id isResolved comments(first:1) { nodes { body } } } } } } }'

For each unresolved Codex comment:

If valid: apply the fix, commit, push, then resolve the thread:

gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"<thread_id>"}) { thread { id isResolved } } }'

If not valid / not applicable: reply explaining why, then resolve the thread:

gh api repos/owid/etl/pulls/<pr_number>/comments/<comment_id>/replies -f body="<explanation>"
gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"<thread_id>"}) { thread { id isResolved } } }'

If neither the inline-comments endpoint nor the issue-comments endpoint shows a Codex post after 60 s, wait another 60 s and retry (up to ~5 min total). Codex can take 5–10 min — a clean review often arrives only as the top-level "no issues" comment.

Committing and pushing

Commit and push incrementally as you go — after each step that produces code changes. Don't wait until the end. Use descriptive commit messages with appropriate emojis (the one auto-prepended by etl pr for the chosen category + 🤖 for AI-written code).

At the end of the workflow, update the PR description with:

A tracking-issue link as the first line of the Summary — e.g. Tracks: [owid/owid-issues#NNNN](https://github.com/owid/owid-issues/issues/NNNN). Most data updates have a corresponding owid-issues ticket; try to find it by searching the title or <short_name> first, and ask the user for the issue number if you can't locate one rather than skipping the link silently.
A summary of key changes at the top
Collapsed <details> sections only for the pipeline steps that changed in a non-obvious way. Skip any step that's just the boilerplate generated by etl update — don't add a placeholder like "unchanged from boilerplate". The Summary already explains the why; per-step sections are only for the how, when the how isn't obvious from the diff.
A collapsed section for the Slack announcement

Downstream dependency check

After completing the update, check if any other datasets depend on the old version of the updated dataset:

rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"

Filter out the old dataset's own DAG entries (snapshot → meadow → garden → grapher chain). Any remaining references are downstream dependents that still point to the old version.

If downstream dependents exist:

Tell the user which datasets depend on the old version and need updating in a follow-up PR
Add a "Downstream dependencies" section to the PR description (not collapsed — this is important) listing the dependent datasets with a note that they should be updated to point to the new version in a follow-up PR

Removing the old version & reordering the DAG

After the ETL update, etl update appends the new version entries to the bottom of the main DAG file while the old version's entries stay in their original slot. Always ask the user whether to remove the old version — but never skip this checklist item, and when the user agrees, always do the reorder too.

Workflow when the user agrees:

Delete the old version. Remove its entries (snapshot → meadow → garden → grapher) from the main DAG file (e.g., dag/poverty_inequality.yml) and delete its files (etl/steps/..., snapshots/...). The archive dag (dag/archive/*.yml) is not edited by hand — etl archive-dag reconstructs it from git history, recording each removed step with the commit where it was last active (for recovery via git checkout).
Move the new entries into the old slot so the dataset stays grouped with its neighbours and section comment. The new entries should not remain at the bottom of the main DAG.
Preserve the original section comment (same indentation as the old block) above the new entries.
Prefer the nested (compact) DAG format. etl update emits the flat form (each step a separate top-level key with a flat dep list); the loader (etl/dag_helpers.py:_parse_dag_yaml) also accepts the nested form, where the chain is declared inline and flattens to the same graph. The nested form is the team's preferred style and is usually what the archived old block already used:
```
data://grapher/<ns>/<v>/<short>:
  - data://garden/<ns>/<v>/<short>:
    - data://garden/regions/2023-01-01/regions
    - data://meadow/<ns>/<v>/<short>:
      - snapshot://<ns>/<v>/<short>.csv
```
Convert the relocated new entries to nested while reordering, so the active and archived blocks match. Verify it parses with python -c "from etl.dag_helpers import load_dag; load_dag()" (a malformed nesting raises).
Verify: rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive" returns nothing, and rg "<namespace>/<new_version>/<short_name>" dag/ -g "*.yml" shows the entries only in the main file (under the section comment), not at the bottom.
Run make check and commit with 🔨🤖 Remove old <name> entries and reorder DAG.

Expect a Codex false-positive on the archive edit. Because this step touches dag/archive/*.yml, Codex often flags it ("avoid updating archived DAG entries" — the AGENTS.md rule against editing archived files). This is expected: archiving is the explicitly-requested workflow step, and the rule's own "unless explicitly asked" exception applies. Reply citing that and resolve the thread — don't revert the archive.

Final QA hand-off — Anomalist + Chart Diff in Wizard

This is the last step, after the DAG archive has been committed. Don't auto-run these — they're human-judgment tools. Hand off the two staging links so the user can review and click through:

Anomalist — flags variables whose new values diverge from the old version beyond statistical thresholds. Catches accidental scale changes, base-year rebases that propagated the wrong way, and silent drops.
```
http://staging-site-<container_branch>/etl/wizard/anomalist
```
Check the upgrade detectors' coverage before handing off. Anomalist's upgrade_missing / upgrade_change detectors only compare old→new variable pairs from the wizard's variable-mapping table — and the indicator upgrader persists mappings only for charted indicators. If only some of the dataset's indicators are used in charts (the common case), the upgrade detectors silently skip the rest, and a partial mapping suppresses the shortName-inference fallback that would otherwise cover everything. Verify with WizardDB.get_variable_mapping_raw(): if it has fewer pairs than the dataset has indicators, rebuild the full mapping by shortName (old vs. new variables rows by datasetId) and re-run:
```
STAGING=<branch> .venv/bin/etl anomalist --anomaly-types upgrade_missing --anomaly-types upgrade_change \
    --dataset-ids <new_dataset_id> --variable-mapping '<full json mapping>' --force
```
Then spot-check the stored anomalies.dfReduced rows include indicators beyond the charted ones.
Chart Diff — shows side-by-side before/after thumbnails for every chart that uses an upgraded indicator. Catches visual regressions the schema-level checks miss (axis ranges, color steps, legend changes).
```
http://staging-site-<container_branch>/etl/wizard/chart-diff
```

Important: derive <container_branch> correctly. The staging hostname is not simply staging-site-<branch>. The container name is produced by get_container_name(branch) in etl/config.py:

Replace /, ., _ with - in the branch name.
Strip a leading staging-site- if present.
Truncate to the first 28 characters (Cloudflare DNS limit).
Strip any trailing -.

Branches over 28 chars therefore get clipped. Example: data-military-expenditure-2026 (30 chars) → container data-military-expenditure-20 → hostname staging-site-data-military-expenditure-20. The simplest way to get the correct value is to call the helper:

.venv/bin/python -c "from etl.config import get_container_name; print(get_container_name('<branch>'))"

Tell the user something like: "Final QA: please review Anomalist and Chart Diff in the Wizard. If anything looks off, let me know and I'll investigate."

These pages need a fresh staging build, so they're only meaningful after the PR's grapher upload to staging has completed and the staging server has rebuilt.

Guardrails and tips

END_YEAR / "as of" framing for status/event datasets. When a dataset records events (and derives a status time series) and its latest event year lags the release date, you face a choice: forward-fill the latest status to the release year, or stop the series at the last event year and note the "as of" date in metadata. Prefer the latter — forward-filling invents data points for years with no source information (and shifts an END_YEAR-style constant ripples through the whole series). Keep the series at the last real year and add the currency note to description_processing and a description_key bullet (e.g. "The legal status shown for each country reflects the situation as of ."). Confirm the choice with the user; they may change their mind (in this update we forward-filled to the release year, then reverted to the last event year + an "as of" note).
Re-test "manual upload" snapshots — the blocking may be inverse-UA. When a snapshot's docstring says the file is uploaded manually because "the website blocks the download request", verify that claim before carrying it into the new version. Some hosts (e.g. the IMF Datamapper) reject browser-like User-Agents with 403 while letting plain, honestly-identified clients through — the inverse of the usual bot-blocking — and the ETL downloader's default UA (DEFAULT_USER_AGENT in etl/download_helpers.py) is browser-like, so the original author may have misdiagnosed an automatable source. Test both directions (plain requests vs. browser UA) against the direct file URL. If the plain UA works: set url_download in the .dvc and pass user_agent="owid-etl/1.0 (https://ourworldindata.org)" (or similar plain UA) to snap.create_snapshot(...). Keep the snapshot .py script in that case — the script-less .dvc-only path (run_snapshot_dvc_only) calls create_snapshot() without a user_agent and would 403 — and say so in the docstring so nobody deletes it as "redundant".
DAG consistency: After etl update, always verify that all new steps in dag/main.yml reference each other with the new version. A common bug is garden depending on old meadow or old snapshot — this silently loads stale data.
Never return empty tables or comment out logic as a workaround — fix the parsing/transformations instead.
Column name changes: update garden processing code and metadata YAMLs (garden/grapher) to match schema changes.
Indexing: avoid leaking index columns from reset_index(); format tables with tb.format(["country", "year"]) as appropriate.
Metadata validation errors are guidance — update YAML to add/remove variables as indicated.
Mixed-type object columns at meadow: when pd.read_csv produces an object column that mixes strings and NaN (common for sparse text columns like sources/comments/punishments), the feather repacker rejects it. Cast those columns to pandas "string" dtype before tb.format(...).
paths.regions auto-resolves DAG dependencies: paths.regions.add_population(tb) and paths.regions.add_aggregates(tb, regions=[...]) pick up the population and income_groups datasets directly from the DAG. Don't paths.load_dataset("population") and pass it through unless the helper specifically asks for the dataset — the parameter is unused.
WB income-group aggregates: add the four classification names (High-income countries, Upper-middle-income countries, Lower-middle-income countries, Low-income countries) to your REGIONS list and add data://garden/wb/<latest>/income_groups to the DAG. paths.regions.add_aggregates(...) auto-resolves the classification.
Detect structural placeholders dynamically: when a source ships "balanced panel" rows that are zero everywhere by design (status combos that exist only for completeness), detect them at runtime (groupby(...).max() == 0) and assert the count matches the codebook. A coding change in the source then surfaces as a test failure instead of silently shipping noise.
Codebook-vs-data inconsistencies: when the codebook documents one thing but the actual CSV shows another (placeholder claimed but non-zero rows present, etc.), preserve the data as-shipped and flag it in the PR description for the producer to confirm. Don't silently force the data to match the codebook.
Grapher .meta.yml only when it adds something: the grapher step inherits everything via default_metadata=ds_garden.metadata, so drop the grapher .meta.yml if it only duplicates the garden values. Keep it only for genuine grapher-side overrides.
processing_level: major requires description_processing: keep processing_level: minor as the common default and override to major only on indicators that have a description_processing field. Don't blanket-set major on the common block and then leave country-level proportions without their own processing note.
Per-indicator description_processing reads better than a generic shared note: when an indicator is derived (combined-categorical buckets, regional aggregates, computed counts), spell out that indicator's derivation. Reusing named definitions for shared boilerplate is fine; just compose them into per-indicator sentences rather than dropping a single generic note across all indicators.
description_key in definitions.common propagates only to indicators without their own list: if you want a bullet to appear on every indicator, either keep it on common.description_key and don't define per-indicator lists (it inherits), or prepend it explicitly to each per-indicator list (treats it as a "first bullet" pattern).
Phantom-category audit on categorical indicators: after building categorical indicators, sweep every indicator and compare YAML sort: labels against the unique values that actually appear in the data. Phantom labels (declared in sort: or in a category map but never produced) clutter chart legends with empty buckets. Either drop them from sort: and description_key, or remove them from the map if they can never occur given the data shape. Re-run the audit on every data refresh — phantoms can reappear when a category is dropped upstream.
NOTE: comments for the next maintainer when behaviour is data-conditional: when something in the code holds only because of the current data shape (e.g. "only 4 indicators have an EoE=0 row", "only Brazil 2025 is a transition-year artefact"), leave a # NOTE: comment near the relevant block asking the next data update to re-audit. Helps future maintainers spot which assumptions might decay before they bite.
Indicator Upgrader CLI for one-shot chart remaps: when v1 → v2 short_names change so much that the auto-upgrader can't match them, drive the remap manually. Write a small script that calls WizardDB.add_variable_mapping(mapping={old_id: new_id, ...}, dataset_id_old=..., dataset_id_new=..., comments="...") with the explicit pairs, then run from apps.indicator_upgrade.upgrade import cli_upgrade_indicators; cli_upgrade_indicators(dry_run=True) to preview affected charts, and (dry_run=False) to apply. Mappings stay in the wizard DB until WizardDB.delete_variable_mapping() is called, so a slug-collision failure can be recovered by fixing the slug and rerunning the upgrade — only un-upgraded charts get reattempted. The active staging DB is inferred from the current git branch.
Drop-in vs restructure decision point: when the new dataset has a different shape (long vs wide, more policies, changed score semantics, dropped composite measures), etl update --rename is the wrong starting point — the structure of meadow/garden/grapher needs to follow the new shape, and the rename flow will only produce confusion. Spot this fork early at the snapshot/codebook stage, before running etl update. Scaffold the new chain via the create-etl-steps skill (wraps the wizard's cookiecutter templates) or launch the wizard UI with etlwiz and use its "ETL Steps" page — both produce a consistent meadow/garden/grapher skeleton to fill in. Once scaffolded, read the v1 scripts as a reference for the source-specific logic that's still relevant (column-rename maps, status/category normalisations, country harmonisation map, sanity checks, codebook-driven structural assertions) — don't copy the v1 structure blindly, but port the bits that still apply to the new schema.

When the update is review-heavy and you need iterative back-and-forth with a topic owner over staging, see the report-indicator-changes skill for drafting the message.

Artifacts (expected)

workbench/<short_name>/snapshot-runner.md
workbench/<short_name>/progress.md
workbench/<short_name>/notes_to_check.md (one entry per carried-over # NOTE: / # TODO:, plus detected sanity_checks functions and their log-control flags)
workbench/<short_name>/sanity_checks.log (only if step 5b ran)
workbench/<short_name>/meadow_diff_raw.txt and meadow_diff.md
workbench/<short_name>/garden_diff_raw.txt and garden_diff.md
workbench/<short_name>/harmonization.log and harmonization_audit.md (from step 5c)
workbench/<short_name>/indicator_upgrade.json (if indicator-upgrader was used)
workbench/<short_name>/update-context.yml (canonical facts gathered during the update; consumed by data-updates-comms)
workbench/<short_name>/slack-announcement.md
workbench/<short_name>/data-update.md (public-facing post draft for OWID /latest, from step 9b)

Example usage

Minimal catalog URI with explicit old version:
- update-dataset data://snapshot/irena/2024-11-15/renewable_power_generation_costs 2023-11-15 update-irena-costs

Common issues when data structure changes

SILENT FAILURES WARNING: Never return empty tables or comment code as workarounds!
Column name changes: If columns are renamed/split (e.g., single cost → local currency + PPP), update:
- Python code references in the garden step
- Garden metadata YAML (e.g., food_prices_for_nutrition.meta.yml)
- Grapher metadata YAML (if it exists)
Index issues: Check for unwanted index columns from reset_index() — ensure proper indexing with tb.format(["country", "year"]).
Metadata validation: Use error messages as a guide — they show exactly which variables to add/remove from YAML files.