name: agami-reconcile
description: "Reconciles known (label, expected_value) numbers from an existing dashboard against agami's answers. Input can be a SCREENSHOT of a Metabase / Power BI / Tableau / Looker dashboard (Claude's vision extracts the pairs), a CSV, or numbers pasted inline — the user doesn't need to know which; they can just ask. For each pair, the skill generates a matching NL question, runs it through the active profile's semantic model, diffs actual vs expected, and surfaces matches in green and mismatches in red with drill-down receipts. The strongest onboarding demo for a skeptical data engineer — either we agree with their numbers (trust earned via evidence) or we surface a real definitional disagreement (trust earned via transparency)."
when_to_use: "Use when the user says 'reconcile against this dashboard', 'do these numbers match?', 'validate against my Tableau export', '/agami-reconcile ', drops a screenshot of a BI dashboard (Metabase/Power BI/Tableau/Looker/spreadsheet) and asks agami to reproduce the numbers, or pastes a CSV / table of known numbers. Requires agami-connect to have been run first (need a semantic model + examples library). A high-leverage validation surface for a skeptical data team — reproduce their dashboard numbers, or surface the definitional gap."
argument-hint: "<screenshot | path-to-csv | pasted numbers>"
agami reconcile
You are running the reconciliation harness. Goal: take labeled numbers from an existing dashboard (Tableau / Looker / Mode / Metabase / Power BI / spreadsheet) — most often a screenshot, sometimes a CSV or pasted list — and prove agami can reproduce each number. When numbers match, that's evidence the semantic model is right. When they don't, the receipt drill-down explains why — typically a definitional disagreement (gross vs net, refunds in vs out, FX rate at booking vs reporting date) — which is exactly the trust signal that makes a DE relax.
This skill orchestrates:
- Extract the
(label, expected_value)pairs from the input — a dashboard screenshot (via vision, confirmed with the user), a CSV, or a pasted list. Number parsing is always deterministic (reconcile.py). - Generate a matching NL question for each label.
- Run each question through the same NL→SQL→execute pipeline as agami-query.
- Diff actual vs expected with a tolerance.
- Present a markdown table with per-row status; for mismatches, render the full receipt as a drill-down so the user can find the definitional disagreement.
Spec for the deterministic helpers: scripts/reconcile.py (CSV parser + number normalization + diff with tolerance).
Conversation style
- Tight loops. This skill is a tool, not a tutorial. One question per turn, max two sentences of prose between phases.
- Surface mismatches loud. A reconcile run with 9/12 matches and 3 mismatches is a SUCCESSFUL run — the mismatches are the value. Lead with what didn't match.
- Don't paste raw SQL in chat. The receipt has it. Same hard rule as agami-query.
Phase 0: Preflight
Same checks as agami-query / agami-connect:
- Plan-mode check per
shared/plan-mode-check.md. This skill needs Bash + Read + Write — refuse if locked in plan mode. DO NOT write a plan file. DO NOT callExitPlanMode. Refusal text: "I can't reconcile in plan mode — each row runs a live query and writes a receipt. Switch to Auto or Edit Automatically mode (Shift+Tab to cycle) and re-invoke me with the CSV path." - Credentials present — read
<artifacts_dir>/local/credentialsfor the active profile. If missing, invoke/agami-connectto set up first; this skill needs a working DB connection. - Model present —
<artifacts_dir>/<profile>/org.yamlmust exist. If not, invoke/agami-connect. This skill needs an introspected model to generate questions against. - Input — accept any of three shapes; the user needn't know which. Detect what they gave:
- A screenshot / image of a dashboard (Metabase, Power BI, Tableau, Looker, a spreadsheet) — the common case. Go to Phase 1's vision branch.
- A CSV — a path in
$ARGUMENTS, or pasted inline (write inline CSV to/tmp/agami-reconcile-<ts>.csv). Go to Phase 1's CSV branch. - Numbers pasted inline as a list/table — treat as inline CSV.
If they gave nothing (or just asked "can you check my dashboard?"), ask once, welcoming all three: "Show me the numbers you want to check against — easiest is a screenshot of your dashboard (Metabase, Power BI, Tableau, a spreadsheet — whatever you have), but a CSV or a pasted list of
label: valueworks too." Don't make them figure out an export format.
- If they gave a file path, validate it exists. If not, surface the error and stop.
Phase 1: Extract the (label, value) pairs
Whatever the input shape, the goal is the same normalized rows JSON. Number parsing is always deterministic — it goes through reconcile.py, never the LLM eyeballing a value (a misread expected number would manufacture a false mismatch on a verification surface).
Vision branch — a dashboard screenshot
- Read the image and extract every labeled number you can see — KPI tiles, table cells, chart value labels — as
(label, raw_value)pairs. Keep the label the user would recognize ("Total Revenue", "Active Users — Apr"), and the value exactly as shown, verbatim ($4.2M,₹2.16Cr,42%,1,234) — don't convert it; the normalizer does that. - Write the pairs as a 2-column CSV (Write tool — never a heredoc/
python3 -c) to/tmp/agami-reconcile-<ts>.csv, then run the SAME normalizer as the CSV branch (below) so value parsing stays deterministic. - Confirm before reconciling — vision can misread. Show the extracted pairs as a small table and ask the user to fix any misread label/number: "I read these N numbers off your screenshot — correct anything I got wrong, then I'll reconcile." This confirm step is mandatory: a wrong expected-value isn't a model bug but it reads like one. If a tile is ambiguous or partly cut off, say so and skip it rather than guess.
CSV branch — a CSV path or inline-pasted CSV
python3 "$AGAMI_PLUGIN_ROOT/scripts/reconcile.py" parse --csv "<csv_path>" \
> /tmp/agami-reconcile-rows-<ts>.json
The helper (used by both branches) handles:
- Header detection (with-or-without first-row column names)
- 2-column or 3+ column inputs (3rd onward are appended to the label as context)
- Currency symbols / magnitude suffixes / accounting parens / percent (
$4.2M,₹2.16Cr,(123.45),42%) - Null sentinels (
n/a,—, blank)
Read the JSON. Each row is {label, expected_value, raw_value}. Discard rows where expected_value is null (unparseable) — surface a one-liner: "Skipped 2 rows where the value couldn't be parsed: 'X', 'Y'."
Surface to the user:
Parsed
<N>rows from<the screenshot / csv_path>. Reconciling now — typically<N> × 5–15sper row depending on query latency.
Phase 2: Generate questions + execute
For each row in the parsed list:
2a — Generate the NL question
Use the LLM to translate label (+ context if present) into the most natural English question whose answer should be expected_value. Examples:
| label | question |
|---|---|
Q3 2025 Revenue |
"What was total revenue in Q3 2025?" |
Active customers (Apr 2026) |
"How many active customers did we have in April 2026?" |
Pipeline value (open opps) |
"What's the total pipeline value across open opportunities?" |
Mean order size last 30 days |
"What's the average order size over the last 30 days?" |
The semantic model + examples library are loaded; let the LLM pick the right subject areas / entities / metrics that resolve the labeled term to a concrete query.
2b — Run via the agami-query pipeline
Invoke the same SQL-generation + execution path agami-query uses (Phases 2 + 3 of that skill — see agami-query/SKILL.md). Capture:
- The generated SQL
- The result (should be a single scalar, or a single row)
- The full chart-template HTML report (so the user can drill in for mismatches)
- The trust receipt (with confidence, signed-off-by, etc.)
If the SQL fails OR the result isn't a single scalar (e.g., the LLM-generated question returned a multi-row table), capture an error: Could not extract a single scalar from the result. These rows show up as error status in the report.
2c — Diff
python3 "$AGAMI_PLUGIN_ROOT/scripts/reconcile.py" diff \
--expected "<expected_value>" \
--actual "<actual_value_from_query>" \
--tolerance 0.01
Default tolerance: ±1%. The user can override with tolerance=N% in their original ask (e.g., "reconcile with 5% tolerance"). Tolerance applies to numeric comparisons; for text values (rare), use exact match.
Capture: match (bool), delta, delta_pct.
2d — Build the row record
Per row:
{
"label": "<from CSV>",
"question": "<LLM-generated NL question>",
"expected": <number>,
"actual": <number or null if errored>,
"delta_pct": <signed fraction or null>,
"match": true | false,
"status": "match" | "mismatch" | "error",
"report_path": "<artifacts_dir>/local/charts/<profile>/<ts>.html", // the full chart report for this query
"error": "<message if status=error, else null>"
}
Append all records to /tmp/agami-reconcile-results-<ts>.jsonl so the user can inspect later.
Phase 3: Present
3a — Summary line first
Reconciled <N> numbers: <M> match (within ±1%), <K> mismatch, <E> error.
3b — Mismatches table (lead with what didn't match)
Render the mismatches as a markdown table BEFORE the matches:
### Mismatches
| Label | Expected | Got | Δ | Drill-down |
|---|---:|---:|---:|---|
| Q3 2025 Revenue | $4,200,000 | $3,890,000 | -7.4% | <artifacts_dir>/local/charts/<profile>/...html |
| Active customers (Apr) | 12,450 | 11,920 | -4.3% | <artifacts_dir>/local/charts/<profile>/...html |
Cell formatting:
- Numbers carry the same currency / magnitude suffix as the input where unambiguous (echo the user's
raw_valueforExpected, formatGotwith the same shape). Δis the signed percent (red wins emphasized in chat by ✗ prefix if Markdown rendering allows; otherwise plain text).Drill-downlinks to the chart-template HTML for that query. Open these to see the full receipt — that's where the definitional disagreement lives.
For each mismatch row, surface a one-line interpretation under the table:
Q3 2025 Revenue — agami reports $3.89M, your dashboard says $4.2M (-7.4%). Open the receipt; the metric
revenuehere is gross of refunds in USD at invoice date. If your dashboard nets refunds, that's the gap.
This is where the trust win lands. The DE doesn't have to chase the disagreement — the receipt + your interpretation does it for them.
3c — Errors block (if any)
### Errors
| Label | What went wrong |
|---|---|
| Pipeline value (open opps) | Could not extract a single scalar — the question returned 47 rows. Try rephrasing or check that the metric exists in the model. |
3d — Matches summary (last, compact)
### Matches (within ±1%)
7 numbers reproduced cleanly: Q3 2025 Orders, MoM growth, Avg order size, Top customer, Customer count by region, Refund rate, Pipeline count.
Don't dump every match's drill-down — they're not interesting. The matches build the case; the mismatches drive the conversation.
3e — Closing prompt
Re-run with `tolerance=5%` to see softer matches, or open any drill-down to find the definitional gap.
End the turn. The user typically:
- Opens a mismatch's drill-down, finds the definitional gap, says "the dashboard is gross-of-refunds; can we update the metric?" — chain into
/agami-save-correctionto update the metric definition. - Asks
tolerance=5%to widen the matches. - Asks for a different CSV.
Hard rules
- No automatic question generation for ambiguous labels. If the label is too short or too vague (e.g.,
Total,Number,Value), surface to the user: "Row 5's label is just 'Total' — too ambiguous to translate to a question. Skipping. Add more context to the CSV (e.g.,Total Revenue Q3instead ofTotal) and re-run." Don't guess. - Receipt is non-optional. Every per-row run MUST produce a chart-template HTML report with the trust receipt — that's what the drill-down link points at, and it's what makes mismatches actionable. If the underlying query path can't produce a receipt (legacy pre-trust-layer model), refuse with: "This profile pre-dates the trust-layer launch. Re-run
/agami-connectto enable receipts, then retry." - Don't write to the semantic model from this skill. Reconcile reads + diffs; it never mutates. If a definitional disagreement surfaces and the user wants to update the metric, route them through
/agami-save-correction. - CSV stays local. Don't upload, don't summarize-and-send. The reconcile run produces local artifacts (
/tmp/agami-reconcile-results-*.jsonl+ the per-query chart HTML) and nothing leaves the machine.
Error handling cheat sheet
| Symptom | Action |
|---|---|
<csv> doesn't exist |
Refuse with one line: "File not found: <path>." |
| CSV has 0 parseable rows | Refuse: "No rows with parseable numeric values. Common cause: the value column has formatting like $1,234.56 (USD) — try simplifying to 1234.56." |
| Every row errors out | Surface a meta-error: "All /agami-connect reintrospect if your schema changed; check the model has the relevant tables." |
| Single mismatch but huge delta (> 100%) | Note in the interpretation: "The delta is large enough to suggest a unit mismatch (cents vs dollars, count vs percentage) rather than a definition gap. Check agami.unit on the relevant field." |
| User pastes inline CSV instead of a path | Accept it. Write to /tmp/agami-reconcile-pasted-<ts>.csv and proceed. |
| Screenshot is blurry / a value is cut off / can't read a tile | Don't guess the number. Extract what's legible, and tell the user which tiles you skipped: "Couldn't read 'Pipeline value' clearly — re-snip it or type that one in." |
| User says "reconcile my dashboard" but attaches nothing | Ask for the screenshot (or CSV / pasted numbers) per Phase 0.4 — don't proceed without the expected numbers. |
Hard rule for screenshots
The screenshot is an image of numbers, and a misread expected value reads exactly like a model bug. So: (1) the value is parsed by reconcile.py, never by eyeballing; (2) the extracted (label, value) table is always confirmed with the user before any query runs (Phase 1 vision branch). The image stays local — same as the CSV (Hard rule #4); it's never uploaded or summarized off-machine.
Roadmap (not in v1)
- Tableau / Looker / Mode export parsing — parse
.twb/.twbx/ JSON exports directly (today a screenshot of any of them already works via the vision branch). - Recurring reconcile runs — wire into
agami testso the golden-test suite includes reconciliation against a pinned dashboard.