name: datagol-data-query
description: Generate and execute ad-hoc SQL against a DataGOL-registered data source, then render the result inline in chat. Triggered by "show me X from my Postgres", "query my database for Y", "how many Z are in my data source", "what's the top N of W", "run a SQL query against ", or any analytical question that targets a registered data source. Loads the source's schema, generates the SQL, posts it to /dataSources/api/v1/internal/queryResults, and shows the rows. Pairs with datagol-data-publish when the user wants the result saved as a workbook instead of (or in addition to) being shown right now.
DataGOL Data Query — Generate SQL, Execute, Render
This skill lets the agent answer data questions by writing and running SQL against any data source the user has registered via datagol-data-connector. It's an agent-runtime skill — the agent itself fetches the schema, generates the SQL, calls the query API, and includes the result in its chat response. It does not scaffold a UI in the generated app; for that, use datagol-data-publish (creates a queryable workbook) or build a CRUD UI on top of the published workbook with datagol-app-development.
Read
datagol-data-connectorfirst. This skill assumes the user has already registered a data source and you have itsdata_source_id. Schema discovery (getDataSourceSchema) lives in the parent.
⛔ STOP — read this template before generating anything
Every single answer this skill produces follows the same shape. No exceptions, no creative reordering.
1. ```sql
<the SQL the agent ran>
<prose summary OR Markdown table OR single number — the actual answer>
<follow-up question, ONE line, only when relevant>: "Want me to chart this?" ← only if the data could benefit from a chart AND the user didn't already ask for one
That's the entire response shape. Three blocks, in this order, every time. **Do not** write a chart file (`*.html`, `*.svg`) in this same turn unless the user's *current* message contains an explicit visual ask (chart / plot / graph / visualize / draw / dashboard / "make it visual"). Saying *"All data in. Now let me build the chart."* and creating an HTML file is **wrong** — it skips the answer (block 2) and assumes a request the user didn't make. Stop after block 3 and wait.
### Anti-patterns (don't do these)
❌ **Skip the data, jump to the chart.** *"Got the rows. Building the chart now."* + write_file `chart.html`. The user can't read raw rows; you owe them block 2 first. And you owe them silence after the question, not a chart they didn't ask for.
❌ **Bury the SQL in the curl tool row.** The chat's `Running` tool row truncates after ~200 chars and is collapsed by default. The user can't review or copy the SQL from there. The fenced ```sql block in your prose is the contract.
❌ **"They'll probably want a chart."** No. If they wanted one, they would have said so. Suggest, don't pre-build.
✅ **Good.** SQL block → 5-row Markdown table or one-line summary → *"Want me to chart this?"* → STOP.
---
## When to use
- "How many films are in my Postgres database?"
- "Show me the top 10 customers by revenue from my warehouse."
- "What's the distribution of payment amounts last month?"
- "Run this query against my data source: `…`"
- "Sample 50 rows from the `actor` table."
When **not** to use:
- The data isn't in a DataGOL-registered SQL source — it's in a workbook. Use `datagol-workbook-operations` instead.
- The user wants to **save** the result, not just see it. Use `datagol-data-publish`.
- The user wants to repeatedly query the same shape from a generated app's UI. Publish first, then build a workbook UI with `datagol-app-development`.
## Response composition (read this before generating anything)
Three rules that override anything below if they conflict. They apply on every single answer this skill produces.
### 1. Always show the executed SQL
Echo the SQL the agent executed in a fenced ```sql``` block in your prose answer — *every time*, with no exceptions. The chat panel may also show a `Running` tool row that *contains* the SQL inside a curl, but that row is collapsed/truncated and isn't a substitute. Users need to be able to read, copy, and edit the query without expanding tool args. Place the SQL block in the prose, before the result.
```sql
SELECT product_name, units_sold, revenue
FROM datawarehouse_plsql.public.sales_summary
ORDER BY units_sold DESC
LIMIT 5
2. Prose / table first, visuals last
When the answer has both a textual component (prose summary, Markdown table, single number) and a visual (chart, HTML, SVG), render the textual answer first and the visual at the end. Order in the rendered chat must be:
- The SQL block (rule 1)
- Prose summary or Markdown table — the answer the user can read without rendering anything
- Then the chart / HTML file (which the chat panel renders inline as an iframe under the
Writing chart.htmltool row)
Practically: write your prose response before calling the file-write tool. The agent's natural pattern of "build first, then announce" produces the wrong order.
3. Don't build a visual unless asked
If the user's question can be answered with a number, a sentence, or a small table — answer that way. Do not auto-create a chart / HTML file. Charts are heavyweight (file write, sandbox HMR, iframe render); they're worth it only when the user explicitly asked for one or when the data really demands one (5+ dimensions, time series, etc.).
When you think a visual would help but the user didn't ask, end your prose answer with one short follow-up — "Want me to chart this?" — and stop. Don't build the chart in the same turn. Wait for confirmation.
Triggers that count as the user asking for a visual:
- "Chart / plot / graph / visualize / draw …"
- "Show me … as a bar / line / pie chart"
- "Make a dashboard of …"
- "I want a visual of …"
Anything else is not a visual request, even if a chart would be neat.
The 4-step flow
1. Identify the target data source → read `Data Connectors` workbook, pick by name or ask user
2. Fetch schema → GET /dataSources/api/v1/dataSources/{id}/schemas
3. Generate SQL grounded in the schema → use the FROM convention, ALWAYS scope to known tables
4. Execute and render → POST /dataSources/api/v1/internal/queryResults
Step 1 — Identify the data source
Ask the user which registered data source to query — by name. If they've used this conversation to register a source already (via datagol-data-connector), you already have the data_source_id; reuse it without asking again.
If the user is in a generated app (Flow B in the parent) and the app has a Data Connectors workbook, you can list candidates from there — but the workbook is a Flow-B-only artefact and doesn't exist for the default chat-only flow. Don't presume it. The conversation context (or just asking the user) is the source of truth.
If a list endpoint becomes available on DataGOL (GET /dataSources/api/v1/dataSources or similar), prefer it; until then, work from what you know in conversation.
Step 2 — Fetch schema
Per the parent skill, getDataSourceSchema(dataSourceId). Cache the response in memory for the rest of the conversation. If the user mentions a table the schema doesn't have, refetch — DBs evolve.
Step 3 — Generate the SQL
Always qualify FROM clauses as <dataSourceName>.<schema>.<table> — that's the DataGOL convention (the execution layer routes via data-source name, not the actual DB name). The dataSourceName comes from the registered name of the data source.
If the data source name contains spaces, wrap it in double quotes:
-- Name with spaces → double-quote the data source name
SELECT * FROM "datawarehouse demo".public.actor LIMIT 100
-- Name without spaces → no quoting needed
SELECT * FROM datawarehouse_plsql.public.actor LIMIT 100
Not dvd_rental.public.actor, not public.actor, not actor. Get this wrong and the query fails with a "table not found" or "cross-database reference" error.
Quoting + identifier rules: stick to ANSI SQL where possible. Postgres quoting is "identifier" (double quotes); MSSQL accepts both [brackets] and "quotes". The datagol-data-query skill should match the dialect of the source (read provider from the connector row). For the v1 Postgres-only flow, double-quote anything case-sensitive:
SELECT "first_name", "last_name" FROM "datawarehouse demo".public.actor
Defensive limits. Always include LIMIT (Postgres / MySQL), TOP (MSSQL), or FETCH FIRST n ROWS ONLY (Snowflake / standard SQL) for any query that could return a lot of rows. Default to LIMIT 1000 for ad-hoc exploration. If the user asks for "all", confirm first — Postgres tables with millions of rows will time out.
Read-only. Only generate SELECT and WITH … SELECT. Never generate INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, TRUNCATE, GRANT, REVOKE. The data-source user might have write permission, but this skill exists to read. If the user asks to modify, refuse and point them at the right path (workbook writes via datagol-workbook-operations).
Step 4 — Execute via /queryResults
curl -X POST "${DATAGOL_BASE_URL}/dataSources/api/v1/internal/queryResults" \
-H "x-auth-token: ${DATAGOL_SERVICE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"sqlQuery": "select * from datawarehouse_plsql.public.actor limit 50",
"dataProvider": "JDBC"
}'
Body fields:
sqlQuery(required) — the SQL string. The endpoint takes one statement at a time.dataProvider(required) — always"JDBC"for v1 (covers Postgres, MSSQL, MySQL).
Optional header: the test backend's web UI sends spark_job_group_id: <uuid> for cancel-tracking. The endpoint accepts the call without it; include only if you need to cancel a long-running query mid-flight (out of scope for v1).
Response shape (TBD with the backend; the helper below decodes the common variants):
// src/api/dataQuery.ts
import { dgFetch } from './datagol';
export interface QueryResult {
columns: Array<{ name: string; type?: string }>;
rows: Array<Record<string, unknown>>;
// Some variants return `data` instead of `rows`, or a flat 2D array.
rowCount?: number;
}
export async function runSql(sqlQuery: string): Promise<QueryResult> {
const raw = await dgFetch<any>('/dataSources/api/v1/internal/queryResults', {
method: 'POST',
body: JSON.stringify({ sqlQuery, dataProvider: 'JDBC' }),
});
// Defensive normalization until the contract is locked.
const rows = raw.rows ?? raw.data ?? [];
const columns = raw.columns
?? (Array.isArray(raw.fields) ? raw.fields : [])
?? (rows.length > 0 ? Object.keys(rows[0]).map((name: string) => ({ name })) : []);
return { columns, rows, rowCount: raw.rowCount ?? rows.length };
}
Rendering the result
When the agent has the rows, format them for the chat UI. The SQL block (Response composition rule 1) goes above this content; any chart goes below it.
- ≤ 20 rows × ≤ 5 columns → render as a Markdown table inline. Easiest to scan.
- > 5 columns or many rows → render the first 10 rows as a table, then say "showing 10 of N rows — say 'show all' to dump the full result, or 'publish' to save this as a workbook". Don't dump 1000 rows into chat.
- Single value (count, sum, etc.) → state it in prose: "There are 200 rows in the
actortable." - Errors → quote the upstream
body.messageverbatim if present. The most common errors:relation "X" does not exist— wrong FROM qualifier (missing<dataSourceName>prefix).column "X" does not exist— schema is stale; refetch schema and retry once.permission denied for table X— the registered DB user lacks SELECT; tell the user to grant it.- 401 / 403 — service token missing or wrong; check
datagol-app-auth.
Always echo the SQL the agent generated above the result, in a fenced code block, so the user can review/correct it. Treat the agent as a SQL writing partner — opaque magic erodes trust. The response format must always be:
- The SQL query in a fenced
sqlcode block - Then the result table (or prose for single values)
Never show the result without the query.
Querying workbook warehouse tables (app_connection_read.<table>)
When the selection context is a workbook (not a registered SQL data source), queries route through a Postgres-compatible JDBC layer over the warehouse mirror. This is true even when the workbook is a materialized view whose materializedViewQuery is written in Spark SQL — that Spark code only runs at MV refresh time. Ad-hoc queries are Postgres.
Don't be misled by materializedViewQuery. It tells you how the view was built (Spark: GET_JSON_OBJECT, LATERAL VIEW EXPLODE, FROM_JSON, etc.). It is not a guide to the dialect available to you at query time.
Rules specific to this path
- Every statement must have a real table in
FROM. BareSELECT CURRENT_DATEandSELECT generate_series(...)both fail with HTTP 400. Set-returning functions likegenerate_seriesmust be cross-joined with an existing workbook table:FROM app_connection_read.<table>, generate_series(...) AS g - Postgres syntax only. No Spark (
LATERAL VIEW EXPLODE,sequence(...)with bareINTERVAL 7 DAY), no Trino (UNNEST(sequence(...)),DATE_ADD('day', n, d)). - No
::casts. UseCAST(x AS DATE). - Date arithmetic:
d + INTERVAL '7 day',d - INTERVAL '371 days'. Wrap inCAST(... AS DATE)if you need a DATE back instead of a TIMESTAMP. - Series generation:
generate_series(start_date, stop_date, INTERVAL '7 day')— once cross-joined as in rule 1. - The schema is
app_connection_read.<tableName>where<tableName>is thetableNamefield fromdatagol_get_workbook_schema(e.g.table_c84e4c5c), not the human title.
Error decoding
- HTTP 400
"Unable to parse the query"→ parser rejection. Check (a) missing FROM clause, (b)::casts, (c) function names from the wrong dialect. - HTTP 500
BadSqlGrammarException→ driver rejection after parsing. Almost always dialect: anEXPLODE/sequence/UNNEST/DATE_ADD('day', ...)slipped in. Switch to the Postgres equivalent.
If you hit either error, change one thing at a time (replace :: with CAST, or swap the function) rather than rewriting the whole query — the FROM and the rest of the structure are usually fine.
Hard rules
- Always qualify FROM with
<dataSourceName>.<schema>.<table>. This is the single most common mistake and the one that produces "relation does not exist" errors. Double-quote the data source name if it contains spaces (e.g."datawarehouse demo".public.actor). - Generate
SELECTonly. No DML, no DDL, no permissions changes. Refuse if the user asks for them. - Always cap row counts. Default
LIMIT 1000; confirm with the user before lifting the limit. - Always echo the generated SQL to the user before/with the result — in a fenced
sqlcode block, every single time. Never show results without the query above them. The chat'sRunningtool row is collapsed/truncated and is not a substitute. (See Response-composition rule 1.) - Render prose / table first, charts last. When an answer mixes a textual result and a visual, the prose / Markdown table comes first; the chart / HTML file is written last so it renders below the answer in chat. Don't call file-write tools mid-prose. (See Response-composition rule 2.)
- Don't build a visual unless the user asked. If a chart would help but they didn't request one, end your prose with a single follow-up — "Want me to chart this?" — and stop. Wait for confirmation before writing any chart file. (See Response-composition rule 3.)
- Use
x-auth-token: <SERVICE_TOKEN>for the query call. Never the user's bearer. - Refetch the schema on
column does not existand retry the query once. Stop and ask if it still fails. - Refuse to generate SQL for sources you haven't fetched the schema for. Hallucinated table/column names are common across LLMs; grounding is non-negotiable.
- Don't include the SQL
passwordor any credential in error messages or logs — even on failure, the user's DB password should never reappear in the agent's output. - Don't run multi-statement scripts. The endpoint accepts one statement; semicolons inside the string would either be ignored or cause a parse error depending on the driver. Keep it to one statement per call.
Cross-references
datagol-data-connector— parent. Schema discovery + theData Connectorsworkbook live there. If the schemas endpoint comes back empty or "not ready", the data source has been registered butsyncSchemahasn't been called — see the parent's Sync schema section and triggerPOST /dataSources/api/v1/dataSources/<id>/syncSchemabefore re-querying.datagol-data-publish— sibling. Use when the user wants the query result saved as a workbook instead of just shown.datagol-data-connector/reference/postgres.md— provider reference file under the parent. If the schema response is missingdataSourceName, fall back to the lowercased + sanitized form of the registeredname.datagol-app-auth— service token + env config. Required.datagol-workbook-operations— for reading rows from workbooks (not data sources). Different API surface, different use case.