sql-workflow

star 467

Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps.

SignalPilot-Labs By SignalPilot-Labs schedule Updated 5/29/2026

name: sql-workflow description: "Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps." type: skill

SQL Workflow Skill

0. Load Knowledge Base Context FIRST

Project-specific conventions, decisions, and quirks live in the Knowledge Base. Always consult before exploring the schema.

Step 0a — get_knowledge

Call once at the start of every task with a 1-line task_description. Returns the always-loaded baseline (org/project understanding + conventions) plus up to 5 task-relevant decisions/debugging/quirks. Treat the returned ## title blocks as authoritative for naming, joins, and known traps.

Step 0b — search_knowledge(query=...)

Call when you hit something unexpected — column meaning unclear, ambiguous join, surprising row count. Pass a 2–4 word query. It is a pure read — no side effects.

Step 0c — propose_knowledge

Call ONLY after you have completed work and verified a finding. Use it to record:

  • category="decisions" for choices made (auto-accepted).
  • category="debugging" for root-cause traps you hit and resolved (auto-accepted).
  • category="quirks" (scope=connection) for connector/dialect oddities (auto-accepted).
  • Do NOT propose understanding — humans only. Do NOT propose conventions or domain-rules as part of automated runs unless explicitly asked (these queue for human review).
  • Title must be a slug (^[a-z0-9-]+$, ≤120 chars). Body is markdown.
  • On duplicate-title: re-call with overwrite=true only if the prior doc is genuinely outdated.

What NOT to do

  • Do not paste raw KB text back into SQL comments — reference the doc title instead.
  • Do not call propose_knowledge mid-exploration — only after success.

1. Schema Exploration — Do This First

Before writing any SQL, load KB context (Phase 0 above) then understand the data:

  1. Read local schema files first (if schema/ directory exists in workdir):
    • schema/DDL.csv — all CREATE TABLE statements (if it exists)
    • schema/{table_name}.json — column names, types, descriptions, sample values Reading these files costs zero tool calls and gives you table structure + sample data. Only call MCP tools for information not in the local files (e.g., row counts, live data exploration).
  2. Call list_tables to get all schemas and tables — only if no local schema files exist or you need row counts.
  3. Call describe_table on the tables that seem relevant to the question (only if JSON files lack detail)
  4. Call explore_column on categorical columns to see distinct values (for filtering/grouping)
  5. Call find_join_path if you need to join tables and the relationship is unclear

Stop exploring after 3-5 tool calls. Write SQL based on what you've found.

2. Output Shape Inference — Before Writing SQL

Read the task question carefully for cardinality clues:

  • "for each X" → GROUP BY X, one output row per X
  • "top N" / "top 5" → LIMIT N or QUALIFY RANK() <= N
  • "total / sum / average" → single row aggregate
  • "list all" → detail rows, no aggregation
  • "how many" → COUNT, result is 1 row 1 column

Write a comment at the top of your SQL:

-- EXPECTED: <row count estimate> rows because <reason from question>

Critical checks:

  • If the question asks for a single number, the result MUST be 1 row × 1 column
  • If the question says "how many", verify the CSV has exactly 1 row with a COUNT value
  • If "top N" appears in the question, verify the CSV has at most N rows

3. Iterative Query Building — Build Bottom-Up

Do NOT write a 50-line query and run it all at once:

  1. Write the innermost subquery or first CTE first
  2. Run it standalone with query_database — verify row count and sample values
  3. Add the next CTE, verify again
  4. Continue until the full query is built

Example incremental pattern:

-- Step 1: verify source
SELECT COUNT(*) FROM orders WHERE status = 'completed';

-- Step 2: verify join partner cardinality
SELECT COUNT(*), COUNT(DISTINCT customer_id) FROM orders;

-- Step 3: build first CTE, verify
WITH order_totals AS (
  SELECT customer_id, SUM(amount) AS total
  FROM orders
  GROUP BY customer_id
)
SELECT COUNT(*), COUNT(DISTINCT customer_id) FROM order_totals;

-- Step 4: add final aggregation

4. Execution and Structured Verification

mcp__signalpilot__query_database
  connection_name="<task_connection_name>"
  sql="SELECT ..."

After executing, run these checks IN ORDER before saving:

  1. Row count sanity: Does 0 rows make sense? Does 1M rows make sense for a "top 10" question?
  2. Column count: Does the result have the right number of columns for the question?
  3. NULL audit: For each key column — unexpected NULLs indicate wrong JOINs:
    SELECT COUNT(*) - COUNT(col) AS nulls FROM (your_query) t
    
  4. Sample inspection: Look at 5 rows — are values in expected ranges? Do string columns have meaningful values (not join keys)?
  5. Fan-out check: If JOINing, compare COUNT(*) vs COUNT(DISTINCT primary_key):
    SELECT COUNT(*) AS total_rows, COUNT(DISTINCT <pk>) AS unique_keys FROM (your_query) t;
    
    If they differ, you have duplicate rows from a fan-out JOIN.
  6. Re-read the question: Does your output actually answer what was asked?

5. Error Recovery Protocol

  • Syntax error: Use validate_sql before query_database to catch errors without burning a query turn
  • Wrong results: Do NOT just re-run the same query. Diagnose: which JOIN is wrong? Which filter is too aggressive?
  • Zero rows: Binary-search your WHERE conditions — remove them one at a time to find the culprit:
    SELECT COUNT(*) FROM table WHERE cond_1;             -- still same? keep it
    SELECT COUNT(*) FROM table WHERE cond_1 AND cond_2;  -- drops? cond_2 is culprit
    
  • Too many rows: Check for fan-out (duplicate join keys) or missing GROUP BY
  • CTE debugging: Use debug_cte_query to run each CTE independently and find which step breaks

6. Saving Output

Once you have the correct result:

  1. Write final SQL to result.sql:

    Write tool: path="result.sql", content="<your SQL query>"
    
  2. Write the result as CSV to result.csv:

    Write tool: path="result.csv", content="col1,col2,...\nval1,val2,..."
    
    • Always include a header row with column names
    • Use comma as delimiter
    • Quote string values that contain commas or newlines

7. Turn Budget Management

  • First 3 turns: Schema exploration only (schema_overview, describe_table on 2-3 tables, explore_column on key categorical columns). STOP exploring.
  • Turns 4 through (N-3): Write query iteratively — execute and verify each step.
  • Last 3 turns: Finalize result.sql and result.csv. If you have a working query, SAVE IT NOW — do not keep iterating.

If your query works and passes all verification checks, SAVE IMMEDIATELY — do not continue exploring "just in case".

8. Common Benchmark Traps

  • Rounding: Do NOT round unless the question explicitly asks for rounded values. The evaluator uses tolerance-based comparison — full precision is always safer.
  • Column naming: Match the question's phrasing exactly. If the question says "total revenue", name the column total_revenue, not sum_revenue or revenue_total.
  • CSV format: No trailing newline, no BOM, comma delimiter, double-quote strings containing commas.
  • Empty result: If the correct answer is 0 or empty, write a CSV with just the header row (or header + "0").
  • Date/time format in CSV: Use ISO 8601 (YYYY-MM-DD) unless the question specifies otherwise.
  • String case in CSV: Preserve the case from the database — do not uppercase/lowercase unless the question explicitly asks.
  • Fan-out from JOINs: Always check COUNT(*) vs COUNT(DISTINCT key) after every JOIN
  • Wrong NULL handling: Use IS NULL / IS NOT NULL, not = NULL
  • Date format mismatch: Check the actual format stored in the column with explore_column
  • Case sensitivity: Use the correct case-insensitive function for your backend
  • Interpretation errors: Before saving, re-read the original question. Verify:
    • Filter conditions match domain values (check with explore_column if unsure)
    • "Excluding X" means the right thing (NOT IN vs EXCEPT vs WHERE NOT)
    • Metrics match domain definitions (e.g., "scored points" in F1 = points > 0, not just participated)
Install via CLI
npx skills add https://github.com/SignalPilot-Labs/SignalPilot --skill sql-workflow
Repository Details
star Stars 467
call_split Forks 21
navigation Branch main
article Path SKILL.md
More from Creator
SignalPilot-Labs
SignalPilot-Labs Explore all skills →