ktx-analytics - SKILL.md Agent Skill

name: ktx-analytics description: Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.

ktx Analytics Workflow

You have access to ktx MCP tools for data discovery, semantic-layer analysis, raw read-only SQL, wiki context, and memory ingest. Follow this workflow.

1. **Discover** - call `discover_data` first to see what exists across wiki pages, semantic-layer sources, metrics, dimensions, raw tables, and columns. Returns refs only. 2. **Inspect top hits in parallel** - for each promising ref: - `kind: 'wiki'` -> `wiki_read` - `kind: 'sl_source'`, `kind: 'sl_measure'`, or `kind: 'sl_dimension'` -> `sl_read_source` - `kind: 'table'` or `kind: 'column'` -> `entity_details` 3. **Resolve business values** - if the user named a value such as "Acme Corp", "enterprise", or "status=shipped", call `dictionary_search` to find which column holds it. 4. **Plan the analysis** - identify the grain, metrics, dimensions, filters, time window, and expected row limits before querying. 5. **Query** - - Prefer `sl_query` when the semantic layer covers the question. - Use `sql_execution` only for questions the semantic layer does not cover. 6. **Validate and explain** - sanity-check totals, filters, null handling, and time zones. State the source tables or semantic-layer objects used. 7. **Capture durable learnings** - call `memory_ingest` whenever a turn produces something worth remembering (business rules, metric definitions, schema gotchas, recurring findings) **or** whenever the user asks you to remember something. Pass markdown in `content` including any source context the memory agent should weigh. Each call is a feedback loop; better notes today mean smarter `discover_data` and `wiki_search` results tomorrow. - Always run `discover_data` before writing SQL. Do not guess table names. - Prefer the semantic layer over raw SQL when both can answer the question; measures are the source of truth. - Read entity details before writing SQL against an unfamiliar table. Do not assume column names. - Treat `sql_execution` as read-only. Writes are rejected by the server. - Validate value mentions with `dictionary_search` instead of guessing case or spelling. Treat a `dictionary_search` miss as non-authoritative. The index is built from profile-sampled values, so a missing value may simply have been outside the sample. Follow up with `sql_execution` against the most plausible columns before concluding the value is absent. - `connectionId` scoping when `connection_list` shows multiple connections: - Always pass it: `entity_details`, `sl_read_source`, `sql_execution`. - Pass it when intent pins a warehouse, otherwise omit for unscoped discovery: `sl_query`, `discover_data`, `dictionary_search`. - `memory_ingest`: pass it for warehouse-specific knowledge (e.g. "in our warehouse"); without it the memory lands as wiki-only and cannot update the semantic layer. - Never pass it: `connection_list`, `wiki_search`, `wiki_read`, `memory_ingest_status`. - If scoping is required but intent is ambiguous, ask which warehouse before calling. - Show compact result tables for small outputs. For broad results, summarize the top findings and mention the applied limit. - Ask a concise clarification only when the metric, date range, entity, or grain is genuinely ambiguous and cannot be inferred from context. **Input:** "How many orders did Acme Corp place last month?"

Workflow:

dictionary_search({ values: ["Acme Corp"] }) finds customers.name.
discover_data({ query: "orders customer monthly" }) finds an orders semantic-layer source.
sl_read_source({ connectionId: "warehouse", sourceName: "orders_facts" }) confirms the source grain, measures, and dimensions.
sl_query({ connectionId: "warehouse", measures: ["order_count"], filters: ["customer_name = 'Acme Corp'"] }) answers through the semantic layer.
memory_ingest({ connectionId: "warehouse", content: "Acme Corp order analysis used orders_facts.order_count filtered by customers.name = 'Acme Corp'. Source: current analysis turn." }) captures the durable finding.

Input: "What columns does the events table have?"

Workflow:

discover_data({ query: "events table" }) returns a table ref.
entity_details({ connectionId: "warehouse", entities: [{ table: "analytics.events" }] }) returns columns, types, and foreign keys.
Answer directly. No query is needed.

Input: "Heads up: ARR is always reported in cents in our warehouse."

Workflow:

If multiple connections exist, call connection_list and identify the warehouse the user means. Ask if ambiguous.
memory_ingest({ connectionId: "warehouse", content: "ARR is reported in cents (not dollars) in this warehouse. Multiply by 0.01 for dollar amounts. Source: user clarification." }) remembers the warehouse-specific rule without running an analysis turn.