hops-superset - SKILL.md Agent Skill

name: hops-superset description: Use when building Superset charts or dashboards inside Hopsworks via the Python SDK. Auto-invoke when the user wants to create Superset charts/dashboards/datasets, visualize a feature group in Superset, or interact with `project.get_superset_api()`. Input an offline-materialized FG; output a published Superset dashboard + URL. allowed-tools: Read, Grep, Glob, Edit, Write, Bash

Hopsworks Superset — Charts, Datasets, and Dashboards

Render Hopsworks feature groups as Superset charts and dashboards via the Python SDK (project.get_superset_api()).

Contract

Input: an offline-materialized feature group (queryable in Trino).
Output: a published Superset dashboard (charts + datasets) and its URL.
Pre-condition: Superset enabled on the cluster, and the FG materialized to the offline (Trino) store.

Smoke-test (cheap pre/post-flight)

Confirm auth + Superset reachability before building anything — cheapest from the CLI, no Python:

hops superset dataset list      # auth + Superset reachable in one shot
# or in Python: api.list_databases()  -> find the Trino DB id (see §2)

The hops superset {dataset,chart,dashboard} {list,info,create,delete} CLI is the quickest path to list/inspect/clean up (delete takes --yes). Re-run hops superset dashboard list after building to verify the result.

Ask the user (only when state is ambiguous)

Which feature group (and version) to chart.
Which columns / metrics to visualize, and which chart types (see the viz_type enum in §3).
Before deleting — api.delete_chart/dataset/dashboard(id) / hops superset ... delete --yes is irreversible; confirm with the user which object to remove, and never delete one you created as a side effect (temp or test ones included) unless they asked.

Feature groups (FGs) are referenced in superset as either: delta.featurestore. or hudi.featurestore. depending on whether they are a delta offline feature group or a hudi offline feature group. For example, the delta FG, transactions, in the jim project is referenced as:

SELECT * FROM delta.jim_featurestore.transactions_1;

Hopsworks exposes Apache Superset as a managed service. This skill covers the Python SDK wrapper for the Superset REST API (project.get_superset_api()), how to surface Hopsworks feature groups as Superset datasets via Trino, and the modern viz_type keys + param schemas this Superset version expects.

The Python client lives in the Hopsworks venv: hopsworks_common/core/superset_api.py (inside /srv/hops/venv/lib/python3.13/site-packages/ on a Hopsworks host).

When to use this skill

Use this skill whenever the user wants to:

Render a feature group (or any Trino table) in Superset
Create / update / delete Superset charts, datasets, or dashboards programmatically
Wire a Hopsworks feature group into an existing Superset dashboard
Build a custom monitoring dashboard over a model's logging feature group (feature drift, prediction distributions, KPI degradation) — the logs an inference pipeline writes are an offline FG, so they chart the same way
Debug Superset chart errors like Item with key "X" is not registered or Empty query?

1. Connect to the Superset API

The Hopsworks SDK returns a pre-authenticated Superset REST client. Always go through it — it handles session cookies and CSRF.

import hopsworks

project = hopsworks.login()
api = project.get_superset_api()

Pre-condition / smoke-test. Superset must be enabled on the cluster and the FG you chart must be materialized to the offline (Trino) store. See the Smoke-test section above to confirm reachability (CLI: hops superset dataset list; Python: api.list_databases() → find the Trino DB id, see §2).

Methods available on api:

Area	Methods
Databases	`list_databases()`
Datasets	`create_dataset`, `get_dataset`, `list_datasets`, `update_dataset`, `delete_dataset`
Charts	`create_chart`, `get_chart`, `list_charts`, `update_chart`, `delete_chart`
Dashboards	`create_dashboard`, `get_dashboard`, `list_dashboards`, `update_dashboard`, `delete_dashboard`

Signatures (from the Hopsworks SDK):

api.create_dataset(database_id: int, table_name: str, schema: str | None = None,
                   sql: str | None = None, description: str | None = None,
                   owners: list[int] | None = None) -> dict
api.update_dataset(dataset_id: int, **kwargs) -> dict
api.delete_dataset(dataset_id: int) -> dict

api.create_chart(slice_name: str, viz_type: str, datasource_id: int, params: str,
                 datasource_type: str = "table", description: str | None = None,
                 dashboards: list[int] | None = None,
                 owners: list[int] | None = None) -> dict
api.update_chart(chart_id: int, **kwargs) -> dict     # e.g. dashboards=[id], params=...
api.delete_chart(chart_id: int) -> dict

api.create_dashboard(dashboard_title: str, published: bool = False,
                     slug: str | None = None, position_json: str | None = None,
                     json_metadata: str | None = None, css: str | None = None,
                     owners: list[int] | None = None) -> dict
api.update_dashboard(dashboard_id: int, **kwargs) -> dict
api.delete_dashboard(dashboard_id: int) -> dict

_request(...) is internal (underscore-prefixed, not @public): it carries no cross-release stability guarantee. Prefer the @public methods above (list_*, create_*, get_*, update_*, delete_*) as the supported surface. Reach for api._request("GET", "/api/v1/...") only for what the public API does not cover yet — paging, or endpoints the SDK doesn't wrap — knowing it may break on upgrade.

Paging helper

list_*() returns only the first page (~25 rows). There is no public paged-list method yet, so paginate via the internal _request (escape hatch — see the note above):

def list_all(api, resource):
    items, page = [], 0
    while True:
        j = api._request("GET", f"/api/v1/{resource}/?q=(page:{page},page_size:100)")
        batch = j.get("result", [])
        items.extend(batch)
        if len(batch) < 100:
            break
        page += 1
    return items

2. Reading feature groups in Superset (via Trino)

Feature groups are not exposed as Superset-native tables. They are queried through the Trino database connection with this naming pattern:

delta.<project>_featurestore.<feature_group>_<version>

Example (project af, feature group customers v1):

SELECT * FROM delta.af_featurestore.customers_1

Rules:

Catalog is delta for all offline feature groups (Delta format).
Schema is <project>_featurestore (project name is lowercase).
Table name is <fg_name>_<version> — the version suffix is required.
The Trino DB connection in Superset defaults its catalog to hive, so you must fully qualify with delta. or the query resolves to the wrong catalog.
Trino database id varies per Hopsworks install — never hardcode it. Always resolve it with api.list_databases() / find_trino_db_id(api) below.

Look up the Trino database id

def find_trino_db_id(api):
    for db in api.list_databases()["result"]:
        # Names vary: "trino", "Trino", "trino-<project>"...
        if "trino" in db.get("database_name", "").lower():
            return db["id"]
    raise RuntimeError("No Trino database connection found in Superset")

Create a Superset dataset for a feature group (virtual dataset)

Use a virtual dataset (a SELECT expression). A physical table reference will not resolve the delta. catalog correctly.

import hopsworks

project = hopsworks.login()
api = project.get_superset_api()

project_name = project.name.lower()                # e.g. "af"
fg_name, fg_version = "customers", 1

trino_db_id = find_trino_db_id(api)                # or a cached int
schema = f"{project_name}_featurestore"
sql = f"SELECT * FROM delta.{schema}.{fg_name}_{fg_version}"

created = api.create_dataset(
    database_id=trino_db_id,
    table_name=fg_name,                            # display name in Superset
    schema=schema,
    sql=sql,
)
dataset_id = created["id"]

# NOTE: Do NOT pass `description=...` to create_dataset. The SDK accepts
# the kwarg but the Superset REST API in this deployment rejects it with
# `400 {"message":{"description":["Unknown field."]}}`. Same for
# create_chart / create_dashboard — keep to the core fields only.

Idempotent create-or-reuse

create_dataset fails if the (schema, table_name) pair already exists. List-then-create:

def ensure_dataset(api, database_id, schema, name, sql):
    for ds in list_all(api, "dataset"):
        if ds.get("table_name") == name and ds.get("schema") == schema:
            return ds["id"]
    return api.create_dataset(
        database_id=database_id, table_name=name, schema=schema, sql=sql,
    )["id"]

Use the same pattern for charts (key on slice_name) and dashboards (key on dashboard_title).

3. Viz types — use the MODERN keys

Legacy viz_type keys are unregistered in current Superset and fail with Item with key "X" is not registered. Hand-rolled chart scripts often copy stale examples — always use the keys below.

The full registered viz_type enum (the keys the server accepts) and the legacy keys that fail are tabulated in references/viz_types.md.

4. Creating charts — param schemas

create_chart takes a JSON-string params blob whose shape is viz-type specific. The reusable metric/filter building blocks (COUNT_METRIC, SUM_AMOUNT, adhoc_filters), a copy-paste params block for each supported viz type, and a replace-by-name idempotency helper are in references/chart_params.md.

5. Creating dashboards

A dashboard needs (a) a position_json layout and (b) an explicit chart→dashboard link via update_chart(dashboards=[id]). The layout alone does not populate the chart's "Dashboards" tab in the UI.

Layout primer

Superset uses a 12-column grid. Row heights are in ~25px units. Every node has id, type, children, parents, and meta. Widths must sum to 12 within each row.

Node types used in position_json:

Type	Purpose	`meta` fields
`ROOT`	Always present, single child `GRID_ID`	—
`GRID`	The 12-col grid, children are rows	—
`HEADER`	Dashboard title header	`text`
`ROW`	Horizontal row, children sum to 12 cols	`background`
`CHART`	A chart cell	`width`, `height`, `chartId`, `sliceName`
`MARKDOWN`	Static markdown cell	`width`, `height`, `code`
`COLUMN`	Vertical column container	`width`, `background`
`TABS` / `TAB`	Tabbed sections	`text` (on TAB)

Layout builder

import json

def build_position_json(chart_ids, chart_slices, title):
    layout = {
        "DASHBOARD_VERSION_KEY": "v2",
        "ROOT_ID": {"type": "ROOT", "id": "ROOT_ID", "children": ["GRID_ID"]},
        "GRID_ID": {
            "type": "GRID", "id": "GRID_ID",
            "children": [],                # filled below
            "parents": ["ROOT_ID"],
        },
        "HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": title}},
    }

    def chart(key, width, height):
        nid = f"CHART-{key}"
        layout[nid] = {
            "type": "CHART", "id": nid, "children": [],
            "parents": ["ROOT_ID", "GRID_ID"],
            "meta": {"width": width, "height": height,
                     "chartId": chart_ids[key], "sliceName": chart_slices[key]},
        }
        return nid

    def row(row_id, children):
        layout[row_id] = {
            "type": "ROW", "id": row_id, "children": children,
            "parents": ["ROOT_ID", "GRID_ID"],
            "meta": {"background": "BACKGROUND_TRANSPARENT"},
        }
        for c in children:
            layout[c]["parents"] = ["ROOT_ID", "GRID_ID", row_id]
        layout["GRID_ID"]["children"].append(row_id)

    row("ROW-1", [chart("a", 6, 50), chart("b", 6, 50)])    # two half-width charts
    row("ROW-2", [chart("c", 12, 50)])                      # one full-width chart
    return json.dumps(layout)

Create / update idempotently

def ensure_dashboard(api, title, chart_ids, chart_slices):
    position_json = build_position_json(chart_ids, chart_slices, title)

    dashboard_id = next(
        (d["id"] for d in list_all(api, "dashboard")
         if d.get("dashboard_title") == title),
        None,
    )

    if dashboard_id is None:
        dashboard_id = api.create_dashboard(
            dashboard_title=title, published=True, position_json=position_json,
        )["id"]
    else:
        api.update_dashboard(
            dashboard_id, dashboard_title=title, published=True,
            position_json=position_json,
        )

    # Persist the chart -> dashboard relation explicitly.
    for cid in chart_ids.values():
        api.update_chart(cid, dashboards=[dashboard_id])

    return dashboard_id

Dashboard URL:

https://<hopsworks-host>/hopsworks-api/superset/superset/dashboard/<id>/

6. End-to-end pattern

import json
import hopsworks

# Columns below (state, age, …) are illustrative — swap for real ones from
# `hops fg features <FG_NAME> --version <FG_VERSION>`.
PROJECT_NAME = "<your_project>"        # don't hardcode; set to project.name after login
FG_NAME, FG_VERSION = "customers", 1

COUNT_METRIC = {
    "expressionType": "SQL", "sqlExpression": "COUNT(*)",
    "label": "count", "optionName": "metric_count", "hasCustomLabel": True,
}


def main():
    project = hopsworks.login()
    api = project.get_superset_api()

    trino_db_id = find_trino_db_id(api)
    schema = f"{PROJECT_NAME}_featurestore"
    sql = f"SELECT * FROM delta.{schema}.{FG_NAME}_{FG_VERSION}"

    dataset_id = ensure_dataset(
        api, database_id=trino_db_id, schema=schema, name=FG_NAME, sql=sql,
    )

    chart_ids = {
        "total": replace_chart(
            api, slice_name="Total Customers",
            viz_type="big_number_total", datasource_id=dataset_id,
            params=json.dumps({
                "viz_type": "big_number_total", "metric": COUNT_METRIC,
                "adhoc_filters": [], "y_axis_format": "SMART_NUMBER",
            }),
        ),
        "by_state": replace_chart(
            api, slice_name="Top States",
            viz_type="echarts_timeseries_bar", datasource_id=dataset_id,
            params=json.dumps({
                "viz_type": "echarts_timeseries_bar",
                "x_axis": "state", "x_axis_force_categorical": True,
                "metrics": [COUNT_METRIC], "groupby": [], "adhoc_filters": [],
                "row_limit": 20, "orientation": "vertical", "order_desc": True,
                "timeseries_limit_metric": COUNT_METRIC,
                "x_axis_sort": "count", "x_axis_sort_asc": False,
                "y_axis_format": "SMART_NUMBER",
            }),
        ),
        "age_hist": replace_chart(
            api, slice_name="Age Distribution",
            viz_type="histogram_v2", datasource_id=dataset_id,
            params=json.dumps({
                "viz_type": "histogram_v2", "column": "age",
                "groupby": [], "adhoc_filters": [], "row_limit": 50000,
                "bins": 20, "x_axis_title": "Age", "y_axis_title": "Customers",
            }),
        ),
    }

    chart_slices = {"total": "Total Customers", "by_state": "Top States",
                    "age_hist": "Age Distribution"}
    dashboard_id = ensure_dashboard(
        api, "Customers Overview", chart_ids, chart_slices,
    )
    print(f"Dashboard id: {dashboard_id}")


if __name__ == "__main__":
    main()

7. Debugging cheatsheet

Symptom	Likely cause	Fix
`Item with key "X" is not registered`	Legacy viz_type removed	Swap to modern key from §3
`Empty query?` (histogram)	Used `all_columns_x`	Use `column` (single string)
`Empty query?` (bar)	Used `groupby` instead of `x_axis`	Set `x_axis`, optionally add `x_axis_force_categorical: true`
`Dataset ... already exists`	Non-idempotent create	Use `ensure_dataset` (list-then-create)
`400 {"message":{"description":["Unknown field."]}}`	SDK signature includes `description` but Superset REST rejects it	Drop `description` from `create_dataset`/`create_chart`/`create_dashboard` calls
Chart renders but absent from dashboard	Only `position_json` was set	Also call `update_chart(cid, dashboards=[dashboard_id])`
Trino query "Schema not found"	Missing `delta.` prefix	Use `SELECT * FROM delta.<project>_featurestore.<fg>_<version>`
Trino query "Table not found"	Missing `_<version>` suffix	FG tables are always suffixed with version (`customers_1`, not `customers`)
Bars come out unsorted	`x_axis_sort` doesn't match a metric label	Make `x_axis_sort` equal to the metric's `label` field
Time-series bar shows one bar	X axis is a string but `x_axis_force_categorical` missing	Add `x_axis_force_categorical: true`
Auth error from `api._request`	Session expired across long runs	Re-fetch `project.get_superset_api()`

Quick Reference

Task	Code
Get Superset API	`api = project.get_superset_api()`
List Trino DBs	`api.list_databases()`
FG as Trino table	`delta.<project>_featurestore.<fg>_<version>`
Create virtual dataset	`api.create_dataset(database_id=..., table_name=..., schema=..., sql=...)`
Create chart	`api.create_chart(slice_name=..., viz_type=..., datasource_id=..., params=json.dumps({...}))`
Link chart to dashboard	`api.update_chart(chart_id, dashboards=[dashboard_id])`
Create dashboard	`api.create_dashboard(dashboard_title=..., published=True, position_json=...)`
Delete	`api.delete_chart(id)` / `api.delete_dataset(id)` / `api.delete_dashboard(id)`
Paginate any list	`api._request("GET", f"/api/v1/{resource}/?q=(page:{p},page_size:100)")`

Next Steps

Get a feature group materialized offline to chart: hops-fg.
Inspect / query the underlying Trino table: hops-trino-sql, hops-data-discovery.
A custom interactive app instead of BI dashboards: hops-app.

Monitoring dashboards

Inference pipelines log untransformed features, transformed features, and predictions to a per-model logging feature group. That FG is offline-queryable in Trino like any other, so the same dataset/chart/dashboard machinery here builds feature- and model-monitoring dashboards: chart a feature's distribution over a recent detection window against its training-time reference, or track a KPI over time to spot degradation. Build the drift jobs themselves (univariate / multivariate, deviation-from-mean, NannyML) in the inference pipeline; use this skill to surface their outputs.