ops-telemetry-query

star 194

Internal — for Boundless team members only. Query Boundless broker telemetry tables on Redshift for prod/staging operational data. Use when the user asks about broker health, request evaluations, request completions, proving times, skip rates, telemetry data, or wants to run SQL against the telemetry database on live networks. Also covers historical telemetry through 2026-04-24 stored as Parquet archives in S3 (queried via DuckDB). Do NOT use for debugging local code changes, reviewing PRs, or investigating issues in the codebase itself.

boundless-xyz By boundless-xyz schedule Updated 4/24/2026

name: ops-telemetry-query description: Internal — for Boundless team members only. Query Boundless broker telemetry tables on Redshift for prod/staging operational data. Use when the user asks about broker health, request evaluations, request completions, proving times, skip rates, telemetry data, or wants to run SQL against the telemetry database on live networks. Also covers historical telemetry through 2026-04-24 stored as Parquet archives in S3 (queried via DuckDB). Do NOT use for debugging local code changes, reviewing PRs, or investigating issues in the codebase itself.

Telemetry Query

Query Boundless broker telemetry across two backends:

  • On/after 2026-04-25 — Amazon Redshift (provisioned, Postgres-compatible) via psql.
  • Through 2026-04-24 — S3 Parquet archives (UNLOADed from the old Redshift Serverless at cutover) via DuckDB.

The UNLOAD per env happened at various times during 2026-04-24; treating all of 2026-04-24 UTC as "archive" avoids any risk of missing data that landed in the archive after midnight but before the UNLOAD moment. Parquet schemas are identical to the Redshift typed views, so column names, types, and enum values are the same on both sides. See Routing by Time Window for how to pick a backend.

Prerequisites

Before running any queries, check if REDSHIFT_URL is already exported in the shell.

If not, try to read network_secrets.toml from the repo root. If it exists, extract the telemetry credentials for the selected environment from [environments.<env>.telemetry]:

  • db_url -- the Redshift hostname:port/database?sslmode path
  • readonly_password -- the readonly user password

Construct the connection string: postgres://readonly:<password>@<db_url>

Also read network_address_labels.json (same directory) for translating addresses to human-readable labels in output.

If network_secrets.toml is not present, recommend the user create it -- instructions and credentials are in the Boundless runbook. Alternatively, they can provide credentials directly:

  1. The Redshift endpoint (hostname or full connection string)
  2. The readonly password

They can either:

  • Export them as env vars: export REDSHIFT_ENDPOINT="..." and export REDSHIFT_PASSWORD="..."
  • Provide them directly in the chat

If network_address_labels.json is not present, recommend the user create it -- the canonical address mapping is in the Boundless runbook. The file is plain JSON ({"0xaddr": "label", ...}).

The user may provide a full postgres://... URL or just a hostname. Normalize to a valid connection string:

  • If the URL is missing /telemetry at the end of the path, append it
  • If the URL is missing ?sslmode=require, append it
  • If only a hostname is given, construct: postgres://readonly:<password>@<hostname>:5439/telemetry?sslmode=require

Export the result as REDSHIFT_URL before running queries.

Routing by Time Window

Telemetry was cut over from Redshift Serverless to Redshift provisioned during 2026-04-24 UTC. For routing purposes, the boundary is 2026-04-25T00:00:00Z — all of 2026-04-24 is treated as archive territory. This avoids a subtle gap: the per-env UNLOAD ran at various times on 2026-04-24 (e.g. 15:54 UTC for prod_base), so a midnight cutover would miss archive rows between 2026-04-24 00:00 and each env's UNLOAD moment.

Pick a backend based on the user's time filter (evaluated_at, completed_at, timestamp):

Window Backend
end < 2026-04-25T00:00:00Z Archive only — DuckDB + S3 Parquet (see Archive Backend)
start >= 2026-04-25T00:00:00Z Live only — existing psql + Redshift path
Crosses 2026-04-25T00:00:00Z Two separate queries: run the archive query for everything through 2026-04-24 23:59:59Z and the live query for everything from 2026-04-25 00:00:00Z onwards. Present both result sets to the user and note that the window crossed the cutover. Do not try to auto-UNION.
No time window given Default to live only and call that out in the response, so the user can widen the range if they wanted history.

Archive Backend (DuckDB + S3 Parquet)

Bucket mapping

All buckets live in us-west-2. Layout inside every bucket: s3://<bucket>/<stack>/<view>/NNNN_part_00.parquet, where <view>broker_heartbeats | request_evaluations | request_completions. Each bucket contains one stack prefix — use the glob s3://<bucket>/*/<view>/*.parquet.

env (skill name) chain id bucket
prod_taiko 167000 telemetry-snapshot-prod-167000-04-24-26
prod_base 8453 telemetry-snapshot-prod-8453-04-24-26
prod_base_sepolia 84532 telemetry-snapshot-prod-84532-04-24-26
staging_taiko 167000 telemetry-snapshot-staging-167000-04-24-26
staging_base_sepolia 84532 telemetry-snapshot-staging-84532-04-24-26

Schema

Parquet column names and types match the Redshift typed views exactly (Redshift did the UNLOAD; schemas were preserved). Enum values (outcomes, skip codes, fulfillment types) are defined in crates/boundless-market/src/telemetry.rs — the same source of truth as the live path. received_at is present.

Credentials

Read AWS creds from network_secrets.toml at the repo root:

  • [aws.prod] for any prod_* env
  • [aws.staging] for any staging_* env

Both have s3:GetObject on their respective buckets. Region is us-west-2.

Running a query

Mirror the psql heredoc pattern. Install httpfs once per duckdb invocation, set the creds, then read_parquet the glob. Heredoc keeps zsh from mangling !=.

AWS_ACCESS_KEY_ID=<from network_secrets.toml> \
AWS_SECRET_ACCESS_KEY=<from network_secrets.toml> \
duckdb <<SQL
INSTALL httpfs; LOAD httpfs;
SET s3_region='us-west-2';
SET s3_access_key_id='${AWS_ACCESS_KEY_ID}';
SET s3_secret_access_key='${AWS_SECRET_ACCESS_KEY}';

SELECT skip_code, COUNT(*) AS count
FROM read_parquet('s3://telemetry-snapshot-staging-84532-04-24-26/*/request_evaluations/*.parquet')
WHERE outcome = 'Skipped'
  AND evaluated_at BETWEEN TIMESTAMP '2026-04-18 00:00:00+00'
                       AND TIMESTAMP '2026-04-25 00:00:00+00'
GROUP BY skip_code
ORDER BY count DESC;
SQL

DuckDB vs Redshift SQL differences

When rewriting a Redshift query for the archive path, watch for these:

  • Use NOW() instead of GETDATE().
  • MEDIAN() and PERCENTILE_CONT() have no single-aggregate restriction — multiple percentiles in one SELECT work fine.
  • DISTINCT ON and LATERAL joins are supported.
  • Timestamp literal form: TIMESTAMP '2026-04-25 00:00:00+00' (note the timezone).

Fail-loud rules

  • If the user asks for a pre-cutover window on an env not in the bucket table → stop and say the archive doesn't exist for that env.
  • If DuckDB returns HTTP 403 / AccessDenied → stop and report the bucket name and which [aws.*] profile was used; do not silently retry.
  • If network_secrets.toml is missing the needed [aws.*] block → stop and ask the user.

Known Addresses

If network_address_labels.json exists at the repo root, use it to label addresses in results. The file is plain JSON ({"0xaddr": "label", ...}) so the user can paste directly from the canonical mapping. When displaying addresses from query output, check if the address (case-insensitive) has a known label and show it alongside, e.g. 0xbdA9...5542 (BP1).

Provers we operate are labeled with a BP prefix (e.g. BP1, BP2, BPNightlyAWS). When investigating any issue, always highlight what our BP provers are doing -- did they skip, fail, drop, or fulfill? This should be called out explicitly even when the investigation is not specifically about our provers.

Connecting

Prefer heredocs for all queries (avoids shell escaping issues):

psql "$REDSHIFT_URL" <<'SQL'
SELECT ...
SQL

For simple one-liners, psql -c works but beware of shell quoting (see below):

psql "$REDSHIFT_URL" -c "SELECT 1;"

Shell Quoting

CRITICAL: zsh treats ! as history expansion inside double-quoted strings. This means != in psql -c "..." gets mangled to \!=, causing a Redshift syntax error.

Always use <> instead of != in SQL. This is valid standard SQL and avoids the issue entirely:

-- Good
WHERE outcome <> 'Skipped'

-- Bad (breaks in zsh double quotes)
WHERE outcome != 'Skipped'

Alternatively, use a heredoc (<<'SQL') which is immune to shell expansion.

Available Tables

All tables live in the telemetry schema. There are three views:

View Description
telemetry.broker_heartbeats Hourly broker health snapshots (config, queue sizes, version, uptime)
telemetry.request_evaluations Per-order evaluation decisions (accepted/skipped, skip reasons, cycle counts, timing)
telemetry.request_completions Terminal order outcomes (success/failure, proving durations, cycle counts)

The same three view names exist as directories under each archive bucket (e.g. s3://<bucket>/<stack>/broker_heartbeats/), with matching schemas.

Schema Reference

IMPORTANT: Before writing any queries, you MUST read crates/boundless-market/src/telemetry.rs to get the exact column names and enum values. The view columns map 1:1 to the Rust struct fields (snake_case). Do NOT guess column names.

The key mappings are:

  • BrokerHeartbeat -> telemetry.broker_heartbeats
  • RequestEvaluated -> telemetry.request_evaluations
  • RequestCompleted -> telemetry.request_completions

The Redshift views also include a received_at TIMESTAMPTZ column (when Kinesis ingested the record) that is not part of the Rust structs.

Data Coverage

Telemetry is opt-in. Brokers can operate without sending telemetry, so the data here only represents brokers that have opted in. Do not assume that the brokers visible in telemetry are the only ones active on the network.

Secondary Fulfillment

When a prover locks an order but fails to fulfill it, the order becomes available for secondary fulfillment by any other prover, who earns the slash collateral as reward. In telemetry, secondary fulfillments are identified by fulfillment_type = 'FulfillAfterLockExpire' in both request_evaluations and request_completions. When investigating expired or slashed requests, filter by this fulfillment type to see which brokers evaluated the secondary opportunity and whether they skipped, committed, or failed. Always check whether our BP provers attempted secondary fulfillment.

Query Guidelines

  • Always use telemetry. schema prefix
  • Views are deduplicated via ROW_NUMBER() on order_id -- no need to dedupe in your queries
  • Timestamps are TIMESTAMPTZ -- use standard Postgres date functions
  • broker_address is a VARCHAR(42) Ethereum address (e.g. 0xabc...)
  • outcome values are defined by enums in the Rust source -- read them there, do not assume
  • Use LIMIT on exploratory queries -- tables can be large
  • For time ranges, filter on evaluated_at, completed_at, or timestamp columns
  • When the user's time window crosses 2026-04-25T00:00:00Z, follow Routing by Time Window — run the archive and live queries separately and present both result sets. Do not auto-UNION.

Common Query Patterns

For ready-to-use analytical queries, read example-queries.md.

Quick examples:

-- Recent heartbeats
SELECT broker_address, version, uptime_secs, committed_orders_count, timestamp
FROM telemetry.broker_heartbeats ORDER BY timestamp DESC LIMIT 10;

-- Recent evaluations
SELECT broker_address, order_id, outcome, skip_code, total_cycles, evaluated_at
FROM telemetry.request_evaluations ORDER BY evaluated_at DESC LIMIT 10;

-- Recent completions
SELECT broker_address, order_id, outcome, proving_duration_secs, total_duration_secs, completed_at
FROM telemetry.request_completions ORDER BY completed_at DESC LIMIT 10;

Redshift vs Postgres Differences

Redshift is mostly Postgres-compatible but has some differences:

  • No LATERAL joins
  • No DISTINCT ON
  • UNION ALL with ORDER BY must be wrapped in a subquery (Redshift rejects bare UNION ALL ... ORDER BY)
  • MEDIAN() / PERCENTILE_CONT() cannot be mixed with regular aggregates or with each other on different columns in the same SELECT -- use UNION ALL with one MEDIAN per subquery
  • Use GETDATE() instead of NOW() (though NOW() works in some contexts)
  • SUPER type for JSON -- use JSON_PARSE() and dot notation for access
  • APPROXIMATE COUNT(DISTINCT ...) is available for large datasets via APPROXIMATE COUNT(DISTINCT col)
Install via CLI
npx skills add https://github.com/boundless-xyz/boundless --skill ops-telemetry-query
Repository Details
star Stars 194
call_split Forks 172
navigation Branch main
article Path SKILL.md
More from Creator
boundless-xyz
boundless-xyz Explore all skills →