name: isutools-prometheus
description: Query isutools (isucon-go-tools) metrics via PromQL on a Prometheus server. Use when the user wants to analyze ISUCON performance — slow endpoints, slow SQL queries, cache hit rate, DB connection pool, lock contention, queue depth, object pool usage, or benchmark scores — using Prometheus (/api/v1/query, /api/v1/query_range, Grafana, promtool). Triggers on phrases like "isutools metrics", "isucon-go-tools metrics", "isutools_api_*", "isutools_db_*", "ISUCON のメトリクスをクエリ", "PromQL for isutools".
Querying isutools Metrics via Prometheus
PromQL reference for metrics emitted by isucon-go-tools v2. Assume a Prometheus server is already scraping the application — this skill only covers querying, not setting up the exporter.
All metrics use the isutools namespace. Subsystems: api, db, cache, locker, pool, queue, benchmark.
How to issue queries
- Prometheus HTTP API:
GET /api/v1/query?query=<PromQL>(instant),GET /api/v1/query_range?query=...&start=...&end=...&step=...(range). promtool:promtool query instant http://<prom>:9090 '<PromQL>'/promtool query range ....- Grafana: paste the PromQL into a panel.
When invoking via curl, URL-encode the PromQL expression. Example:
curl -sG http://prometheus:9090/api/v1/query \
--data-urlencode 'query=topk(5, sum by (url) (rate(isutools_api_request_duration_seconds_sum[1m])))'
Metric reference
api — HTTP server
| Metric | Type | Labels |
|---|---|---|
isutools_api_request_total |
Counter | code, method, host, url |
isutools_api_request_duration_seconds |
Histogram | code, method, url |
isutools_api_request_size_bytes |
Histogram | code, method, url |
isutools_api_response_size_bytes |
Histogram | code, method, url |
isutools_api_flow_total |
Counter | source_method, source_path, target_method, target_path |
url is pre-normalized: UUIDs → <uuid>, digit runs → <number>. Use the normalized form when filtering.
db — database/sql wrapper
| Metric | Type | Labels |
|---|---|---|
isutools_db_query_count |
Counter | driver, addr, query |
isutools_db_query_duration_seconds |
Histogram | driver, addr, query |
isutools_db_max_open_connections |
Gauge | driver, addr, connection_id |
isutools_db_connection_pool |
Gauge | driver, addr, connection_id, status (idle/open/in_use) |
isutools_db_wait_count |
Gauge | driver, addr, connection_id |
isutools_db_wait_duration |
Gauge | driver, addr, connection_id |
isutools_db_max_idle_closed |
Gauge | driver, addr, connection_id |
isutools_db_max_lifetime_closed |
Gauge | driver, addr, connection_id |
isutools_db_max_idle_time_closed |
Gauge | driver, addr, connection_id |
query is a normalized SQL string (high cardinality). isutools_db_wait_duration is in nanoseconds.
cache — motoki317/sc and isutools maps/slices
| Metric | Type | Labels |
|---|---|---|
isutools_cache_hit_count |
Gauge | name, stat (hit/grace_hit/miss/replace) |
isutools_cache_load_count |
Gauge | name, status (hit/miss) |
isutools_cache_store_count |
Gauge | name, status (replace/new/remove) |
isutools_cache_index_access |
Histogram | name |
isutools_cache_length |
Gauge | name |
isutools_cache_hit_count is a Gauge that mirrors the cumulative sc.Stats() snapshot — query it directly, don't wrap it in rate().
locker — RWMutex
| Metric | Type | Labels |
|---|---|---|
isutools_locker_index_access |
Histogram | name, type (read/write) |
pool — object pool
| Metric | Type | Labels |
|---|---|---|
isutools_pool_count |
Counter | name, type (alloc/get/put) |
queue — channel-backed queue
| Metric | Type | Labels |
|---|---|---|
isutools_queue_counter |
Gauge | name, status (in/out) |
benchmark
| Metric | Type | Labels |
|---|---|---|
isutools_benchmark_score |
Gauge | — |
isutools_benchmark_duration |
Gauge | — |
PromQL recipes
Slowest endpoints (p95 over the last 1m)
topk(10,
histogram_quantile(0.95,
sum by (method, url, le) (
rate(isutools_api_request_duration_seconds_bucket[1m])
)
)
)
Endpoints consuming the most total time (the score-killers)
topk(10,
sum by (method, url) (
rate(isutools_api_request_duration_seconds_sum[1m])
)
)
Throughput and 5xx error rate per endpoint
sum by (method, url) (rate(isutools_api_request_total[1m]))
sum by (url) (rate(isutools_api_request_total{code=~"5.."}[1m]))
/ ignoring(code) sum by (url) (rate(isutools_api_request_total[1m]))
Endpoint transition flows
topk(20,
sum by (source_path, target_path) (
rate(isutools_api_flow_total[5m])
)
)
Slowest SQL queries (p95)
topk(10,
histogram_quantile(0.95,
sum by (query, le) (
rate(isutools_db_query_duration_seconds_bucket[1m])
)
)
)
SQL queries by total time consumed
topk(10,
sum by (query) (
rate(isutools_db_query_duration_seconds_sum[1m])
)
)
DB connection pool saturation
isutools_db_connection_pool{status="in_use"}
/ on(connection_id) isutools_db_max_open_connections
Approaching 1.0 means the pool is the bottleneck. Cross-check with growth in isutools_db_wait_count and isutools_db_wait_duration:
rate(isutools_db_wait_count[1m])
rate(isutools_db_wait_duration[1m]) / 1e9 # seconds per second of wait
Cache hit rate (sc.Cache)
sum by (name) (isutools_cache_hit_count{stat=~"hit|grace_hit"})
/ sum by (name) (isutools_cache_hit_count{stat=~"hit|grace_hit|miss"})
Map cache hit rate
sum by (name) (isutools_cache_load_count{status="hit"})
/ sum by (name) (isutools_cache_load_count)
Slice index access distribution (p99)
histogram_quantile(0.99,
sum by (name, le) (
rate(isutools_cache_index_access_bucket[1m])
)
)
Lock contention (RWMutex acquisition latency p99)
histogram_quantile(0.99,
sum by (name, type, le) (
rate(isutools_locker_index_access_bucket[1m])
)
)
Object pool reuse ratio
sum by (name) (rate(isutools_pool_count{type="get"}[1m]))
/ sum by (name) (rate(isutools_pool_count{type="alloc"}[1m]))
A high ratio means most gets reuse a pooled object instead of allocating.
Queue depth and throughput
sum by (name) (isutools_queue_counter{status="in"})
- sum by (name) (isutools_queue_counter{status="out"})
sum by (name) (rate(isutools_queue_counter{status="in"}[1m]))
sum by (name) (rate(isutools_queue_counter{status="out"}[1m]))
Benchmark score and duration
isutools_benchmark_score
isutools_benchmark_duration
Workflow when answering an isutools metrics question
- Map the question to a subsystem and metric from the reference above.
- Aggregate first with
sum by (...)before applyinghistogram_quantile—urlandqueryare high-cardinality, and querying buckets without aggregation is both slow and wrong. - Pick the time window deliberately:
[1m]for live load,[5m]–[15m]for trends, range queries for benchmarks. - Report the exact PromQL used so the user can paste it into Grafana /
promtooland verify.
Gotchas
- Histograms expose
_bucket/_sum/_count— alwaysrate(..._bucket[...])andsum by (..., le)beforehistogram_quantile. isutools_cache_hit_countis a Gauge (cumulative snapshot). Usedelta(...[5m])if you need the change over a window; neverrate().isutools_db_wait_durationis in nanoseconds — divide by1e9for seconds.- Counter resets happen on app restart; use
rate()/increase()(which handle resets) rather than raw... - ... offset .... urlandquerylabels are pre-normalized inside the application; queries should use the normalized form (e.g./users/<number>, not/users/42).isutools_api_request_totalcarries ahostlabel that the duration/size histograms do not; don'ton()-join across them onhost.