metrics - SKILL.md Agent Skill

name: metrics description: Query metrics from Prometheus using PromQL for HTTP request rates, latency percentiles, error rates, active connections, and GenAI token usage. allowed-tools: - Bash - curl

Metrics Querying with PromQL

Overview

This skill provides PromQL query templates for querying metrics from Prometheus. All queries use the Prometheus HTTP API at http://localhost:9090/api/v1/query. No authentication is needed for local Prometheus.

Prometheus runs on port 9090 using HTTP (not HTTPS).

Connection Defaults

Variable	Default	Description
`PROMETHEUS_ENDPOINT`	`http://localhost:9090`	Prometheus base URL

Metric Discovery

Different OTel SDK versions and languages emit HTTP metrics under different names. Before querying, discover which metric names are active in your stack:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/label/__name__/values" | python3 -c "
import json, sys
for m in json.load(sys.stdin).get('data', []):
    if any(k in m for k in ['http_server', 'gen_ai', 'db_client']):
        print(m)"

Common HTTP metric name variants:

Metric Name	Unit	Emitted By
`http_server_duration_milliseconds`	milliseconds	Python OTel SDK (older semconv)
`http_server_duration_seconds`	seconds	.NET, Java OTel SDKs
`http_server_request_duration_seconds`	seconds	Stable HTTP semconv (newer SDKs)

Important: Replace the metric name in the queries below with whichever variant is active in your stack. The query patterns (rate, histogram_quantile, etc.) are identical — only the metric name changes. For histogram bucket queries, replace _seconds_bucket with _milliseconds_bucket as appropriate, and adjust latency thresholds accordingly (e.g., le="0.25" for seconds vs le="250" for milliseconds).

HTTP Request Rate by Service

Calculate the per-second HTTP request rate over a 5-minute window, grouped by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'

HTTP Latency Percentiles

p95 Latency by Service

Calculate the 95th percentile HTTP request latency by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.95, sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service_name))'

p99 Latency by Service

Calculate the 99th percentile HTTP request latency by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.99, sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service_name))'

Error Rate (5xx Responses)

Calculate the ratio of 5xx error responses to total requests by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=sum(rate(http_server_duration_seconds_count{http_response_status_code=~"5.."}[5m])) by (service_name) / sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'

Note on status code labels: The label name varies by OTel SDK version. Older semconv uses http_status_code; newer stable semconv uses http_response_status_code. Use the Metric Discovery section to check which label is present, or query both:
sum(rate(http_server_duration_seconds_count{http_status_code=~"5.."}[5m])) by (service_name)

Active Connections

Query the current number of active HTTP connections by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=sum(http_server_active_requests) by (service_name)'

Database Operation Latency

DB Operation p95 Latency by Service

Calculate the 95th percentile database operation latency by service:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.95, sum(rate(db_client_operation_duration_seconds_bucket[5m])) by (le, service_name))'

GenAI-Specific Metrics

Token Usage by Operation and Model

Query GenAI token usage histograms grouped by operation name and request model:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=sum(rate(gen_ai_client_token_usage_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model)'

Token usage p95 by operation and model:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.95, sum(rate(gen_ai_client_token_usage_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model))'

Operation Duration by Operation and Model

Query GenAI operation duration histograms grouped by operation and model:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=sum(rate(gen_ai_client_operation_duration_seconds_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model)'

Operation duration p95 by operation and model:

curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.95, sum(rate(gen_ai_client_operation_duration_seconds_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model))'

Available Metric Names and Label Dimensions

Metric	Type	Labels
`http_server_duration_milliseconds`	histogram	`service_name`, `http_response_status_code`
`http_server_duration_seconds`	histogram	`service_name`, `http_response_status_code`
`http_server_request_duration_seconds`	histogram	`service_name`, `http_response_status_code`
`http_server_active_requests`	gauge	`service_name`
`db_client_operation_duration_seconds`	histogram	`service_name`
`gen_ai_client_token_usage`	histogram	`gen_ai.operation.name`, `gen_ai.request.model`
`gen_ai_client_operation_duration_seconds`	histogram	`gen_ai.operation.name`, `gen_ai.request.model`

Note on Prometheus label names: Prometheus replaces dots in label names with underscores. The OTel attribute gen_ai.operation.name becomes the Prometheus label gen_ai_operation_name in PromQL queries. The table above shows the original OTel attribute names for reference.

PPL Alternative for OpenSearch-Ingested Metrics

PPL can also query metrics stored in OpenSearch when metrics are ingested via Data Prepper, as an alternative to PromQL. This is useful for OpenSearch-native workflows where you want to query metrics alongside traces and logs using a single query language. When Data Prepper is configured to ingest metrics into OpenSearch, you can use PPL source= queries against the metrics index just as you would for traces and logs.

References

PPL Language Reference — Official PPL syntax documentation. Fetch this if queries fail due to OpenSearch version differences or new syntax.
Prometheus Querying Basics — PromQL syntax reference.

AWS Managed Service for Prometheus

To query metrics on Amazon Managed Service for Prometheus (AMP), replace the local endpoint and add AWS SigV4 authentication:

curl -s --aws-sigv4 "aws:amz:REGION:aps" \
  --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
  'https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query' \
  --data-urlencode 'query=sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'

Endpoint format: https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query
Auth: --aws-sigv4 "aws:amz:REGION:aps" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"
PromQL query syntax is identical between local Prometheus and Amazon Managed Prometheus; only the endpoint and authentication differ