name: metrics description: Query metrics from Prometheus using PromQL for HTTP request rates, latency percentiles, error rates, active connections, and GenAI token usage. allowed-tools: - Bash - curl
Metrics Querying with PromQL
Overview
This skill provides PromQL query templates for querying metrics from Prometheus. All queries use the Prometheus HTTP API at http://localhost:9090/api/v1/query. No authentication is needed for local Prometheus.
Prometheus runs on port 9090 using HTTP (not HTTPS).
Connection Defaults
| Variable | Default | Description |
|---|---|---|
PROMETHEUS_ENDPOINT |
http://localhost:9090 |
Prometheus base URL |
Metric Discovery
Different OTel SDK versions and languages emit HTTP metrics under different names. Before querying, discover which metric names are active in your stack:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/label/__name__/values" | python3 -c "
import json, sys
for m in json.load(sys.stdin).get('data', []):
if any(k in m for k in ['http_server', 'gen_ai', 'db_client']):
print(m)"
Common HTTP metric name variants:
| Metric Name | Unit | Emitted By |
|---|---|---|
http_server_duration_milliseconds |
milliseconds | Python OTel SDK (older semconv) |
http_server_duration_seconds |
seconds | .NET, Java OTel SDKs |
http_server_request_duration_seconds |
seconds | Stable HTTP semconv (newer SDKs) |
Important: Replace the metric name in the queries below with whichever variant is active in your stack. The query patterns (rate, histogram_quantile, etc.) are identical — only the metric name changes. For histogram bucket queries, replace
_seconds_bucketwith_milliseconds_bucketas appropriate, and adjust latency thresholds accordingly (e.g.,le="0.25"for seconds vsle="250"for milliseconds).
HTTP Request Rate by Service
Calculate the per-second HTTP request rate over a 5-minute window, grouped by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'
HTTP Latency Percentiles
p95 Latency by Service
Calculate the 95th percentile HTTP request latency by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.95, sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service_name))'
p99 Latency by Service
Calculate the 99th percentile HTTP request latency by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.99, sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service_name))'
Error Rate (5xx Responses)
Calculate the ratio of 5xx error responses to total requests by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(http_server_duration_seconds_count{http_response_status_code=~"5.."}[5m])) by (service_name) / sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'
Note on status code labels: The label name varies by OTel SDK version. Older semconv uses
http_status_code; newer stable semconv useshttp_response_status_code. Use the Metric Discovery section to check which label is present, or query both:sum(rate(http_server_duration_seconds_count{http_status_code=~"5.."}[5m])) by (service_name)
Active Connections
Query the current number of active HTTP connections by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(http_server_active_requests) by (service_name)'
Database Operation Latency
DB Operation p95 Latency by Service
Calculate the 95th percentile database operation latency by service:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.95, sum(rate(db_client_operation_duration_seconds_bucket[5m])) by (le, service_name))'
GenAI-Specific Metrics
Token Usage by Operation and Model
Query GenAI token usage histograms grouped by operation name and request model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(gen_ai_client_token_usage_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model)'
Token usage p95 by operation and model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.95, sum(rate(gen_ai_client_token_usage_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model))'
Operation Duration by Operation and Model
Query GenAI operation duration histograms grouped by operation and model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(gen_ai_client_operation_duration_seconds_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model)'
Operation duration p95 by operation and model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.95, sum(rate(gen_ai_client_operation_duration_seconds_bucket[5m])) by (le, gen_ai_operation_name, gen_ai_request_model))'
Available Metric Names and Label Dimensions
| Metric | Type | Labels |
|---|---|---|
http_server_duration_milliseconds |
histogram | service_name, http_response_status_code |
http_server_duration_seconds |
histogram | service_name, http_response_status_code |
http_server_request_duration_seconds |
histogram | service_name, http_response_status_code |
http_server_active_requests |
gauge | service_name |
db_client_operation_duration_seconds |
histogram | service_name |
gen_ai_client_token_usage |
histogram | gen_ai.operation.name, gen_ai.request.model |
gen_ai_client_operation_duration_seconds |
histogram | gen_ai.operation.name, gen_ai.request.model |
Note on Prometheus label names: Prometheus replaces dots in label names with underscores. The OTel attribute
gen_ai.operation.namebecomes the Prometheus labelgen_ai_operation_namein PromQL queries. The table above shows the original OTel attribute names for reference.
PPL Alternative for OpenSearch-Ingested Metrics
PPL can also query metrics stored in OpenSearch when metrics are ingested via Data Prepper, as an alternative to PromQL. This is useful for OpenSearch-native workflows where you want to query metrics alongside traces and logs using a single query language. When Data Prepper is configured to ingest metrics into OpenSearch, you can use PPL source= queries against the metrics index just as you would for traces and logs.
References
- PPL Language Reference — Official PPL syntax documentation. Fetch this if queries fail due to OpenSearch version differences or new syntax.
- Prometheus Querying Basics — PromQL syntax reference.
AWS Managed Service for Prometheus
To query metrics on Amazon Managed Service for Prometheus (AMP), replace the local endpoint and add AWS SigV4 authentication:
curl -s --aws-sigv4 "aws:amz:REGION:aps" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
'https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query' \
--data-urlencode 'query=sum(rate(http_server_duration_seconds_count[5m])) by (service_name)'
- Endpoint format:
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query - Auth:
--aws-sigv4 "aws:amz:REGION:aps"with--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" - PromQL query syntax is identical between local Prometheus and Amazon Managed Prometheus; only the endpoint and authentication differ