metrics-query-cookbook

star 0

Cookbook of ready-to-use PromQL queries, preset catalog, metric name dictionaries, and label references for Ceph storage, network traffic, pod statistics, and MTV migrations. Use when you need specific queries, exact metric names, or label filters.

kubev2v By kubev2v schedule Updated 6/2/2026

name: metrics-query-cookbook description: Cookbook of ready-to-use PromQL queries, preset catalog, metric name dictionaries, and label references for Ceph storage, network traffic, pod statistics, and MTV migrations. Use when you need specific queries, exact metric names, or label filters.

Metrics Query Cookbook

Ready-to-use queries, preset catalog, and metric name/label references for OpenShift clusters with ODF, OVN-Kubernetes, KubeVirt, and Forklift/MTV.

All examples use the kubectl-metrics MCP server tools (metrics_read and metrics_help).

Output format guidance: Use default (markdown) when presenting to user. Use output: "json" only when you need to parse values programmatically. Use selector to filter results by labels post-query.


Preset Catalog

Every preset works as both an instant (default) and range query. Pass start to get a time-series trend.

Cluster & Namespace

Preset Description
cluster_cpu_utilization Cluster CPU utilization percentage
cluster_memory_utilization Cluster memory utilization percentage
cluster_pod_status Pod counts by phase (Running, Pending, Failed, Succeeded, Unknown)
cluster_node_readiness Node readiness status counts
namespace_cpu_usage Top 10 namespaces by CPU usage (cores)
namespace_memory_usage Top 10 namespaces by memory usage (bytes)
namespace_network_rx Top 10 namespaces by network receive rate
namespace_network_tx Top 10 namespaces by network transmit rate
namespace_network_errors Network errors + drops by namespace (top 10)
pod_restarts_top10 Top 10 pods by container restart count

Forklift / MTV Migration

Preset Description
mtv_migration_status Migration counts by status (succeeded/failed/running)
mtv_plan_status Plan-level status counts
mtv_migration_duration Migration duration per plan (seconds)
mtv_avg_migration_duration Average migration duration (seconds)
mtv_data_transferred Total bytes migrated per plan
mtv_net_throughput Migration network throughput
mtv_storage_throughput Migration storage throughput
mtv_migration_pod_rx Migration pod receive rate (bytes/sec, top 20)
mtv_migration_pod_tx Migration pod transmit rate (bytes/sec, top 20)
mtv_forklift_traffic Forklift operator pod network traffic (bytes/sec)
mtv_vmi_migrations_pending KubeVirt VMI migrations in pending phase
mtv_vmi_migrations_running KubeVirt VMI migrations in running phase

Storage Metrics (Ceph / ODF)

Cluster-wide storage health

metrics_read { "command": "query", "flags": { "query": "ceph_health_status", "output": "markdown" } }

Result: 0 = OK, 1 = WARN, 2 = ERR.

Storage capacity

metrics_read { "command": "query", "flags": { "query": "ceph_cluster_total_bytes", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_cluster_total_used_bytes", "output": "markdown" } }

Pool-level statistics

metrics_read { "command": "query", "flags": { "query": "ceph_pool_percent_used * 100", "output": "markdown" } }

Pool I/O rates

metrics_read { "command": "query", "flags": { "query": "rate(ceph_pool_rd[5m])", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "rate(ceph_pool_wr[5m])", "output": "markdown" } }

OSD operation latency

metrics_read { "command": "query", "flags": { "query": "rate(ceph_osd_op_latency_sum[5m]) / rate(ceph_osd_op_latency_count[5m])", "output": "markdown" } }

Placement group health

metrics_read { "command": "query", "flags": { "query": "ceph_pg_total", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_pg_degraded", "output": "markdown" } }

Available labels on ceph_* metrics

Label Description Example values
pool_id Ceph pool identifier (pool-level metrics) 1, 2, 3, 4
ceph_daemon OSD daemon name (OSD-level metrics) osd.0, osd.1, osd.2
namespace Storage operator namespace openshift-storage
managedBy Managing resource ocs-storagecluster
job Scrape job rook-ceph-mgr, rook-ceph-exporter

Storage metrics reference

Metric Description
ceph_health_status Overall cluster health (0=OK, 1=WARN, 2=ERR)
ceph_cluster_total_bytes Total cluster capacity
ceph_cluster_total_used_bytes Used cluster capacity
ceph_pool_percent_used Per-pool usage percentage
ceph_pool_stored Bytes stored per pool
ceph_pool_max_avail Available bytes per pool
ceph_pool_rd, ceph_pool_wr Read/write IOPS per pool
ceph_pool_rd_bytes, ceph_pool_wr_bytes Read/write bytes per pool
ceph_osd_op_latency_sum/count OSD operation latency (use as rate ratio)
ceph_pg_total, ceph_pg_active, ceph_pg_degraded Placement group counts
node_filesystem_avail_bytes, node_filesystem_size_bytes Node filesystem capacity

Network Traffic Metrics

Network traffic by namespace

metrics_read { "command": "preset", "flags": { "name": "namespace_network_rx", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "namespace_network_tx", "output": "markdown" } }

Network traffic by pod in a namespace

Replace TARGET_NAMESPACE with the actual namespace -- ASK the user if not known.

metrics_read {
  "command": "query",
  "flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"TARGET_NAMESPACE\"}[5m]))))", "output": "markdown" }
}
metrics_read {
  "command": "query",
  "flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_transmit_bytes_total{namespace=\"TARGET_NAMESPACE\"}[5m]))))", "output": "markdown" }
}

Network errors and drops by namespace

metrics_read { "command": "preset", "flags": { "name": "namespace_network_errors", "output": "markdown" } }

Node-level network throughput

metrics_read {
  "command": "query",
  "flags": { "query": "instance:node_network_receive_bytes_excluding_lo:rate1m + instance:node_network_transmit_bytes_excluding_lo:rate1m", "output": "markdown" }
}

Available labels on network metrics

Label Description Example values
namespace Pod namespace openshift-storage, konveyor-forklift
pod Pod name forklift-controller-6df77f6bf5-jtt7q
interface Network interface (per-pod metrics) eth0
instance Node instance (node-level metrics) 10.0.0.5:9100
node Node name (node-level metrics) worker-0

Network metrics reference

Metric Description
container_network_receive_bytes_total Bytes received per pod/namespace
container_network_transmit_bytes_total Bytes transmitted per pod/namespace
container_network_receive_errors_total Receive errors per pod/namespace
container_network_transmit_errors_total Transmit errors per pod/namespace
container_network_receive_packets_dropped_total Dropped receive packets
container_network_transmit_packets_dropped_total Dropped transmit packets
node_network_receive_bytes_total Bytes received per node/interface
node_network_transmit_bytes_total Bytes transmitted per node/interface
instance:node_network_receive_bytes_excluding_lo:rate1m Pre-computed node receive rate
instance:node_network_transmit_bytes_excluding_lo:rate1m Pre-computed node transmit rate

Pod and Container Statistics

Pod count by namespace

metrics_read { "command": "query", "flags": { "query": "topk(15, count by (namespace)(kube_pod_info))", "output": "markdown" } }

Pod phase summary

metrics_read { "command": "preset", "flags": { "name": "cluster_pod_status", "output": "markdown" } }

Container CPU usage by namespace

metrics_read { "command": "preset", "flags": { "name": "namespace_cpu_usage", "output": "markdown" } }

Container memory usage by namespace

metrics_read { "command": "preset", "flags": { "name": "namespace_memory_usage", "output": "markdown" } }

Container restart counts (instability indicator)

metrics_read { "command": "preset", "flags": { "name": "pod_restarts_top10", "output": "markdown" } }

Pods with high recent restarts (use debug_read for details)

After finding pods with high restarts, use debug_read to get pod details and logs:

debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "query": "where status.containerStatuses[0].restartCount > 5", "output": "markdown" } }
debug_read { "command": "logs", "flags": { "name": "<POD_NAME>", "namespace": "<NAMESPACE>", "tail": 100, "query": "where level = 'ERROR'", "output": "markdown" } }

Available labels on pod/container metrics

Label Description Example values
namespace Pod namespace konveyor-forklift, openshift-cnv
pod Pod name forklift-controller-6df77f6bf5-jtt7q
container Container name main, inventory, extract
node Node the pod runs on worker-0, worker-1
phase Pod phase (on status metrics) Running, Pending, Failed, Succeeded
uid Pod UID 793fb1cb-3e58-4eef-b95a-733f237365a3
created_by_kind Owner resource kind (on kube_pod_info) ReplicaSet, DaemonSet, StatefulSet
created_by_name Owner resource name (on kube_pod_info) forklift-controller-6df77f6bf5
host_ip Node IP (on kube_pod_info) 192.168.0.77
pod_ip Pod IP (on kube_pod_info) 10.129.3.3

Pod/container metrics reference

Metric Description
kube_pod_info Pod metadata (node, namespace, IPs, owner)
kube_pod_status_phase Pod phase (Running/Pending/Failed/Succeeded)
kube_pod_container_status_restarts_total Container restart count
kube_pod_container_status_waiting_reason Waiting reason (CrashLoopBackOff, ImagePullBackOff, etc.)
container_cpu_usage_seconds_total Container CPU usage
container_memory_working_set_bytes Container memory usage
namespace:container_cpu_usage:sum Pre-aggregated CPU by namespace
namespace:container_memory_usage_bytes:sum Pre-aggregated memory by namespace

Forklift / MTV Migration Metrics

Available labels on mtv_* metrics

All mtv_* metrics share these labels for filtering and grouping:

Label Description Example values
provider Source provider type vsphere, ovirt, openstack, ova, ec2
mode Migration mode Cold, Warm
target Target cluster Local (host cluster) or remote cluster name
owner User who owns the migration admin@example.com
plan Migration plan UUID 363ce137-dace-4fb4-b815-759c214c9fec
namespace Forklift operator namespace konveyor-forklift, openshift-mtv
status Migration/plan status (on status metrics) Succeeded, Failed, Executing

MTV migration metrics reference

Metric Description
mtv_migrations_status_total Migration counts by status (succeeded/failed/running)
mtv_plans_status Plan-level status counts
mtv_migration_data_transferred_bytes Total bytes migrated per plan
mtv_migration_net_throughput Migration network throughput
mtv_migration_storage_throughput Migration storage throughput
mtv_migration_duration_seconds Migration duration per plan
mtv_plan_alert_status Alerts on migration plans
mtv_workload_migrations_status_total Per-workload migration status (per plan + status)
kubevirt_vmi_migrations_in_pending_phase Live VMI migrations pending
kubevirt_vmi_migrations_in_running_phase Live VMI migrations in progress

Migration status overview

metrics_read { "command": "preset", "flags": { "name": "mtv_migration_status", "output": "markdown" } }

Migration plan status

metrics_read { "command": "preset", "flags": { "name": "mtv_plan_status", "output": "markdown" } }

Migration data transfer and throughput

metrics_read { "command": "preset", "flags": { "name": "mtv_data_transferred", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_net_throughput", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_storage_throughput", "output": "markdown" } }

Migration duration

metrics_read { "command": "preset", "flags": { "name": "mtv_migration_duration", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_avg_migration_duration", "output": "markdown" } }

Migration alerts

metrics_read { "command": "query", "flags": { "query": "mtv_plan_alert_status", "output": "markdown" } }

Narrowing migration metrics with label filters

Use {label="value"} in PromQL or use the selector flag:

metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes", "selector": "provider=vsphere", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes{mode=\"Cold\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes{provider=\"ovirt\", mode=\"Warm\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migrations_status_total{status=\"Failed\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_workload_migrations_status_total{plan=\"PLAN_UUID\", status=\"Failed\"}", "output": "markdown" } }

Grouping migration metrics

metrics_read { "command": "query", "flags": { "query": "sum by (provider)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (mode)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (provider, mode)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (status, provider)(mtv_migrations_status_total)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "avg by (provider)(mtv_migration_duration_seconds)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (plan, status)(mtv_workload_migrations_status_total)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (provider, status)(mtv_plans_status)", "output": "markdown" } }

Network traffic of migration pods

During active Forklift migrations, data-transfer pods run in the target namespace. Migration pod names follow the pattern {plan-name}-{vm-id}-{random} (e.g. test-vmware-metrics-vm-43-tws62).

Step 1 -- Discover migration pods:

VMware/general migration pods (carry a plan label):

debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "selector": "plan", "output": "markdown" } }

oVirt/OpenStack populator pods (named populate-{uuid}-...):

debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "query": "where name ~= '^populate-'", "output": "markdown" } }

Step 2 -- Query network traffic for discovered pods:

Use the pod names from Step 1 to build a regex filter (replace POD1|POD2 with the actual names):

metrics_read {
  "command": "query",
  "flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"TARGET_NAMESPACE\",pod=~\"POD1|POD2\"}[5m]))))", "output": "markdown" }
}
metrics_read {
  "command": "query",
  "flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_transmit_bytes_total{namespace=\"TARGET_NAMESPACE\",pod=~\"POD1|POD2\"}[5m]))))", "output": "markdown" }
}

Short-lived pod network metrics

Pods that run under 60 seconds (e.g. oVirt/OpenStack populator pods) may not have container-level network metrics (container_network_*). This is because cadvisor needs 1-2 collection cycles (10-20s) to establish network namespace tracking, and the pod may complete before tracking starts. CPU and memory metrics are unaffected.

Node-level network metrics capture the transfer at the node level. Determine which node ran the pod (spec.nodeName or kube_pod_info), then query RX and TX together:

metrics_read {
  "command": "query_range",
  "flags": {
    "query": [
      "instance:node_network_receive_bytes_excluding_lo:rate1m{instance=~\"NODE_NAME.*\"}",
      "instance:node_network_transmit_bytes_excluding_lo:rate1m{instance=~\"NODE_NAME.*\"}"
    ],
    "name": ["node_rx", "node_tx"],
    "start": "<MIGRATION_START>",
    "end": "<MIGRATION_END>",
    "step": "30s",
    "output": "markdown"
  }
}

Compare against baseline before/after the migration window to isolate transfer traffic.

CPU activity confirms the pod was active during the window:

metrics_read { "command": "query_range", "flags": { "query": "rate(container_cpu_usage_seconds_total{pod=\"<POD>\",namespace=\"<NS>\"}[1m])", "start": "<START>", "end": "<END>", "step": "30s", "output": "markdown" } }

Completed migration historical queries

When querying metrics for a migration that already finished, use the plan's start/completion timestamps as absolute time bounds:

  1. Get timestamps from the plan:
mtv_read { "command": "describe plan", "flags": { "name": "<PLAN>", "namespace": "<NS>", "output": "markdown" } }
  1. Use ISO-8601 start/end in query_range:
metrics_read {
  "command": "query_range",
  "flags": {
    "query": "sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"<NS>\"}[5m]))",
    "start": "2025-06-15T10:00:00Z",
    "end": "2025-06-15T12:30:00Z",
    "step": "60s",
    "output": "markdown"
  }
}

Do not use relative offsets like -1h for completed migrations -- the data may fall outside that window.

Checking migration pod status with debug_read

To investigate migration pod issues alongside metrics:

debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "selector": "plan", "output": "markdown" } }
debug_read { "command": "logs", "flags": { "name": "<POD_NAME>", "namespace": "<NAMESPACE>", "tail": 100, "query": "where level = 'ERROR'", "output": "markdown" } }

Network traffic of the Forklift operator itself

metrics_read { "command": "preset", "flags": { "name": "mtv_forklift_traffic", "output": "markdown" } }

KubeVirt VMI migration metrics

These track live VM migrations (vMotion-style), not Forklift cold migrations:

metrics_read { "command": "preset", "flags": { "name": "mtv_vmi_migrations_pending", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_vmi_migrations_running", "output": "markdown" } }

Quick Health Dashboard

Run key queries for a cluster overview:

metrics_read { "command": "preset", "flags": { "name": "cluster_cpu_utilization", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "cluster_memory_utilization", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_health_status", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "namespace_network_rx", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_migration_status", "output": "markdown" } }
Install via CLI
npx skills add https://github.com/kubev2v/mtv-agent --skill metrics-query-cookbook
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator