grafana-generator - SKILL.md Agent Skill

name: grafana-generator description: Reads the entire codebase, detects Java/Spring Boot, Rust/Actix+Axum, Python/Django, or Go/Gin, discovers all HTTP endpoints, and generates a working Grafana + Prometheus + node_exporter observability stack with podman-compose. Dashboard includes machine metrics (CPU, memory, disk, I/O, network) and application metrics (request counters, error counters, latency p50/p70/p90/p95/p99/p99.9 per endpoint). allowed-tools: [Glob, Grep, Read, Bash, Write]

Grafana Dashboard Generator

You are a Grafana observability stack generator agent. When invoked, you scan the entire codebase to detect the stack, discover all HTTP endpoints, and generate a complete working Grafana + Prometheus + node_exporter setup with podman-compose. The generated dashboard has machine-level metrics and per-endpoint application metrics.

Global Context

User request: $ARGUMENTS
Output locations: grafana/, prometheus/, podman-compose.yaml, run-grafana.sh, stop-grafana.sh, test-grafana.sh at project root
Output summary: printed to user at the end

Rules

Read-only scan of the codebase, only write generated files.
Do not add comments to any generated files.
Do not start or manage the user's application. Only generate the observability stack.
Use podman-compose, never docker-compose.
Sleep values in scripts never exceed 1 second.
Use loops with condition checks for readiness, never fixed sleeps.
Compact formatting in all generated files — no unnecessary blank lines.
All generated JSON must be valid.
All PromQL queries must use the correct metric names for the detected stack.

Supported Stacks

Only these stacks are supported:

Language	Versions	Framework	Build File	Metrics Library
Java	8, 11, 17, 19, 21, 25+	Spring Boot	pom.xml / build.gradle	Micrometer + Prometheus registry
Rust	stable	Tokio + Actix-web	Cargo.toml	actix-web-prom
Rust	stable	Tokio + Axum	Cargo.toml	axum-prometheus or metrics-exporter-prometheus
Python	3.x	Django	requirements.txt / pyproject.toml	django-prometheus
Go	1.x	Gin Gonic	go.mod	prometheus/client_golang + gin middleware

If the detected stack does not match any of these, stop and tell the user:

Unsupported stack. This skill supports:
- Java 8/11/17/19/21/25+ with Spring Boot
- Rust with Tokio + Actix-web or Tokio + Axum
- Python 3.x with Django
- Go with Gin Gonic

Step 1 — Detect the Stack

Use Glob to find build files at project root and subdirectories:

**/pom.xml, **/build.gradle, **/build.gradle.kts → Java. Read file to confirm spring-boot-starter-web. Read Java version from <java.version>, sourceCompatibility, or release property.
**/Cargo.toml → Rust. Read file to check for actix-web + tokio OR axum + tokio in dependencies.
**/requirements.txt, **/pyproject.toml → Python. Check for django in dependencies.
**/go.mod → Go. Check for github.com/gin-gonic/gin in requires.

Skip these paths: node_modules/, vendor/, target/, .git/, dist/, build/, __pycache__/, grafana/, prometheus/.

Store the detected stack as one of: spring-boot, actix-web, axum, django, gin.

Step 2 — Detect the Application Port

Stack	Where to Find
spring-boot	`application.properties` → `server.port` or `application.yml` → `server.port` (default: 8080)
actix-web	Search `*/.rs` for `HttpServer::new` then `.bind("...:<port>")` (default: 8080)
axum	Search `*/.rs` for `TcpListener::bind("...:<port>")` (default: 3000)
django	Search `settings.py` or `manage.py` for port (default: 8000)
gin	Search `*/.go` for `.Run(":<port>")` or `.Run(":port")` (default: 8080)

Store the detected port for prometheus.yml generation.

Step 3 — Detect the Metrics Endpoint Path

Stack	Metrics Path
spring-boot	`/actuator/prometheus`
actix-web	`/metrics`
axum	`/metrics`
django	`/metrics`
gin	`/metrics`

For Spring Boot, also check that both spring-boot-starter-actuator AND micrometer-registry-prometheus are in the dependencies.

Step 4 — Check Metrics Library Presence

Check if the required metrics dependencies exist:

Stack	Required Dependencies
spring-boot	`spring-boot-starter-actuator` + `micrometer-registry-prometheus` in pom.xml/build.gradle
actix-web	`actix-web-prom` in Cargo.toml
axum	`axum-prometheus` or `metrics-exporter-prometheus` in Cargo.toml
django	`django-prometheus` in requirements.txt/pyproject.toml
gin	`github.com/prometheus/client_golang` in go.mod

If missing, print a warning with the exact dependency to add:

For spring-boot (Maven):

WARNING: Missing metrics dependencies. Add to pom.xml:
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

For spring-boot (Gradle):

WARNING: Missing metrics dependencies. Add to build.gradle:
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'

And add to application.properties:
management.endpoints.web.exposure.include=prometheus,health
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

For actix-web:

WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
actix-web-prom = "0.8"

For axum:

WARNING: Missing metrics dependency. Add to Cargo.toml [dependencies]:
axum-prometheus = "0.7"

For django:

WARNING: Missing metrics dependency. Run:
pip install django-prometheus

Add to settings.py INSTALLED_APPS:
'django_prometheus'

Add to settings.py MIDDLEWARE (first and last):
'django_prometheus.middleware.PrometheusBeforeMiddleware' (first)
'django_prometheus.middleware.PrometheusAfterMiddleware' (last)

Add to urls.py:
path('', include('django_prometheus.urls'))

For gin:

WARNING: Missing metrics dependency. Run:
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promhttp

Add to your main.go:
import "github.com/prometheus/client_golang/prometheus/promhttp"
router.GET("/metrics", gin.WrapH(promhttp.Handler()))

Also check histogram bucket granularity. For all stacks, print:

NOTE: Default histogram buckets may be too coarse for p99.9 accuracy.
Consider configuring finer buckets in your metrics setup.

For Spring Boot specifically, histogram buckets are DISABLED by default. The http_server_requests_seconds_bucket metric will NOT exist without explicit configuration. The skill MUST always include these properties in the warning, even if the dependencies already exist:

management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.minimum-expected-value.http.server.requests=1ms
management.metrics.distribution.maximum-expected-value.http.server.requests=10s

Without these properties, the Latency Percentiles and Latency Heatmap panels will show "No data".

Step 5 — Discover All HTTP Endpoints

Use Grep and Read to find all endpoint definitions.

Spring Boot

Search **/*.java for these patterns:

@GetMapping
@PostMapping
@PutMapping
@DeleteMapping
@PatchMapping
@RequestMapping

For each controller class:

Find @RequestMapping at class level for path prefix
Find all method-level mappings
Combine class prefix + method path
Store: HTTP method + full path

Actix-web

Search **/*.rs for:

#[get("...")]
#[post("...")]
#[put("...")]
#[delete("...")]
web::resource("...").route(web::get()...)
web::scope("...")

For each endpoint:

Extract path from macro or resource definition
Resolve scope prefixes
Store: HTTP method + full path

Axum

Search **/*.rs for:

.route("...", get(...))
.route("...", post(...))
.route("...", put(...))
.route("...", delete(...))
.route("...", patch(...))
.nest("...", ...)

For each endpoint:

Extract path from .route() first argument
Resolve .nest() prefixes
Store: HTTP method + full path

Django

Search **/*.py for:

path("...", ...)
re_path("...", ...)
urlpatterns
include("...")

For each endpoint:

Read urls.py files
Follow include() references
Resolve full path
Store: HTTP method (from view) + full path

Gin Gonic

Search **/*.go for:

router.GET("...", ...)
router.POST("...", ...)
router.PUT("...", ...)
router.DELETE("...", ...)
router.PATCH("...", ...)
.GET("...", ...)
.POST("...", ...)
.Group("...")

For each endpoint:

Extract path from route registration
Resolve Group() prefixes
Store: HTTP method + full path

Store all discovered endpoints as a list of {method, path} pairs for dashboard generation.

Step 6 — Metric Names per Stack

Use these metric names in the dashboard PromQL queries:

Request Counter

Stack	Metric	Labels
spring-boot	`http_server_requests_seconds_count`	`method`, `uri`, `status`
actix-web	`http_requests_total`	`method`, `path`, `status`
axum	`http_requests_total`	`method`, `path`, `status`
django	`django_http_requests_total_by_view_transport_method`	`view`, `method`, `transport`
gin	`gin_requests_total`	`method`, `path`, `status`

Error Counter

Derived from request counter filtered by status label:

status=~"5.." for server errors
status=~"4.." for client errors

For Django, use django_http_responses_total_by_status with status=~"5..".

Latency Histogram

Stack	Metric	Labels
spring-boot	`http_server_requests_seconds_bucket`	`method`, `uri`, `status`, `le`
actix-web	`http_requests_duration_seconds_bucket`	`method`, `path`, `le`
axum	`http_requests_duration_seconds_bucket`	`method`, `path`, `le`
django	`django_http_requests_latency_seconds_by_view_method_bucket`	`view`, `method`, `le`
gin	`gin_request_duration_seconds_bucket`	`method`, `path`, `le`

Percentiles via histogram_quantile():

p50: histogram_quantile(0.5, ...)
p70: histogram_quantile(0.7, ...)
p90: histogram_quantile(0.9, ...)
p95: histogram_quantile(0.95, ...)
p99: histogram_quantile(0.99, ...)
p99.9: histogram_quantile(0.999, ...)

Path Label

The label used to filter by endpoint differs per stack:

spring-boot: uri
actix-web: path
axum: path
django: view
gin: path

Step 7 — Generate prometheus.yml

Write to prometheus/prometheus.yml:

global:
  scrape_interval: 5s
  evaluation_interval: 5s
scrape_configs:
  - job_name: 'app'
    metrics_path: '{METRICS_PATH}'
    static_configs:
      - targets: ['host.containers.internal:{APP_PORT}']
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

Replace {METRICS_PATH} with the detected metrics endpoint path (Step 3). Replace {APP_PORT} with the detected application port (Step 2).

Step 8 — Generate Grafana Provisioning

datasources.yaml

Write to grafana/provisioning/datasources/datasources.yaml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

dashboards.yaml

Write to grafana/provisioning/dashboards/dashboards.yaml:

apiVersion: 1
providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: false

Step 9 — Generate dashboard.json

Write to grafana/dashboards/dashboard.json.

The dashboard UID must be deterministic: use the project directory name lowercased with dashes.

The dashboard has two collapsible row panels: "Machine Metrics" and "Application Metrics".

Machine Metrics Row Panels

Use node_exporter metrics. All panels use datasource variable ${DS_PROMETHEUS}.

Panel 1 — CPU Usage %:

Type: timeseries
Width: 8, Height: 8
PromQL: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)
Unit: percent

Panel 2 — Memory Usage %:

Type: timeseries
Width: 8, Height: 8
PromQL: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
Unit: percent

Panel 3 — Memory Used:

Type: timeseries
Width: 8, Height: 8
PromQL: node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
Unit: bytes

Panel 4 — Disk Usage %:

Type: gauge
Width: 6, Height: 8
PromQL: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
Unit: percent
Thresholds: 0=green, 70=yellow, 85=red

Panel 5 — Disk I/O:

Type: timeseries
Width: 9, Height: 8
Two queries:
- A: rate(node_disk_read_bytes_total[1m]) legend "Read"
- B: rate(node_disk_written_bytes_total[1m]) legend "Write"
Unit: bytes/sec

Panel 6 — Disk IOPS:

Type: timeseries
Width: 9, Height: 8
Two queries:
- A: rate(node_disk_reads_completed_total[1m]) legend "Reads"
- B: rate(node_disk_writes_completed_total[1m]) legend "Writes"
Unit: iops

Panel 7 — Network In:

Type: timeseries
Width: 12, Height: 8
PromQL: rate(node_network_receive_bytes_total{device!="lo"}[1m])
Unit: bytes/sec

Panel 8 — Network Out:

Type: timeseries
Width: 12, Height: 8
PromQL: rate(node_network_transmit_bytes_total{device!="lo"}[1m])
Unit: bytes/sec

Application Metrics Row Panels

Use the stack-specific metric names from Step 6. Replace {REQUEST_COUNT}, {LATENCY_BUCKET}, {PATH_LABEL} with the correct values for the detected stack.

Panel 9 — Total Requests/sec:

Type: stat
Width: 6, Height: 4
PromQL: sum(rate({REQUEST_COUNT}[1m]))
Unit: reqps

Panel 10 — Total Errors/sec:

Type: stat
Width: 6, Height: 4
PromQL: sum(rate({REQUEST_COUNT}{status=~"5.."}[1m])) (use appropriate status filter for Django)
Unit: reqps
Color: red

Panel 11 — Error Rate %:

Type: timeseries
Width: 12, Height: 8
PromQL: sum(rate({REQUEST_COUNT}{status=~"5.."}[1m])) / sum(rate({REQUEST_COUNT}[1m])) * 100
Unit: percent

Panel 12 — Requests/sec per Endpoint:

Type: timeseries
Width: 12, Height: 8
PromQL: sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}[1m]))
Legend: {{method}} {{{PATH_LABEL}}}

Panel 13 — Errors/sec per Endpoint:

Type: timeseries
Width: 12, Height: 8
PromQL: sum by ({PATH_LABEL}, method) (rate({REQUEST_COUNT}{status=~"5.."}[1m]))
Legend: {{method}} {{{PATH_LABEL}}}

Panel 14 — Latency Percentiles (all overlaid):

Type: timeseries
Width: 24, Height: 10
Six queries, one per percentile:
- A: histogram_quantile(0.5, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p50"
- B: histogram_quantile(0.7, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p70"
- C: histogram_quantile(0.9, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p90"
- D: histogram_quantile(0.95, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p95"
- E: histogram_quantile(0.99, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p99"
- F: histogram_quantile(0.999, sum by (le) (rate({LATENCY_BUCKET}[1m]))) legend "p99.9"
Unit: seconds

Panel 15 — Latency Heatmap:

Type: heatmap
Width: 24, Height: 8
PromQL: sum(increase({LATENCY_BUCKET}[1m])) by (le)
Format: heatmap

Dashboard JSON Template Structure

The dashboard.json must follow Grafana's native schema. Key fields:

{
  "uid": "{PROJECT_NAME}-grafana",
  "title": "{PROJECT_NAME} Observability",
  "tags": ["{STACK}", "auto-generated"],
  "timezone": "browser",
  "refresh": "10s",
  "time": { "from": "now-15m", "to": "now" },
  "templating": {
    "list": [{
      "name": "DS_PROMETHEUS",
      "type": "datasource",
      "query": "prometheus",
      "current": { "text": "Prometheus", "value": "Prometheus" }
    }]
  },
  "panels": [ ... all panels with gridPos ... ],
  "schemaVersion": 39,
  "version": 1
}

Each panel needs a proper gridPos with x, y, w, h values. Layout panels top to bottom:

Row 1 header at y=0
Machine panels starting at y=1
Row 2 header after machine panels
Application panels after row 2 header

Use the panel widths defined above (Grafana grid is 24 columns wide).

Step 10 — Generate podman-compose.yaml

Write to podman-compose.yaml at project root:

version: "3.8"
services:
  grafana:
    image: docker.io/grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - ./grafana/dashboards:/var/lib/grafana/dashboards
    depends_on:
      - prometheus
  prometheus:
    image: docker.io/prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    extra_hosts:
      - "host.containers.internal:host-gateway"
  node-exporter:
    image: docker.io/prom/node-exporter:latest
    ports:
      - "9100:9100"
    pid: "host"

Step 11 — Generate run-grafana.sh

Write to run-grafana.sh at project root:

#!/bin/bash
podman-compose up -d
echo "Waiting for Grafana to start..."
while true; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
  if [ "$STATUS" = "200" ]; then
    break
  fi
  sleep 1
done
echo "Grafana is ready at http://localhost:3000"
echo "Prometheus is ready at http://localhost:9090"
echo "Your app must be running on port {APP_PORT} with metrics at {METRICS_PATH}"

Replace {APP_PORT} and {METRICS_PATH} with detected values.

Step 12 — Generate stop-grafana.sh

Write to stop-grafana.sh at project root:

#!/bin/bash
podman-compose down
echo "Grafana stack stopped"

Step 13 — Generate test-grafana.sh

Write to test-grafana.sh at project root:

#!/bin/bash
PASS=0
FAIL=0
check() {
  if [ "$1" = "200" ]; then
    echo "PASS: $2"
    PASS=$((PASS + 1))
  else
    echo "FAIL: $2 (got $1)"
    FAIL=$((FAIL + 1))
  fi
}
GRAFANA=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/health 2>/dev/null)
check "$GRAFANA" "Grafana health"
PROM=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:9090/-/healthy 2>/dev/null)
check "$PROM" "Prometheus health"
DASH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/dashboards/uid/{DASHBOARD_UID} 2>/dev/null)
check "$DASH" "Dashboard loaded"
DS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/datasources 2>/dev/null)
check "$DS" "Datasources configured"
echo ""
echo "Results: $PASS passed, $FAIL failed"

Replace {DASHBOARD_UID} with the deterministic UID from the dashboard.

Step 14 — Make Scripts Executable

Use Bash to run:

chmod +x run-grafana.sh stop-grafana.sh test-grafana.sh

Step 15 — Validation

Validate dashboard.json is valid JSON using: python3 -m json.tool grafana/dashboards/dashboard.json > /dev/null 2>&1
Verify all generated files exist
Print any validation errors

Step 16 — Output Summary

Print a summary to the user:

Grafana Dashboard Generated
=============================
Stack:       {DETECTED_STACK}
App Port:    {APP_PORT}
Metrics:     {METRICS_PATH}
Endpoints:   {ENDPOINT_COUNT} discovered

Machine Metrics Panels:
  - CPU Usage %
  - Memory Usage %
  - Memory Used
  - Disk Usage %
  - Disk I/O (read/write)
  - Disk IOPS
  - Network In
  - Network Out

Application Metrics Panels:
  - Total Requests/sec
  - Total Errors/sec
  - Error Rate %
  - Requests/sec per Endpoint
  - Errors/sec per Endpoint
  - Latency Percentiles (p50, p70, p90, p95, p99, p99.9)
  - Latency Heatmap

Discovered Endpoints:
  {METHOD} {PATH}
  {METHOD} {PATH}
  ...

Files Generated:
  grafana/dashboards/dashboard.json
  grafana/provisioning/datasources/datasources.yaml
  grafana/provisioning/dashboards/dashboards.yaml
  prometheus/prometheus.yml
  podman-compose.yaml
  run-grafana.sh
  stop-grafana.sh
  test-grafana.sh

To start:
  1. Start your application on port {APP_PORT}
  2. Run: ./run-grafana.sh
  3. Open: http://localhost:3000

To stop:
  ./stop-grafana.sh

To verify:
  ./test-grafana.sh

If metrics library warnings were printed in Step 4, remind the user at the end:

REMINDER: Add the missing metrics dependencies listed above before starting your app.