holmesgpt - SKILL.md Agent Skill

name: holmesgpt description: > HolmesGPT in Kubernetes: OpenAI-compatible backend via LiteLLM, Kubernetes and log/metrics toolsets, Gateway API exposure, and in-cluster troubleshooting. license: MIT compatibility: - opencode metadata: author: dotfiles tags: [kubernetes, holmesgpt, ai, litellm, openai-compatible, gateway-api, observability]

HolmesGPT Skill

Cluster Context

HolmesGPT runs in the ai namespace and uses the in-cluster LiteLLM proxy as an OpenAI-compatible backend.

Backend wiring

OPENAI_API_BASE=http://litellm-proxy.ai.svc.cluster.local:4000/v1
OPENAI_API_KEY=sk-hermes-internal (LiteLLM master key)
Models must support function calling / tool calling

First smoke test

kubectl port-forward -n ai svc/holmesgpt-holmes 8080:80
curl -X POST http://localhost:8080/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"ask":"list pods in namespace ai?","model":"holmes-litellm"}'

Or use the local wrapper from infra:

./scripts/holmes-chat "Using Prometheus, how many pods are using more than 500 MiB of RAM right now?"
./scripts/holmes-chat "Using Prometheus, show the top 5 pods by CPU in namespace monitoring over the last 15m."

Prometheus questions

Ask Holmes in plain English and mention Prometheus explicitly:

curl -X POST http://localhost:8080/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"ask":"Using Prometheus, show the top 5 pods by CPU in namespace monitoring over the last 15m.","model":"holmes-litellm"}'

curl -X POST http://localhost:8080/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"ask":"Using Prometheus, check whether Grafana latency or error rate spiked in the last 30m.","model":"holmes-litellm"}'

curl -X POST http://localhost:8080/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"ask":"Using Prometheus, how many pods are using more than 500 MiB of RAM right now? List the namespace, pod name, and memory usage.","model":"holmes-litellm"}'

curl -X POST http://localhost:8080/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"ask":"Using Prometheus, show the top 10 pods by memory usage in namespace monitoring over the last 15m.","model":"holmes-litellm"}'

Telegram prompts

In Telegram, send plain text. Do not prefix with /ask.

Examples:

che como está mi cluster
cuantos pods hay en monitoring
mostrame los pods que usan más de 500 MiB de RAM
revisá si argocd está healthy

Expected toolsets

Kubernetes core
Kubernetes logs
Live metrics
Prometheus stack
Bash (extended allowlist)

MCP follow-up

HolmesGPT can also use MCP servers later (for example GitHub or custom K8s helpers). Not wired yet in this cluster; add it after the base Helm install is stable.

Troubleshooting

Check the Holmes deployment and service:

kubectl get deploy,svc,httproute -n ai | grep holmes

Check pod logs:

kubectl logs -n ai -l app.kubernetes.io/name=holmes --tail=100

Verify LiteLLM access from the pod:

kubectl exec -it -n ai deploy/holmesgpt-holmes -- sh -lc 'curl -sf -H "Authorization: Bearer sk-hermes-internal" http://litellm-proxy.ai.svc.cluster.local:4000/v1/models'

If tool calling fails, pick a different OpenRouter model in LiteLLM.