cluster-resource-health

star 372

Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

cnoe-io

By cnoe-io schedule Updated 4/15/2026

play_arrow Run Skill in Manus View GitHub

name: cluster-resource-health description: Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

Cluster Resource Health

Query AWS EKS clusters for node health, pod status, resource utilization, and alerts to produce a cluster health dashboard.

Instructions

Phase 1: Cluster Overview (AWS Agent)

List EKS clusters and their status
Check Kubernetes version - current vs. latest, end-of-support date

Phase 2: Node Health

Inspect node conditions - Ready, MemoryPressure, DiskPressure, PIDPressure
Resource utilization per node - CPU, Memory, Pod count

Phase 3: Pod Health

Identify problematic pods - CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending
Namespace-level summary - pods running, pending, failed per namespace

Phase 4: Resource Capacity Analysis

Cluster-wide utilization - total CPU/Memory requested vs. allocatable
Capacity risks - nodes at >80%, namespaces exceeding quotas

Output Format

```markdown

Cluster Resource Health Report

Cluster Summary

Cluster	Version	Nodes	Status	Overall Health
prod-us-west-2	1.29	12/12 Ready	Active	HEALTHY

Resource Utilization

Resource	Requested	Allocatable	Utilization
CPU	38 cores	48 cores	79%
Memory	96 Gi	128 Gi	75%
```

Examples

"Check the health of our EKS clusters"
"Are there any failing pods in production?"
"Show me cluster resource utilization"
"Which nodes are under memory pressure?"

Guidelines

Check all clusters unless a specific cluster is requested
Flag any node above 85% resource utilization as a capacity risk
For CrashLoopBackOff pods, suggest checking logs as the immediate action
EKS version end-of-support should be flagged at least 90 days before EOL
Use kubectl read-only commands only (never modify cluster state during health checks)

Install via CLI

npx skills add https://github.com/cnoe-io/ai-platform-engineering --skill cluster-resource-health

Repository Details

star Stars 372

call_split Forks 65

navigation Branch main

article Path SKILL.md

More from Creator

cnoe-io

cnoe-io Explore all skills →