name: kubernetes-manifest-audit description: Audit Kubernetes manifests, Helm charts, and Kustomize overlays against CIS Kubernetes Benchmark and NSA/CISA hardening — pod security, resources, probes, RBAC, networking, secrets, availability. Static, live, apply, runtime modes. Use when this capability is needed. metadata: author: anthril
Kubernetes Manifest Audit
ultrathink
Output path directive (canonical — overrides in-body references). All file outputs from this skill MUST be written under
.anthril/audits/kubernetes-manifest-audit/. Runmkdir -p .anthril/audits/kubernetes-manifest-auditbefore the firstWritecall. Primary artefact:.anthril/audits/kubernetes-manifest-audit/<artefact>. Do NOT write to the project root or to bare filenames at cwd. Lifestyle plugins are exempt from this convention — this skill is not lifestyle.
When to use
Run this skill when the user mentions:
- Kubernetes audit, k8s security
- CIS Kubernetes Benchmark
- Helm chart review, Kustomize review
- Pod security standards
- NSA/CISA Kubernetes Hardening Guide
Covers nine categories: pod security (runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation, dropped capabilities, no host namespaces), resource requests and limits, liveness/readiness/startup probes, image hygiene (digest pinning, pull policy, scoped imagePullSecrets), secrets and config (no plaintext Secrets in Git, external secret operators), networking (NetworkPolicies, Service types, Ingress TLS), RBAC (per-workload ServiceAccounts, no wildcard verbs), availability (PodDisruptionBudgets, replicas, topology spread, anti-affinity), and Helm hygiene (values.schema.json, sensible defaults).
Before You Start
- Determine operating mode.
--livereads from a real cluster viakubectl, runskube-benchandkube-hunterif installed.--applyproduces YAML patches orkubectl patchcommands (cluster changes require an explicit second confirmation).--runtimeruns a scoped chaos experiment against non-prod (chaos-mesh or a simple pod-kill) if configured. - Enumerate manifest groups. Run
scripts/list-manifests.sh. - Sub-agent budget. One agent per chart / Kustomize overlay / manifest directory. Warn above 10.
- Load
.k8s-ignorefor suppressions. - Production-name guard. In
--applyor--runtime, refuse targets whose namespace or context containsprod/productionwithout--i-really-mean-prod.
User Context
$ARGUMENTS
Manifest inventory: !bash "${CLAUDE_PLUGIN_ROOT}/skills/kubernetes-manifest-audit/scripts/list-manifests.sh"
Live-mode tools: !which kubectl 2>/dev/null || echo "kubectl:unavailable" · !which helm 2>/dev/null || echo "helm:unavailable" · !which kube-bench 2>/dev/null || echo "kube-bench:unavailable"
Audit Phases
Phase 1: Discovery & Mode Selection
- Parse inventory. Group manifests into audit units: one per Helm chart, one per Kustomize overlay, one per directory of raw manifests.
- In
--livemode, verify kubectl context is set and non-prod (or--i-really-mean-prodis present). - Confirm scope with the user; warn if >10 groups.
Phase 2: Per-Group Snapshot
For each group, extract every manifest's kind and relevant fields:
- Deployments, StatefulSets, DaemonSets, Jobs, CronJobs — spec.template.spec (containers, securityContext, resources, probes, volumes), replicas, strategy
- Services, Ingresses — type, ports, TLS
- ConfigMaps, Secrets — data keys (never values), sealing status
- RBAC — ServiceAccounts, Roles/ClusterRoles, Bindings
- NetworkPolicies — selectors, ingress/egress rules
- PDBs, HPAs — target workloads and thresholds
- Helm-specific:
Chart.yaml,values.yaml,values.schema.json,templates/
In --live mode, cross-reference with kubectl get output per namespace.
Phase 3: Parallel Sub-Agent Audit
Spawn one Agent(subagent_type=Explore) per group (single assistant message). Each walks categories A–I from reference.md §1.
- A. Pod security —
runAsNonRoot,readOnlyRootFilesystem,allowPrivilegeEscalation: false, dropped capabilities, nohostNetwork/hostPID/hostIPC, seccomp profile - B. Resources — every container has
requests+limitsfor cpu and memory; QoS tier appropriate - C. Probes — liveness, readiness, startup configured; thresholds sensible
- D. Image hygiene — digest-pinned,
imagePullPolicy: IfNotPresent(notAlwaysin prod),imagePullSecretsscoped - E. Secrets & config — no plaintext Secret YAML in Git (SealedSecrets / External Secrets / SOPS acceptable); ConfigMap not misused for secrets
- F. Networking — NetworkPolicy present for each workload namespace; Service type sensible; Ingress TLS
- G. RBAC — per-workload ServiceAccount; no wildcard
verbs: ["*"]orresources: ["*"] - H. Availability — PDB for critical workloads;
replicas > 1for prod; topology spread or anti-affinity; rolling update surge/unavailable bounds - I. Helm hygiene —
values.schema.json, templated fields have defaults, no hardcoded production values invalues.yaml
Sub-agents may read kubectl get <kind> -o yaml in --live mode but MUST NOT run kubectl apply, kubectl delete, kubectl patch, helm install, or helm upgrade.
Phase 4: Merge & Risk Register
Merge sub-agent output. Cross-reference with kube-bench output if available (attach matching CIS IDs to findings). Assign K8S-001… IDs.
Phase 5: Remediation Drafting
Emit commented YAML to k8s-suggested.yaml. Each block shows the target file:line, the evidence, and the fix.
For --live mode, alternatives as kubectl patch commands are included — but commented out, never executed.
Phase 6: Apply Mode (opt-in)
Interactive [a]pply / [s]kip / [A]ll / [q]uit loop. YAML file edits go through Edit. kubectl patch execution requires the literal word DESTROY confirmation and writes both the patch command and the prior state to apply-log.md.
Phase 7: Runtime Testing (opt-in)
When --runtime and a non-prod cluster is confirmed:
- Identify the target Deployment (user-selected; defaults to the most-replicated non-system one).
- Run a scoped chaos experiment: single pod deletion, confirm rolling recovery within its
progressDeadlineSeconds. - Alternative: run
kubectl drainon one node if--chaos-nodeflag is passed. - Record metrics from
kubectl top podspre/post if metrics-server is available. - Attach results to the report as "Runtime resilience test".
Phase 8: Reporting
Write kubernetes-manifest-audit.md + kubernetes-manifest-audit.json + k8s-suggested.yaml (+ cluster-state.json in --live mode and chaos-run.md in --runtime).
Scoring
Weights: A=20, B=15, C=10, D=10, E=15, F=10, G=10, H=5, I=5 (sum 100). See reference.md §3.
| Total | Verdict |
|---|---|
| 90+ | PASS |
| 70–89 | PASS WITH WARNINGS |
| 50–69 | CONDITIONAL |
| <50 | FAIL |
Important Principles
- Default security is insecure. A Deployment with no
securityContextruns as root with full capabilities. This is always at least HIGH. - No requests = best-effort QoS. The first pod to be evicted under memory pressure. Flag every container missing requests.
- Secrets in plaintext YAML belong outside Git. SealedSecrets / External Secrets Operator / SOPS / cluster-managed Secrets are all acceptable alternatives.
- Ingress without TLS is HIGH severity on prod. Often downgraded to MEDIUM on internal-only ingress, but still flagged.
replicas: 1in prod is MEDIUM-HIGH. A single pod is a single point of failure.- Helm's
values.yamlis often production values. Treat it as a manifest — it deploys real things. - Runtime chaos is non-prod only. Never run a chaos experiment against a cluster whose context/namespace contains
prod/productionwithout--i-really-mean-prod. - Australian English. DD/MM/YYYY. Markdown-first.
Edge Cases
- Pure Helm chart (no rendered manifests in Git). Run
helm templateto render, then audit the rendered output. - Operator-managed CRDs. Audit the CR spec; note that operator semantics may enforce additional rules outside this skill's view.
- GitOps repo (Argo CD / Flux). Audit source manifests; in
--livemode, note the sync state but do not edit. - Cluster-scoped resources (ClusterRoles, ClusterRoleBindings). Weight RBAC findings higher; cluster-wide scope amplifies blast radius.
- Mutating admission webhooks in cluster. Rendered manifests may differ from deployed. In
--livemode, cross-check. - DaemonSets often need host namespaces. CNI plugins, log shippers — flag but allow suppression.
- Jobs and CronJobs — probes don't apply; resource requests still do.
- NetworkPolicy absence — if the CNI doesn't enforce NetworkPolicy, skip F findings for that group (note in report).
Source: anthril/official-claude-plugins — distributed by TomeVault.