name: remediate-container-escape-k8s
description: >-
Contain a Kubernetes container-escape signal by planning, applying, or
re-verifying a namespace-scoped deny-all NetworkPolicy for the targeted pod
or workload selector. Consumes an OCSF 1.8 Detection Finding (class 2004)
emitted by detect-container-escape-k8s and resolves the live selector from
the Kubernetes API before emitting a native remediation plan or action
record. Every action is dry-run by default, deny-listed for protected
namespaces, gated behind an incident ID plus approver plus an explicit
cluster allow-list for --apply, and dual-audited (DynamoDB +
KMS-encrypted S3). The low-risk default remains
reversible quarantine; explicit destructive follow-ups are also supported
via --approve-pod-kill and --approve-node-drain, with the node-drain
path requiring a second approver. Use when the user mentions "quarantine a
suspicious Kubernetes pod," "contain container escape in Kubernetes,"
"apply deny-all NetworkPolicy after escape finding," "re-verify K8s
quarantine policy," "kill the compromised pod," "drain the affected node,"
or "collect K8s container-escape forensics."
purpose: Contain a Kubernetes container-escape signal by planning, applying, or re-verifying a namespace-scoped deny-all NetworkPolicy for the targeted pod or workload selector.
capability: write-cloud
persistence: cloud_state
telemetry: stderr_jsonl
privilege_escalation: read_write
license: Apache-2.0
approval_model: human_required
execution_modes: jit, ci, mcp, persistent
side_effects: writes-cloud, writes-storage, writes-audit
input_formats: ocsf, native
output_formats: native
concurrency_safety: operator_coordinated
network_egress: kubernetes.default.svc, s3.amazonaws.com, dynamodb.amazonaws.com
caller_roles: security_engineer, incident_responder, platform_engineer
approver_roles: security_lead, incident_commander, platform_owner
min_approvers: 1
compatibility: >-
Requires Python 3.11+, kubernetes, and boto3. Dry-run and re-verify still
require read access to pods/workloads and NetworkPolicies in the target
namespace so the selector can be resolved and checked. Apply requires create
or replace permission for networking.k8s.io/v1 NetworkPolicies plus audit
write access to DynamoDB, S3, and KMS.
metadata:
homepage: https://github.com/msaad00/cloud-ai-security-skills
source: https://github.com/msaad00/cloud-ai-security-skills/tree/main/skills/remediation/remediate-container-escape-k8s
version: 0.1.0
frameworks:
- MITRE ATT&CK v14
- NIST CSF 2.0
- SOC 2
cloud:
- kubernetes
remediate-container-escape-k8s
What this closes
Pair skill for detect-container-escape-k8s.
This is the first Kubernetes detect → act → audit → re-verify loop in the
repo. A container-escape finding flows in from stdin or a file; this skill
resolves the live pod or workload selector from the cluster; dry-run prints the
exact deny-all NetworkPolicy manifest that would be applied; --apply writes
the policy after an out-of-band approval gate; --reverify proves the
quarantine policy is still present and still shaped as expected.
Attack pattern it responds to
detect-container-escape-k8s emits findings for:
- risky spec patches that enable
privileged,hostPID,hostNetwork, or high-risk Linux capabilities (T1611) - risky
hostPathinjections to host-sensitive paths like/procor/var/lib/containerd(T1611) - ephemeral container creation through
pods/ephemeralcontainersorkubectl debug(T1610)
The least-destructive first response is to isolate the affected workload from
the network while preserving the pod for human investigation. This skill does
that with a namespace-scoped deny-all NetworkPolicy matched to the target
pod's labels or the workload selector. Once quarantine lands, the same skill
can also build a deterministic forensic bundle from host-mounted /proc,
runtime logs, and optional CSI VolumeSnapshot references.
Inputs
Reads one or more OCSF 1.8 Detection Finding (class 2004) or repo-native
detection_finding records from stdin or a file argument. Only findings whose
metadata.product.feature.name is detect-container-escape-k8s are processed.
From each finding, the skill extracts:
observables[name=namespace]observables[name=resource.type]observables[name=resource.name]observables[name=pod.name]when presenttargetas a fallback encoded path likedeployments/payments/apiorpods/payments/api-7d9b/ephemeralcontainers
The skill then resolves the live selector from the cluster:
- pod finding → current pod labels
- workload finding → workload selector or pod-template labels
Dry-run therefore still performs read-only Kubernetes API calls.
Guardrails (enforced in code, not just documented)
1. Source-skill lock
Only detect-container-escape-k8s findings are accepted. Any other producer is
skipped with a stderr warning. This blocks prompt-injection or operator error
from feeding an unrelated finding into a Kubernetes write path.
2. Protected namespace deny-list
The skill refuses to plan or apply quarantine for these protected namespaces:
kube-systemkube-publicistio-system- anything starting with
linkerdorlinkerd-
Dry-run returns would-violate-deny-list; apply returns
skipped_deny_list. The protected-namespace check fires before any policy
write or audit write.
3. Dry-run is the default
Without --apply, the skill emits a native remediation_plan record
containing the exact NetworkPolicy manifest, the resolved selector, and the
mutating Kubernetes endpoint it WOULD call. Zero cluster writes occur in this
mode.
4. --apply requires an incident gate and explicit cluster boundary
--apply is refused unless both env vars are set before any write:
K8S_CONTAINER_ESCAPE_INCIDENT_IDK8S_CONTAINER_ESCAPE_APPROVERK8S_CLUSTER_NAMEK8S_CONTAINER_ESCAPE_ALLOWED_CLUSTERS
The gate sits outside the agent loop. An alert or agent suggestion alone is not sufficient to mutate cluster state.
The active cluster name must be listed explicitly in
K8S_CONTAINER_ESCAPE_ALLOWED_CLUSTERS before the skill will quarantine,
delete a pod, or drain a node. This keeps the handler from acting against
whichever kube context ambient credentials happen to resolve to.
Destructive follow-up paths tighten that bar further:
--approve-pod-killstill requires the incident + approver pair--approve-node-drainrequires the same pair plusK8S_CONTAINER_ESCAPE_SECOND_APPROVER, and the second approver must differ from the primary approver
5. Dual audit before and after the write
For the quarantine step the skill writes:
- a KMS-encrypted S3 evidence object under
s3://<bucket>/container-escape/audit/<yyyy>/<mm>/<dd>/<namespace>/<resource>/<timestamp>-apply_quarantine_network_policy.json - a DynamoDB audit row carrying the target, incident, approver, status, and evidence URI
The first audit write lands with status: in_progress BEFORE the Kubernetes
write. A second audit row lands with status: success or status: failure
after the API call returns.
6. --reverify proves the expected post-response state still holds
--reverify is read-only and follows the same action mode you selected:
default quarantine path: fetches the expected
NetworkPolicyby the deterministic policy name and checks that:the policy still exists
podSelector.matchLabelsstill matches the resolved selectorpolicyTypesstill contains bothIngressandEgressboth
ingressandegressare empty arrays--approve-pod-kill --reverify: proves the target pod is still absent--approve-node-drain --reverify: proves the node remains cordoned and the target pod is still absent
The emitted record is remediation_verification with status: verified or
status: drift.
Output contract
Dry-run emits a native remediation_plan; apply emits
remediation_action; re-verify emits remediation_verification.
{
"schema_mode": "native",
"canonical_schema_version": "2026-04",
"record_type": "remediation_plan",
"source_skill": "remediate-container-escape-k8s",
"target": {
"provider": "Kubernetes",
"namespace": "payments",
"resource_type": "deployments",
"resource_name": "api",
"pod_name": ""
},
"policy_name": "ce-quarantine-api-7bacbd6e11",
"selector": {
"app": "api"
},
"manifest": {
"apiVersion": "networking.k8s.io/v1",
"kind": "NetworkPolicy"
},
"actions": [
{
"step": "apply_quarantine_network_policy",
"endpoint": "UPSERT /apis/networking.k8s.io/v1/namespaces/payments/networkpolicies/ce-quarantine-api-7bacbd6e11",
"status": "planned",
"detail": "dry-run: would apply quarantine NetworkPolicy"
}
],
"status": "planned",
"dry_run": true,
"finding_uid": "det-k8s-r2-hostpath-injection-9c5a6fa5-7bacbd6e"
}
Apply records add incident_id, approver, and audit metadata. Re-verify
records replace actions with a single endpoint field and a
status: verified or status: drift.
Usage
# Dry-run (default) — resolves selector and prints the exact quarantine manifest
cat finding.ocsf.jsonl | python src/handler.py
# Apply quarantine — requires incident gate and audit destinations
export K8S_CONTAINER_ESCAPE_INCIDENT_ID=inc-2026-04-19-001
export K8S_CONTAINER_ESCAPE_APPROVER=alice@example.com
export K8S_CLUSTER_NAME=prod-eks-us-east-1
export K8S_CONTAINER_ESCAPE_ALLOWED_CLUSTERS=prod-eks-us-east-1
export K8S_REMEDIATION_AUDIT_DYNAMODB_TABLE=k8s-remediation-audit
export K8S_REMEDIATION_AUDIT_BUCKET=sec-k8s-remediation
export KMS_KEY_ARN=arn:aws:kms:us-east-1:123456789012:key/...
cat finding.ocsf.jsonl | python src/handler.py --apply
# Re-verify — read-only check that the quarantine policy still exists
cat finding.ocsf.jsonl | python src/handler.py --reverify
# Explicit destructive pod delete
cat finding.ocsf.jsonl | python src/handler.py --approve-pod-kill
cat finding.ocsf.jsonl | python src/handler.py --apply --approve-pod-kill
# Explicit destructive node drain — requires a second approver
export K8S_CONTAINER_ESCAPE_SECOND_APPROVER=bob@example.com
cat finding.ocsf.jsonl | python src/handler.py --approve-node-drain
cat finding.ocsf.jsonl | python src/handler.py --apply --approve-node-drain
Forensic evidence mode
src/forensic_collector.py is the post-quarantine evidence path. It runs in a
controlled follow-up worker or sidecar with read-only host mounts:
/host/procfor PID discovery and/proc/<pid>capture/host/var/logfor container runtime logs- optional CSI
VolumeSnapshotcreation for PVC-backed pod volumes
Dry-run is still the default. Without --upload, the collector emits a native
remediation_plan record describing the exact bundle contents it WOULD write.
With --upload, it writes a deterministic tar.gz bundle to the same
KMS-encrypted audit bucket under
container-escape/audit/<incident-id>/<timestamp>-<namespace>-<target>-forensics.tar.gz.
# Dry-run forensic plan
cat finding.ocsf.jsonl | python src/forensic_collector.py \
--proc-root /host/proc \
--log-root /host/var/log
# Upload bundle + create VolumeSnapshot refs
export K8S_CONTAINER_ESCAPE_INCIDENT_ID=inc-2026-04-19-001
export K8S_CONTAINER_ESCAPE_APPROVER=alice@example.com
export K8S_REMEDIATION_AUDIT_BUCKET=sec-k8s-remediation
export KMS_KEY_ARN=arn:aws:kms:us-east-1:123456789012:key/...
cat finding.ocsf.jsonl | python src/forensic_collector.py \
--upload \
--snapshot-volumes \
--snapshot-class csi-snapshots
Use when
- you need a reversible first-response containment for a suspicious Kubernetes
workload after
detect-container-escape-k8s - you want a deny-all
NetworkPolicymatched to the live selector, not a raw pod name - you need an auditable quarantine step that can later be re-verified for drift
- you need a reproducible forensic bundle after quarantine, without killing the target pod first
Do NOT use
- against protected namespaces like
kube-systemoristio-system - as a generic "pause traffic" control for planned maintenance
- without setting
K8S_CONTAINER_ESCAPE_INCIDENT_IDandK8S_CONTAINER_ESCAPE_APPROVERunder--apply - without setting
K8S_CLUSTER_NAMEandK8S_CONTAINER_ESCAPE_ALLOWED_CLUSTERSunder--apply - without setting
K8S_CONTAINER_ESCAPE_SECOND_APPROVERfor--approve-node-drain --apply
Closed-loop verification
The remediation loop is:
ingest-k8s-audit-ocsfdetect-container-escape-k8sremediate-container-escape-k8s --applyremediate-container-escape-k8s --reverify
If the next re-verify run shows the expected quarantine, pod-delete, or
node-drain state missing or drifted, the skill emits status: drift and the
shared verifier contract produces the paired OCSF drift finding.
Tests
- accepted-producer enforcement
- protected-namespace deny-list in dry-run and apply modes
- dry-run emits a plan with the resolved selector and deny-all manifest
--applygate requires incident ID and approver--approve-node-drainrequires a distinct second approver- audit write lands before the Kubernetes mutating call
--reverifydistinguishes verified from drifted policy state- destructive pod-kill and node-drain modes re-verify their own post-action state
- end-to-end dry-run from the frozen container-escape findings golden
- forensic collector builds deterministic bundles from
/proc+ runtime logs - forensic collector can plan or create
VolumeSnapshotrefs and upload a KMS-encrypted bundle to S3