description: Author, inspect, troubleshoot, and review infrastructure across IaC, Kubernetes, cloud resources, containers, CI/CD, and Linux hosts. Use when changing Terraform/OpenTofu, Kubernetes, Helm, Kustomize, Dockerfiles, GitHub Actions, AWS, GCP, Cloud Run, BigQuery, IAM, logs, instances, or service health. NOT for deploy/apply/rollback workflows (see deploying-infra). NOT for shell scripts or generic command pipelines (see writing-shell). name: operating-infra
Operate Infrastructure
Boundary
- Work from files, plans, logs, and read-only commands before changing anything.
- Do not run apply, delete, destroy, or rollback until identity, exact resources, blast radius, and plan/diff/inventory are shown and the user confirms.
- If the task is deployment, rollout, rollback, or production apply, use
deploying-infra. - If the task is only shell scripts or generic command pipelines, use
writing-shell.
Role behavior
- Write-capable: make minimal file changes and run safe validation. Stop before live mutation unless the user confirmed exact resources.
- Read-only: apply nothing; return proposed file changes, evidence, and validation commands.
Load references
Load every matching reference:
- Terraform/OpenTofu files, modules, state, or plans → terraform.md
- Kubernetes manifests or
kustomization.yaml→ kubernetes.md Chart.yaml, Helm values, or chart templates → helm.md- GitHub workflow YAML → github-actions.md
Dockerfileor container image build/release concerns → dockerfile.md- AWS CLI, EC2, ECS, Lambda, S3, RDS, IAM, or CloudWatch → aws.md
- GCP CLI, GCS, Compute Engine, IAM, quotas, or Cloud Logging → gcp.md
- Cloud Run services, revisions, traffic, or logs → cloud-run.md
- BigQuery queries, tables, datasets, or cost checks → bigquery.md
- Linux services, hosts, processes, disks, or networks → linux.md
Mixed stacks: load all matching references. Unknown stack: use the workflow below only.
Workflow
- Identify scope: files, resources, environment, account/project, region/zone, and owner.
- Verify cloud identity before cloud work; prefer explicit profile/project/region over defaults.
- Inspect current state with read-only evidence: files, plan/diff, list/describe/status, logs, metrics, and recent events.
- For authoring/design: choose the smallest pattern that preserves ownership, state boundaries, and least privilege.
- For troubleshooting: rank likely causes, gather one safe signal, then propose the next step.
- For validation: run relevant gates when tools exist; state skipped gates and why.
- For destructive, costly, or externally visible work: show exact resources and blast radius, then stop for confirmation or hand off to
deploying-infra.
Validation gates
- Terraform/OpenTofu: format, init without backend when possible, validate, plan,
tflint,checkovortrivy config; use plan JSON for policy checks when needed. - Kubernetes/Kustomize: render first, schema-check with
kubeconform, then policy/security-check withkube-linter,kubescape,conftest, orkyverno. - Helm: lint chart, render templates, use
helm diffbefore upgrade planning, validate rendered YAML. - Docker/images: lint Dockerfile with
hadolint; scan images/config withtrivy; usesyft,grype, andcosignwhere SBOM, vulnerability, or signature proof matters. - GitHub Actions: run
actionlintandzizmor; require SHA-pinned actions and least-permission jobs. - Cloud CLI: verify identity, inventory resources, estimate cost or dry-run when available, and check IAM/quota before mutation.
Output
INFRA RESULT
============
Scope: <files/resources/environment>
Identity: <account/project/profile/region or not applicable>
Status: DONE | NEEDS CONFIRMATION | BLOCKED
Evidence:
- <file:line, plan/log/status summary, or command result>
Changes or proposal:
- <minimal change or proposed next step>
Validation:
- <gate> — pass/fail/skipped
Next:
- <safe next action, confirmation request, or none>