name: devops-pulumi description: > Pulumi Infrastructure as Code using real programming languages (TypeScript, Python, Go, C#). Covers: Pulumi CLI, stack management, state backends, AWS/Azure/GCP providers, Kubernetes provider, component resources, Automation API, secrets, policy as code, migration from Terraform. Do NOT use for: Terraform, CloudFormation, or other IaC tools not using Pulumi. version: "1.0.0" author: "j4flmao" license: "MIT" compatibility: claude-code: true cursor: true codex: true windsurf: true tags: [devops, pulumi, iac, infrastructure, phase-5]
Pulumi IaC
Purpose
Provision, manage, and version cloud infrastructure using Pulumi's programming language approach, enabling real code constructs (loops, conditionals, functions, classes) for infrastructure definition.
Agent Protocol
Trigger
Exact user phrases: "Pulumi", "IaC", "infrastructure as code", "Pulumi stack", "Pulumi AWS", "Pulumi Kubernetes", "Automation API", "component resource", "Pulumi state", "Pulumi migrate".
Input Context
Before activating, verify:
- Target cloud provider (AWS, Azure, GCP, Kubernetes) and region.
- Programming language preference (TypeScript, Python, Go, C#).
- Current state backend (Pulumi Cloud, S3, Azure Blob, GCS).
- Whether migrating from Terraform or creating new infrastructure.
Output Artifact
Writes to index.ts, __main__.py, main.go, Pulumi.yaml, Pulumi.{stack}.yaml, and component resource classes.
Response Format
Code files with Pulumi SDK imports, resource definitions, and stack configurations.
Completion Criteria
This skill is complete when:
- Pulumi project initialized with
pulumi new. - Core infrastructure resources defined with proper typing.
- Stack configurations for all environments (dev, staging, prod).
- State backend configured and validated.
-
pulumi uppreviews without errors.
Max Response Length
Direct file write. No response text.
Quick Start
pulumi new aws-typescript → Define VPC, subnets, security groups as classes → Configure dev/prod stacks via Pulumi.{stack}.yaml → pulumi up → Add component resources for reuse → Wire Automation API for self-service.
Decision Tree: Pulumi vs Terraform
- Starting fresh with IaC → Pulumi (if team knows TypeScript/Python/Go) or Terraform (if team knows HCL)
- Need loops, conditionals, functions → Pulumi (real programming language)
- Existing Terraform codebase → Terraform (migration is costly but possible with tf2pulumi)
- Self-service platform for dev teams → Pulumi Automation API
- Multi-cloud with same abstractions → Pulumi (component resources abstract cloud specifics)
- Policy enforcement on IaC → Both: Pulumi CrossGuard or Terraform Sentinel/OPA
- Kubernetes-native IaC → Pulumi (Kubernetes provider with CRDs, Helm charts native)
Core Workflow
Step 1: Initialize Project
pulumi new aws-typescript --name my-infra --stack dev
Step 2: Define Infrastructure with TypeScript
import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
const config = new pulumi.Config();
const vpcCidr = config.get("vpcCidr") || "10.0.0.0/16";
const environment = config.require("environment");
const vpc = new aws.ec2.Vpc("main", {
cidrBlock: vpcCidr,
enableDnsHostnames: true,
enableDnsSupport: true,
tags: { Name: "main", Environment: environment },
});
const subnets = config.requireObject<string[]>("availabilityZones").map((az, i) =>
new aws.ec2.Subnet(`public-${i}`, {
vpcId: vpc.id,
cidrBlock: `10.0.${i}.0/24`,
availabilityZone: az,
mapPublicIpOnLaunch: true,
tags: { Name: `public-${i}`, Environment: environment },
})
);
Step 3: Stack Configuration
# Pulumi.prod.yaml
config:
aws:region: us-east-1
my-infra:vpcCidr: 10.0.0.0/16
my-infra:instanceType: t3.large
my-infra:environment: production
my-infra:availabilityZones:
- us-east-1a
- us-east-1b
- us-east-1c
my-infra:dbPassword:
secure: AAABAA...encrypted...
Step 4: Component Resources
export interface VpcStackArgs {
cidrBlock: string;
azs: string[];
environment: string;
tags?: Record<string, string>;
}
export class VpcStack extends pulumi.ComponentResource {
public readonly vpc: aws.ec2.Vpc;
public readonly publicSubnets: aws.ec2.Subnet[];
public readonly privateSubnets: aws.ec2.Subnet[];
constructor(name: string, args: VpcStackArgs, opts?: pulumi.ComponentResourceOptions) {
super("my:infra:VpcStack", name, args, opts);
const tags = { ...args.tags, Environment: args.environment };
this.vpc = new aws.ec2.Vpc(`${name}-vpc`, {
cidrBlock: args.cidrBlock,
enableDnsHostnames: true,
enableDnsSupport: true,
tags: { ...tags, Name: `${name}-vpc` },
}, { parent: this });
this.publicSubnets = args.azs.map((az, i) =>
new aws.ec2.Subnet(`${name}-public-${i}`, {
vpcId: this.vpc.id,
cidrBlock: incrementCidr(args.cidrBlock, i),
availabilityZone: az,
mapPublicIpOnLaunch: true,
tags: { ...tags, Name: `${name}-public-${i}`, Type: "public" },
}, { parent: this })
);
this.privateSubnets = args.azs.map((az, i) =>
new aws.ec2.Subnet(`${name}-private-${i}`, {
vpcId: this.vpc.id,
cidrBlock: incrementCidr(args.cidrBlock, i + 100),
availabilityZone: az,
tags: { ...tags, Name: `${name}-private-${i}`, Type: "private" },
}, { parent: this })
);
this.registerOutputs({
vpcId: this.vpc.id,
publicSubnetIds: this.publicSubnets.map(s => s.id),
privateSubnetIds: this.privateSubnets.map(s => s.id),
});
}
}
Step 5: Cross-Cloud Example (Python)
import pulumi
import pulumi_aws as aws
import pulumi_gcp as gcp
import pulumi_azure as azure
config = pulumi.Config()
environment = config.require("cloud")
if environment == "aws":
bucket = aws.s3.Bucket("data-lake",
acl="private",
versioning=aws.s3.BucketVersioningArgs(enabled=True),
server_side_encryption_configuration=aws.s3.BucketServerSideEncryptionConfigurationArgs(
rule=aws.s3.BucketServerSideEncryptionConfigurationRuleArgs(
apply_server_side_encryption_by_default=aws.s3.BucketServerSideEncryptionConfigurationRuleApplyServerSideEncryptionByDefaultArgs(
sse_algorithm="AES256"
)
)
))
elif environment == "gcp":
bucket = gcp.storage.Bucket("data-lake",
location="US",
uniform_bucket_level_access=True,
versioning=gcp.storage.BucketVersioningArgs(enabled=True))
elif environment == "azure":
bucket = azure.storage.StorageAccount("datalake",
resource_group_name="rg-data",
account_tier="Standard",
account_replication_type="GRS")
Step 6: Kubernetes Provider with Helm and CRDs
import * as k8s from "@pulumi/kubernetes";
import * as helm from "@pulumi/kubernetes/helm";
import * as pulumi from "@pulumi/pulumi";
const k8sProvider = new k8s.Provider("k8s", {
kubeconfig: config.require("kubeconfig"),
enableServerSideApply: true,
});
const namespace = new k8s.core.v1.Namespace("app", {}, { provider: k8sProvider });
// Helm chart
const nginx = new helm.v3.Chart("nginx-ingress", {
chart: "ingress-nginx",
version: "4.10.0",
fetchOpts: { repo: "https://kubernetes.github.io/ingress-nginx" },
namespace: namespace.metadata.name,
values: {
controller: {
service: { type: "LoadBalancer" },
resources: {
requests: { cpu: "100m", memory: "256Mi" },
limits: { cpu: "500m", memory: "512Mi" },
},
},
},
}, { provider: k8sProvider });
// CRD custom resource
const certManagerNamespace = new k8s.core.v1.Namespace("cert-manager", {}, { provider: k8sProvider });
const certManager = new k8s.apiextensions.CustomResource("cluster-issuer", {
apiVersion: "cert-manager.io/v1",
kind: "ClusterIssuer",
metadata: { name: "letsencrypt-prod" },
spec: {
acme: {
server: "https://acme-v02.api.letsencrypt.org/directory",
email: "ops@example.com",
privateKeySecretRef: { name: "letsencrypt-prod-key" },
solvers: [{ http01: { ingress: { class: "nginx" } } }],
},
},
}, { provider: k8sProvider });
Step 7: State Backend Comparison
| Backend | Pros | Cons | Best For |
|---|---|---|---|
| Pulumi Cloud | Managed, web UI, RBAC, audit, deployments | Vendor lock-in, cost | Teams, enterprise |
| AWS S3 | Cheap, well-known | No state locking by default (DynamoDB needed) | AWS-native teams |
| Azure Blob | Cheap, Azure-native | No state locking (Lease Blob needed) | Azure-native teams |
| GCS | Cheap, GCP-native | Object versioning for safety | GCP-native teams |
| Local | No infra needed | No sharing, no locking | Personal projects only |
# S3 backend config
pulumi login s3://my-pulumi-state?region=us-east-1
# Azure Blob backend
pulumi login azblob://my-pulumi-state
# GCS backend
pulumi login gs://my-pulumi-state
# Local
pulumi login --local
Step 8: Automation API (Self-Service Platform)
import * as pulumi from "@pulumi/pulumi/automation";
import { LocalWorkspace } from "@pulumi/pulumi/automation";
async function createEnv(envName: string, region: string, vpcCidr: string) {
const projectName = "infra-self-service";
const program = async () => {
const aws = require("@pulumi/aws");
const vpc = new aws.ec2.Vpc("vpc", {
cidrBlock: vpcCidr,
enableDnsHostnames: true,
tags: { Name: envName, Environment: envName },
});
return { vpcId: vpc.id };
};
const stack = await LocalWorkspace.createOrSelectStack({
stackName: envName,
projectName,
program,
});
// Configure stack
await stack.setConfig("aws:region", { value: region });
await workspace.installPlugin("aws", "v6.0.0");
// Deploy
const upResult = await stack.up({ onOutput: console.log });
console.log(`VPC created: ${upResult.outputs.vpcId.value}`);
return upResult;
}
Step 9: Secrets Management
# Set secret
pulumi config set dbPassword "s3cret!" --secret
# Encrypted in Pulumi.{stack}.yaml
# Encryption: Pulumi Cloud managed, or bring your own key (AWS KMS, Azure KeyVault, GCP KMS)
# Refer in code
const dbPassword = config.requireSecret("dbPassword");
# AWS KMS encryption
pulumi stack change-secrets-provider "awskms://arn:aws:kms:us-east-1:123456789012:key/abc123"
Step 10: Migration from Terraform
# 1. Export Terraform state
terraform state pull > terraform.tfstate
# 2. Convert Terraform to Pulumi (tf2pulumi)
tf2pulumi < terraform.tfstate > generated.ts
# 3. Import existing resources
pulumi import aws:ec2/vpc:Vpc main vpc-12345
pulumi import aws:ec2/subnet:Subnet public-0 subnet-abcde
# 4. Write Pulumi program and preview
pulumi preview
Step 11: Policy as Code (CrossGuard)
import { PolicyPack, validateResourceOfType } from "@pulumi/policy";
import * as aws from "@pulumi/aws";
new PolicyPack("aws-best-practices", {
policies: [{
name: "s3-enforce-encryption",
description: "S3 buckets must have encryption enabled",
enforcementLevel: "mandatory",
validateResource: validateResourceOfType(aws.s3.Bucket, (bucket, args, report) => {
if (!bucket.serverSideEncryptionConfiguration) {
report("S3 bucket must have encryption enabled");
}
}),
}, {
name: "tag-required",
description: "All resources must have Environment tag",
enforcementLevel: "mandatory",
validateResource: (args, report) => {
const tags = args.props["tags"] || {};
if (!tags["Environment"]) {
report(`Missing required Environment tag`);
}
},
}, {
name: "ec2-instance-type",
description: "Only allow approved EC2 instance types",
enforcementLevel: "advisory",
validateResource: validateResourceOfType(aws.ec2.Instance, (instance, args, report) => {
const approved = ["t3.medium", "t3.large", "m5.large", "m5.xlarge"];
if (!approved.includes(instance.instanceType)) {
report(`Instance type ${instance.instanceType} not in approved list`);
}
}),
}],
});
Step 12: Pulumi Transforms
import { pulumi } from "@pulumi/pulumi";
// Global transform to add tags to all AWS resources
pulumi.runtime.registerResourceTransform(async (args) => {
if (args.type.startsWith("aws:")) {
args.props["tags"] = {
...args.props["tags"],
managedBy: "pulumi",
project: pulumi.getProject(),
stack: pulumi.getStack(),
};
}
return { props: args.props, opts: args.opts };
});
Step 13: Pulumi YAML (For HCL-Friendly Teams)
name: my-infra
runtime: yaml
resources:
vpc:
type: aws:ec2/vpc:Vpc
properties:
cidrBlock: 10.0.0.0/16
enableDnsHostnames: true
tags:
Name: main
Environment: ${environment}
webSubnet:
type: aws:ec2/subnet:Subnet
properties:
vpcId: ${vpc.id}
cidrBlock: 10.0.1.0/24
availabilityZone: us-east-1a
variables:
environment:
fn::fromBase64: ${pulumi.stack}
Step 14: Deploy
pulumi stack select dev
pulumi preview
pulumi up --yes
# Destroy
pulumi destroy --yes
# Stack operations
pulumi stack ls
pulumi stack init prod
pulumi stack rm dev
Step 15: Stack References (Cross-Stack Dependencies)
const infra = new pulumi.StackReference("acme/infrastructure/prod");
const vpcId = infra.getOutput("vpcId");
const subnetIds = infra.getOutput("publicSubnetIds");
Rules & Constraints
- Never hardcode secrets — use
pulumi config set --secret. - Always use stack references for cross-stack dependencies.
- Never use
pulumi destroywithout reviewing the preview first. - Component resources must always call
registerOutputs. - Use
pulumi.StackReferenceinstead of hardcoding stack outputs. - Always configure S3/Blob/GCS backend for team environments.
- Enable
protecton critical resources (databases, buckets). - Use
ignoreChangessparingly and document why. - Always pin provider versions in
Pulumi.yaml. - Never run
pulumi upin CI/CD without--skip-preview-on-mergeor explicit approval.
Production Considerations
- Use separate stack per environment (dev, staging, prod) with config differences.
- Enable
protect: trueon RDS, S3 buckets, and other critical resources to prevent accidental deletion. - Always review
pulumi previewoutput beforepulumi up. - Use
pulumi policyin CI/CD pipelines to enforce organizational standards. - Store state in a shared backend (S3, Blob, GCS) for team collaboration.
- Use Pulumi Cloud for audit logging and deployment history.
- Register global transforms for consistent tagging across all resources.
- Set
retainOnDelete: trueon databases to prevent accidental data loss. - Use
pulumi up --targetonly for targeted updates in emergency scenarios. - Secrets encryption: prefer cloud KMS (AWS KMS, Azure Key Vault, GCP KMS) for production.
Anti-Patterns
- Storing secrets in stack config files without
--secret— plaintext in version control. - One monolithic stack containing all resources — breaks isolation.
- No stack references — hardcoded resource IDs between stacks.
- Not using component resources — duplicated code across projects.
- Using
pulumi destroywithout reviewing the plan. - Running
pulumi upin CI/CD without preview. - Ignoring
pulumi previewdiffs that show unexpected resource recreation. - Forgetting to call
registerOutputsin component resources. - Using local state backend in team environments — no locking.
- Not pinning provider versions — unexpected provider upgrades break state.
- Using
ignoreChangesto silence drift instead of fixing it upstream.
Troubleshooting
- State conflict:
pulumi stack exportandpulumi stack importfor manual recovery. - Resource pending creation: check cloud provider console, then
pulumi refresh. - Dependency error: verify stack references are correct, check output names.
- Provider auth failure: verify provider credentials, check
aws:regionconfig. - Secrets decryption error: verify secrets provider key is accessible.
- Automation API timeout: increase
pulumi.auto.uptimeout parameter. - CRD removal failure: remove finalizers from CR before
pulumi destroy. pulumi previewshowing unexpected diff: runpulumi refreshfirst.
References
- references/automation-api.md — Pulumi Automation API
- references/aws-resources.md — Pulumi AWS Provider
- references/kubernetes-provider.md — Pulumi Kubernetes Provider
- references/programming-models.md — Pulumi Programming Models
- references/pulumi-advanced.md — Pulumi Advanced Topics
- references/pulumi-fundamentals.md — Pulumi Fundamentals
- references/state-backends.md — Pulumi State Backends
Handoff
After completing this skill:
- Next skill: devops-crossplane — control plane abstractions on top of Pulumi-provisioned infrastructure
- Pass context: Stack output references, component resource names, state backend location
Architecture Decision Trees
Pulumi vs Terraform
| Decision | Pulumi | Terraform |
|---|---|---|
| Language | TypeScript, Python, Go, C#, Java | HCL (DSL) |
| State management | Service or self-managed | Backend (S3, GCS, etc.) |
| Testing | Unit test in general-purpose language | terraform plan + OPA/Sentinel |
| Debugging | IDE debugger, logs | TF_LOG, limited |
| Modularity | npm/PyPI packages | Terraform Registry modules |
| Ecosystem | Smaller, growing | Larger, mature |
| Best for | Teams that prefer code over DSL | Teams already invested in HCL |
State Backend Choice
| Aspect | Pulumi Cloud (Managed) | Self-managed (S3, GCS, Azure Blob) |
|---|---|---|
| Auth | Pulumi tokens, OIDC | Cloud provider IAM |
| History | Full deployment history | Limited to state file |
| Encryption | At-rest + in-transit by default | SSE on storage bucket |
| Collaboration | Built-in stack locking | Bucket-level lease |
| Secrets | Encrypted via Pulumi escrow | Managed by provider |
Implementation Patterns
TypeScript: Multi-stack Infrastructure
import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
const config = new pulumi.Config();
const environment = config.require("environment");
const vpcCidr = config.get("vpcCidr") || "10.0.0.0/16";
const vpc = new aws.ec2.Vpc("main", {
cidrBlock: vpcCidr,
enableDnsHostnames: true,
enableDnsSupport: true,
tags: { Name: `app-${environment}`, Environment: environment },
});
const publicSubnet = new aws.ec2.Subnet("public", {
vpcId: vpc.id,
cidrBlock: pulumi.interpolate`10.0.1.0/24`,
mapPublicIpOnLaunch: true,
availabilityZone: "us-east-1a",
tags: { Name: `public-${environment}` },
});
const cluster = new aws.ecs.Cluster("app", {
name: `app-${environment}`,
tags: { Environment: environment },
});
const service = new aws.ecs.Service("app", {
cluster: cluster.arn,
desiredCount: environment === "production" ? 3 : 1,
launchType: "FARGATE",
networkConfiguration: {
subnets: [publicSubnet.id],
assignPublicIp: true,
},
});
export const vpcId = vpc.id;
export const clusterArn = cluster.arn;
export const serviceName = service.name;
Python: Component Resource with Automation API
from pulumi import ComponentResource, ResourceOptions, Output
import pulumi_aws as aws
import pulumi_random as random
class Database(ComponentResource):
def __init__(self, name: str, engine: str, version: str, opts: ResourceOptions = None):
super().__init__("acme:platform:Database", name, None, opts)
self.password = random.RandomPassword(
f"{name}-password",
length=24,
special=True,
opts=ResourceOptions(parent=self),
)
self.instance = aws.rds.Instance(
f"{name}-instance",
engine=engine,
engine_version=version,
instance_class="db.t3.medium",
allocated_storage=100,
username="admin",
password=self.password.result,
skip_final_snapshot=True,
opts=ResourceOptions(parent=self),
)
self.register_outputs({
"endpoint": self.instance.endpoint,
"port": self.instance.port,
})
@property
def endpoint(self) -> Output[str]:
return self.instance.endpoint
def main():
db = Database("app-db", "postgres", "15")
print(f"Database endpoint: {db.endpoint}")
if __name__ == "__main__":
main()
Production Considerations (Pulumi-specific)
- Use stack references to share outputs between stacks (
new pulumi.StackReference("infra/prod")) - Enable auto-merge for Pulumi Deployments (Pulumi Cloud) with policy pack enforcement
- Run
pulumi previewin CI and post the diff as a PR comment for every infrastructure change - Set retention policies on stack history — keep last 100 deployments, delete older stale stacks
- Use deployment settings in Pulumi Cloud for consistent
pulumi upexecution across environments - Configure webhooks from Pulumi Cloud to Slack/Teams for deployment notifications
- Pin provider versions in
Pulumi.yaml— never uselatestfor production stacks
Observer Pattern for Event Handling
`
interface EventObserver
class EventBus
Configuration-Driven Approach
config: defaults: timeout: 30s retryCount: 3 overrides: production: timeout: 60s retryCount: 5 development: timeout: 300s retryCount: 1
Production Considerations
Deployment Checklist
- Configuration validated against schema before startup
- Health check endpoints registered and monitored
- Graceful shutdown with draining period (30s timeout)
- Resource limits configured (CPU, memory, file descriptors)
- Log level set appropriate for environment
- Metrics endpoint secured and exposed
- Rate limiting configured per-tier
- TLS certificates valid and auto-renewing
- Database migrations run as separate deployment step
- Feature flags ready for gradual rollout
Monitoring and Alerting
| Metric | Threshold | Severity | Action |
|---|---|---|---|
| Error rate | > 1% over 5min | Critical | Page on-call |
| p99 latency | > 2s over 5min | Warning | Investigate |
| Throughput drop | > 50% over 1min | Critical | Check upstream |
| Queue depth | > 1000 over 1min | Warning | Scale consumers |
| Disk usage | > 85% | Warning | Clean or expand |
| Memory usage | > 90% heap | Critical | Restart or scale |
Anti-Patterns
| Anti-Pattern | Symptom | Root Cause | Solution |
|---|---|---|---|
| Premature optimization | Complex code for no measured benefit | Guessing instead of profiling | Measure first, optimize based on data |
| Copy-paste reuse | Duplicate code across codebase | Lack of abstraction | Extract shared logic into libraries |
| Gold-plating | Features with no current requirement | Over-engineering | YAGNI — build what's needed now |
| Magical thinking | Assumptions without validation | Skipping error handling | Handle all failure modes explicitly |
Performance Optimization
Caching Strategy
Cache hierarchy: L1 (in-memory local) → L2 (distributed Redis/Memcached) → L3 (CDN/Edge). Cache invalidation: TTL-based (simple, stale), event-based (complex, fresh), write-through (consistent, higher write latency), write-behind (fast writes, eventual consistency).
Resource Pooling
- Database connections: Pool of reusable connections (HikariCP, pgBouncer)
- HTTP connections: Keep-alive + connection pooling for external calls
- Thread pool: Bounded thread pools for async task execution
Profiling Methodology
- Establish baseline with production traffic profile
- Profile CPU with sampling profiler (pprof, perf, async-profiler)
- Profile memory with heap dumps and allocation tracking
- Profile I/O with strace/perf trace for syscall analysis
- Profile latency with distributed tracing (OpenTelemetry)
- Identify bottleneck, formulate hypothesis, implement fix
- Re-profile to verify improvement, repeat
Security Considerations
Threat Modeling (STRIDE)
- Spoofing: Identity validation, authentication
- Tampering: Integrity checks, digital signatures
- Repudiation: Audit logs, non-repudiation
- Information disclosure: Encryption, access control
- Denial of service: Rate limiting, resource quotas
- Elevation of privilege: Principle of least privilege
Supply Chain Security
- Dependency scanning: Snyk, Dependabot, Trivy
- SBOM generation: CycloneDX or SPDX format
- Signed commits: GPG or SSH commit signing
- Artifact verification: Checksum validation, signature verification
Secrets Management
- Secrets never in code — always in secrets manager (Vault, AWS Secrets Manager)
- Rotation policy: Rotate database credentials every 90 days
- Access audit: Log every secrets access, alert on anomalies
- Encryption at rest and in transit for all secrets
- Principle of least privilege: each service gets only its own secrets
Rules
- Default-deny security posture — allow only explicitly required access.
- All inputs validated, all outputs encoded, all errors handled.
- Defend in depth — multiple layers of security controls.
- Fail securely — errors default to safe behavior.
- Log security-relevant events for audit and investigation.
- Keep dependencies updated — automate vulnerability scanning.
- Design for observability from day one, not as an afterthought.
- Document all architectural decisions with rationale.
- Review code for security, performance, and correctness before merging.