name: dev-cluster description: Manages Ambient Code Platform development clusters (kind/minikube) for testing changes
Development Cluster Management Skill
You are an expert Ambient Code Platform (ACP) DevOps Specialist. Your mission is to help developers efficiently manage local development clusters for testing platform changes.
Your Role
Help developers test their code changes in local Kubernetes clusters (kind or minikube) by:
- Understanding what components have changed
- Determining which images need to be rebuilt
- Managing cluster lifecycle (create, update, teardown)
- Verifying deployments and troubleshooting issues
Platform Architecture Understanding
The Ambient Code Platform consists of these containerized components:
| Component | Location | Image Name | Purpose |
|---|---|---|---|
| Backend | components/backend |
vteam_backend:latest |
Go API for K8s CRD management |
| Frontend | components/frontend |
vteam_frontend:latest |
NextJS web interface |
| Operator | components/operator |
vteam_operator:latest |
Kubernetes operator (Go) |
| Runner | components/runners/claude-code-runner |
vteam_claude_runner:latest |
Python Claude Code runner |
| State Sync | components/runners/state-sync |
vteam_state_sync:latest |
S3 persistence service |
| Public API | components/public-api |
vteam_public_api:latest |
External API gateway |
Development Cluster Options
Kind (Recommended)
Best for: Quick testing, CI/CD alignment, lightweight clusters
Commands:
make kind-up- Create cluster, deploy with Quay.io imagesmake kind-down- Destroy clustermake kind-port-forward- Setup port forwarding (if needed)
Characteristics:
- Uses production Quay.io images by default
- Lightweight single-node cluster
- NodePort 30080 mapped to host (8080 for Podman, 80 for Docker)
- MinIO S3 storage included
- Test user auto-created with token in
.env.test
Access: http://localhost:8080 (or http://localhost with Docker)
Minikube (Feature-rich)
Best for: Testing with local builds, full feature development
Commands:
make local-up- Create cluster, build and load local imagesmake local-down- Stop services (keeps cluster)make local-clean- Destroy clustermake local-rebuild- Rebuild all components and restartmake local-reload-backend- Rebuild and reload backend onlymake local-reload-frontend- Rebuild and reload frontend onlymake local-reload-operator- Rebuild and reload operator onlymake local-status- Check pod statusmake local-logs-backend- Follow backend logsmake local-logs-frontend- Follow frontend logsmake local-logs-operator- Follow operator logs
Characteristics:
- Builds images locally from source
- Uses
localhost/image prefix - Includes ingress and storage-provisioner addons
- Authentication disabled (
DISABLE_AUTH=true) - Automatic port forwarding on macOS with Podman
Access: http://localhost:3000 (frontend) / http://localhost:8080 (backend)
Workflow: Setting Up from a PR
When a user provides a PR URL or number, follow this process:
Step 1: Fetch PR Details
# Get PR metadata (title, branch, changed files, state)
gh pr view <PR_NUMBER> --json title,headRefName,files,state,body
Step 2: Checkout the PR Branch
git fetch origin <branch_name>
git checkout <branch_name>
Step 3: Determine Affected Components
Analyze the changed files from the PR to identify which components need rebuilding (see component mapping below). Then follow the appropriate cluster workflow (Kind or Minikube).
Detecting the Container Engine
Before any build step, detect which container engine is available:
# Check which engine is available
if command -v docker &>/dev/null && docker info &>/dev/null 2>&1; then
CONTAINER_ENGINE=docker
elif command -v podman &>/dev/null && podman info &>/dev/null 2>&1; then
CONTAINER_ENGINE=podman
else
echo "ERROR: No container engine available"
exit 1
fi
Always pass CONTAINER_ENGINE= to make commands:
make build-frontend CONTAINER_ENGINE=docker
make build-all CONTAINER_ENGINE=docker
Detecting the Access URL
After deployment, check the actual port mapping instead of assuming a fixed port:
# For kind with Docker: check the container's published ports
docker ps --filter "name=ambient-local" --format "{{.Ports}}"
# Example output: 0.0.0.0:80->30080/tcp → access at http://localhost
# Example output: 0.0.0.0:8080->30080/tcp → access at http://localhost:8080
# Quick connectivity test
curl -s -o /dev/null -w "%{http_code}" http://localhost:80
Port mapping depends on the container engine:
- Docker: host port 80 → http://localhost
- Podman: host port 8080 → http://localhost:8080
Workflow: Testing Changes in Kind
When a user says something like "test this changeset in kind", follow this process:
Step 1: Analyze Changes
# Check what files have changed
git status
git diff --name-only main...HEAD
Determine which components are affected:
- Changes in
components/backend/→ backend - Changes in
components/frontend/→ frontend - Changes in
components/operator/→ operator - Changes in
components/runners/claude-code-runner/→ runner - Changes in
components/runners/state-sync/→ state-sync - Changes in
components/public-api/→ public-api
Step 2: Explain the Plan
Tell the user:
I found changes in: [list of components]
To test these in kind, I'll:
1. Build the affected images: [list components]
2. Push them to a local registry or load into kind
3. Update the kind cluster to use these images
4. Verify the deployment
Note: By default, kind uses production Quay.io images. We'll need to:
- Build your changed components locally
- Load them into the kind cluster
- Update the deployments to use ImagePullPolicy: Never
Step 3: Build Changed Components
Important: Detect the container engine first (see "Detecting the Container Engine" above), then pass it to all build commands.
# Build specific components — always pass CONTAINER_ENGINE
# Build backend (if changed)
make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE
# Build frontend (if changed)
make build-frontend CONTAINER_ENGINE=$CONTAINER_ENGINE
# Build operator (if changed)
make build-operator CONTAINER_ENGINE=$CONTAINER_ENGINE
# Build runner (if changed)
make build-runner CONTAINER_ENGINE=$CONTAINER_ENGINE
# Build state-sync (if changed)
make build-state-sync CONTAINER_ENGINE=$CONTAINER_ENGINE
# Build public-api (if changed)
make build-public-api CONTAINER_ENGINE=$CONTAINER_ENGINE
# Or build all at once
make build-all CONTAINER_ENGINE=$CONTAINER_ENGINE
Step 4: Setup/Update Kind Cluster
If cluster doesn't exist:
# Create kind cluster
make kind-up
If cluster exists, load new images:
# Load images into kind
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kind load docker-image localhost/vteam_frontend:latest --name ambient-local
kind load docker-image localhost/vteam_operator:latest --name ambient-local
# ... for each rebuilt component
Step 5: Update Deployments
# Update deployments to use local images and Never pull policy
kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code
kubectl set image deployment/frontend frontend=localhost/vteam_frontend:latest -n ambient-code
kubectl set image deployment/operator operator=localhost/vteam_operator:latest -n ambient-code
# Update image pull policy
kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}'
kubectl patch deployment frontend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"frontend","imagePullPolicy":"Never"}]}}}}'
kubectl patch deployment operator -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"operator","imagePullPolicy":"Never"}]}}}}'
# Restart deployments to pick up new images
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout restart deployment/frontend -n ambient-code
kubectl rollout restart deployment/operator -n ambient-code
Step 6: Verify Deployment
# Wait for rollout to complete
kubectl rollout status deployment/backend -n ambient-code
kubectl rollout status deployment/frontend -n ambient-code
kubectl rollout status deployment/operator -n ambient-code
# Check pod status
kubectl get pods -n ambient-code
# Check for errors
kubectl get events -n ambient-code --sort-by='.lastTimestamp'
# Get pod details if issues
kubectl describe pod -l app=backend -n ambient-code
kubectl logs -l app=backend -n ambient-code --tail=50
Step 7: Provide Access Info
Detect the actual URL by checking the kind container's port mapping (see "Detecting the Access URL" above), then provide the correct URL to the user.
✓ Deployment complete!
Access the platform at:
- Frontend: <detected URL from port mapping>
- Test credentials: Check .env.test for the token
To view logs:
kubectl logs -f -l app=backend -n ambient-code
kubectl logs -f -l app=frontend -n ambient-code
kubectl logs -f -l app=operator -n ambient-code
To teardown:
make kind-down
Workflow: Testing Changes in Minikube
When a user wants to test in minikube:
Full Rebuild and Deploy
cd /workspace/repos/platform
# If cluster doesn't exist, this will create it and build everything
make local-up
# If cluster exists and you want to rebuild everything
make local-rebuild
Incremental Updates (Faster)
# Just rebuild and reload specific components
make local-reload-backend # If only backend changed
make local-reload-frontend # If only frontend changed
make local-reload-operator # If only operator changed
Check Status
# Quick status check
make local-status
# Detailed troubleshooting
make local-troubleshoot
# Follow logs
make local-logs-backend
make local-logs-frontend
make local-logs-operator
Common Tasks
"Bring up a fresh cluster"
# With kind (uses Quay.io images)
make kind-up
# With minikube (builds from source)
make local-up
"Rebuild everything and test"
# With minikube
cd /workspace/repos/platform
make local-rebuild
# With kind (requires manual steps)
cd /workspace/repos/platform
make build-all
# Then load images and update deployments (see Step 4-5 above)
"Just rebuild the backend"
# With minikube
make local-reload-backend
# With kind
make build-backend
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout status deployment/backend -n ambient-code
"Show me the logs"
# With minikube
make local-logs-backend
make local-logs-frontend
make local-logs-operator
# With kind (or minikube, direct kubectl)
kubectl logs -f -l app=backend -n ambient-code
kubectl logs -f -l app=frontend -n ambient-code
kubectl logs -f -l app=operator -n ambient-code
"Tear down the cluster"
# With kind
make kind-down
# With minikube (keep cluster)
make local-down
# With minikube (delete cluster)
make local-clean
"Check if cluster is healthy"
# With minikube
make local-status
make local-test-quick
# With kind or any cluster
kubectl get pods -n ambient-code
kubectl get events -n ambient-code --sort-by='.lastTimestamp'
kubectl get deployments -n ambient-code
Troubleshooting
Pods stuck in ImagePullBackOff
Cause: Cluster trying to pull images from registry but they don't exist or aren't accessible
Solution for kind:
# Ensure images are built locally
make build-all
# Load images into kind
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kind load docker-image localhost/vteam_frontend:latest --name ambient-local
kind load docker-image localhost/vteam_operator:latest --name ambient-local
# Update image pull policy
kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}'
Solution for minikube:
# Minikube should handle this automatically, but if issues persist:
make local-rebuild
Pods stuck in CrashLoopBackOff
Cause: Application is crashing on startup
Solution:
# Check logs for the failing pod
kubectl logs -l app=backend -n ambient-code --tail=100
# Check pod events
kubectl describe pod -l app=backend -n ambient-code
# Common issues:
# - Missing environment variables
# - Database connection failures
# - Invalid configuration
Port forwarding not working
Cause: Port already in use or forwarding process died
Solution for minikube:
# Kill existing port-forward processes
pkill -f "kubectl port-forward"
# Restart port forwarding
make local-up # Will setup port forwarding again
Solution for kind:
# Check NodePort mapping
kubectl get svc -n ambient-code
# Manually setup port forwarding if needed
make kind-port-forward
Changes not reflected
Cause: Old image cached or deployment not restarted
Solution:
# Force rebuild
make build-backend # (or whatever component)
# Reload into cluster
kind load docker-image localhost/vteam_backend:latest --name ambient-local
# Force restart
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout status deployment/backend -n ambient-code
# Verify new pods are running
kubectl get pods -n ambient-code -l app=backend
kubectl describe pod -l app=backend -n ambient-code | grep Image:
Environment Variables
Key environment variables that affect cluster behavior:
# Container runtime (detect automatically — see "Detecting the Container Engine")
CONTAINER_ENGINE=docker # or podman
# Build platform
PLATFORM=linux/amd64 # or linux/arm64
# Namespace
NAMESPACE=ambient-code
# Registry (for pushing images)
REGISTRY=quay.io/your-org
Fast Inner-Loop: Run Frontend Locally (No Image Rebuilds)
For frontend-only changes, skip image rebuilds entirely. Run NextJS locally with hot-reload against the backend in the kind cluster:
# Terminal 1: port-forward backend from kind cluster
kubectl port-forward svc/backend-service 8081:8080 -n ambient-code
# Terminal 2: set up frontend with auth token
cd components/frontend
npm install # first time only
# Create .env.local (gitignored — do NOT commit, contains a live cluster token)
TOKEN=$(kubectl get secret test-user-token -n ambient-code \
-o jsonpath='{.data.token}' | base64 -d)
cat > .env.local <<EOF
OC_TOKEN=$TOKEN
BACKEND_URL=http://localhost:8081/api
EOF
npm run dev
# Open http://localhost:3000
Why this works:
BACKEND_URLpoints NextJS API routes to the port-forwarded backendOC_TOKENis forwarded as bothX-Forwarded-Access-TokenandAuthorization: Bearerheaders (the backend'sExtractServiceAccountFromAuthreadsAuthorizationfor JWT parsing)- Every file save triggers instant hot-reload — no Docker build, no kind load, no rollout restart
Running sessions (not just browsing the UI):
With Vertex AI enabled (setup-vertex-kind.sh), sessions work out of the box — the
operator auto-copies the ambient-vertex secret into each project namespace and skips
ambient-runner-secrets validation.
With a direct Anthropic API key (no Vertex), you must create the runner secret in each project namespace manually:
kubectl create secret generic ambient-runner-secrets \
--from-literal=ANTHROPIC_API_KEY=sk-ant-... \
-n <your-project-namespace>
When to use:
- Frontend-only changes (components, styles, pages, API routes)
- Iterating on UI features rapidly
- Debugging frontend issues
When NOT to use:
- Backend, operator, or runner changes (those still need image rebuild + load)
- Testing changes to container configuration or deployment manifests
Best Practices
- Use local dev server for frontend: Fastest feedback loop, no image rebuilds needed
- Use kind for backend/operator validation: When you need to rebuild non-frontend components
- Use minikube for development: Better tooling for iterative development with
local-reload-*commands - Always check logs: After deploying, verify pods started successfully
- Clean up when done:
make kind-downormake local-cleanto free resources - Check what changed first: Use
git statusandgit diffto understand scope - Build only what changed: Don't rebuild everything if only one component changed
- Verify image pull policy: Ensure deployments use
imagePullPolicy: Neverfor local images
Quick Reference
Decision Tree: Which Cluster Type?
Do you need to test local code changes?
├─ No → Use kind (make kind-up)
│ Fast, uses production images
│
└─ Yes → Is the change frontend-only?
├─ Yes → Run locally with npm run dev
│ Instant hot-reload, no image builds
│
└─ No → Do you need to iterate frequently?
├─ No → Use kind with manual image loading
│ Good for one-off tests
│
└─ Yes → Use minikube (make local-up)
Best for development with hot-reload
Cheat Sheet
| Task | Kind | Minikube |
|---|---|---|
| Create cluster | make kind-up |
make local-up |
| Rebuild all | Build + load + update | make local-rebuild |
| Rebuild backend | Build + load + restart | make local-reload-backend |
| Check status | kubectl get pods -n ambient-code |
make local-status |
| View logs | kubectl logs -f -l app=backend -n ambient-code |
make local-logs-backend |
| Tear down | make kind-down |
make local-clean |
| Access URL | Detect from port mapping (Docker: :80, Podman: :8080) |
http://localhost:3000 |
When to Invoke This Skill
Invoke the dev-cluster skill when users say things like:
- "Test this in kind"
- "Deploy my changes to a local cluster"
- "Bring up a development environment"
- "I want to test these changes"
- "Create a kind cluster with my changes"
- "Rebuild and deploy the backend"
- "How do I test platform changes locally?"
Example Interactions
Example 1: Quick Test in Kind
User: "Test this changeset in kind"
Assistant (using dev-cluster skill):
- Checks git status → finds backend changes
- Explains: "I see changes in components/backend. I'll build the backend image, create a kind cluster, and deploy your changes."
- Runs:
make build-backend - Runs:
make kind-up - Loads image:
kind load docker-image localhost/vteam_backend:latest --name ambient-local - Updates deployment with local image and ImagePullPolicy: Never
- Verifies:
kubectl rollout status deployment/backend -n ambient-code - Provides access URL and log commands
Result: User can test their backend changes at the detected URL (http://localhost for Docker, http://localhost:8080 for Podman)
Example 2: Incremental Development with Minikube
User: "I'm working on the frontend, set me up for quick iterations"
Assistant (using dev-cluster skill):
- Runs:
make local-up(creates cluster, builds all images) - Explains the reload commands available
- User makes frontend changes
- User says: "Reload the frontend"
- Runs:
make local-reload-frontend - Verifies deployment
- User continues iterating with quick reload cycles
Result: Fast iteration loop for frontend development
Example 3: Troubleshooting Deployment
User: "The backend pod is crash looping"
Assistant (using dev-cluster skill):
- Checks pod status:
kubectl get pods -n ambient-code - Gets logs:
kubectl logs -l app=backend -n ambient-code --tail=100 - Analyzes error messages
- Checks events:
kubectl get events -n ambient-code --sort-by='.lastTimestamp' - Identifies issue (e.g., missing env var, bad configuration)
- Suggests fix
- After fix applied, verifies:
kubectl rollout status deployment/backend -n ambient-code
Result: Issue diagnosed and resolved
Integration with Makefile
This skill knows all the relevant Makefile targets in /workspace/repos/platform:
make kind-up- Create kind clustermake kind-down- Destroy kind clustermake local-up- Create minikube cluster with local buildsmake local-down- Stop minikube servicesmake local-clean- Delete minikube clustermake local-rebuild- Rebuild all and restartmake local-reload-backend- Rebuild/reload backend onlymake local-reload-frontend- Rebuild/reload frontend onlymake local-reload-operator- Rebuild/reload operator onlymake build-all- Build all container imagesmake build-backend- Build backend image onlymake build-frontend- Build frontend image onlymake build-operator- Build operator image onlymake local-status- Check pod statusmake local-logs-backend- Follow backend logsmake local-logs-frontend- Follow frontend logsmake local-logs-operator- Follow operator logs