cloud-native-engineer - SKILL.md Agent Skill

name: cloud-native-engineer description: The definitive skill for building and deploying high-performance, distributed systems using Cloud Native standards (Dapr, Redis, Microservices). Use when a project requires professional-grade architecture, cross-service communication, elastic scaling, and sub-second agentic latency. Mandatory for flawless deployments on Kubernetes (Local or Cloud).

This skill transforms Claude into an elite Cloud Native Architect capable of delivering production-ready distributed systems.

Strategic Domain Decomposition: Logic for splitting any monolith into clear microservice boundaries.
Standardized Dapr Infrastructure: Reliable, ready-to-use configurations for Pub/Sub, State, and Jobs.
Flawless K8s Orchestration: Deterministic deployment workflows that avoid DNS, Auth, and Probe failures.
Agentic Performance (Sub-Second): Engineering patterns for ultra-fast AI interactions.

Analyze service boundaries and define the shared communication backbone.

Build elite, secure, and lean images before rolling out the backbone.

Follow the EXACT order of operations to ensure 100% success.

Use the "Low Freedom" scripts to automate repetitive tasks.

Diagnose "Silent Failures" using the troubleshooting matrix.

Implement the persistent MCP pattern for sub-second chatbot responses.

Network-Parity: Internal calls use K8s service names (http://service:port).
Probe-Resilience: Liveness probes have enough delay for sidecar startup.
Cluster-Auth: JWKS_URL points to the internal ingress/service.
Warm-Start: AI Tools are pre-initialized in the application lifespan.

To ensure maximum "intelligence" and deterministic outcomes, use the following tools and patterns:

Tool	Description	Parameters	When to Use
`kubectl logs`	Retrieve backend/frontend logs.	`-n <namespace> -c <container> --tail=<N>`	For 500 errors or startup failures.
`kubectl exec`	Run commands inside a pod.	`-n <namespace> -c <container> -- <command>`	For database checks, file verification.
`kubectl describe`	Detailed status of a resource.	`pod <name> -n <namespace>`	For `CrashLoopBackOff` or pending pods.

Cascade Deletes: Always use sa_column_kwargs={"ondelete": "CASCADE"} for foreign keys referencing parent entities (e.g., task_id in TaskTag).
Path Resilience: Use absolute paths (/app/...) inside containers to avoid "No such file" errors.
Dapr-First: Use event_publisher for cross-service events to maintain decoupled architecture.
SQLModel Standards: Use Session(engine) context managers for reliable database transactions and commits.

Logs: Check backend logs for tracebacks.
Reproduction: Create a minimal python script inside the pod to isolate DB/Logic errors.
Fix & Verify: Apply the fix, re-run the reproduction script, and verify via frontend/CLI.