name: openkruise compatibility: opencode completeness: 95 content-types:
- guidance
- examples
- do-dont
- config
description: '"OpenKruise in Extended Kubernetes workload management with advanced
deployment" strategies'
license: MIT
maturity: stable
metadata:
domain: cncf
output-format: manifests
role: reference
scope: infrastructure
triggers: container orchestration, extended, k8s, openkruise, workload, kubernetes
archetypes:
- educational
- strategic anti_triggers:
- brainstorming
- vague ideation
- non-containerized architecture response_profile: verbosity: medium directive_strength: low abstraction_level: strategic version: "1.0.0"
related-skills: cncf-argo, cncf-artifact-hub, cncf-aws-dynamodb, cncf-aws-ec2
OpenKruise in Cloud-Native Engineering
Category: Scheduling & Orchestration
Status: Active
Stars: 2,700
Last Updated: 2026-04-22
Primary Language: Go
Documentation: Extended Kubernetes workload management with advanced deployment strategies
Purpose and Use Cases
OpenKruise is a core component of the cloud-native ecosystem, serving as deployment strategies
What Problem Does It Solve?
OpenKruise addresses the challenge of advanced workload management beyond Kubernetes native controllers. It provides enhanced deployment strategies, advanced lifecycle management, and workload orchestration for complex applications.
When to Use This Project
Use OpenKruise when managing complex deployments with rolling updates, stateful applications, or large-scale cluster operations. Not ideal for simple deployments or when faster rollouts with less downtime, granular control over update order, and advanced pod management.
Key Use Cases
- Advanced Deployment Strategies with CBO
- StatefulSet Enhanced with Partition Management
- Large-Scale Cluster Management
- Pre-Batch Pod Creation for Fast Scaling
- Custom Workload Management with sidecar injection
Architecture Design Patterns
Core Components
- CloneSet Controller: Manages stateless applications with enhanced scaling and updates
- StatefulSet Controller: Enhances native StatefulSet with advanced partitioning and updates
- DaemonSet Controller: Provides advanced daemon management with batch updates
- Advanced Workloads Controller: Manages custom workload types with unique scheduling needs
- SidecarSet Controller: Manages sidecar injection and updates independently
Component Interactions
- Workload → CloneSet: CloneSet manages pod lifecycle with enhanced update strategies
- SidecarSet → Pods: SidecarSet injects sidecars and updates them independently
- BroadcastJob → Nodes: BroadcastJob schedules jobs across all matching nodes
Data Flow Patterns
- Pod Creation: Workload creation → Controller creates pods → Health checks → Ready
- Update Orchestration: New version → Partition update → Select pods → Update sequentially
- Sidecar Injection: Pod created → SidecarSet matches → Sidecar injected → Pod ready
Design Principles
- Backward Compatible: Extends Kubernetes APIs without breaking changes
- Incremental Rollout: Supports canary and phased deployments
- Graceful Handling: Handles failures and rollback automatically
- Resource Efficient: Optimizes pod creation and update sequences
Integration Approaches
Integration with Other CNCF Projects
- Kubernetes: Core platform with extended controllers
- Helm: Installation and upgrade management
- Prometheus: Metrics collection for workloads
- Istio: Service mesh integration for traffic management
API Patterns
- Custom Resources: CloneSet, StatefulSetEnhanced, SidecarSet APIs
- Webhooks: Admission webhooks for validation
- Kubernetes Controllers: Reconciliation loops for desired state
Configuration Patterns
- CRD YAML: Define workload specifications in YAML
- Helm Values: Configure OpenKruise settings
- Admission Webhook Config: Configure webhook behavior
Extension Mechanisms
- Custom Controllers: Build controllers for custom workload types
- Mutation Webhooks: Modify pod specs before creation
- Validation Webhooks: Validate workload configurations
Common Pitfalls and How to Avoid Them
Misconfigurations
- Version Mismatch: Controller and Kubernetes version incompatibility
- How to Avoid: Check compatibility matrix, upgrade together, test in staging
- Resource Conflicts: Multiple controllers managing same pods
- How to Avoid: Use distinct workload types, separate namespaces, label management
Performance Issues
- Performance Overhead: Additional controllers impacting cluster performance
- How to Avoid: Tune controller replicas, optimize requeues, monitor metrics
- StatefulSet Partition: Incorrect partition configuration
- How to Avoid: Understand partition semantics, test upgrade order, document strategy
Operational Challenges
- Migration Complexity: Migrating existing workloads to OpenKruise
- How to Avoid: Gradual migration, test one workload, document changes
- Cluster Scaling: Controller not scaling with cluster
- How to Avoid: Vertical pod autoscaler, monitor resource usage, scale replicas
Security Pitfalls
- Security Context: Insufficient RBAC for controllers
- How to Avoid: Define least-privilege RBAC, audit permissions, test in namespace
Coding Practices
Idiomatic Configuration
- Declarative Workloads: Define workload specs in YAML manifests
- Update Policies: Configure update strategies and batch sizes
- Health Checks: Define proper readiness and liveness probes
API Usage Patterns
- kubectl apply: Apply custom workload definitions
- kubectl describe: Inspect workload status and update history
- kubectl logs: Check controller logs for issues
Observability Best Practices
- Controller Metrics: Monitor reconciliation duration and error rates
- Pod Status Metrics: Track pod health and update status
- Webhook Performance: Monitor webhook latency and success rates
Testing Strategies
- Unit Tests: Test controller reconciliation logic
- Integration Tests: Test workload updates in Kubernetes
- Stress Tests: Validate performance under load
Development Workflow
- Local Development: Use kind for local testing
- Debug Commands: Use kubectl describe and logs
- Test Environment: Set up dedicated test cluster
- CI/CD Integration: Automate testing with GitHub Actions
- Monitoring Setup: Configure Prometheus and Grafana
- Documentation: Maintain comprehensive docs
Fundamentals
Essential Concepts
- CloneSet: Controller for stateless workload management
- StatefulSetEnhanced: Enhanced StatefulSet with advanced features
- SidecarSet: Sidecar injection and management
- BroadcastJob: Job that runs on all matching nodes
- AdvancedDeployment: Enhanced deployment with advanced strategies
- InjectPod: Pod injection mechanism
- Reconciler: Controller reconciliation loop
- UpdateStrategy: Configuration for update behavior
- Partition: Control update batch size
- ReadySeconds: Wait period before considering pod healthy
Terminology Glossary
- MaxUnavailable: Maximum pods unavailable during update
- MaxSurge: Maximum pods above desired count
- UpdateOrder: Order of pod updates (NewFirst/OldFirst)
- Partition: Number of pods to skip during update
- InPlaceUpdate: In-place pod updates without recreation
Data Models and Types
- CloneSetSpec: Desired state for CloneSet
- UpdateStrategy: Update configuration
- SidecarSetSpec: Sidecar configuration
- BroadcastJobSpec: Batch job configuration
Lifecycle Management
- Workload Creation: CR created → Controller reconciles → Pods created → Ready
- Update Initiation: Spec changed → Controller detects → Update starts
- Batch Update: Select batch → Update pods → Health checks → Next batch
- Pod Recreation: Old pod deleted → New pod created → Health checks
- Update Completion: All pods updated → Update complete → Status updated
State Management
- Update Phase: Pending, Updating, Completed
- Pod Status: Ready, Updating, Failed
- Controller Phase: Reconciling, Waiting, Completed
- Partition Status: Current partition index
Scaling and Deployment Patterns
Horizontal Scaling
- Horizontal Pod Scaling: Scale pods based on load
- Controller Scaling: Scale controller replicas
- Cluster Scaling: Add nodes for more capacity
High Availability
- Controller HA: Multiple controller replicas
- Pod Spread: Spread pods across nodes
- Graceful Degradation: Continue serving during updates
- Update Rollback: Rollback to previous version if issues
Production Deployments
- Installation: Deploy using official Helm chart
- RBAC Setup: Configure appropriate cluster roles
- Webhook Configuration: Set up admission webhooks
- Monitoring Setup: Configure Prometheus metrics
- Security Hardening: Enable network policies and PodSecurityPolicies
- Backup Strategy: Backup CRD definitions
- Resource Quotas: Set namespace limits
- Logging Setup: Configure centralized logging
Upgrade Strategies
- Chart Upgrade: Upgrade Helm chart to new version
- CRD Migration: Update CRD definitions
- Controller Restart: Rolling restart of controllers
- Test Workloads: Verify existing workloads function
Resource Management
- CPU Resources: Set controller CPU requests
- Memory Resources: Configure memory limits
- Storage Resources: Configure etcd storage
- Network Resources: Configure webhook network policies
Additional Resources
- Official Documentation: https://openkruise.io/docs/
- GitHub Repository: Check the project's official documentation for repository link
- CNCF Project Page: cncf.io/projects/cncf-openkruise/
- Community: Check the official documentation for community channels
- Versioning: Refer to project's release notes for version-specific features
Troubleshooting
Common Issues
Deployment Failures
- Check pod logs for errors
- Verify configuration values
- Ensure network connectivity
Performance Issues
- Monitor resource usage
- Adjust resource limits
- Check for bottlenecks
Configuration Errors
- Validate YAML syntax
- Check required fields
- Verify environment-specific settings
Integration Problems
- Verify API compatibility
- Check dependency versions
- Review integration documentation
Getting Help
- Check official documentation
- Search GitHub issues
- Join community channels
- Review logs and metrics Content generated automatically. Verify against official documentation before production use.
Examples
Basic Configuration
# Basic configuration example
apiVersion: v1
kind: ConfigMap
metadata:
name: {{project_name}}-config
namespace: default
data:
# Configuration goes here
config.yaml: |
# Base configuration
# Add your settings here
Kubernetes Deployment
# Kubernetes deployment for {{project_name}}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{project_name}}
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: {{project_name}}
template:
metadata:
labels:
app: {{project_name}}
spec:
containers:
- name: {{project_name}}
image: {{project_name}}:latest
ports:
- containerPort: 8080
resources:
limits:
memory: "128Mi"
cpu: "500m"
Kubernetes Service
# Kubernetes service for {{project_name}}
apiVersion: v1
kind: Service
metadata:
name: {{project_name}}
namespace: default
spec:
selector:
app: {{project_name}}
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: ClusterIP
When to Use
Use this skill when:
- Integrating a CNCF project into Kubernetes infrastructure — You need to configure, deploy, or troubleshoot a cloud-native tool within a cluster
- Designing cloud-native architecture — You are selecting and integrating CNCF tools to solve specific infrastructure challenges
- Resolving operational issues — A CNCF component is misbehaving, underperforming, or needs configuration changes
Core Workflow
Assess Requirements — Understand the use case, scale, integration needs, and existing infrastructure. Checkpoint: Document requirements, constraints, and success criteria.
Design Architecture — Plan component interactions, data flow, and deployment strategy using cloud-native best practices. Checkpoint: Verify the architecture addresses all requirements and follows CNCF conventions.
Implement & Configure — Create manifests, configurations, and deployment scripts. Include resource limits, health checks, and observability hooks. Checkpoint: Validate all YAML against schema and test in a staging environment.
Deploy & Monitor — Apply manifests to the cluster, verify component health, and confirm observability is working. Checkpoint: Confirm all pods/services are running, probes passing, and metrics/alerts configured.
Constraints
MUST DO
- Include at least one complete working YAML manifest example
- Note when content is auto-generated vs. manually verified
- Reference relevant CNCF project documentation
MUST NOT DO
- Deploy manifests without testing in a staging environment first
- Use deprecated API versions (e.g., apps/v1beta1)
- Omit resource limits and requests in Kubernetes manifests