openkruise

name: openkruise compatibility: opencode completeness: 95 content-types:

guidance
examples
do-dont
config description: '"OpenKruise in Extended Kubernetes workload management with advanced deployment" strategies' license: MIT maturity: stable metadata: domain: cncf output-format: manifests role: reference scope: infrastructure triggers: container orchestration, extended, k8s, openkruise, workload, kubernetes archetypes:
- educational
- strategic anti_triggers:
- brainstorming
- vague ideation
- non-containerized architecture response_profile: verbosity: medium directive_strength: low abstraction_level: strategic version: "1.0.0"

related-skills: cncf-argo, cncf-artifact-hub, cncf-aws-dynamodb, cncf-aws-ec2

OpenKruise in Cloud-Native Engineering

Category: Scheduling & Orchestration
Status: Active
Stars: 2,700
Last Updated: 2026-04-22
Primary Language: Go
Documentation: Extended Kubernetes workload management with advanced deployment strategies

Purpose and Use Cases

OpenKruise is a core component of the cloud-native ecosystem, serving as deployment strategies

What Problem Does It Solve?

OpenKruise addresses the challenge of advanced workload management beyond Kubernetes native controllers. It provides enhanced deployment strategies, advanced lifecycle management, and workload orchestration for complex applications.

When to Use This Project

Use OpenKruise when managing complex deployments with rolling updates, stateful applications, or large-scale cluster operations. Not ideal for simple deployments or when faster rollouts with less downtime, granular control over update order, and advanced pod management.

Key Use Cases

Advanced Deployment Strategies with CBO
StatefulSet Enhanced with Partition Management
Large-Scale Cluster Management
Pre-Batch Pod Creation for Fast Scaling
Custom Workload Management with sidecar injection

Architecture Design Patterns

Core Components

CloneSet Controller: Manages stateless applications with enhanced scaling and updates
StatefulSet Controller: Enhances native StatefulSet with advanced partitioning and updates
DaemonSet Controller: Provides advanced daemon management with batch updates
Advanced Workloads Controller: Manages custom workload types with unique scheduling needs
SidecarSet Controller: Manages sidecar injection and updates independently

Component Interactions

Workload → CloneSet: CloneSet manages pod lifecycle with enhanced update strategies
SidecarSet → Pods: SidecarSet injects sidecars and updates them independently
BroadcastJob → Nodes: BroadcastJob schedules jobs across all matching nodes

Data Flow Patterns

Pod Creation: Workload creation → Controller creates pods → Health checks → Ready
Update Orchestration: New version → Partition update → Select pods → Update sequentially
Sidecar Injection: Pod created → SidecarSet matches → Sidecar injected → Pod ready

Design Principles

Backward Compatible: Extends Kubernetes APIs without breaking changes
Incremental Rollout: Supports canary and phased deployments
Graceful Handling: Handles failures and rollback automatically
Resource Efficient: Optimizes pod creation and update sequences

Integration Approaches

Integration with Other CNCF Projects

Kubernetes: Core platform with extended controllers
Helm: Installation and upgrade management
Prometheus: Metrics collection for workloads
Istio: Service mesh integration for traffic management

API Patterns

Custom Resources: CloneSet, StatefulSetEnhanced, SidecarSet APIs
Webhooks: Admission webhooks for validation
Kubernetes Controllers: Reconciliation loops for desired state

Configuration Patterns

CRD YAML: Define workload specifications in YAML
Helm Values: Configure OpenKruise settings
Admission Webhook Config: Configure webhook behavior

Extension Mechanisms

Custom Controllers: Build controllers for custom workload types
Mutation Webhooks: Modify pod specs before creation
Validation Webhooks: Validate workload configurations

Common Pitfalls and How to Avoid Them

Misconfigurations

Version Mismatch: Controller and Kubernetes version incompatibility
- How to Avoid: Check compatibility matrix, upgrade together, test in staging
Resource Conflicts: Multiple controllers managing same pods
- How to Avoid: Use distinct workload types, separate namespaces, label management

Performance Issues

Performance Overhead: Additional controllers impacting cluster performance
- How to Avoid: Tune controller replicas, optimize requeues, monitor metrics
StatefulSet Partition: Incorrect partition configuration
- How to Avoid: Understand partition semantics, test upgrade order, document strategy

Operational Challenges

Migration Complexity: Migrating existing workloads to OpenKruise
- How to Avoid: Gradual migration, test one workload, document changes
Cluster Scaling: Controller not scaling with cluster
- How to Avoid: Vertical pod autoscaler, monitor resource usage, scale replicas

Security Pitfalls

Security Context: Insufficient RBAC for controllers
- How to Avoid: Define least-privilege RBAC, audit permissions, test in namespace

Coding Practices

Idiomatic Configuration

Declarative Workloads: Define workload specs in YAML manifests
Update Policies: Configure update strategies and batch sizes
Health Checks: Define proper readiness and liveness probes

API Usage Patterns

kubectl apply: Apply custom workload definitions
kubectl describe: Inspect workload status and update history
kubectl logs: Check controller logs for issues

Observability Best Practices

Controller Metrics: Monitor reconciliation duration and error rates
Pod Status Metrics: Track pod health and update status
Webhook Performance: Monitor webhook latency and success rates

Testing Strategies

Unit Tests: Test controller reconciliation logic
Integration Tests: Test workload updates in Kubernetes
Stress Tests: Validate performance under load

Development Workflow

Local Development: Use kind for local testing
Debug Commands: Use kubectl describe and logs
Test Environment: Set up dedicated test cluster
CI/CD Integration: Automate testing with GitHub Actions
Monitoring Setup: Configure Prometheus and Grafana
Documentation: Maintain comprehensive docs

Fundamentals

Essential Concepts

CloneSet: Controller for stateless workload management
StatefulSetEnhanced: Enhanced StatefulSet with advanced features
SidecarSet: Sidecar injection and management
BroadcastJob: Job that runs on all matching nodes
AdvancedDeployment: Enhanced deployment with advanced strategies
InjectPod: Pod injection mechanism
Reconciler: Controller reconciliation loop
UpdateStrategy: Configuration for update behavior
Partition: Control update batch size
ReadySeconds: Wait period before considering pod healthy

Terminology Glossary

MaxUnavailable: Maximum pods unavailable during update
MaxSurge: Maximum pods above desired count
UpdateOrder: Order of pod updates (NewFirst/OldFirst)
Partition: Number of pods to skip during update
InPlaceUpdate: In-place pod updates without recreation

Data Models and Types

CloneSetSpec: Desired state for CloneSet
UpdateStrategy: Update configuration
SidecarSetSpec: Sidecar configuration
BroadcastJobSpec: Batch job configuration

Lifecycle Management

Workload Creation: CR created → Controller reconciles → Pods created → Ready
Update Initiation: Spec changed → Controller detects → Update starts
Batch Update: Select batch → Update pods → Health checks → Next batch
Pod Recreation: Old pod deleted → New pod created → Health checks
Update Completion: All pods updated → Update complete → Status updated

State Management

Update Phase: Pending, Updating, Completed
Pod Status: Ready, Updating, Failed
Controller Phase: Reconciling, Waiting, Completed
Partition Status: Current partition index

Scaling and Deployment Patterns

Horizontal Scaling

Horizontal Pod Scaling: Scale pods based on load
Controller Scaling: Scale controller replicas
Cluster Scaling: Add nodes for more capacity

High Availability

Controller HA: Multiple controller replicas
Pod Spread: Spread pods across nodes
Graceful Degradation: Continue serving during updates
Update Rollback: Rollback to previous version if issues

Production Deployments

Installation: Deploy using official Helm chart
RBAC Setup: Configure appropriate cluster roles
Webhook Configuration: Set up admission webhooks
Monitoring Setup: Configure Prometheus metrics
Security Hardening: Enable network policies and PodSecurityPolicies
Backup Strategy: Backup CRD definitions
Resource Quotas: Set namespace limits
Logging Setup: Configure centralized logging

Upgrade Strategies

Chart Upgrade: Upgrade Helm chart to new version
CRD Migration: Update CRD definitions
Controller Restart: Rolling restart of controllers
Test Workloads: Verify existing workloads function

Resource Management

CPU Resources: Set controller CPU requests
Memory Resources: Configure memory limits
Storage Resources: Configure etcd storage
Network Resources: Configure webhook network policies

Additional Resources

Official Documentation: https://openkruise.io/docs/
GitHub Repository: Check the project's official documentation for repository link
CNCF Project Page: cncf.io/projects/cncf-openkruise/
Community: Check the official documentation for community channels
Versioning: Refer to project's release notes for version-specific features

Troubleshooting

Common Issues

Deployment Failures
- Check pod logs for errors
- Verify configuration values
- Ensure network connectivity
Performance Issues
- Monitor resource usage
- Adjust resource limits
- Check for bottlenecks
Configuration Errors
- Validate YAML syntax
- Check required fields
- Verify environment-specific settings
Integration Problems
- Verify API compatibility
- Check dependency versions
- Review integration documentation

Getting Help

Check official documentation
Search GitHub issues
Join community channels
Review logs and metrics Content generated automatically. Verify against official documentation before production use.

Examples

Basic Configuration

# Basic configuration example
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{project_name}}-config
  namespace: default
data:
  # Configuration goes here
  config.yaml: |
    # Base configuration
    # Add your settings here

Kubernetes Deployment

# Kubernetes deployment for {{project_name}}
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{project_name}}
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{project_name}}
  template:
    metadata:
      labels:
        app: {{project_name}}
    spec:
      containers:
      - name: {{project_name}}
        image: {{project_name}}:latest
        ports:
        - containerPort: 8080
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"

Kubernetes Service

# Kubernetes service for {{project_name}}
apiVersion: v1
kind: Service
metadata:
  name: {{project_name}}
  namespace: default
spec:
  selector:
    app: {{project_name}}
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

When to Use

Use this skill when:

Integrating a CNCF project into Kubernetes infrastructure — You need to configure, deploy, or troubleshoot a cloud-native tool within a cluster
Designing cloud-native architecture — You are selecting and integrating CNCF tools to solve specific infrastructure challenges
Resolving operational issues — A CNCF component is misbehaving, underperforming, or needs configuration changes

Core Workflow

Assess Requirements — Understand the use case, scale, integration needs, and existing infrastructure. Checkpoint: Document requirements, constraints, and success criteria.
Design Architecture — Plan component interactions, data flow, and deployment strategy using cloud-native best practices. Checkpoint: Verify the architecture addresses all requirements and follows CNCF conventions.
Implement & Configure — Create manifests, configurations, and deployment scripts. Include resource limits, health checks, and observability hooks. Checkpoint: Validate all YAML against schema and test in a staging environment.
Deploy & Monitor — Apply manifests to the cluster, verify component health, and confirm observability is working. Checkpoint: Confirm all pods/services are running, probes passing, and metrics/alerts configured.