openkruise

star 4

"OpenKruise in Extended Kubernetes workload management with advanced deployment" strategies

paulpas By paulpas schedule Updated 6/4/2026

name: openkruise compatibility: opencode completeness: 95 content-types:

  • guidance
  • examples
  • do-dont
  • config description: '"OpenKruise in Extended Kubernetes workload management with advanced deployment" strategies' license: MIT maturity: stable metadata: domain: cncf output-format: manifests role: reference scope: infrastructure triggers: container orchestration, extended, k8s, openkruise, workload, kubernetes archetypes:
    • educational
    • strategic anti_triggers:
    • brainstorming
    • vague ideation
    • non-containerized architecture response_profile: verbosity: medium directive_strength: low abstraction_level: strategic version: "1.0.0"

related-skills: cncf-argo, cncf-artifact-hub, cncf-aws-dynamodb, cncf-aws-ec2

OpenKruise in Cloud-Native Engineering

Category: Scheduling & Orchestration
Status: Active
Stars: 2,700
Last Updated: 2026-04-22
Primary Language: Go
Documentation: Extended Kubernetes workload management with advanced deployment strategies


Purpose and Use Cases

OpenKruise is a core component of the cloud-native ecosystem, serving as deployment strategies

What Problem Does It Solve?

OpenKruise addresses the challenge of advanced workload management beyond Kubernetes native controllers. It provides enhanced deployment strategies, advanced lifecycle management, and workload orchestration for complex applications.

When to Use This Project

Use OpenKruise when managing complex deployments with rolling updates, stateful applications, or large-scale cluster operations. Not ideal for simple deployments or when faster rollouts with less downtime, granular control over update order, and advanced pod management.

Key Use Cases

  • Advanced Deployment Strategies with CBO
  • StatefulSet Enhanced with Partition Management
  • Large-Scale Cluster Management
  • Pre-Batch Pod Creation for Fast Scaling
  • Custom Workload Management with sidecar injection

Architecture Design Patterns

Core Components

  • CloneSet Controller: Manages stateless applications with enhanced scaling and updates
  • StatefulSet Controller: Enhances native StatefulSet with advanced partitioning and updates
  • DaemonSet Controller: Provides advanced daemon management with batch updates
  • Advanced Workloads Controller: Manages custom workload types with unique scheduling needs
  • SidecarSet Controller: Manages sidecar injection and updates independently

Component Interactions

  1. Workload → CloneSet: CloneSet manages pod lifecycle with enhanced update strategies
  2. SidecarSet → Pods: SidecarSet injects sidecars and updates them independently
  3. BroadcastJob → Nodes: BroadcastJob schedules jobs across all matching nodes

Data Flow Patterns

  1. Pod Creation: Workload creation → Controller creates pods → Health checks → Ready
  2. Update Orchestration: New version → Partition update → Select pods → Update sequentially
  3. Sidecar Injection: Pod created → SidecarSet matches → Sidecar injected → Pod ready

Design Principles

  • Backward Compatible: Extends Kubernetes APIs without breaking changes
  • Incremental Rollout: Supports canary and phased deployments
  • Graceful Handling: Handles failures and rollback automatically
  • Resource Efficient: Optimizes pod creation and update sequences

Integration Approaches

Integration with Other CNCF Projects

  • Kubernetes: Core platform with extended controllers
  • Helm: Installation and upgrade management
  • Prometheus: Metrics collection for workloads
  • Istio: Service mesh integration for traffic management

API Patterns

  • Custom Resources: CloneSet, StatefulSetEnhanced, SidecarSet APIs
  • Webhooks: Admission webhooks for validation
  • Kubernetes Controllers: Reconciliation loops for desired state

Configuration Patterns

  • CRD YAML: Define workload specifications in YAML
  • Helm Values: Configure OpenKruise settings
  • Admission Webhook Config: Configure webhook behavior

Extension Mechanisms

  • Custom Controllers: Build controllers for custom workload types
  • Mutation Webhooks: Modify pod specs before creation
  • Validation Webhooks: Validate workload configurations

Common Pitfalls and How to Avoid Them

Misconfigurations

  • Version Mismatch: Controller and Kubernetes version incompatibility
    • How to Avoid: Check compatibility matrix, upgrade together, test in staging
  • Resource Conflicts: Multiple controllers managing same pods
    • How to Avoid: Use distinct workload types, separate namespaces, label management

Performance Issues

  • Performance Overhead: Additional controllers impacting cluster performance
    • How to Avoid: Tune controller replicas, optimize requeues, monitor metrics
  • StatefulSet Partition: Incorrect partition configuration
    • How to Avoid: Understand partition semantics, test upgrade order, document strategy

Operational Challenges

  • Migration Complexity: Migrating existing workloads to OpenKruise
    • How to Avoid: Gradual migration, test one workload, document changes
  • Cluster Scaling: Controller not scaling with cluster
    • How to Avoid: Vertical pod autoscaler, monitor resource usage, scale replicas

Security Pitfalls

  • Security Context: Insufficient RBAC for controllers
    • How to Avoid: Define least-privilege RBAC, audit permissions, test in namespace

Coding Practices

Idiomatic Configuration

  • Declarative Workloads: Define workload specs in YAML manifests
  • Update Policies: Configure update strategies and batch sizes
  • Health Checks: Define proper readiness and liveness probes

API Usage Patterns

  • kubectl apply: Apply custom workload definitions
  • kubectl describe: Inspect workload status and update history
  • kubectl logs: Check controller logs for issues

Observability Best Practices

  • Controller Metrics: Monitor reconciliation duration and error rates
  • Pod Status Metrics: Track pod health and update status
  • Webhook Performance: Monitor webhook latency and success rates

Testing Strategies

  • Unit Tests: Test controller reconciliation logic
  • Integration Tests: Test workload updates in Kubernetes
  • Stress Tests: Validate performance under load

Development Workflow

  • Local Development: Use kind for local testing
  • Debug Commands: Use kubectl describe and logs
  • Test Environment: Set up dedicated test cluster
  • CI/CD Integration: Automate testing with GitHub Actions
  • Monitoring Setup: Configure Prometheus and Grafana
  • Documentation: Maintain comprehensive docs

Fundamentals

Essential Concepts

  • CloneSet: Controller for stateless workload management
  • StatefulSetEnhanced: Enhanced StatefulSet with advanced features
  • SidecarSet: Sidecar injection and management
  • BroadcastJob: Job that runs on all matching nodes
  • AdvancedDeployment: Enhanced deployment with advanced strategies
  • InjectPod: Pod injection mechanism
  • Reconciler: Controller reconciliation loop
  • UpdateStrategy: Configuration for update behavior
  • Partition: Control update batch size
  • ReadySeconds: Wait period before considering pod healthy

Terminology Glossary

  • MaxUnavailable: Maximum pods unavailable during update
  • MaxSurge: Maximum pods above desired count
  • UpdateOrder: Order of pod updates (NewFirst/OldFirst)
  • Partition: Number of pods to skip during update
  • InPlaceUpdate: In-place pod updates without recreation

Data Models and Types

  • CloneSetSpec: Desired state for CloneSet
  • UpdateStrategy: Update configuration
  • SidecarSetSpec: Sidecar configuration
  • BroadcastJobSpec: Batch job configuration

Lifecycle Management

  • Workload Creation: CR created → Controller reconciles → Pods created → Ready
  • Update Initiation: Spec changed → Controller detects → Update starts
  • Batch Update: Select batch → Update pods → Health checks → Next batch
  • Pod Recreation: Old pod deleted → New pod created → Health checks
  • Update Completion: All pods updated → Update complete → Status updated

State Management

  • Update Phase: Pending, Updating, Completed
  • Pod Status: Ready, Updating, Failed
  • Controller Phase: Reconciling, Waiting, Completed
  • Partition Status: Current partition index

Scaling and Deployment Patterns

Horizontal Scaling

  • Horizontal Pod Scaling: Scale pods based on load
  • Controller Scaling: Scale controller replicas
  • Cluster Scaling: Add nodes for more capacity

High Availability

  • Controller HA: Multiple controller replicas
  • Pod Spread: Spread pods across nodes
  • Graceful Degradation: Continue serving during updates
  • Update Rollback: Rollback to previous version if issues

Production Deployments

  • Installation: Deploy using official Helm chart
  • RBAC Setup: Configure appropriate cluster roles
  • Webhook Configuration: Set up admission webhooks
  • Monitoring Setup: Configure Prometheus metrics
  • Security Hardening: Enable network policies and PodSecurityPolicies
  • Backup Strategy: Backup CRD definitions
  • Resource Quotas: Set namespace limits
  • Logging Setup: Configure centralized logging

Upgrade Strategies

  • Chart Upgrade: Upgrade Helm chart to new version
  • CRD Migration: Update CRD definitions
  • Controller Restart: Rolling restart of controllers
  • Test Workloads: Verify existing workloads function

Resource Management

  • CPU Resources: Set controller CPU requests
  • Memory Resources: Configure memory limits
  • Storage Resources: Configure etcd storage
  • Network Resources: Configure webhook network policies

Additional Resources

  • Official Documentation: https://openkruise.io/docs/
  • GitHub Repository: Check the project's official documentation for repository link
  • CNCF Project Page: cncf.io/projects/cncf-openkruise/
  • Community: Check the official documentation for community channels
  • Versioning: Refer to project's release notes for version-specific features

Troubleshooting

Common Issues

  1. Deployment Failures

    • Check pod logs for errors
    • Verify configuration values
    • Ensure network connectivity
  2. Performance Issues

    • Monitor resource usage
    • Adjust resource limits
    • Check for bottlenecks
  3. Configuration Errors

    • Validate YAML syntax
    • Check required fields
    • Verify environment-specific settings
  4. Integration Problems

    • Verify API compatibility
    • Check dependency versions
    • Review integration documentation

Getting Help

  • Check official documentation
  • Search GitHub issues
  • Join community channels
  • Review logs and metrics Content generated automatically. Verify against official documentation before production use.

Examples

Basic Configuration

# Basic configuration example
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{project_name}}-config
  namespace: default
data:
  # Configuration goes here
  config.yaml: |
    # Base configuration
    # Add your settings here

Kubernetes Deployment

# Kubernetes deployment for {{project_name}}
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{project_name}}
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{project_name}}
  template:
    metadata:
      labels:
        app: {{project_name}}
    spec:
      containers:
      - name: {{project_name}}
        image: {{project_name}}:latest
        ports:
        - containerPort: 8080
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"

Kubernetes Service

# Kubernetes service for {{project_name}}
apiVersion: v1
kind: Service
metadata:
  name: {{project_name}}
  namespace: default
spec:
  selector:
    app: {{project_name}}
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

When to Use

Use this skill when:

  • Integrating a CNCF project into Kubernetes infrastructure — You need to configure, deploy, or troubleshoot a cloud-native tool within a cluster
  • Designing cloud-native architecture — You are selecting and integrating CNCF tools to solve specific infrastructure challenges
  • Resolving operational issues — A CNCF component is misbehaving, underperforming, or needs configuration changes

Core Workflow

  1. Assess Requirements — Understand the use case, scale, integration needs, and existing infrastructure. Checkpoint: Document requirements, constraints, and success criteria.

  2. Design Architecture — Plan component interactions, data flow, and deployment strategy using cloud-native best practices. Checkpoint: Verify the architecture addresses all requirements and follows CNCF conventions.

  3. Implement & Configure — Create manifests, configurations, and deployment scripts. Include resource limits, health checks, and observability hooks. Checkpoint: Validate all YAML against schema and test in a staging environment.

  4. Deploy & Monitor — Apply manifests to the cluster, verify component health, and confirm observability is working. Checkpoint: Confirm all pods/services are running, probes passing, and metrics/alerts configured.


Constraints

MUST DO

  • Include at least one complete working YAML manifest example
  • Note when content is auto-generated vs. manually verified
  • Reference relevant CNCF project documentation

MUST NOT DO

  • Deploy manifests without testing in a staging environment first
  • Use deprecated API versions (e.g., apps/v1beta1)
  • Omit resource limits and requests in Kubernetes manifests
Install via CLI
npx skills add https://github.com/paulpas/agent-skill-router --skill openkruise
Repository Details
star Stars 4
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator