k3s-ops

star 1

K3s lightweight Kubernetes cluster deployment and operations skill. Supports automatic K3s cluster deployment (Server + Agent nodes), cluster upgrade, certificate management, backup restore, troubleshooting, and daily maintenance. 触发场景:K3s 安装、K3s 部署、K3s 升级、K3s 维护、K3s 故障、轻量集群、边缘集群、 集群初始化、k3s server、k3s agent、集群搭建、节点加入。

clcc2019 By clcc2019 schedule Updated 2/26/2026

name: k3s-ops description: | K3s lightweight Kubernetes cluster deployment and operations skill. Supports automatic K3s cluster deployment (Server + Agent nodes), cluster upgrade, certificate management, backup restore, troubleshooting, and daily maintenance. 触发场景:K3s 安装、K3s 部署、K3s 升级、K3s 维护、K3s 故障、轻量集群、边缘集群、 集群初始化、k3s server、k3s agent、集群搭建、节点加入。 allowed-tools: linux_execute_command linux_get_system_info linux_get_service_status linux_service_control linux_upload_script linux_write_file linux_read_file pods_list nodes_list events_list

K3s Cluster Deployment and Operations Guide

Reference: https://github.com/k3s-io/k3s | https://docs.k3s.io

I. Automated K3s Cluster Deployment

Prerequisites

  • Target host has SSH access configured (via Linux MCP)
  • System requirements: Linux 64-bit (recommended Ubuntu 20.04+/CentOS 7+/RHEL 8+)
  • Minimum 512MB RAM (Server), recommended 2GB+
  • Network connectivity, Server node exposes port 6443

Step 1: Environment Check

Use linux_get_system_info(host) to verify:

  • OS and kernel version
  • CPU/memory resources meet minimum requirements
  • Network connectivity (ping between nodes)
  • Firewall status
# Check system requirements
uname -a
free -h
df -h
# Check if port 6443 is in use
ss -tlnp | grep 6443
# Check firewall
systemctl status firewalld 2>/dev/null || ufw status 2>/dev/null

Step 2: Deploy K3s Server (Master Node)

Use linux_execute_command or linux_upload_script to run the install script:

# Install K3s Server
curl -sfL https://get.k3s.io | sh -s - server \
  --write-kubeconfig-mode 644 \
  --disable traefik \
  --disable servicelb \
  --tls-san <SERVER_IP_OR_DOMAIN>

Common install options:

Option Description Example
--write-kubeconfig-mode 644 kubeconfig file permissions Allow non-root read
--disable traefik Disable built-in Traefik Use custom Ingress Controller
--disable servicelb Disable built-in ServiceLB Use MetalLB
--tls-san API Server extra SAN Domain or external IP
--data-dir Data directory Custom storage path
--cluster-init Enable embedded etcd HA mode
--flannel-backend=none Disable Flannel Use Calico/Cilium

Step 3: Get Node Token

cat /var/lib/rancher/k3s/server/node-token

Step 4: Deploy K3s Agent (Worker Node)

curl -sfL https://get.k3s.io | K3S_URL=https://<SERVER_IP>:6443 \
  K3S_TOKEN=<NODE_TOKEN> sh -s - agent

Step 5: Verify Cluster

k3s kubectl get nodes
k3s kubectl get pods -A
k3s kubectl cluster-info

II. High Availability Deployment

Embedded etcd Mode (3 Server Nodes)

# First Server node
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san <VIP_OR_LB_IP>

# Get token
cat /var/lib/rancher/k3s/server/node-token

# Second and third Server nodes
curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> sh -s - server \
  --server https://<FIRST_SERVER_IP>:6443 \
  --tls-san <VIP_OR_LB_IP>

III. Cluster Upgrade

Manual Upgrade

# Server node (upgrade one by one)
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=stable sh -

# Agent node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=stable \
  K3S_URL=https://<SERVER_IP>:6443 K3S_TOKEN=<TOKEN> sh -

Upgrade Notes

  • Upgrade Server nodes first, then Agent nodes
  • Upgrade one at a time, verify node is Ready before next
  • In HA mode ensure at least one Server node is available
  • Take etcd snapshot backup before upgrade

IV. Backup and Restore

etcd Snapshot

# Manual snapshot
k3s etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)

# List snapshots
k3s etcd-snapshot ls

# Restore from snapshot (run after stopping k3s)
systemctl stop k3s
k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/<snapshot>
systemctl start k3s

Automated Snapshot Config

# /etc/rancher/k3s/config.yaml
etcd-snapshot-schedule-cron: "0 */6 * * *"
etcd-snapshot-retention: 5

V. Daily Maintenance

K3s Service Management

systemctl status k3s          # Server status
systemctl status k3s-agent     # Agent status
systemctl restart k3s          # Restart Server
systemctl restart k3s-agent    # Restart Agent
journalctl -u k3s -f           # Server logs
journalctl -u k3s-agent -f     # Agent logs

Certificate Management

# Check certificate expiry
for cert in /var/lib/rancher/k3s/server/tls/*.crt; do
  echo "=== $cert ==="; openssl x509 -in "$cert" -noout -enddate
done

# K3s auto-rotates certificates (restart suffices)
systemctl restart k3s

Cluster Cleanup

# Uninstall K3s Server
/usr/local/bin/k3s-uninstall.sh

# Uninstall K3s Agent
/usr/local/bin/k3s-agent-uninstall.sh

VI. Troubleshooting

K3s Server Won't Start

  1. Check service status: systemctl status k3s
  2. View logs: journalctl -u k3s --no-pager -n 200
  3. Common causes:
    • Port in use (6443, 10250)
    • Data directory permission issues
    • etcd data corruption (restore from snapshot)

Node NotReady

  1. Check Agent service: systemctl status k3s-agent
  2. Check connectivity: k3s kubectl get nodes
  3. Check kubelet logs: journalctl -u k3s-agent -n 100
  4. Common causes:
    • Server unreachable (network/firewall)
    • Invalid Node Token
    • Certificate expired

Pod Issues

  1. Use K8s MCP: pods_list(namespace="all")
  2. View events: events_list(namespace="all")
  3. View logs: pods_logs(namespace, name)

Network Issues

# Check Flannel
k3s kubectl get pods -n kube-system | grep flannel
# Check CoreDNS
k3s kubectl get pods -n kube-system | grep coredns
# Check Service CIDR and Pod CIDR
k3s kubectl cluster-info dump | grep -i cidr

VII. Key File Paths

Path Description
/etc/rancher/k3s/k3s.yaml kubeconfig
/etc/rancher/k3s/config.yaml K3s config file
/var/lib/rancher/k3s/ Data directory
/var/lib/rancher/k3s/server/node-token Node Token
/var/lib/rancher/k3s/server/tls/ TLS certificates
/var/lib/rancher/k3s/server/db/ Embedded DB (SQLite/etcd)
/var/log/syslog or journalctl -u k3s K3s logs
/usr/local/bin/k3s K3s binary
/usr/local/bin/kubectl kubectl symlink
/usr/local/bin/crictl crictl symlink
Install via CLI
npx skills add https://github.com/clcc2019/agentic-infra --skill k3s-ops
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator