kaitu-node-ops

star 0

Node infrastructure operations via kaitu-center tools. Safety guardrails, dual-architecture identification, standard ops commands, and script execution guidance.

kaitu-io By kaitu-io schedule Updated 6/9/2026

name: kaitu-node-ops description: Unified runbook for Kaitu VPN node operations — operate existing k2s nodes (status/logs/restart/update, user auth, batch fleet ops), provision NEW private/shared nodes from scratch, and build/configure/observe the node-authority traffic metering + quota cutoff sidecar (incl. AWS Lightsail billing rules). Canonical deployment is the single-compose k2s + k2-sidecar stack. Covers safety guardrails, architecture ID, env vars, verification, troubleshooting, and device-log triage. triggers: - node ops - server ops - node management - docker ops - k2 node - exec on node - list nodes - node health - node restart - node update - node logs - docker compose - k2s - k2-slave - provision node - provision private node - provisioning intent - claim provisioning - provision job - private node - 专属节点 - dedicated node - K2_PRIVATE_CLAIM - metering ops - traffic metering - sidecar deploy - sidecar image - build sidecar - traffic limit - traffic quota - traffic cutoff - quota cutoff - set-usage - node usage - NodeUsage - usage reporter - K2_NODE_TRAFFIC_LIMIT_GB - K2_NODE_BILLING_START_DATE - Lightsail billing

Kaitu Node Operations (unified)

One skill for everything you do to a Kaitu VPN node, via the kaitu-center MCP (list_nodes, exec_on_node, ping_node, delete_node, the cloud*/node_operation tools) + SSH.

Canonical deployment = the single-compose k2s + k2-sidecar stack (docker/docker-compose.yml, dir /apps/k2s/). Everything below assumes it. The old k2-slave SNI router is legacy — see §1.

One stack, three jobs — the unifying fact: shared-pool nodes and private (dedicated) nodes run the exact same compose + images. They differ only by a few .env variables. Provisioning a private node = the standard node deploy + a Center work-item wrapper. Metering + cutoff run on every node that has a billing date set. So this is one skill, not three.


Quick index — what do you want to do?

Goal Where
Safety rules (read once) §0 Guardrails
Identify a node's architecture (k2s vs legacy) §1
Look up / set an .env variable §2 Env master reference
Status / logs / restart / update one node §3 Standard ops
Manage k2s users (auth file) §3.1
Deploy or update a single node §4
Roll compose / images / auto-update across the fleet §5 Batch ops
Verify a node after deploy §6 Verification
Fix a broken node (iptables-nft, restart loop) §7 Troubleshooting
AWS Lightsail specifics (reboot, billing) §8 + references/metering.md
Provision a NEW private node from a Center work item references/provisioning.md
Build / push the sidecar image references/metering.md Part A
Per-provider quota knobs (reset day / bundle / already-used) references/metering.md Part B
Configure quota / billing / mid-cycle seed references/metering.md Part C
AWS Lightsail billing rules (max(in,out), calendar month, proration) references/metering.md
Observe metering / operate the cutoff references/metering.md Parts D–E
Triage a user's device logs (DIAG analysis) references/device-logs.md

The three references/*.md files are pulled in with the Read tool only when the task needs them — keep the hub lean.


§0 Safety Guardrails

Full SSH root access — these rules prevent accidental damage:

  1. K2_NODE_SECRET / K2_PRIVATE_CLAIM / claim tokens are untouchable — never read, display, modify, transmit, or echo them. Write secrets only via heredoc. Redact when printing .env: sed -E "s/(SECRET|CLAIM|TOKEN)=.*/\1=<redacted>/". MCP stdout redaction is a backstop, not a license.
  2. Never touch another node — only operate the node you were asked about. Never read/modify a different node's secret or config.
  3. Update = pull + up -d, never down. up -d recreates only changed containers; down removes them and interrupts service. Use --remove-orphans to clear stale k2v5/k2v4-slave/k2-oc containers after a rename.
  4. docker restart does NOT re-read .env — it reuses old env. To apply an .env change you must docker compose up -d (or restart via compose). See §4.
  5. Config changes go through .env only — don't hand-edit docker-compose.yml or /etc/kaitu/ (auto-generated by the sidecar; overwritten on restart). Port mapping (40000-40019/udp) is Docker-managed — no manual iptables DNAT.
  6. Never delete /apps/k2s/ — it is the entire node deployment.
  7. Confirm before restartdocker compose ps first; understand what's affected.
  8. K2_DOMAIN stays empty — the sidecar auto-derives a unique {ipv4-with-dashes}.sslip.io (see §2). No manual assignment, no collision risk.
  9. Pinned image tags only; never move :latest outside a deliberate fleet rollout (it's what unpinned nodes auto-pull → touches every node).

§1 Identify Node Architecture

Run first on every node:

exec_on_node(ip, "docker ps --format '{{.Names}}'")
  • Output has k2scanonical (k2s tunnel container). Everything in this skill applies directly.
  • Output has k2-slavelegacy SNI router (no ECH, host network). All commands are identical — just substitute the tunnel container name k2-slave for k2s. Migrate it to k2s when convenient (push current compose + up -d --remove-orphans).

Canonical stack (dir /apps/k2s/, images from public.ecr.aws/d6n9t2r2/):

k2-sidecar (bridge net, k2-internal)
  └── healthy ──→ k2s (bridge net; Docker port map :443 TCP+UDP + 40000-40019 UDP → container 443)
Container Role Network Image
k2-sidecar Registration, config-gen, health report, traffic metering + cutoff bridge k2-sidecar:${K2_VERSION}
k2s Tunnel data plane. ECH front door on 443 → in-process QUIC + TCP-WS bridge k2s:${K2_VERSION}
  • Sidecar writes /etc/kaitu/.ready when config-gen completes; k2s waits on the sidecar healthcheck.
  • No iptables management, no NET_ADMIN, no wrapper scripts. k2-oc (OpenConnect) was retired 2026-04-30 — if you see it, push compose + up -d --remove-orphans.

§2 Env Master Reference (/apps/k2s/.env)

One table for every .env var across ops / provisioning / metering. "Set by": who writes it.

Variable Purpose Set by Notes
K2_NODE_SECRET Node auth key + Center Basic-auth (ipv4:secret) for usage reports provision (openssl rand -hex 32) SECRET — never read/log. TOFU-recorded by Center at first registration.
K2_PRIVATE_CLAIM One-time claim token → Center sets Class=private, binds owner, activates owner's sub provision (from claim) SECRET — one-time, never log. Identity/activation only — NOT metering. Empty on shared nodes.
K2_CENTER_URL Center API base (registration + /slave/usage) provision dev/test = https://k2.52j.me.
K2_DOMAIN Tunnel domain leave empty Empty → sidecar auto-derives {ipv4-dashes}.sslip.io (unique by IP). Set only for a custom domain (you own its DNS).
K2_VERSION Image tag for both images ops Pin a known-good tag, never :latest outside a fleet rollout. If k2s/sidecar need different tags, pin k2s's image: line directly in compose and let K2_VERSION drive the sidecar.
K2_NODE_NAME Human name, {region}.{provider}.wm{NN} (private: pn-<subId>) provision registration meta.
K2_NODE_REGION Region id, e.g. jp-tokyo.aws provision registration meta.
K2_NODE_ARCH Protocol arch tag (default k2v5) ops registration meta.
K2_IP_TYPE residential / non_residential / unknown provision Sidecar reports → Center SlaveNode.ip_type (drives 住宅IP visibility). Last-writer-wins with ops update_node.
K2_NODE_BILLING_START_DATE Monthly cycle anchor, yyyy-MM-dd (day-of-month extracted) provision/ops REQUIRED to meter. Empty → metering OFF, node runs uncapped. Day must match the provider's reset day (Lightsail=01; 搬瓦工=KiwiVM "Next reset" day — don't assume 01) — derive per provider in references/metering.md Part B.
K2_NODE_TRAFFIC_LIMIT_GB Monthly quota (GiB). Node pauses k2s at used ≥ limit − 500 MiB provision/ops 0 = unlimited (safe fallback).
K2_NODE_TRAFFIC_USED_GB Mid-cycle onboarding seed (GiB already used) provision Applied once on first boot (no state). Prefer set-usage later — see references/metering.md Part C.
K2_CUTOFF_POLL_INTERVAL Enforcer poll period ops default 5s.
K2_JUMP_PORT_MIN/MAX Hop port range (default 40000/40019) ops Docker port map → container 443; 20 ports, high range to dodge GFW scan.
K2_LOG_LEVEL debug/info/warn/error ops

Metering is decoupled from the private claim. Any node — shared or private — meters and self-cuts iff K2_NODE_BILLING_START_DATE is set. K2_PRIVATE_CLAIM only controls identity/activation. The reporter cadence is owned by Center (next_report_interval, 10s floor) — don't tune it from the node.


§3 Standard Operations

exec_on_node(ip, command). For legacy nodes substitute k2-slave for k2s.

Operation Command
All container status cd /apps/k2s && docker compose ps
Container logs (tail) docker logs --tail 100 <container>
Pull + restart all cd /apps/k2s && docker compose pull && docker compose up -d
Restart one container docker restart <container> (⚠ no .env reload — §4)
View .env (redacted) sed -E "s/(SECRET|CLAIM|TOKEN)=.*/\1=<redacted>/" /apps/k2s/.env
Sidecar health docker inspect --format='{{.State.Health.Status}}' k2-sidecar
Disk / mem / CPU df -h && free -h && top -bn1 | head -5
Network conns ss -s
IPv6 status ip -6 addr show scope global
Hop port mapping docker port k2s (expect 22 maps: 443/tcp + 443/udp + 40000-40019/udp)
BBR status sysctl net.ipv4.tcp_congestion_control (expect bbr)
Container outbound net docker exec k2-sidecar wget -qO- --timeout=5 https://api.ipify.org
Auto-update log tail -50 /apps/k2s/auto-update.log
Timezone timedatectl | grep 'Time zone' (must be Asia/Singapore)

§3.1 k2s user auth

Two modes, first match wins: (1) /apps/k2s/users (bind-mounted), (2) Center remote /slave/device-check-auth (fallback). Empty file = pure remote auth = default. No restart needed (changes apply on next full auth; 1h tickets stay valid). Format = one udid:token per line.

Operation Command
View cat /apps/k2s/users
Add echo "udid:token" >> /apps/k2s/users
Remove sed -i "/^UDID:/d" /apps/k2s/users
Clear (remote only) truncate -s 0 /apps/k2s/users

§4 Deploy / Update a Single Node

# Push the canonical compose (idempotent)
exec_on_node(ip, "sudo tee /apps/k2s/docker-compose.yml > /dev/null", { scriptPath: "docker/docker-compose.yml" })

# Edit .env (heredoc for any secret; never on the command line)
exec_on_node(ip, "sudo tee /apps/k2s/.env > /dev/null <<'ENVEOF'\n<contents>\nENVEOF")

# Apply: pull + up -d (NEVER down)
exec_on_node(ip, "cd /apps/k2s && sudo docker compose pull && sudo docker compose up -d --remove-orphans")
  • docker compose up -d is the only way to apply .env changes (docker restart reuses old env).
  • If sidecar & k2s need different tags: sudo sed -i "s#k2s:\${K2_VERSION:-latest}#k2s:v0.4.6-<sha>#" /apps/k2s/docker-compose.yml.
  • Brand-new node (from nothing)? That's provisioning → references/provisioning.md.

Script execution (mandatory form)

For scripts in docker/scripts/, pipe via SSH stdin — never inline a large script or base64:

exec_on_node(ip, "sudo bash -s", { scriptPath: "docker/scripts/provision-node.sh", timeout: 300 })   # root
exec_on_node(ip, "bash -s", { scriptPath: "docker/scripts/enable-ipv6.sh" })                          # non-root
Script (docker/scripts/) Purpose Warning
provision-node.sh Full OS prep: TZ Asia/Singapore + Docker CE + IPv6 + BBR + nftables + daemon.json + UFW-Docker + SSH→1022 + journald + auto-update cron Destructive (stops containers). Fresh/rebuild only. sudo + confirmation.
auto-update.sh Daily pull/compare/restart + Slack notify Safe. Cron 04:00 Beijing (needs Asia/Singapore TZ).
enable-ipv6.sh / totally-reinstall-docker.sh Subsets of provision-node.sh Superseded by provision-node.sh.
simple-docker-pull-restart.sh Pull + restart Safe routine update.

§5 Batch Fleet Operations

Local scripts in this skill dir (.claude/skills/kaitu-node-ops/). Need KAITU_CENTER_URL + KAITU_ACCESS_KEY; SSH via KAITU_SSH_USER (default ubuntu) port KAITU_SSH_PORT (default 1022).

Script Purpose Flags
deploy-compose.sh SCP docker/docker-compose.yml to all active nodes (MD5-skip, no restart) --all, --dry-run
update-compose.sh pull + up -d across active nodes, rolling --sleep=N, --node=IP, --dry-run
deploy-auto-update.sh SCP auto-update.sh + install daily cron (idempotent) --all, --node=IP, --dry-run

⚠ DIR-MIGRATION GUARD (active until the whole fleet is on /apps/k2s): the canonical compose now carries name: k2s. Do NOT run deploy-compose.sh / update-compose.sh / deploy-auto-update.sh against the full fleet while any node is still at /apps/kaitu-slave. Pushing the name: k2s compose to an un-migrated node makes its next up -d / nightly auto-update resolve project k2sempty k2s_* volumes (cert + metering state lost) + container_name collision. During the migration window use --node=<already-migrated-IP> only, or run fleet-wide after the sweep completes. Remove this note once all nodes are on /apps/k2s. (Background: spec 2026-06-23-node-deploy-dir-k2s-migration-design.md.)

Node activity heuristic (no explicit status field): active = tunnelCount > 0 and a real name (hk.aliyun.wm01, not IP-as-name) and SSH on :1022. tunnelCount==0 + IP-name = decommissioned; scripts skip them by default (--all includes them).

MCP tools: list_nodes(country?,name?), exec_on_node(ip,command,timeout?=60,scriptPath?), ping_node(ip), delete_node(ip). exec_on_node returns status (success/ssh_error/timeout); check exitCode for pass/fail; stdout capped 10k, stderr 2k, both secret-redacted.


§6 Post-Deployment Verification

MANDATORY after deploying to a new/restarted node. In order:

# Check Command Expected
0 Timezone timedatectl | grep 'Time zone' Asia/Singapore (+08, +0800)
1 Containers up cd /apps/k2s && docker compose ps all Up, sidecar (healthy), no k2-oc orphan
2 Sidecar registered docker logs --tail 30 k2-sidecar | grep "Registration completed" tunnels=1
3 Tunnel domain docker logs --tail 30 k2-sidecar | grep "Tunnel registered" {ipv4-dashes}.sslip.io, created=true
4 k2s ready docker logs --tail 20 k2s | grep "server ready" k2s server ready listen=:443
5 Container net docker exec k2-sidecar wget -qO- --timeout=5 https://api.ipify.org node's public IP
6 MCP cross-check list_nodes(name=<node>) one tunnels entry with the sslip.io domain
7 Port mapping docker port k2s 22 maps: 443/tcp + 443/udp + 40000-40019/udp

Metering nodes — also (when K2_NODE_BILLING_START_DATE is set):

Check Command Expected
Meter initialized docker logs k2-sidecar | grep "Traffic monitor initialized" billingDate=… limitGB=… rx=… tx=…
Reporter running docker logs --tail 80 k2-sidecar | grep usage-reporter usage-reporter-start + periodic usage-reporter-cycle-ok cumulative=… quotaTotal=… (cumulative climbs)

If the reporter line is absent: confirm K2_NODE_BILLING_START_DATE (+ K2_NODE_TRAFFIC_LIMIT_GB) + K2_NODE_SECRET are present; otherwise the sidecar logs Metering disabled: no billing date and runs uncapped. Deep metering ops → references/metering.md.


§7 Troubleshooting

iptables-nft incompatibility (Ubuntu 20.04)

Symptom: sidecar loops Failed to get IPv4 address: all ipv4 services failed (i/o timeout); host curl fine but docker exec net calls fail. Cause: iptables on nf_tables backend; Docker NAT needs legacy. MASQUERADE fails silently. Detect: iptables --version → BAD (nf_tables), GOOD (legacy). Fix:

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo systemctl restart docker
cd /apps/k2s && sudo docker compose up -d --remove-orphans

(provision-node.sh step 2 handles this for new nodes.)

Sidecar restart loop (no registration)

docker compose ps shows sidecar restarting; logs repeat Detecting missing network info… → error → restart. Causes: (1) container net broken (above), (2) missing/invalid K2_NODE_SECRET, (3) K2_CENTER_URL unreachable. Diagnose via sidecar logs + the container-net check (§6 #5).


§8 Cloud Provider Notes

  • AWS nodes are Lightsail, not EC2. Use aws lightsail … (get-instances, reboot-instance), not aws ec2. Profile default reaches the Lightsail account.
  • Lightsail data-transfer billing = calendar month + first-month proration + max(in,out). This drives how K2_NODE_BILLING_START_DATE / K2_NODE_TRAFFIC_LIMIT_GB must be set. Full rules + worked examples in references/metering.md.

Reference files (Read when needed)

  • references/provisioning.md — provision a NEW private node end-to-end from a Center work item (claim intent → create VPS → OS prep → .env → deploy → self-register → verify). It's the §4 deploy + a Center wrapper + claim env.
  • references/metering.md — build/push the sidecar image (Part A), configure quota & seed mid-cycle usage incl. proration calc (Part C), AWS Lightsail billing rules, observe the recorder (Part D), operate & test the cutoff (Part E).
  • references/device-logs.md — triage a user's uploaded device logs: MCP query/download tools + the k2 client DIAG-log analysis workflow.
Install via CLI
npx skills add https://github.com/kaitu-io/k2app --skill kaitu-node-ops
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator