name: private-node-provisioning description: End-to-end runbook for an AI agent to provision a dedicated private node (专属节点) VPS — claim a provisioning intent from Center, stand up the VPS, deploy the k2s tunnel stack (single compose file), and let the node self-register so Center activates the owner's subscription. The sidecar self-meters host-NIC usage to Center. triggers: - private node - 专属节点 - provision private node - provisioning intent - claim provisioning - provision job - private node deploy - dedicated node - K2_PRIVATE_CLAIM
Private Node Provisioning (专属节点 开通)
Use this skill when an external Claude Code agent (holding the kaitu-center MCP) provisions a dedicated private node end-to-end: claim a provisioning intent from Center, create a VPS, OS-prep it, deploy the k2s tunnel stack, and verify the node self-registers so Center flips the owner's subscription to active. The node's sidecar self-meters host-NIC usage to Center.
This is a sibling of kaitu-node-ops. Where that skill operates existing shared-pool nodes, this one creates a private node from a Center work item. All the safety guardrails, architecture identification, exec_on_node script-pipe rules, and post-deployment verification from kaitu-node-ops apply verbatim — read it for any operation not covered here. Never read/display/modify another node's K2_NODE_SECRET; use pull + up -d, never down.
Architecture context (read before the loop)
Capability matrix — a private node changes who routes through it, not the tunnel stack:
| Surface | Routes through | Metering |
|---|---|---|
| App (iOS / Android / desktop) | shared pool (k2subs picks) | per-node, no cutoff |
| Customer router / dedicated VPS | private node (single-tenant) | self-metered, owner quota cutoff |
A private node is the exact same k2s + k2-sidecar stack as a shared node (see kaitu-node-ops Step 2), deployed from the same single docker-compose.yml. The only difference is one .env variable — K2_PRIVATE_CLAIM — which the sidecar reads and which switches on two behaviors, both inside the sidecar:
- Claim carriage (activation) — the sidecar carries
K2_PRIVATE_CLAIMon registration. Center flips that node'sClass=private, binds the owner, and activates the owner's subscription. Activation is not in the agent's hands. - Host-NIC self-metering (cost gate) — when
K2_PRIVATE_CLAIMis set, the sidecar runs a metering loop: it reads the host NIC byte counters (/host/proc/net/dev, already mounted) andPOSTs cumulative bytes to{centerURL}/slave/usage(Basic authbase64(ipv4:secret), same node credentials it registers with). Center records the usage into the owner's quota ledger. Enforcement is Center-side and automatic: at ≥95% of sold quota Center rejects new device auth (402) and hides the node from/api/tunnels+/api/subs— no client lands on an over-quota node. The k2s data-plane is not involved in metering. Shared-pool nodes (noK2_PRIVATE_CLAIM) never start the reporter → byte-for-byte identical to today.
Why host-NIC, not k2s app-bytes: the NIC counter is the number the provider actually bills, is provider-agnostic (works where the provider has no traffic API), and keeps the tunnel data-plane decoupled from billing. (This supersedes the retired "Option D" in-k2s reporter — there is no longer a
docker-compose.private.ymloverride.)
Step 0: Preconditions / identity
Before the loop:
- The agent runs the
kaitu-centerMCP with thecloudandcloud.writepermission groups (claim/report and instance creation arecloud.write; list/get arecloud). - SSH access to the new VPS uses the cloud account's default keypair. The agent must be able to reach the new instance over SSH (port 22 initially, 1022 after hardening — see Step 4). ⚠ Open question (agent-provisioning spec §8): multi-provider SSH-key acquisition is only solved for providers with a default-keypair API (e.g. Lightsail). For residential-IP / other providers the SSH access path is TBD — if the operation's provider has no known key path,
update_node_operation(status=failed)and escalate. claimTokenandK2_NODE_SECRETare machine secrets: never echo them to logs, the conversation, or anyupdate_node_operationcall. Write them to.envvia heredoc only (Step 5).
Step 1: Claim an intent
list_node_operations(action=provision, status=queued)
Returns data.items[] (paginated via page / pageSize). Pick one operation, then atomically lease it:
claim_node_operation(id=<operationId>, holder=<agent-id>, leaseSeconds=600?)
leaseSeconds defaults to 600 if omitted. The response is { data.operation, data.identity } (data.identity is present only for action=provision). The provision spec fields below live on data.operation.params.*. Capture:
| From claim response | Goes to | Notes |
|---|---|---|
data.identity.claimToken |
.env K2_PRIVATE_CLAIM |
ONE-TIME — only ever returned by this call, never shown again. Bake into .env immediately; never log it. |
data.identity.centerUrl |
.env K2_CENTER_URL |
Center base URL (the sidecar uses it for both registration and usage reporting). |
data.identity.domain |
.env K2_DOMAIN |
Empty → leave empty, sidecar auto-derives {ipv4-with-dashes}.sslip.io. |
data.operation.params.region |
create_cloud_instance region + .env K2_NODE_REGION |
Map to the provider's region identifier (Step 2). |
data.operation.params.trafficTotalBytes |
.env K2_NODE_TRAFFIC_LIMIT_GB |
Derive GB = trafficTotalBytes / (1024^3). The sold quota (e.g. 950G on a 1T bundle). |
data.operation.params.ipType |
provider / bundle selection (Step 2) and .env K2_IP_TYPE |
residential vs non-residential. Pass it through verbatim — the sidecar reports it to Center so the node is flagged as a 住宅IP / residential exit. Center normalizes any unexpected value to unknown, so use exactly residential / non_residential / unknown. |
data.operation.subId |
instance name = pn-<subId> + .env K2_NODE_NAME |
Deterministic naming → idempotency root. |
Note (post-decoupling): the deploy task carries only business inputs (
region,trafficTotalBytes,ipType). Whoever provisions chooses the concreteprovider/bundle/image/k2Versionthat satisfies them (Step 2) — pick a bundle whose included transfer comfortably exceeds the soldtrafficTotalBytesso provider overage never triggers.
If claim returns an error envelope (409-ish: already claimed / not found): do not retry blindly. Re-run list_node_operations(action=provision, status=queued) — someone else took it — and pick another, or exit if the queue is empty.
Idempotent re-entry: before creating, probe list_cloud_instances for an instance already named pn-<subId>. If one exists and is running, reuse it (a prior run was interrupted) — skip Step 3's create and resume at Step 4. This is the guard against orphan VPSes.
Step 2: Discover account / region / plan / image
Map the job's abstract spec fields to concrete provider arguments. You choose the bundle/image (the task does not dictate them) — pick a bundle whose included transfer ≥ the sold trafficTotalBytes with headroom, and pin a known-good k2Version (not :latest):
list_cloud_accounts → pick account_name for the chosen provider
list_cloud_regions → map job.region → create_cloud_instance region
list_cloud_plans → choose a plan/bundle whose transfer ≥ sold quota → create_cloud_instance plan
list_cloud_images → choose an Ubuntu 20/22/24 image → create_cloud_instance image_id (provision-node.sh is Ubuntu-only)
Step 3: Create the VPS
create_cloud_instance(
account_name=<from list_cloud_accounts>,
region=<mapped>,
plan=<chosen bundle>,
image_id=<chosen image>,
name=pn-<subId>
)
As soon as the instance ID and public IPv4 are known, report progress so Center sees the work advancing (pass them inside result):
update_node_operation(id=<operationId>, status=in_progress, result={ instanceId: <...>, ipv4: <publicIPv4> })
For
action=provision,update_node_operationacceptsin_progressorfailed.doneis REJECTED here — Center rejects it. The terminal completion (done) is set by the node itself at self-registration (Step 7).
Step 4: OS provision
- Wait for SSH reachability. New instances answer on port 22 first;
provision-node.shstep 13 hardens SSH to port 1022 only, so after provisioning all subsequentexec_on_nodecalls use 1022. Useping_node/ a trivialexec_on_nodeto confirm reachability before proceeding. - Run the full OS prep via the mandatory script-pipe form (reads the local file, pipes over SSH stdin — never inline a large script):
exec_on_node(ip=<ipv4>, "sudo bash -s", { scriptPath: "docker/scripts/provision-node.sh", timeout: 300 })
provision-node.sh is a 16-step, idempotent, root-required script: timezone (Asia/Singapore) → snapd removal → swap → Docker CE → IPv6 → BBR → Docker daemon.json → UFW-Docker → SSH 22→1022 → journald persistence + crash monitor → auto-update cron. It runs before any compose deploy and prepares /apps/kaitu-slave/ as the deploy dir. It is destructive (stops containers) — fine on a fresh node.
Step 5: Write /apps/kaitu-slave/.env (heredoc — secrets never on the command line)
Write via exec_on_node with a heredoc so secrets never appear in the process list / shell history:
exec_on_node(ip=<ipv4>, "sudo tee /apps/kaitu-slave/.env > /dev/null <<'ENVEOF'\n<contents>\nENVEOF")
Exact variables and their sources:
.env variable |
Value / source | Consumer | Secret? |
|---|---|---|---|
K2_NODE_SECRET |
agent-generated openssl rand -hex 32 |
sidecar — registration and the Center Basic-auth (ipv4:secret) for usage reporting |
YES — never log |
K2_PRIVATE_CLAIM |
identity.claimToken from Step 1 |
sidecar — carried on registration → Center flips Class=private + owner + activates sub; also switches on host-NIC self-metering |
YES — one-time, never log |
K2_CENTER_URL |
identity.centerUrl |
sidecar (registration + /slave/usage) + k2v4-slave |
no |
K2_DOMAIN |
identity.domain, or empty |
sidecar (empty → auto {ipv4-with-dashes}.sslip.io) |
no |
K2_VERSION |
chosen pin (not :latest) |
image tags | no |
K2_NODE_TRAFFIC_LIMIT_GB |
job.trafficTotalBytes / 1024^3 |
display/load reporting | no |
K2_NODE_NAME |
pn-<subId> |
registration meta | no |
K2_NODE_REGION |
job.region |
registration meta | no |
K2_IP_TYPE |
job.ipType (residential / non_residential; omit → unknown) |
sidecar — reported on registration → Center records SlaveNode.ip_type (drives 住宅IP visibility in /api/v20260717/tunnels + admin/MCP). Last-writer-wins with ops update_node. |
no |
How the private node differs (all in the sidecar, base compose only):
K2_PRIVATE_CLAIM→ the sidecar (base compose passesK2_PRIVATE_CLAIM=${K2_PRIVATE_CLAIM:-}). The sidecar registers and carries the claim (Center activates), and — because the claim is non-empty — starts the host-NIC usage reporter.- The sidecar already holds
K2_NODE_SECRETandK2_CENTER_URL(base compose) and auto-detects its public IPv4 at registration, so it needs no extra env to report usage. There is no k2s-side metering and no compose override.
Step 6: Deploy the stack (single compose file)
SCP the canonical base file to /apps/kaitu-slave/, then bring up:
- SCP
docker/docker-compose.ymlto/apps/kaitu-slave/(use thescriptPathupload form perkaitu-node-ops, e.g.exec_on_node(ip, "sudo tee /apps/kaitu-slave/docker-compose.yml > /dev/null", { scriptPath: "docker/docker-compose.yml" })). - Also deploy
users(empty → pure remote auth),auto-update.sh, andk2s-crash-monitor.shperkaitu-node-opsStep 5 (the crash-monitor + cron steps ofprovision-node.shlook for these in/apps/kaitu-slave/). - Bring up:
exec_on_node(ip=<ipv4>, "cd /apps/kaitu-slave && docker compose -f docker-compose.yml up -d")
Private vs shared is the .env, not the compose: the same docker-compose.yml deploys both. A node is private iff its .env carries K2_PRIVATE_CLAIM — the sidecar then registers the claim (Center activates) and self-meters. A shared-pool node omits K2_PRIVATE_CLAIM → the sidecar registers normally and never calls /slave/usage. There is no separate override file.
Updates use
pull + up -d, neverdown.
Step 7: Verify
Run the kaitu-node-ops post-deployment checklist (containers Up, sidecar healthy, tunnel domain derived, container outbound network, port mapping) plus these private-node specifics:
| Check | Command | Expected |
|---|---|---|
| k2s healthy + tunnel ready | docker logs --tail 20 k2s | grep "server ready" |
k2s server ready listen=:443 |
| Sidecar registered (carried claim) | docker logs --tail 30 k2-sidecar | grep "Registration completed" |
tunnels=1 |
| Usage reporter started | docker logs --tail 80 k2-sidecar | grep "usage-reporter" |
DIAG: usage-reporter-start (and periodic usage-reporter-cycle-ok) |
| Node visible in Center | list_nodes(name=pn-<subId>) |
one tunnels entry with the sslip.io domain |
| Usage recorded in Center | get_cloud_instance for the node (or admin cloud list) |
trafficUsedBytes advancing after some traffic |
| Operation flipped to done | list_node_operations(action=provision, status=done) (or check the operation) |
operation is done — set by node self-registration, NOT by update_node_operation |
If the usage-reporter line is absent, re-check that K2_PRIVATE_CLAIM and K2_NODE_SECRET are present in .env and that the sidecar registered (the reporter only starts when PrivateClaim != ""). The sidecar logs Usage reporter disabled (not a private node) when the claim is empty.
Step 8: On failure
Any step failing → mark the operation failed so Center frees / alerts on it:
update_node_operation(id=<operationId>, status=failed, error=<concise reason — NEVER include claimToken or K2_NODE_SECRET>)
- Deploy steps are idempotent — within the lease you may self-retry (re-run
provision-node.sh/ re-up -d) before giving up. - Never report
donefor provision — Center rejects it; only the node's self-registration sets the terminal success. If the node never self-registers, Center's timeout-sweep cron marks the sub failed (the authoritative gate, independent of the agent). An agent crash at any point never wedges the sub permanently.
Step 9: Guardrails (mirror kaitu-node-ops)
claimToken+K2_NODE_SECRETare untouchable — never echo to logs, the conversation, or anyupdate_node_operationcall. Write only via heredoc (Step 5); never pass on the command line.- Deterministic naming
pn-<subId>= idempotency root — always probelist_cloud_instancesbefore creating; reuse a running match instead of spawning an orphan. - Re-runs are idempotent —
provision-node.sh,.envwrite, andup -dare all safe to repeat. pull + up -d, neverdown.- Don't touch other nodes — this runbook only ever operates on the instance it just created (
pn-<subId>). Never read/modify theK2_NODE_SECRETor config of any existing node. K2_DOMAINstays empty unless the job supplies a domain — the sidecar auto-derives a globally-unique{ipv4-with-dashes}.sslip.io.- Pick a big-enough bundle — the cost guardrail is bundle sizing: choose a VPS bundle whose included transfer exceeds the sold
trafficTotalBytesso the sidecar's quota cutoff trips before provider overage billing.