name: deploy description: > Roll out the latest local code to the Clack GCE VM. Runs scripts/gce-update-image.sh in the background, surfaces each phase (build → push → prune → pull → drain → swap → ready) via a Monitor, reports the downtime, and finally checks whether local tool mappings have diverged from the VM — if so, prompts to push them too. Trigger when the user says "deploy", "deploy again", "deploy now", "ship it", "redeploy", or any near variant.
Deploy to GCE
Orchestrates the standard image-update deploy for the Clack VM. Replaces the manual sequence of "kick off bash, arm monitor, ack each phase, extract downtime."
Step 1 — kick off the deploy in the background
Bash(
command: "bash scripts/gce-update-image.sh",
description: "Deploy",
run_in_background: true
)
Note the returned task_id (e.g. bey7dhw8e) AND the output file path
(/private/tmp/.../tasks/<task_id>.output). You need both.
Step 2 — arm a Monitor with the standard phase filter
Monitor(
description: "deploy progress",
timeout_ms: 900000, # 15 min — safely above the script's 5-min readiness wait
persistent: false,
command: "tail -f <OUTPUT_FILE> | grep -E --line-buffered \"✓|✗|ERROR|error:|failed|denied|no space|Pre-pulling|Draining|Bot idle|Drain timeout|Drain check skipped|Stopping old|Waiting for|Bot is ready|downtime|Step [0-9]+/[0-9]+ : FROM|Successfully built|Successfully tagged|^DONE|New image pulled|Total reclaimed\""
)
That filter catches every phase marker plus the failure modes the script itself surfaces.
Step 3 — acknowledge each phase event tersely
The user is watching the live stream; don't restate what they already see. One sentence per event, matching the marker:
| Event substring | Reply |
|---|---|
Step 1/32 : FROM ... Step 8/32 : FROM |
Build phase. |
Successfully built <sha> |
Built. |
^DONE (after build) |
Pushed. |
Pre-pulling new image (bot still running) |
Pre-pulling. |
Total reclaimed space: <X> GB |
<X> GB reclaimed. |
New image pulled |
Pulled. |
Draining: (N runs, M workers) waiting... |
Draining — N runs, M workers active. |
Bot idle — proceeding |
Drained. |
Drain timeout — still busy |
Drain timed out; swapping anyway. |
Drain check skipped |
No status endpoint; skipped drain. |
Stopping old container and starting new one |
Downtime clock running. |
Waiting for bot to reach 'Clack is ready' |
Polling. |
Step 4 — when the bash task completes, extract the downtime
The bash task fires its own completion notification. On that notification:
Bash(command: "grep -E 'downtime|Bot is ready' <OUTPUT_FILE> | tail -1")
Report it as **Downtime: 28s.** (the actual seconds).
Step 5 — check tool mappings divergence
After downtime is reported, compare every local
data/default_configuration/tool_mapping/*.json against the VM:
DIVERGED=()
for f in data/default_configuration/tool_mapping/*.json; do
name=$(basename "$f")
local_md5=$(md5 -q "$f")
remote_md5=$(gcloud compute ssh clack --zone=northamerica-northeast1-a --quiet \
--command="sudo md5sum /mnt/disks/clack-data/data/default_configuration/tool_mapping/$name | cut -d' ' -f1" 2>/dev/null)
[ "$local_md5" != "$remote_md5" ] && DIVERGED+=("$name")
done
[ ${#DIVERGED[@]} -gt 0 ] && printf 'DIVERGED: %s\n' "${DIVERGED[@]}" || echo "IN SYNC"
If
IN SYNC→ say nothing further beyond the downtime line.If any files diverged → say:
Local tool mappings differ from the VM:
<file1>,<file2>, ... Push them withbash scripts/gce-push-config.sh?Wait for the user's confirmation. If they say yes, run:
Bash(command: "bash scripts/gce-push-config.sh --force 2>&1 | grep -vE 'LIBARCHIVE\\.xattr|known_hosts' | grep -E '✓|Streaming|✗'")Deploy context implies overwrite intent, so
--forceis appropriate here (the safety check is for accidental clobbers, not authorized ones).
Step 6 — handle the stale monitor event
After the bash task completes, the Monitor often emits one final notification
a few minutes later: [Monitor timed out — re-arm if needed.]. That's
expected. Acknowledge with Stale monitor. Idle. and stop.
Failure modes (from gce-update-image.sh)
The script exits non-zero on:
- Container crash during swap → script prints
docker logs --tail 80 clackcommand - 5-min timeout waiting for "Clack is ready" → script prints
docker logs -f clackcommand no space left on device→ boot disk full; the script'sdocker image prune -fbefore pull is meant to prevent this. If it recurs, check/mnt/stateful_partitionusage on the VM.denied: Unauthenticated requeston pull → the configure-docker step failed; usually a one-off and resolved by re-running.
In every case the script's stderr includes a copy-pasteable diagnostic command. Forward it to the user verbatim.
Drain phase (before swap)
Phase 1.5 waits for the running bot to go idle before the hard docker stop, so
an in-flight Claude run isn't killed mid-answer. It probes the running container's
/status endpoint (busy = active query runs OR executing Changes-Workflow runs)
and proceeds the instant the bot is idle.
- A long drain wait is expected, not a hang — the bot is busy. The gate is
bounded (
DRAIN_MAX_WAIT, default 300s) and then swaps anyway, printing what was still active. Don't kill the deploy just because draining takes a while. Drain check skippedmeans the running image predates the/statusendpoint (e.g. the very first deploy of this feature). Harmless — it proceeds.
Gotchas
- macOS
._*xattr files show up in tar-pipe diffs but are not real content differences. Thegce-push-config.shsafety check still flags them — that's a known false positive. Use--forceto bypass. - The skill is
image-only— it does NOT pushconfig.json,mcp.json, ordefault_configuration/. Those live on the persistent disk and needgce-push-config.sh. The tool-mapping check at Step 5 catches the most common case where this matters. - Don't poll for completion. The Bash background task and the Monitor both notify automatically.