name: synapse-installer description: Build and maintain the Synapse auto-installer (v0.6) — pure-bash setup.sh + supporting helpers + bats tests. Use when the user asks to "work on the installer", "improve setup.sh", "add a pre-flight check", "make installation easier", or anything under installer/ / setup.sh in the repo.
Working on the Synapse auto-installer
The installer is v0.6's deliverable: a one-command flow that takes a
fresh VPS to a running Synapse with TLS, secrets, and a registered
admin user. The phased history is captured in the v0.6 section of
docs/ROADMAP.md; the installer code itself
(setup.sh + installer/) is the source of truth for file layout and
behaviour.
When to use this skill (vs. synapse-feature)
- This skill: anything under
setup.sh,installer/, the newcaddyprofile indocker-compose.yml, or operator-facing CLI likesynapse status/upgrade/backup/doctor/uninstall. synapse-featureskill: adding REST endpoints, dashboard UI, Go packages. Don't use that workflow for bash-only changes.
Bash conventions
The plan picks bash deliberately (over Go) so the installer runs on any Linux VPS without a build step. Follow these or you'll regret it when the script fails on a fresh Debian and you can't tell why:
#!/usr/bin/env bash
set -euo pipefail
# Trap on exit to clean up partial state. Defined before any work
# starts so even an early `set -u` failure on an unset variable
# triggers it.
trap 'on_exit $?' EXIT
readonly INSTALLER_VERSION="0.6.0"
readonly LOG_FILE="/var/log/synapse-install.log"
# All sourced helpers live in installer/lib/, dot-included once at
# the top. Avoid sourcing inside conditional blocks — makes the
# control flow harder to follow.
. "$(dirname "$0")/installer/lib/detect.sh"
. "$(dirname "$0")/installer/lib/port.sh"
Rules:
set -euo pipefailat the top. Every script. Catches typos, unset vars, broken pipes.- Quote every variable.
"$foo"not$foo. Even when you "know" it has no spaces — it'll have spaces eventually. - Use
[[ ]]not[ ]for tests.[[ -n "$foo" ]],[[ "$a" == "$b" ]]. Standard bash, not POSIX-portable, but we're bash-targeted explicitly. localeverything inside functions.local foo="bar". Otherwise variables leak into the caller.- Functions return integers (0 = pass, non-zero = fail). Use stdout
for return values that need to be captured:
port=$(find_free_port 8080). - Echo to stderr for status, stdout for capturable output.
>&2 echo "checking docker..."for the user-visible line;echo "$port"for the value the caller will assign. shellcheckclean. CI runsshellcheck -x setup.sh installer/**/*.sh. No exceptions; if shellcheck is wrong, document why with a# shellcheck disable=SC2034 reason: ...comment.
File layout (v0.6.0)
The plan spells this out in detail. Quick reference:
convex-synapse/
├── setup.sh # the entry point
├── installer/
│ ├── install/ # phase scripts (one per major step)
│ │ ├── preflight.sh
│ │ ├── secrets.sh
│ │ ├── caddy.sh
│ │ ├── compose.sh
│ │ ├── verify.sh
│ │ └── ui.sh
│ ├── templates/ # files we render or append
│ │ ├── env.tmpl
│ │ ├── caddy.fragment
│ │ └── caddy.standalone
│ ├── lib/ # pure-function helpers
│ │ ├── detect.sh # has_docker, has_caddy, …
│ │ └── port.sh # find_free_port, port_in_use
│ └── tests/ # bats tests
│ ├── lib_test.bats
│ ├── preflight_test.bats
│ └── fixtures/
│ ├── debian.Dockerfile
│ ├── ubuntu.Dockerfile
│ └── fedora.Dockerfile
└── docker-compose.yml # gains optional `caddy` profile
Color + UI conventions (installer/install/ui.sh)
The installer is a product, not a script. Every output goes
through the helpers below — never raw echo for user-facing lines.
# Color codes only via these wrappers — never hardcoded ANSI escapes
# scattered through the codebase.
ui::success() { printf '\033[32m✓\033[0m %s\n' "$*"; }
ui::warn() { printf '\033[33m!\033[0m %s\n' "$*"; }
ui::fail() { printf '\033[31m✗\033[0m %s\n' "$*" >&2; }
ui::info() { printf '\033[36mℹ\033[0m %s\n' "$*"; }
# A spinner for long ops (image pulls, docker compose up).
ui::spin() {
local msg="$1"; shift
"$@" &
local pid=$!
local spin='-\|/'
local i=0
while kill -0 "$pid" 2>/dev/null; do
i=$(((i + 1) % 4))
printf '\r%s %s' "${spin:$i:1}" "$msg"
sleep 0.1
done
wait "$pid" || return $?
printf '\r✓ %s\n' "$msg"
}
# Yes/No prompt. Default = N when --non-interactive.
ui::confirm() {
local prompt="$1" default="${2:-N}"
[[ "${SYNAPSE_NON_INTERACTIVE:-}" == "1" ]] && {
[[ "$default" == "Y" ]] && return 0 || return 1
}
local hint
[[ "$default" == "Y" ]] && hint="[Y/n]" || hint="[y/N]"
read -rp "$prompt $hint " ans
ans="${ans:-$default}"
[[ "$ans" =~ ^[Yy]$ ]]
}
Don't reinvent these. Anything that needs to print to the
operator goes through ui::*.
Pre-flight check pattern
Every check in preflight.sh follows the same shape:
# Returns 0 (pass), 1 (warn — recoverable / offer to fix), 2 (fail — abort).
preflight::check_docker() {
if ! command -v docker >/dev/null 2>&1; then
ui::fail "Docker is not installed."
ui::info " Install with: curl -fsSL https://get.docker.com | sh"
if ui::confirm "Install Docker now?"; then
curl -fsSL https://get.docker.com | sh || return 2
ui::success "Docker installed."
return 0
fi
return 2
fi
local version
version=$(docker version --format '{{.Server.Version}}' 2>/dev/null || echo 0)
if [[ "$(printf '%s\n' "20.10" "$version" | sort -V | head -n1)" != "20.10" ]]; then
ui::fail "Docker $version detected. 20.10+ required."
ui::info " Upgrade with: sudo apt-get install -y docker-ce"
return 2
fi
ui::success "Docker $version"
return 0
}
The wrapper aggregates results:
preflight::run_all() {
local fails=0
for check in check_os check_arch check_sudo check_docker check_compose \
check_disk check_ram check_outbound check_dns; do
preflight::"$check" || (( fails += $? ))
done
(( fails == 0 ))
}
Idempotency
The most important contract. Re-running setup.sh on a working install must not break it. Specifically:
.envis generated fromtemplates/env.tmplonly when missing. Existing.envis parsed, validated, and re-used. Secrets are NEVER regenerated (would invalidate every JWT in flight).- The Caddyfile fragment is appended only if it's not already there.
Use
grep -q "^${MARKER_BEGIN}$" /etc/caddy/Caddyfilewith a marker comment so we can tell our block from the operator's. docker compose up -dis naturally idempotent; let Docker do the diff.- Re-running with
--upgradetriggers a different code path (pull- restart with backup); without
--upgrade, the script does a health check and exits "already installed".
- restart with backup); without
Failure handling
on_exit() {
local code=$1
if (( code != 0 )); then
ui::fail "Install failed at step: ${CURRENT_STEP:-unknown}"
ui::info "Full log: $LOG_FILE"
# Restore Caddyfile backup if we made one
[[ -f /etc/caddy/Caddyfile.synapse-backup ]] && {
sudo mv /etc/caddy/Caddyfile.synapse-backup /etc/caddy/Caddyfile
ui::warn "Restored Caddyfile from backup."
}
# Stop any partial compose stack
[[ -d /opt/synapse ]] && (cd /opt/synapse && docker compose down 2>/dev/null) || true
fi
}
Each phase script sets CURRENT_STEP at the top so the trap message
is informative.
Testing approach (bats + Docker)
Bash is hard to unit-test, but doable with bats inside disposable Docker containers. The structure:
installer/tests/
├── fixtures/
│ ├── debian.Dockerfile # FROM debian:12 — bare, no Docker
│ ├── ubuntu.Dockerfile # FROM ubuntu:24.04 — Docker pre-installed
│ └── fedora.Dockerfile # FROM fedora:40 — different package manager
├── lib_test.bats # pure-function tests (port.sh, detect.sh)
├── preflight_test.bats # exercise each check on each fixture
└── e2e_test.bats # full setup.sh against a fixture
Pure-function tests run fast (no Docker):
@test "find_free_port returns the input when free" {
run find_free_port 65000
[ "$status" -eq 0 ]
[ "$output" = "65000" ]
}
@test "find_free_port skips taken ports" {
nc -l 65001 &
PID=$!
run find_free_port 65001
kill "$PID"
[ "$status" -eq 0 ]
[ "$output" = "65002" ]
}
Fixture-based tests run inside the container:
@test "preflight passes on debian:12" {
run docker run --rm -v "$BATS_TEST_DIRNAME/..:/installer" \
synapse-installer-test:debian \
bash /installer/setup.sh --doctor
[ "$status" -eq 0 ]
}
CI runs both. Keep fixtures lean — they should boot in <5s each.
Common gotchas
These are general bash/shell pitfalls. The next section ("Real-world bugs caught on the synapse-test VPS") has the v0.6.0-specific bug list from the chunk-7 → fix-up chain — read both before adding a new chunk.
/bin/shis not bash on Debian. Curl-pipe-shell installers often hit this. Always#!/usr/bin/env bashat the top of every file, and run viabash setup.shnotsh setup.sh.mktempsyntax differs across systems. Usemktemp -d(POSIX) notmktemp -d /tmp/foo.XXXX(BSD vs GNU differ on the suffix).grep -Pis GNU-only. Stick to ERE (grep -E) so macOS dev loops work.readlink -fis GNU-only. On macOS userealpathor pipe throughpython3 -c 'import os; print(os.path.realpath(...))'.sudomay not be installed in container fixtures. Detect withcommand -v sudoand fall back to direct execution when running as root.- The Docker socket isn't mounted in test fixtures. Tests that
hit
docker psneed a different fixture or stubbed binary. set -edoesn't catch failures insideif/while/||. This is fine — those are the bash idioms — but it meansif maybe_fail; then ...will continue past a failure. Be intentional.- Color codes break in CI logs. Wrap them in a
[[ -t 1 ]]check — only emit ANSI when stdout is a TTY.
Real-world bugs caught on the synapse-test VPS
Across v0.6.0 chunk 7, fix-ups #23/#24/#25, and the v0.6.1/v0.6.2/v0.6.3 sessions, real-VPS smokes on a fresh Hetzner CPX22 (Ubuntu 24.04) surfaced 13 distinct bug classes that the bats suite alone could not catch. Each one is now a regression test; read this list before adding a chunk so the lessons don't have to be relearned:
[[ -n "$X" ]] && cmdat the end of a function — when the test is false, the function returns 1 andset -eaborts the whole script. Use explicitif/fifor any top-level conditional. Fixed insetup.sh::phase_banner,secrets.sh::ensure_env, andcaddy.sh::_render.docker compose pullon services withbuild:— synapse and dashboard have no published image, so pull returns "pull access denied" and aborts. Useup -d --build(which builds local services and pulls the rest) instead of pull-then-up.- Missing
jqanddigon a fresh Ubuntu — the installer shells out to both.phase_install_depsin setup.sh apt-get-installs them as part of the flow; preflight checks are insufficient because they don't auto-install. - camelCase API responses, not snake_case — Synapse follows the
Convex Cloud OpenAPI shape.
accessToken,projectId,convexUrl, NOTaccess_token/project_id/convex_url.verify.shextracts with both as fallback. - Convex backend image needs pre-pull — Synapse calls
docker runagainstghcr.io/get-convex/convex-backend:latestdirectly when provisioning the first deployment; without the image already pulled it 500s.phase_compose_uprunsdocker pullaftercompose up. --no-tls+verify::check_cli_credsis incompatible by design — without a domain,SYNAPSE_PUBLIC_URLis empty and CLI URLs fall back to loopback.verify::run --skip-cli-url-checkopts out. setup.sh passes the flag automatically whenNO_TLS=1orDOMAIN=="".SYNAPSE_PUBLIC_URLempty on--no-tlsblanks the dashboard from a remote browser — Next.js bakes the URL at build time, so the JS bundle hard-codedlocalhost:8080.setup.shnow callsdetect::public_ip(api.ipify.org → ifconfig.me) when DOMAIN is empty and useshttp://<ip>:<port>as the public URL.NEXT_PUBLIC_*is a build-time arg, not runtime env — even after passing the rightPUBLIC_SYNAPSE_URLvalue to docker-compose, the dashboard image still hadlocalhost:8080baked in because the Dockerfile usesARG NEXT_PUBLIC_*with a default. docker-compose.yml now passes it asbuild.args, not justenvironment:.SYNAPSE_PUBLIC_URLlived in.envbut never reached the synapse-api container —.envwas used for compose variable expansion only. The synapse service'senvironment:block didn't reference it. Container env was empty;config.PublicURLparsed to ""; rewrite was a no-op. Fixed indocker-compose.yml.trap RETURNfires on every nested function return, not just the trap-setting function.lifecycle::upgradeset up a trap torm -rf $tmp_clone— which fired the momentui::spinreturned from the snapshot phase, deleting the clone target before rsync ran. Fix: wrap the inner logic, cleanup once on the outer wrapper's return, hand the path back viaprintf -v+ a non-shadowing var name. Seelifecycle::upgrade/lifecycle::_upgrade_inner.${SYNAPSE_VERSION}as a docker tag rejects/. The upgrade flow stampedfeat/installer-upgradeinto.envafter a successful upgrade; the next compose build taggedsynapse-dashboard:feat/installer-upgrade— Docker rejected with "invalid reference format". Two-pronged fix: pinsynapse:local/synapse-dashboard:localin docker-compose (decouple tag from version) AND sanitize/→-in the stamp belt-and-suspenders.- Compose volume names defy prediction. The pgdata volume on
one VPS was
synapse_synapse-pgdataeven though the install dir was/opt/synapse-test(compose project name doesn't always track basename —COMPOSE_PROJECT_NAMEenv, parent dir of compose-file, operator overrides all interact). Don't try to predict; iteratedocker volume ls -q | grep 'synapse-pgdata$'and rm everything that matches. pg_isreadyreturns 0 during postgres's first-init shutdown cycle. A fresh pgdata triggers init → user-create → SHUTDOWN → real-restart.pg_isreadypasses during the first boot, then psql connections fail during the shutdown window with "the database system is shutting down". Wait forpsql -tAc 'SELECT 1'to succeed, not pg_isready. (Also:bash -c "pg_dump | gzip > out"doesn't auto-inheritset -o pipefail; either set it inside the bash -c or decompress to a sibling .sql file and use a< fileredirect — no pipe.)
Plus three follow-ups specific to v0.6.3 (first-run wizard cleanup):
- Convex API uses
POST /<resource>/delete, NOT HTTPDELETE.verify::_curl DELETE4xx'd silently because ofcurl -f. teams.creator_user_id ON DELETE RESTRICTblocks any row-level user delete. TRUNCATE … CASCADE is the surgical idiom for "reset metadata to factory state" after the self-test.- The
/setuproute usesprojectIdas the URL segment, notprojectSlug. The wizard tried to redirect with the slug; the project page fetched it as a UUID and 404'd. Match the existingapp/teams/[team]/page.tsxlink convention.
These are the bug classes a bats suite alone CANNOT catch. Real-VPS validation is part of "done" for any change that touches setup.sh, installer/, docker-compose.yml, lifecycle.sh, the auth/wizard dashboard surface, or a backend handler that emits a URL.
Don't add (anti-features from the plan)
- 20-question wizard. Default-everything-except-domain.
- Web installer that runs before the dashboard exists.
- VPS provisioning (Terraform / cloud APIs). Out of scope.
- Multi-host orchestration (K8s / Helm — v1.0+).
- Custom config language. Render
.envfrom a template; that's it. - Auto-running
synapse upgradefrom cron without explicit opt-in. - Telemetry that sends customer data. Anonymous-only, opt-in, source-visible.
Real-VPS smoke test workflow
The operator provisioned a Hetzner CPX22 dedicated to integration
testing. SSH alias synapse-vps (defined in ~/.ssh/config,
backed by ~/.ssh/synapse-test-vps). Credentials in /.vps/
(gitignored). Reset is free — operator clicks one button on the
Hetzner Cloud Console.
For any change that touches setup.sh, installer/, docker-compose.yml,
or a backend handler that emits a URL:
# 1. Push your branch
git push -u origin <branch>
# 2. Tear down the previous test install (preserves nothing)
ssh synapse-vps 'docker compose -f /opt/synapse-test/docker-compose.yml down -v 2>/dev/null
docker rm -f $(docker ps -aq --filter label=synapse.managed=true) 2>/dev/null
rm -rf /tmp/convex-synapse /opt/synapse-test'
# 3. Clone the branch and run setup.sh end-to-end
ssh synapse-vps 'cd /tmp && git clone -b <branch> https://github.com/Iann29/convex-synapse.git
cd convex-synapse && bash setup.sh --no-tls --skip-dns-check --non-interactive --install-dir=/opt/synapse-test'
# 4. Validate from outside (your dev machine, NOT the VPS)
curl -sf http://178.105.62.81:8080/health # synapse healthy?
curl -sf -o /dev/null -w "%{http_code}\n" http://178.105.62.81:6790/register # dashboard renders?
If something needs a clean OS image (kernel state, package cache), ask the operator to reset via the Hetzner console — they offered free resets for exactly this reason.
When you're stuck
- Re-read the
installer/layout table in this skill +CLAUDE.md— most "where do I put X?" questions are answered there. - Check what Coolify did for the equivalent feature: https://github.com/coollabsio/coolify/blob/main/scripts/install.sh
- Run
shellcheckearly and often — it catches half the bugs before they hit a test. - Test on a fixture first, real VPS second. The fixture catches "works on my Linux but not Debian" failures cheaply.
- If a remote browser sees something different from what curl shows, it's a build-time vs runtime config gap (see bug #8 above).
What "done" looks like (per ticket)
A v0.6 ticket is done when:
-
bash setup.sh --doctorpasses on the relevant fixture(s) -
shellcheckclean - bats tests cover the new logic
- CI's installer job is green (lint + bats)
- Real-VPS smoke passes for any setup.sh / compose / handler-URL change (
ssh synapse-vpsworkflow above) - README's Quickstart still reflects reality after each phase
- Commit message body lists the test fixture(s) you ran against