c1-dev-stack-in-squire

star 3

Stand up a full c1 dev stack inside a Squire env — process-compose, postgres, envoy, pub-api, pub-auth, be-* services — wired so an external client can drive c1's gRPC surface end to end with TLS + OAuth2 client_credentials. Use when testing a Latchkey or other c1 client against a real (not stubbed) c1 backend, or when reproducing c1 server-side behavior locally. Triggers on: c1 dev env, squire c1 stack, pc/up, dev-util mint-test-client, test against c1, c1 OAuth client_credentials, run c1 integration tests in squire, repro buildkite integration test, TEST_LOCAL_EXEC, api_no_uplift.

robert-chiniquy By robert-chiniquy schedule Updated 6/5/2026

name: c1-dev-stack-in-squire description: >- Stand up a full c1 dev stack inside a Squire env — process-compose, postgres, envoy, pub-api, pub-auth, be-* services — wired so an external client can drive c1's gRPC surface end to end with TLS + OAuth2 client_credentials. Use when testing a Latchkey or other c1 client against a real (not stubbed) c1 backend, or when reproducing c1 server-side behavior locally. Triggers on: c1 dev env, squire c1 stack, pc/up, dev-util mint-test-client, test against c1, c1 OAuth client_credentials, run c1 integration tests in squire, repro buildkite integration test, TEST_LOCAL_EXEC, api_no_uplift.

Standing up a c1 dev stack in a Squire env

This is a runbook with all the friction points encoded. Treat it as a script — if you skip steps the stack flaps and you spend an hour debugging postgres unix sockets.

When to use

  • Driving the Latchkey CLI (or any c1 client) end-to-end against a real c1 pub-api over TLS with a real OAuth-minted Bearer.
  • Reproducing pub-api / be-session / be-innkeeper behavior locally.
  • Producing a self-contained env you can hand to a teammate by SSH-forwarding envoy 2443.

Prerequisites

  • squire CLI authenticated to the gateway (squire login if needed).
  • An entry in /etc/hosts mapping 127.0.0.1 c1dev.c1.ductone.com (one-time; needed because c1's pub-auth resolves the tenant from the Host header and the dev tenant is c1dev on installation domain c1.ductone.com).
  • The default squire image does not ship with c1 cloned, despite what the generic squire-env-management skill claims. We clone it manually.

Step 1 — create the env

squire new c1-dev --no-open
# wait until: squire env | grep c1-dev | awk '{print $4}' == "running"

Avoid --prompt / --open if you're driving the env from your laptop rather than the in-env OpenCode agent.

Step 2 — clone c1 with the envmgr git_token MCP tool

The default-image squire credential helper handles https://github.com/... URLs after the initial clone, but you need a real token to bootstrap. Pull one from the env's MCP gateway at localhost:9877:

squire ssh <env> -- 'set -e
init() {
  curl -sf -i -X POST http://localhost:9877/mcp \
    -H "content-type: application/json" \
    -H "accept: application/json, text/event-stream" \
    -d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2024-11-05\",\"capabilities\":{},\"clientInfo\":{\"name\":\"probe\",\"version\":\"1\"}}}" \
  | grep -i ^mcp-session-id | tr -d "\r" | cut -d" " -f2
}
SID=$(init)
call() {
  curl -sf -X POST http://localhost:9877/mcp \
    -H "content-type: application/json" \
    -H "accept: application/json, text/event-stream" \
    -H "mcp-session-id: $SID" -d "$1"
}
call "{\"jsonrpc\":\"2.0\",\"method\":\"notifications/initialized\"}" >/dev/null
call "{\"jsonrpc\":\"2.0\",\"id\":5,\"method\":\"tools/call\",\"params\":{\"name\":\"git_token\",\"arguments\":{\"repo\":\"ductone/c1\"}}}" \
  | jq -r ".result.content[0].text" | jq -r ".token"
'

Then clone (depth 50 is plenty; full clone is slow over ~3M files):

TOK=ghs_...
squire ssh <env> -- "git clone --depth 50 https://x-access-token:$TOK@github.com/ductone/c1 /data/squire/src/c1
git -C /data/squire/src/c1 config user.email squire@conductorone.com
git -C /data/squire/src/c1 config user.name 'Squire Agent'
git -C /data/squire/src/c1 remote set-url origin https://github.com/ductone/c1.git"

Reset the remote URL so the squire credential helper handles future git push (don't bake the short-lived token into the remote URL — it expires in ~30 min).

Step 3 — pre-fix two known config bugs

Both are env-image quirks, not c1 bugs. Fix them before pc/up so services don't burn max_restarts and get marked Skipped.

Postgres unix socket lock

Postgres tries to /run/postgresql/.s.PGSQL.5432.lock which is root-owned in the squire image. Patch process-compose.yaml to point it at /tmp:

squire ssh <env> -- "mkdir -p /tmp/pg-socket
sed -i '/-c port=5432/a\\      -c unix_socket_directories=/tmp/pg-socket' \
  /data/squire/src/c1/dev/process-compose/process-compose.yaml"

Innkeeper Zoho client id / secret can't be empty

gen-env.sh writes empty strings for the Zoho Manage Engine OAuth provider, but the runtime config validation requires min_len=3. The other OAuth providers have placeholder abc1234 strings — the Zoho ones don't, so innkeeper crashloops on startup. Run after make pc/init:

squire ssh <env> -- "sed -i \
  's/^INNKEEPER_ZOHOMANAGEENGINEOAUTHPROVIDER_CLIENT_ID=\$/INNKEEPER_ZOHOMANAGEENGINEOAUTHPROVIDER_CLIENT_ID=abc1234/;
   s/^INNKEEPER_ZOHOMANAGEENGINEOAUTHPROVIDER_CLIENT_SECRET=\$/INNKEEPER_ZOHOMANAGEENGINEOAUTHPROVIDER_CLIENT_SECRET=abc1234/' \
  /data/squire/src/c1/.dev/env/be-innkeeper.env"

Step 4 — build + bring up

# Cold build of all 21 binaries — about 11 minutes on small flavor.
squire ssh <env> -- "nohup nix develop /data/squire/src/c1#localdev \
  --command bash -c 'export GOOS=linux GOARCH=arm64 && \
    make -C /data/squire/src/c1 pc/build && \
    make -C /data/squire/src/c1 pc/init' > /tmp/pc-build.log 2>&1 &"

# Once that's done (poll for a sentinel file, e.g. '&& touch /tmp/pc-build.done'
# appended to the build command), run pc/up via a launcher script. TWO traps:
#   1. process-compose.yaml uses $PWD-relative paths ($PWD/.dev/...), so the
#      working directory MUST be the repo root — launched from $HOME, every
#      service resolves /home/squire/.dev and postgres/dynamodb/temporal fail
#      with missing dirs/jars.
#   2. `make pc/up` does NOT pass -t=false, so it dies on a non-interactive
#      SSH with `terminal entry not found: term not set`. Use raw
#      process-compose with -t=false.
# A launcher script satisfies both (write locally + scp if your harness blocks
# inline `cd`):
cat > /tmp/pc-launch.sh <<'EOF'
#!/bin/bash
cd /data/squire/src/c1 || exit 1
export GOOS=linux GOARCH=arm64
exec nix develop /data/squire/src/c1#localdev --command \
  process-compose up -t=false -f /data/squire/src/c1/dev/process-compose/process-compose.yaml
EOF
scp /tmp/pc-launch.sh <env>.squire:/tmp/pc-launch.sh
squire ssh <env> -- "chmod +x /tmp/pc-launch.sh; setsid -f /tmp/pc-launch.sh > /tmp/pc-up.log 2>&1"

If you have to kill and relaunch process-compose, kill the ORPHANS too: killing the supervisor leaves postgres holding postmaster.pid + port 5432 (and valkey its port), so the next instance's copies crashloop to Skipped while everything else flaps. Find PIDs via ss -tlnp on 5432/6379/8080/2443 and kill -9 them before relaunching. (pkill -f "postgres -D\|postgres:" does NOT work — \| is not alternation in pkill.) Alternatively, if an orphaned postgres is healthy and serving, adopt it: stop the flapping pc copies via curl -X PATCH http://localhost:8080/process/stop/postgres (and valkey) and carry on.

Process-compose exposes a REST API on localhost:8080:

squire ssh <env> -- "curl -sf http://localhost:8080/processes | \
  jq -r '.data[] | \"\(.name): \(.status)\"' | sort"

Wait until postgres / valkey / pub-api / pub-auth / be-session / be-vault / be-innkeeper are all Running and ensure: Completed. The whole bringup takes 1-2 minutes.

If be-innkeeper: Skipped

If you see this even after the Zoho fix, innkeeper hit max_restarts=30 during the early postgres-flapping period and process-compose gave up. Start it manually:

squire ssh <env> -- "set -a; . /data/squire/src/c1/.dev/env/be-innkeeper.env; set +a;
nohup /data/squire/src/c1/build/linux_arm64/be-innkeeper/be-innkeeper \
  > /tmp/innkeeper.log 2>&1 &"

Then re-run dev-util ensure to populate CrossTenantSettings (innkeeper's init code creates this row on first start; without it, anything that calls tenants.TenantDomain returns dynamo: no item found):

squire ssh <env> -- "set -a; . /data/squire/src/c1/.dev/env/dev-shell.env; set +a;
/data/squire/src/c1/build/linux_arm64/dev-util/dev-util ensure"

Step 4b — file gateway needs minio (Latchkey uploads)

The sealed-file gateway (PUT /file/:fileToken on pub-api — KeyPackage publish, secret-value upload, protocol artifacts) writes to the tenant object store, which gen-env.sh points at a REAL AWS bucket (API_TENANTOBJECTSTORES3_BUCKET_NAME=dev-c1-tenant-objects-us-west-2) with empty creds. Squire envs have no AWS credentials, so every upload 504s (hang) or 500s. Run minio and point pub-api at it — the config has an Endpoint field (proto field 5):

squire ssh <env> -- 'curl -sSL -o /tmp/minio https://dl.min.io/server/minio/release/linux-arm64/minio && chmod +x /tmp/minio
mkdir -p /tmp/minio-data/dev-c1-tenant-objects-us-west-2
MINIO_ROOT_USER=dummyaccesskey MINIO_ROOT_PASSWORD=dummysecret \
  nohup /tmp/minio server /tmp/minio-data --address 127.0.0.1:34567 > /tmp/minio.log 2>&1 &
sleep 3; curl -sf http://127.0.0.1:34567/minio/health/live && echo MINIO_LIVE'
# NOTE: use nohup-in-the-ssh-command, and VERIFY MINIO_LIVE prints — a
# setsid -f launch in a one-shot ssh has died silently here before.

squire ssh <env> -- "sed -i \
  's|^API_TENANTOBJECTSTORES3_ACCESS_KEY_ID=.*|API_TENANTOBJECTSTORES3_ACCESS_KEY_ID=dummyaccesskey|; \
   s|^API_TENANTOBJECTSTORES3_SECRET_ACCESS_KEY=.*|API_TENANTOBJECTSTORES3_SECRET_ACCESS_KEY=dummysecret|; \
   /^API_TENANTOBJECTSTORES3_OBJECT_PREFIX=/a\\API_TENANTOBJECTSTORES3_ENDPOINT=http://127.0.0.1:34567' \
  /data/squire/src/c1/.dev/env/pub-api.env
curl -sf -X POST http://localhost:8080/process/restart/pub-api"

Step 5 — mint a client_credentials pair

The dev-util mint-test-client cmd (PR #17295 / merged) creates a user in the target tenant, promotes them to SystemOwnerRoleId, and mints a personal OAuth2 client. Without this cmd you'd be doing direct postgres inserts.

squire ssh <env> -- "set -a; . /data/squire/src/c1/.dev/env/dev-shell.env; set +a;
/data/squire/src/c1/build/linux_arm64/dev-util/dev-util mint-test-client \
  --tenant-domain=c1dev --log_level=error" 2>&1 | grep -E '^(client_|user_|tenant_)'

Output is grep-able:

client_id=mellow-flatcar-10265@c1dev.c1.ductone.com/pcc
client_secret=secret-token:conductorone.com:v1:eyJrdHk...
user_id=3D5vAVJPtjmttwCTphpWsZ2uVav
tenant_id=3D5ijhr15puycSTgo0ol87hz4yE
tenant_domain=c1dev

Multi-principal tests: pass --user-email. The user is keyed by email (default test-cli@dev.local), NOT by --display-name — re-running the cmd without --user-email mints a new client for the SAME user, which silently defeats any two-principal flow (share-to-self). For a second principal: --user-email=test-cli-b@dev.local --display-name=test-cli-b.

The client_id encodes the tenant's installation domain (c1.ductone.com in this default config). If your env has a different INNKEEPER_INSTALLATION_DOMAIN (squire envs sometimes get squire-specific ones like envoy--<env-id>.us-west-2.squire.ductone.com), the client_id will look different and the laptop /etc/hosts entry won't apply.

Step 6 — drive a client from your laptop

Three pieces of laptop setup:

# (a) tunnel envoy 2443 — squire's own `tunnel` mangles TLS bytes; use ssh -L
ssh -fN -L 12443:127.0.0.1:2443 <env>.squire

# (b) /etc/hosts (one-time, requires sudo)
echo "127.0.0.1 c1dev.c1.ductone.com" | sudo tee -a /etc/hosts

# (c) pull the dev CA fresh — it's regenerated by certgen on each pc/init
scp <env>.squire:/data/squire/src/c1/.dev/pki/service-ca.crt /tmp/c1-dev-ca.pem

Then drive the client. For Latchkey:

latchkey \
  --c1-url https://c1dev.c1.ductone.com:12443 \
  --tls-trust-cert /tmp/c1-dev-ca.pem \
  --tls-server-name localhost \
  --client-id "mellow-flatcar-10265@c1dev.c1.ductone.com/pcc" \
  --client-secret "secret-token:..." \
  vault list

Why these flags:

  • URL host is the tenant subdomain so pub-auth's tenants.SplitDomain finds the c1dev tenant and pub-api's authn middleware accepts the request.
  • --tls-server-name=localhost because the dev cert SAN is localhost plus internal-service DNS names — it doesn't include c1dev.c1.ductone.com. The override tells tonic + reqwest to validate against the localhost SAN while the URL host stays c1dev.c1.ductone.com for routing.
  • The CLI exchanges client_credentials against https://c1dev.c1.ductone.com:12443/auth/v1/token (pub-auth, not the legacy /auth/token) on startup, then uses the access token as Bearer.

Smoke test (30s) — is this env still healthy?

Run this when picking up a paused / older env, or when something looks off mid-test, before spending 15 min re-bringing-up. Three layers: process-compose is alive, OAuth still mints, gRPC still answers.

ENV=<env-name>           # e.g. lk-mint-client
CLIENT_ID="..."          # cached from mint-test-client
CLIENT_SECRET="..."

# (1) Inside the env — pc states + critical service health.
squire ssh "$ENV" -- '
  cd /data/squire/src/c1
  pc/list 2>/dev/null | grep -E "envoy|pub-api|pub-auth|be-session|be-innkeeper|postgres|valkey" \
    | awk "{ printf \"%-20s %s\n\", \$1, \$2 }"
  echo "---"
  curl -ksf https://localhost:2443/healthz/ready && echo "envoy: OK" || echo "envoy: FAIL"
'

# (2) From the laptop — OAuth round-trip against the SSH-forwarded
#     envoy. Returns the access_token if pub-auth + dev CA + tunnel
#     all work end to end.
curl -sf --cacert /tmp/c1-dev-ca.pem \
  --resolve c1dev.c1.ductone.com:12443:127.0.0.1 \
  -d grant_type=client_credentials \
  -d client_id="$CLIENT_ID" \
  -d client_secret="$CLIENT_SECRET" \
  https://c1dev.c1.ductone.com:12443/auth/v1/token \
  | jq -r '.access_token // .error_description // .error' | head -c 80; echo

# (3) Trivial gRPC roundtrip via the CLI. Empty list = stack is
#     healthy and your principal has Latchkey perms.
latchkey \
  --c1-url https://c1dev.c1.ductone.com:12443 \
  --tls-trust-cert /tmp/c1-dev-ca.pem \
  --tls-server-name localhost \
  --client-id "$CLIENT_ID" \
  --client-secret "$CLIENT_SECRET" \
  --format json-line \
  vault list
# Expected: {"list":[],"next_page_token":""}

Failure mapping:

  • (1) any of envoy/pub-api/pub-auth not in Running: process-compose has flapped. Open pc/attach, restart the failing service, and consult the Verification chain table below for the usual root causes (postgres unix-socket perms, innkeeper Zoho env, etc.).
  • (2) returns error / error_description: pub-auth path is up but rejecting the credentials. Re-mint with dev-util mint-test-client and update CLIENT_ID/CLIENT_SECRET.
  • (2) curl exits non-zero: SSH tunnel is dead or /etc/hosts lost the c1dev.c1.ductone.com mapping. Re-run the laptop setup one-liners above.
  • (3) succeeds with {"list":[]} but you expected vaults: your principal mints but lacks Latchkey perms — re-check the SystemOwner ServiceRoles + tenant Latchkey FF (Verification table).
  • (3) fails with policy_denied (PermissionDenied: ...): same as the previous bullet; you reached pub-api but the role/FF chain is broken.

Use latchkey auth claims (no extra round-trip) to verify the principal/tenant the CLI is scoped to before driving any device-register or per-tenant flow.

Verification chain — what you should see at each step

Symptom Meaning
transport: error sending request Stale CA cert. SCP /data/squire/src/c1/.dev/pki/service-ca.crt fresh.
Invalid input domain: 'localhost:12443' Forgot the /etc/hosts entry; URL host needs to be the tenant subdomain.
dynamo: no item found (mint-test-client) be-innkeeper never came up; CrossTenantSettings missing. Restart innkeeper + re-run ensure.
not_found (5) from /auth/v1/token Client_id/secret don't match a row in postgres. Re-run mint-test-client.
oauth2 invalid_client (CLI) Same as above; CLI maps OAuth invalid_client to Unauthenticated.
policy_denied (PermissionDenied: ...) Auth chain works — user just lacks permissions for the specific RPC. SystemOwnerRoleId's ServiceRoles list is a hand-rolled allowlist in pkg/builtin_roles/builtin_roles.go::GetSystemOwner — newer services aren't in it by default (e.g. Latchkey). Add latchkey_v1.LatchkeyServiceOwnerRole (or whichever new service-role) to the slice and rebuild + restart pub-api and be-session (be-session is what builds the passport). The persisted role record in dynamo is overlayed by builtin_roles.ApplyBuiltinAttributes on every read, so just rebuilding the binaries is enough — no DB migration needed.
unauthenticated Bearer token invalid or expired (default lifetime is 30 min). Re-run with fresh creds.

Squire-env-specific caveats

  • squire tunnel proxies as a websocket and corrupts TLS handshakes in both directions. Always use ssh -L for TLS-fronted services.
  • The default OpenCode model whitelist on a fresh env may be claude-opus-4-7 only. If you spawn an in-env OpenCode agent and set a different model in prompt_async, the call returns silently with ProviderModelNotFoundError and the agent looks frozen. Always cat .config/opencode/opencode.json | jq '.provider.anthropic.whitelist' first.
  • OpenCode + opus-4-7 will sometimes hit the Anthropic API assistant message prefill 400 error mid-session and stop streaming. The partial work it did is salvageable — check git log and git ls-remote origin from the env; if a branch is pushed, drive the rest from outside.
  • Each squire env's INNKEEPER_INSTALLATION_DOMAIN is set per-env. In cloud-routed envs it's a squire subdomain. In non-cloud-tested envs the default is c1.ductone.com. Always check .dev/env/be-innkeeper.env before composing tenant URLs.

Running c1 integration tests in a Squire env (no docker)

This is a different goal from standing up the running stack above. The tests/... integration suites (e.g. tests/api_no_uplift, run in CI as the buildkite go-...-testapinouplift-api-no-uplift shard) don't need the process-compose services — the suite starts the c1 services in-process. They only need the data backends: postgres, dynamodb-local, temporal, valkey, and an S3 endpoint.

CI runs them with TEST_TEST_CONTAINER=true, which spins those up as docker containers (ci/integration.sh). Squire envs have no docker daemon — neither the base image nor the c1 image — so the testcontainer path is dead. Use the docker-less path instead: TEST_LOCAL_EXEC=true (pkg/utest/integration.gonewLocalResourceClient), which runs every backend as a native binary on PATH. The c1 nix localdev devshell already provides postgres, temporal, valkey, and java; three things it does not set up for you:

  1. DynamoDBLocal.jarnewLocalResourceClient.allocateDynamoDB looks only in /home/dynamodblocal or /usr/local/dynamodblocal (both root-owned). The squire user has passwordless sudo:
    sudo mkdir -p /usr/local/dynamodblocal && sudo chown "$(id -un)" /usr/local/dynamodblocal
    curl -sSL https://d1ni2b6xgvw0s0.cloudfront.net/v2.x/dynamodb_local_latest.tar.gz \
      | tar xz -C /usr/local/dynamodblocal
    
  2. An S3 endpoint on 127.0.0.1:34567allocateS3 connects to it (creds dummyaccesskey / dummysecret) but never starts it, and nothing in process-compose does either. Run minio (arm64 env):
    curl -sSL -o /tmp/minio https://dl.min.io/server/minio/release/linux-arm64/minio && chmod +x /tmp/minio
    MINIO_ROOT_USER=dummyaccesskey MINIO_ROOT_PASSWORD=dummysecret \
      setsid -f /tmp/minio server /tmp/minio-data --address 127.0.0.1:34567
    
  3. A UTF-8 locale — pgtest's initdb inherits the shell locale, which is unset on a fresh env. With LC_ALL=C the DB comes up SQL_ASCII and every query dies with simple protocol queries must be run with client_encoding=UTF8 (the config sets PreferSimpleProtocol: true). Use C.UTF-8 (present as C.utf8 in locale -a), not C.

Then run the suite (use go -C since the block-cd hook rejects cd; the first compile is slow, the binary caches after):

nix develop /data/squire/src/c1#localdev --command bash -c '
  export TEST_LOCAL_EXEC=true LC_ALL=C.UTF-8 LANG=C.UTF-8 LC_CTYPE=C.UTF-8
  go -C /data/squire/src/c1 test -vet=off -count=1 -v \
    ./tests/api_no_uplift/... -run TestAPINoUplift -timeout 25m
'

The buildkite shard name maps directly to this: TEST_CASE="TestAPINoUplift|api_no_uplift"-run "TestAPINoUplift" ./tests/api_no_uplift/... (split on the last |, see ci/integration.sh). Launch it detached (setsid -f ... > log; touch done) and poll the log — squire ssh sessions drop on long holds.

Caveat — this does not match CI's postgres. local-exec uses the devshell's native postgres (currently 18.3); buildkite's testcontainer pins ECR postgres:2 (an older major). A test that passes here can still fail in buildkite (and vice-versa) when the behavior is postgres-version-dependent — e.g. partitioned-table schema handling. So a green local-exec run rules out "broken on modern pg / general staleness," but it does not clear a buildkite failure. To reproduce a version-specific failure you need docker + the pinned image (TEST_TEST_CONTAINER=true with TEST_TEST_CONTAINER_POSTGRES_IMAGE set), which means a docker host, not squire.

Cleanup

# stop the env (preserves state — restart with `start_env`)
squire env <env-id>  # selects
# or via MCP from inside another env: stop_env tool

# delete entirely
squire env delete <env-id>

State on EFS persists between stop/start; the dev CA + postgres data + minted clients all survive.

Install via CLI
npx skills add https://github.com/robert-chiniquy/dotfiles --skill c1-dev-stack-in-squire
Repository Details
star Stars 3
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator
robert-chiniquy
robert-chiniquy Explore all skills →