name: halfstack description: Diagnose and fix Docker Compose halfstack issues — config mapping, service health, DB/Valkey/etcd inspection, supergraph regeneration invoke_method: user auto_execute: false enabled: true tags: - dev - docker - halfstack - troubleshooting
Halfstack Troubleshooting & Fix
Diagnose and directly fix issues with the Docker Compose halfstack development environment.
When to Use
- Docker Compose services fail to start or keep restarting
- Config files are missing, stale, or have wrong port/secret values
- Supergraph schema needs regeneration after GQL changes
- Need to inspect DB, Valkey, or etcd state directly
- Halfstack needs to be brought up after a fresh clone or branch switch
Compose File
The runtime compose file is always docker-compose.halfstack.current.yml (project root).
It is generated from docker-compose.halfstack-main.yml (or halfstack-ha.yml for HA mode).
Quick Reference Commands
# Check all halfstack services
docker compose -f docker-compose.halfstack.current.yml ps
# Check a specific service's logs
docker compose -f docker-compose.halfstack.current.yml logs <service-name>
# Restart a specific service
docker compose -f docker-compose.halfstack.current.yml restart <service-name>
# Bring everything up
docker compose -f docker-compose.halfstack.current.yml up -d --wait
Service Names & Profiles
Optional services are gated behind Docker Compose profiles. By default (docker compose up -d)
only the required services start. To include optional ones, pass --profile <name>.
| Service | Image | Purpose | Profile |
|---|---|---|---|
backendai-half-db |
postgres:16.3-alpine | Main database | (required) |
backendai-half-redis |
valkey/valkey:9.1.0-alpine | Cache / pub-sub | (required) |
backendai-half-etcd |
etcd v3.5 | Config store | (required) |
backendai-half-apollo-router |
Hive Gateway | GraphQL federation (manager has 2 GQL servers federated through this) | (required) |
backendai-half-prometheus |
Prometheus | Metrics — manager queries it for deployment autoscale rule evaluation | (required) |
backendai-half-otel-collector |
OTel Collector | Trace / metric export | telemetry, observability |
backendai-half-loki |
Loki | Log aggregation | telemetry, observability |
backendai-half-grafana |
Grafana | Dashboards | observability |
backendai-half-tempo |
Tempo | Tracing | observability |
backendai-half-pyroscope |
Pyroscope | Profiling | observability |
backendai-half-db-exporter |
postgres-exporter | Postgres metrics | observability |
backendai-half-redis-exporter |
redis_exporter | Valkey metrics | observability |
backendai-half-minio |
MinIO | Object storage | storage |
Profile semantics:
telemetry— service-level export only (otel-collector+loki). Visualisation (Grafana) and supporting backends (Tempo, Pyroscope, exporters) are typically managed centrally; this profile is a good default for dev installs that just want their logs and traces forwarded.observability— superset oftelemetry. Brings up the full local stack including Grafana / Tempo / Pyroscope / exporters.storage— MinIO only.
Enabling optional profiles
# Required only (default)
docker compose -f docker-compose.halfstack.current.yml up -d --wait
# + telemetry export (OTel collector + Loki forwarding logs/traces to a central monitor)
docker compose -f docker-compose.halfstack.current.yml --profile telemetry up -d --wait
# + full observability stack (Grafana / Tempo / Pyroscope / exporters in addition to telemetry)
docker compose -f docker-compose.halfstack.current.yml --profile observability up -d --wait
# + object storage (MinIO)
docker compose -f docker-compose.halfstack.current.yml --profile storage up -d --wait
# Everything
docker compose -f docker-compose.halfstack.current.yml --profile observability --profile storage up -d --wait
When stopping/removing, profile flags must also be passed for those containers to be torn down:
docker compose -f docker-compose.halfstack.current.yml --profile observability --profile storage down
scripts/delete-dev.sh already passes both profiles so a clean wipe works regardless of what was enabled.
Docker Configs — Files That Must Exist in Project Root
The compose file declares a configs: section. Docker Compose reads these as files.
If a file is missing when docker compose up runs, Docker creates a directory at that path instead.
Once a directory exists where a file should be, even copying the correct file won't help — the directory must be removed first.
Fix Procedure for Missing Config Files
Step 1: Stop affected services (or all services):
docker compose -f docker-compose.halfstack.current.yml down
Step 2: Check and remove any directories that should be files:
# These MUST be regular files, not directories
for f in prometheus.yaml otel-collector-config.yaml loki-config.yaml \
tempo-config.yaml supergraph.graphql gateway.config.ts; do
[ -d "$f" ] && rm -rf "$f" && echo "Removed directory: $f"
done
# These MUST be directories
for d in grafana-dashboards grafana-provisioning; do
[ -f "$d" ] && rm -f "$d" && echo "Removed file: $d"
done
Step 3: Copy config files from source (same as scripts/install-dev.sh):
# Docker Compose configs (plain copy, no transformation)
cp configs/prometheus/prometheus.yaml ./prometheus.yaml
cp configs/otel/otel-collector-config.yaml ./otel-collector-config.yaml
cp configs/loki/loki-config.yaml ./loki-config.yaml
cp configs/tempo/tempo-config.yaml ./tempo-config.yaml
cp configs/graphql/gateway.config.ts ./gateway.config.ts
# Supergraph — generated, but can be copied from last known-good
cp docs/manager/graphql-reference/supergraph.graphql ./supergraph.graphql
# Grafana (recursive directory copy)
cp -r configs/grafana/dashboards ./grafana-dashboards
cp -r configs/grafana/provisioning ./grafana-provisioning
Step 4: Ensure volume directories exist:
mkdir -p volumes/postgres-data
mkdir -p volumes/etcd-data
mkdir -p volumes/redis-data
Step 5: Bring services back up:
docker compose -f docker-compose.halfstack.current.yml up -d --wait
Config Source Mapping Reference
| File in project root | Source path | Used by service |
|---|---|---|
prometheus.yaml |
configs/prometheus/prometheus.yaml |
backendai-half-prometheus |
otel-collector-config.yaml |
configs/otel/otel-collector-config.yaml |
backendai-half-otel-collector |
loki-config.yaml |
configs/loki/loki-config.yaml |
backendai-half-loki |
tempo-config.yaml |
configs/tempo/tempo-config.yaml |
backendai-half-tempo |
supergraph.graphql |
docs/manager/graphql-reference/supergraph.graphql |
backendai-half-apollo-router |
gateway.config.ts |
configs/graphql/gateway.config.ts |
backendai-half-apollo-router |
grafana-dashboards/ |
configs/grafana/dashboards/ |
backendai-half-grafana (volume mount) |
grafana-provisioning/ |
configs/grafana/provisioning/ |
backendai-half-grafana (volume mount) |
Missing or Stale Compose File
If docker-compose.halfstack.current.yml doesn't exist or is outdated:
cp docker-compose.halfstack-main.yml docker-compose.halfstack.current.yml
Then apply port substitutions. Read existing component toml files to determine current ports,
or use defaults from scripts/install-dev.sh:
| Setting | Default | sed pattern |
|---|---|---|
| POSTGRES_PORT | 8101 | s/8100:5432/${POSTGRES_PORT}:5432/ |
| REDIS_PORT | 8111 | s/8110:6379/${REDIS_PORT}:6379/ |
| ETCD_PORT | 8121 | s/8120:2379/${ETCD_PORT}:2379/ |
Note: The source template has 8100/8110/8120 but install-dev.sh defaults are 8101/8111/8121.
Always check existing config files first to determine the correct port.
Supergraph / Hive Gateway
The Hive Gateway serves the federated GraphQL schema. Regenerate when:
- GQL schema types or fields change
- New GQL modules are added
- v2 schema is modified
# 1. Generate new schemas and supergraph
./scripts/generate-graphql-schema.sh
# 2. Copy to project root (where compose expects it)
cp docs/manager/graphql-reference/supergraph.graphql ./supergraph.graphql
cp configs/graphql/gateway.config.ts ./gateway.config.ts
# 3. Restart the gateway
docker compose -f docker-compose.halfstack.current.yml restart backendai-half-apollo-router
If manager code is broken and generate-graphql-schema.sh fails,
copy the last known-good supergraph from git:
git show main:docs/manager/graphql-reference/supergraph.graphql > ./supergraph.graphql
Direct Service Inspection
PostgreSQL
PGCONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-db)
# Interactive psql
docker exec -it -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend
# Non-interactive query
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "SELECT version();"
# Check databases
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -tc "SELECT datname FROM pg_database;"
# Check alembic migration version (manager)
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "SELECT * FROM alembic_version;"
# Check alembic migration version (appproxy)
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d appproxy -c "SELECT * FROM alembic_version;"
# List tables
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "\dt"
Common fix — appproxy DB missing:
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -c "CREATE DATABASE appproxy;"
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -c "CREATE ROLE appproxy WITH LOGIN PASSWORD 'develove';"
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d appproxy -c "GRANT ALL ON SCHEMA public TO appproxy;"
./py -m alembic -c alembic-appproxy.ini upgrade head
Valkey
REDIS_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-redis)
# Ping
docker exec $REDIS_CONTAINER valkey-cli ping
# Info
docker exec $REDIS_CONTAINER valkey-cli info server
docker exec $REDIS_CONTAINER valkey-cli dbsize
# List keys (dev only)
docker exec $REDIS_CONTAINER valkey-cli keys '*'
# Get/check specific key
docker exec $REDIS_CONTAINER valkey-cli get <key>
docker exec $REDIS_CONTAINER valkey-cli type <key>
# Flush all (destructive)
docker exec $REDIS_CONTAINER valkey-cli flushall
etcd
ETCD_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-etcd)
# List all keys
docker exec $ETCD_CONTAINER etcdctl get --prefix "" --keys-only
# Get specific key
docker exec $ETCD_CONTAINER etcdctl get <key>
# Common key prefixes
docker exec $ETCD_CONTAINER etcdctl get --prefix "config/redis"
docker exec $ETCD_CONTAINER etcdctl get --prefix "volumes"
# Health check
docker exec $ETCD_CONTAINER etcdctl endpoint health
Or via Backend.AI CLI:
./backend.ai mgr etcd get --prefix ''
./backend.ai mgr etcd get config/redis/addr
./backend.ai mgr etcd put config/redis/addr "127.0.0.1:8111"
MinIO
MINIO_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-minio)
# Health check
docker exec $MINIO_CONTAINER curl -sf http://localhost:9000/minio/health/live
# List buckets (set alias first)
docker exec $MINIO_CONTAINER mc alias set local http://localhost:9000 minioadmin minioadmin
docker exec $MINIO_CONTAINER mc ls local/
# Web console: http://127.0.0.1:9001 (minioadmin / minioadmin)
Component Config Files — Port/Secret Consistency
These config files live in the project root and are generated from configs/ templates.
| Config file | Source template | Key transformations |
|---|---|---|
manager.toml |
configs/manager/halfstack.toml |
etcd/PG/manager port, ipc-base-path |
alembic.ini |
configs/manager/halfstack.alembic.ini |
PG connection string |
account-manager.toml |
configs/account-manager/halfstack.toml |
etcd/PG/service port, ipc-base-path |
alembic-accountmgr.ini |
configs/account-manager/halfstack.alembic.ini |
PG connection string |
agent.toml |
configs/agent/halfstack.toml |
etcd/RPC/watcher port, ipc/var/mount paths, accelerator plugins |
storage-proxy.toml |
configs/storage-proxy/halfstack.toml |
etcd port, 2 secrets, volume config, MinIO creds |
app-proxy-coordinator.toml |
configs/app-proxy-coordinator/halfstack.toml |
PG/Valkey port, service port, 3 generated secrets |
alembic-appproxy.ini |
configs/app-proxy-coordinator/halfstack.alembic.ini |
PG connection string |
app-proxy-worker.toml |
configs/app-proxy-worker/halfstack.toml |
Valkey port, service port, same 3 secrets as coordinator |
webserver.conf |
configs/webserver/halfstack.conf |
Manager endpoint URL, Valkey addr |
Cross-Config Consistency Rules
- PG port in compose must match
manager.toml,alembic.ini,account-manager.toml,app-proxy-coordinator.toml,alembic-appproxy.ini - Valkey port in compose must match
app-proxy-coordinator.toml,app-proxy-worker.toml,webserver.conf - etcd port in compose must match
manager.toml,agent.toml,storage-proxy.toml - App Proxy secrets:
app-proxy-coordinator.tomlandapp-proxy-worker.tomlmust share identicalapi_secret,jwt_secret,permit_hash.secret - Manager ↔ Storage Proxy: the volume auth secret in etcd (set via
dev.etcd.volumes.json) must matchstorage-proxy.toml's[api.manager] secret
Regenerating a Component Config
When regenerating, read existing secret values from the current config file and reuse them.
Only generate new secrets (python -c 'import secrets; print(secrets.token_urlsafe(32))') when the config file doesn't exist at all.
Reference scripts/install-dev.sh lines 1016–1142 for the exact sed substitution patterns per component.
Diagnostic Workflow
When halfstack issues are reported, follow this order:
- Check compose file exists:
ls -la docker-compose.halfstack.current.yml - Check service status:
docker compose -f docker-compose.halfstack.current.yml ps - For exited/unhealthy services: read logs with
docker compose ... logs <service> - For config-dependent services (prometheus, otel, loki, tempo, gateway):
- Verify referenced files exist in project root and are files, not directories
- If a directory exists where a file should be: stop service →
rm -rf <dir>→ copy correct file → restart
- For Backend.AI components (manager, agent, etc.): verify
.toml/.confexists and ports match compose - For DB issues: connect to PostgreSQL directly and check schema/data
- For Valkey/etcd issues: connect directly and inspect state
- Fix the root cause directly — don't just report the problem.