name: ai-gateway description: Debug and troubleshoot ettametta's AI Gateway — the load-balancing reverse proxy for remote GPU worker nodes. Use when node routing fails, health checks stall, provisioning breaks, or inference requests time out.
AI Gateway Debugging
The AI Gateway (port 8133) is a FastAPI reverse proxy that load-balances inference requests across remote GPU worker nodes with model-aware routing.
Quick Diagnostics
# Cluster health + node telemetry
curl http://localhost:8133/health
# Via nginx proxy
curl http://localhost:8000/ai-gateway/health
# Check gateway container
docker compose ps ai-gateway
docker compose logs --tail=50 ai-gateway
# Check registered nodes
docker compose exec ai-gateway cat /workspace/gateway_state.db 2>/dev/null
Architecture
Core: src/engines/remote_ai_setup/gateway.py
FastAPI app ("AI Cluster Gateway") that acts as a load-balancing reverse proxy.
Node Registry: SQLite-backed (/workspace/gateway_state.db) with tables for jobs and nodes. Nodes seeded from AI_NODES env var (comma-separated URLs).
Health Loop: Background async task polls every node's /health every 10 seconds, tracking:
- Online/offline status
- Busy state
- Currently loaded model
Smart Routing (select_best_node):
- Prefer a node that already has the requested model loaded
- Fall back to any idle node
- Fall back to any online node
Catch-All Proxy (POST /{path:path}):
- Extracts
modelormodel_keyfrom request body - Infers model from path keywords (
hunyuan,animatediff,generate) - Routes to best available node
- Stores
job_idmappings for status/download routing
Provisioning (POST /nodes/provision):
- Accepts IP + SSH key (passed via
/dev/shm, never persisted) - Deploys worker via
deploy_to_gpu_server.sh
Key Files
| File | Purpose |
|---|---|
src/engines/remote_ai_setup/gateway.py |
Core gateway — routing, health, provisioning |
infra/docker/gatekeeper.Dockerfile |
Container definition (Python 3.10-slim, fastapi, uvicorn, httpx) |
infra/docker/nginx.conf |
Nginx proxy: /ai-gateway/ → http://ai-gateway:8133 |
apps/dashboard/src/lib/config.ts |
Frontend: AI_GATEWAY_URL = {host}/ai-gateway |
Docker Compose Config
Service: ai-gateway, built from gatekeeper.Dockerfile, port 8133:8133, mounts src/engines/remote_ai_setup into /app, volume gateway_data for /workspace.
Env vars: AI_NODES, INTERNAL_API_TOKEN, AI_CLUSTER_SECRET.
API Endpoints
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/health |
GET | None | Cluster health + node telemetry |
/status/{job_id} |
GET | None | Proxied job status lookup |
/pulse |
POST | X-Worker-Token |
Worker heartbeat sink |
/register |
POST | X-Admin-Token |
Register a new node |
/nodes |
POST | X-Admin-Token |
Register a new node (alias) |
/nodes/{url} |
DELETE | X-Admin-Token |
Remove a node |
/nodes/provision |
POST | X-Admin-Token |
SSH-deploy a new GPU worker |
/{path:path} |
POST | None | Catch-all proxy with model-aware routing |
Common Issues
All nodes offline
curl -s http://localhost:8133/health | jq '.nodes'
Check if AI_NODES env var is set:
docker compose exec ai-gateway env | grep AI_NODES
Requests routing to busy node
select_best_node prefers model-loaded nodes even if busy, as a last resort. Check node states:
curl -s http://localhost:8133/health | jq '.nodes[] | {url, online, busy, model}'
Model not found on any node
If no node has the requested model loaded and all are busy, the request fails. Solutions:
- Add more nodes
- Wait for a node to finish
- Pre-load models on nodes
Provisioning fails
SSH key is passed via /dev/shm (memory-backed tmpfs). Check:
- SSH key is valid
- Target IP is reachable from the gateway container
deploy_to_gpu_server.shexists and is executable
SQLite state corrupted
Gateway state is in /workspace/gateway_state.db. Volume-mounted, survives restarts. If corrupted:
docker compose exec ai-gateway rm /workspace/gateway_state.db
docker compose restart ai-gateway
Job status returns wrong node
Job-to-node mapping is stored in SQLite. If gateway restarts during a job, the mapping is preserved (volume). But if the node also restarts, the job is lost.
Nginx 502 on /ai-gateway/
Check gateway container:
docker compose ps ai-gateway
docker compose exec ai-gateway curl -s http://localhost:8133/health
Worker Heartbeat
Workers send POST /pulse with X-Worker-Token header. Gateway tracks:
- Worker URL
- Online/busy status
- Current model
- Last seen timestamp
If a worker misses 3 heartbeats (~30s), it's marked offline.
Debugging Checklist
- Gateway up?
curl http://localhost:8133/health - Nodes registered? Check
AI_NODESenv - Nodes online?
curl /health | jq '.nodes' - Nginx proxy working?
curl http://localhost:8000/ai-gateway/health - SQLite intact?
ls -la gateway_data/gateway_state.db - Auth tokens set?
INTERNAL_API_TOKEN,AI_CLUSTER_SECRET - Provisioning: SSH key valid, target reachable