name: router-service-recovery description: Fix common failures of m5-router.service — preset.ini parse errors (orphan lines, invalid flags) and port 8080 conflicts from rogue llama-server processes that block GUI model selection.
Router Service Recovery Skill
When m5-router.service fails to start, follow these steps:
Check for errors in journal
journalctl --user -u m5-router.service --since "5 min ago" --no-pager | grep -E "(fail|error|not recognized)"Two common error types:
failed to parse server config— orphan line (bare filename withoutkey = value) in INIoption 'X' not recognized in preset 'Y'— keyXis not a validllama-serverCLI flag
Edit
router-preset.ini- Path:
~/llm-server/router-preset.ini - Every key must be a valid
llama-serverflag. Runllama-server --helpto verify. Common invalid keys accidentally added:backend,compression,t/s, filenames. - Remove any orphan lines (bare filenames without
key = valuesyntax). - Ensure model sections have standard keys:
n-gpu-layers,cache-type-k,cache-type-v,flash-attn,no-mmap/mmap. - Do not copy RoPE/sampling params from one model to another. Each architecture has its own values (see step 6).
- Path:
Check for port conflicts
lsof -i :8080If another
llama-serverprocess is listening, kill it:kill <PID>Restart the service
sudo systemctl restart m5-router.service sudo systemctl status m5-router.serviceVerify models are loaded
curl http://localhost:8080/v1/modelsValidate model-specific params (RoPE, context, sampling) When adding a new model, do NOT copy parameters from other sections. Each architecture differs:
- Research the model's official config (HuggingFace model card
config.json) forrope_theta, context length, etc. - If
rope_thetais baked into GGUF metadata, llama-server uses it automatically — norope-freq-baseorrope-scaleneeded. - Example: Mistral Small 4 uses
rope_theta=1e8(baked in, 128K native ctx). Leanstral usesrope_freq_base=8192+rope_scale=128. Copying one to the other is wrong. - Verify via
/v1/modelsendpoint — check theargsarray for unexpected flags.
- Research the model's official config (HuggingFace model card
Pitfalls
- Do not add
t/sor rate limiting entries to the INI; they are measurements, not settings. - The preset only accepts CLI flags from
llama-server --help. Verify each entry. - Invalid keys like
backend,compression,t/scause immediate crash withoption 'X' not recognized. - Do not copy RoPE/sampling params between models — each architecture has its own values.
- After 2 failed patch attempts, stop and read the full file to ensure correctness.
- Port conflict symptom: If router starts but GUI won't show/select models, check for standalone
llama-serverprocess on port 8080 (PID fromlsof -i :8080). Kill it before restarting m5-router.service. Router needs exclusive binding to serve multiple models. start-native-router.shvalidator is stricter than llama-server: The script has a hardcodedKNOWN_KEYSlist that can lag behind llama-server's actual supported flags. If validation fails with "Unknown preset keys" but the key is valid (checkllama-server --help), add it toKNOWN_KEYSin the script. Example:n-gpu-layers-draftwas missing despite being a valid flag.
Verification
- Ensure
m5-router.serviceisactive (running). - Open WebUI on port 8088 should list all models.
- Check journal for
Available models (N)count and no errors.
Router Mode API Behavior
Important: llama.cpp in router mode does NOT support all OpenAI-compatible endpoints.
Model Loading in Router Mode
The /v1/models/load endpoint does NOT exist in router mode. Models must be handled differently:
- Auto-load mode (
--models-autoload): Models load on startup. Recommended for benchmarks and scripts. - Manual mode (
--no-models-autoload): Models are unloaded by default and must be triggered via first request to/v1/chat/completionswith that model. The first request will be slow (model loading), subsequent requests are fast.
Checking Model Status
Poll the /v1/models endpoint to check model load status:
curl -s http://localhost:8080/v1/models | python3 <<'PYEOF'
import json, sys
d = json.load(sys.stdin)
for m in d['data']:
if m['id'] == 'MODEL_ID':
print(f"Status: {m['status']['value']}")
PYEOF
Status values: unloaded, loading, loaded, error.
Waiting for Auto-load Completion
For scripts that need to wait for model loading (e.g., benchmarks), poll until status is loaded:
for i in {1..120}; do # 2 minute timeout
status=$(curl -s http://localhost:8080/v1/models | \
python3 -c "import json,sys; d=json.load(sys.stdin); \
print([m['status']['value'] for m in d['data'] if m['id']=='MODEL_ID'][0] \
if any(m['id']=='MODEL_ID' for m in d['data']))")
if [[ "$status" == "loaded" ]]; then
echo "Model loaded!"
break
fi
if [[ $i -eq 120 ]]; then
echo "ERROR: Model failed to load within 120 seconds"
exit 1
fi
sleep 1
done
Pitfalls
- Never use
/v1/models/load— it returns 404 in router mode. - Don't send
Authorization: Bearer ***headers with unescaped trailing backslashes. UseBearer dummyfor testing or omit entirely. - When using
--models-autoload, ensure VRAM is sufficient for all models in the preset. Router will fail to start if models can't be loaded simultaneously.
Adding Custom Chat Templates
When a model's built-in chat template has bugs (e.g. Qwen 3.5 tool calling crashes, thinking bleed, prefix cache invalidation), override it with a custom jinja template.
Obtain the template. Often found in GitHub issue threads or HuggingFace repos. Use the GitHub API to extract from comments if the HF link is dead:
curl -s "https://api.github.com/repos/OWNER/REPO/issues/NUMBER/comments" | python3 -c " import json, sys comments = json.load(sys.stdin) body = comments[N]['body'] # N = comment index containing the template start = body.index('\`\`\`\n') + 4 end = body.index('\n\`\`\`', start) print(body[start:end]) " > ~/llm-server/model-chat-template.jinjaAdd to router-preset.ini under the model section:
jinja = true chat-template-file = /home/cricri/llm-server/model-chat-template.jinjaBoth lines are required.
jinja = trueenables the jinja engine;chat-template-filepoints to the override.Restart the router:
systemctl --user restart m5-router.serviceVerify the config was picked up:
curl -s http://localhost:8080/v1/models | python3 -c " import json, sys d = json.load(sys.stdin) for m in d['data']: args = ' '.join(m['status']['args']) print(f'{m[\"id\"]:25s} jinja={\"--jinja\" in args} template={\"chat-template-file\" in args}') "Test the template. Send a chat completion via curl:
# Basic chat (check thinking/reasoning_content) curl -s http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"MODEL_ID","messages":[{"role":"user","content":"Hello"}],"max_tokens":100}' # Tool calling (write payload to file to avoid shell escaping hell) python3 -c " import json payload = { 'model': 'MODEL_ID', 'messages': [{'role': 'user', 'content': 'What is the weather in Paris?'}], 'tools': [{'type': 'function', 'function': {'name': 'get_weather', 'description': 'Get weather', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string'}}, 'required': ['city']}}}], 'max_tokens': 200 } json.dump(payload, open('/tmp/tool_test.json', 'w')) " curl -s http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d @/tmp/tool_test.json
Pitfalls
- The template file path must be accessible inside the distrobox container.
/home/cricri/is bind-mounted, so paths like/home/cricri/llm-server/xxx.jinjawork. - For complex JSON payloads (tools, multi-turn), always write to a temp file and use
curl -d @file— shell escaping of nested JSON is a nightmare. - Models auto-load on first request. The first call will be slow (model loading), subsequent calls are fast.
- llama-server does NOT parse XML tool call output back into OAI
tool_callsarray. The XML appears incontent. This is expected — the template ensures correct formatting for the model, but the server doesn't do structured parsing.
Switching the Router's Distrobox Container
When moving m5-router.service from one distrobox container to another (e.g. switching from AMDVLK to RADV Vulkan, or upgrading to a new llama.cpp build):
Check if the target container was created via distrobox. If
distrobox enter CONTAINER -- whoamifails with "unable to find user", the container was created with raw podman and needs to be recreated:podman stop CONTAINER && podman rm CONTAINER distrobox create --name CONTAINER --image IMAGE:TAG \ --additional-flags "--device /dev/dri" \ --volume /mnt/data2:/mnt/data2 \ --yesKey flags:
--device /dev/drifor GPU access,--volume /mnt/data2:/mnt/data2for secondary model storage.Verify the new container works:
distrobox enter CONTAINER -- llama-server --version distrobox enter CONTAINER -- ls /usr/share/vulkan/icd.d/Update
start-native-router.shfor the target container's Vulkan driver:- RADV only:
VK_ICD_FILENAMES="/usr/share/vulkan/icd.d/radeon_icd.x86_64.json" - Remove ROCm env vars (
HSA_OVERRIDE_GFX_VERSION,ROCM_PATH) if the container is Vulkan-only.
- RADV only:
Update
m5-router.service— replace all container name references:ExecStart=/usr/bin/distrobox enter NEW_CONTAINER -- /home/cricri/llm-server/start-native-router.sh ExecStop=/usr/bin/distrobox enter NEW_CONTAINER -- bash -c "pkill -TERM -f llama-server || true" ExecStopPost=/usr/bin/distrobox enter NEW_CONTAINER -- bash -c "pkill -9 -f llama-server || true"Deploy the updated service:
systemctl --user stop m5-router.service cp ~/llm-server/m5-router.service ~/.config/systemd/user/ systemctl --user daemon-reload systemctl --user start m5-router.serviceVerify:
curl http://localhost:8080/v1/modelsshould list all models.Stop old container:
distrobox stop OLD_CONTAINER --yes
Pitfalls
- Containers created with raw
podmanlack distrobox integration (no user, no init, no /dev bind). Always usedistrobox createto recreate them. distrobox createwith--rootrequires sudo with a terminal. Omit--rootfor rootless podman (user containers).--pull=falseis not a valid distrobox flag. Omit it to use locally-available images.- The VK_ICD_FILENAMES must match what's actually available inside the container. Check with
ls /usr/share/vulkan/icd.d/andls /etc/vulkan/icd.d/inside the container.
References
- See
~/llm-server/llm-models-combined.mdfor model performance. - Router preset file:
/home/cricri/llm-server/router-preset.ini - KNOWN_KEYS validator:
references/known-keys-validator.md