name: telegram-bot-deployment description: Best practices for deploying Telegram bots with Docker, webhooks, process management, monitoring, and scaling strategies
Telegram Bot Deployment
This skill covers everything needed to deploy, monitor, and scale a Telegram bot in production. It applies to both Node.js (grammY, Telegraf) and Python (aiogram, python-telegram-bot) bots.
1. Polling vs Webhook Mode
Long Polling
The bot continuously asks the Telegram API for new updates.
Pros:
- No public URL or SSL certificate required
- Works behind NAT, firewalls, and on local machines
- Simpler initial setup
Cons:
- Higher latency (depends on poll interval)
- Wastes bandwidth when idle
- Only one process can poll at a time (no horizontal scaling)
When to use: Development, small bots, VPS without a domain, bots behind restrictive firewalls.
// grammY - polling
bot.start();
# aiogram - polling
dp.run_polling(bot)
Webhook
Telegram pushes updates to your HTTPS endpoint.
Pros:
- Near-instant delivery of updates
- No wasted bandwidth
- Multiple workers can handle incoming requests (scalable)
Cons:
- Requires a public HTTPS URL with a valid certificate
- Slightly more complex setup (reverse proxy, SSL)
When to use: Production deployments, bots handling high traffic, bots deployed alongside a web application.
// grammY - webhook with express
import express from "express";
import { webhookCallback } from "grammy";
const app = express();
app.use(express.json());
app.use("/bot-webhook", webhookCallback(bot, "express"));
app.listen(3000);
# aiogram - webhook with aiohttp
from aiohttp import web
from aiogram.webhook.aiohttp_server import SimpleRequestHandler
handler = SimpleRequestHandler(dispatcher=dp, bot=bot)
app = web.Application()
handler.register(app, path="/bot-webhook")
web.run_app(app, host="0.0.0.0", port=3000)
Set the webhook URL via the API:
curl -X POST "https://api.telegram.org/bot<TOKEN>/setWebhook" \
-d "url=https://bot.example.com/bot-webhook" \
-d "secret_token=<RANDOM_SECRET>"
Always use secret_token to verify that requests actually come from Telegram.
2. Docker Compose Setup
Single Bot
# docker-compose.yml
version: "3.8"
services:
bot:
build:
context: .
dockerfile: Dockerfile
env_file: .env
restart: unless-stopped
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "node", "healthcheck.js"]
interval: 30s
timeout: 10s
retries: 3
logging:
driver: json-file
options:
max-size: "30m"
max-file: "3"
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_USER: botuser
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
POSTGRES_DB: botdb
secrets:
- db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U botuser -d botdb"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
volumes:
pgdata:
redisdata:
secrets:
db_password:
file: ./secrets/db_password.txt
Multi-Bot
Run several bots in one compose project. Share the database and Redis.
services:
bot-main:
build:
context: ./bots/main
env_file: ./bots/main/.env
restart: unless-stopped
depends_on:
db:
condition: service_healthy
bot-admin:
build:
context: ./bots/admin
env_file: ./bots/admin/.env
restart: unless-stopped
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_USER: botuser
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: bots
redis:
image: redis:7-alpine
volumes:
- redisdata:/data
volumes:
pgdata:
redisdata:
3. Webhook Setup with Nginx Reverse Proxy + SSL
# /etc/nginx/sites-available/bot.example.com
server {
listen 80;
server_name bot.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name bot.example.com;
ssl_certificate /etc/letsencrypt/live/bot.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/bot.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Webhook endpoint
location /bot-webhook {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Telegram sends JSON, increase buffer for large updates
proxy_buffer_size 16k;
proxy_buffers 4 16k;
}
# Block everything else
location / {
return 404;
}
}
Obtain a certificate with Certbot:
sudo certbot --nginx -d bot.example.com
Telegram requires one of these ports for webhooks: 443, 80, 88, or 8443.
4. PM2 Process Management
Use PM2 when deploying directly on a VPS without Docker.
// ecosystem.config.js
module.exports = {
apps: [
{
name: "telegram-bot",
script: "dist/index.js",
instances: 1, // bots using polling MUST use 1 instance
autorestart: true,
max_memory_restart: "300M",
watch: false,
env_production: {
NODE_ENV: "production",
BOT_MODE: "polling", // or "webhook"
},
error_file: "/var/log/telegram-bot/error.log",
out_file: "/var/log/telegram-bot/out.log",
merge_logs: true,
log_date_format: "YYYY-MM-DD HH:mm:ss Z",
kill_timeout: 10000, // 10s graceful shutdown
listen_timeout: 5000,
},
],
};
Commands:
pm2 start ecosystem.config.js --env production
pm2 save
pm2 startup # auto-start on reboot
pm2 logs telegram-bot --lines 50
pm2 monit # live dashboard
If using webhook mode with multiple workers, set instances to the desired
count and use exec_mode: "cluster".
5. Environment Variables and Secrets
.env File
# .env (never commit this file)
BOT_TOKEN=123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11
BOT_MODE=polling
DATABASE_URL=postgresql://botuser:secret@localhost:5432/botdb
REDIS_URL=redis://localhost:6379
LOG_LEVEL=info
WEBHOOK_DOMAIN=https://bot.example.com
WEBHOOK_PATH=/bot-webhook
WEBHOOK_SECRET=random-secret-string-here
ADMIN_CHAT_ID=123456789
Docker Secrets
For Swarm or Compose, use secrets instead of environment variables for sensitive values.
secrets:
bot_token:
file: ./secrets/bot_token.txt
services:
bot:
secrets:
- bot_token
environment:
BOT_TOKEN_FILE: /run/secrets/bot_token
Read the secret in code:
import { readFileSync } from "fs";
const token = process.env.BOT_TOKEN_FILE
? readFileSync(process.env.BOT_TOKEN_FILE, "utf-8").trim()
: process.env.BOT_TOKEN;
6. Health Checks and Auto-Restart
Simple Health Check Script (polling mode)
// healthcheck.js
import net from "net";
const client = new net.Socket();
client.connect(3000, "127.0.0.1", () => {
client.end();
process.exit(0);
});
client.on("error", () => process.exit(1));
Application-Level Health Endpoint
Even polling bots should expose an HTTP health endpoint for monitoring.
import express from "express";
const health = express();
health.get("/health", (req, res) => {
res.json({
status: "ok",
uptime: process.uptime(),
botInfo: bot.botInfo?.username ?? "unknown",
mode: process.env.BOT_MODE,
});
});
health.listen(3001);
Docker compose health check:
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3001/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
7. Logging Configuration
Node.js -- pino
import pino from "pino";
export const logger = pino({
level: process.env.LOG_LEVEL || "info",
transport:
process.env.NODE_ENV !== "production"
? { target: "pino-pretty" }
: undefined,
redact: ["botToken", "*.botToken"],
});
// Log every update
bot.use(async (ctx, next) => {
const start = Date.now();
await next();
const ms = Date.now() - start;
logger.info({
updateId: ctx.update.update_id,
type: ctx.updateType,
from: ctx.from?.id,
chat: ctx.chat?.id,
ms,
});
});
Python -- loguru
from loguru import logger
import sys
logger.remove()
logger.add(
sys.stdout,
format="{time:YYYY-MM-DD HH:mm:ss} | {level:<7} | {message}",
level="INFO",
serialize=True,
)
logger.add(
"/var/log/bot/bot.log",
rotation="50 MB",
retention="30 days",
compression="gz",
level="DEBUG",
)
8. Database Deployment
PostgreSQL
services:
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
POSTGRES_USER: botuser
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: botdb
healthcheck:
test: ["CMD-SHELL", "pg_isready -U botuser -d botdb"]
interval: 10s
timeout: 5s
retries: 5
shm_size: 128mb
Run migrations before starting the bot:
services:
migrate:
image: bot:latest
command: ["npx", "prisma", "migrate", "deploy"]
depends_on:
db:
condition: service_healthy
bot:
depends_on:
migrate:
condition: service_completed_successfully
SQLite
For lightweight bots, SQLite is sufficient. Mount the database file as a volume.
services:
bot:
volumes:
- ./data:/app/data
environment:
DATABASE_URL: file:/app/data/bot.db
Ensure the directory exists and has correct permissions before starting.
9. Redis for Sessions and Queues
services:
redis:
image: redis:7-alpine
command: >
redis-server
--maxmemory 128mb
--maxmemory-policy allkeys-lru
--appendonly yes
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
Use Redis for:
- Session storage: Store conversation state per user/chat.
- Rate limit counters: Track API usage per user.
- Job queues: Offload heavy work (image processing, external API calls) to background workers using BullMQ (Node.js) or Celery (Python).
// grammY session with Redis
import { RedisAdapter } from "@grammyjs/storage-redis";
import { createClient } from "redis";
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
bot.use(session({
initial: () => ({ step: "idle" }),
storage: new RedisAdapter({ instance: redis }),
}));
10. CI/CD with GitHub Actions
# .github/workflows/deploy.yml
name: Deploy Bot
on:
push:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: npm run lint
- run: npm test
build-and-push:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- name: Deploy via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.SERVER_HOST }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
cd /opt/telegram-bot
docker compose pull
docker compose up -d --remove-orphans
docker image prune -f
11. Monitoring
Bot Uptime
Use an external service (UptimeRobot, Healthchecks.io) to ping the health endpoint every 60 seconds. Alert on two consecutive failures.
Message Throughput
Track updates per minute with a Prometheus counter.
import client from "prom-client";
const updatesTotal = new client.Counter({
name: "bot_updates_total",
help: "Total number of Telegram updates processed",
labelNames: ["type"],
});
bot.use(async (ctx, next) => {
updatesTotal.inc({ type: ctx.updateType });
await next();
});
Error Rates
Track handler errors and alert if the error rate exceeds a threshold.
const errorsTotal = new client.Counter({
name: "bot_errors_total",
help: "Total number of handler errors",
labelNames: ["handler"],
});
bot.catch((err) => {
errorsTotal.inc({ handler: err.ctx?.updateType ?? "unknown" });
logger.error({ err: err.error, update: err.ctx?.update }, "Bot error");
});
Grafana Dashboard
Create a dashboard showing:
- Updates per minute (by type)
- Error rate percentage
- Response latency (p50, p95, p99)
- Active users (unique from IDs per hour)
- Memory and CPU usage of the bot container
12. Backup Strategies
Database Dumps
#!/bin/bash
# /scripts/backup.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/bot"
mkdir -p "$BACKUP_DIR"
# PostgreSQL dump
docker exec bot-db pg_dump -U botuser botdb | gzip > "${BACKUP_DIR}/db_${TIMESTAMP}.gz"
# Keep last 30 days
find "$BACKUP_DIR" -name "*.gz" -mtime +30 -delete
Schedule with cron: 0 3 * * * /scripts/backup.sh
Session Data
If sessions are in Redis, schedule periodic RDB snapshots.
services:
redis:
command: redis-server --save 900 1 --save 300 10 --appendonly yes
volumes:
- redisdata:/data
Copy /data/dump.rdb to a backup location daily.
SQLite Backup
sqlite3 /app/data/bot.db ".backup /backups/bot_${TIMESTAMP}.db"
13. Scaling
Multiple Workers (Webhook Mode)
In webhook mode, run multiple worker processes behind a load balancer. Each worker handles incoming webhook requests independently.
services:
bot:
build: .
deploy:
replicas: 3
environment:
BOT_MODE: webhook
nginx:
image: nginx:alpine
ports:
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- bot
Nginx upstream configuration:
upstream bot_workers {
least_conn;
server bot:3000;
}
server {
listen 443 ssl;
location /bot-webhook {
proxy_pass http://bot_workers;
}
}
Worker Processes for Heavy Tasks
Offload CPU-intensive or slow tasks to a background queue.
[Telegram] --> [Webhook Handler] --> [Redis Queue] --> [Worker]
| |
v v
(fast reply) (process image,
call external API)
Polling Mode Limitation
Polling mode does NOT support multiple instances. Only one process can call
getUpdates at a time. If you need to scale, switch to webhook mode.
14. Common Issues and Solutions
API Rate Limits
Telegram enforces limits on bot API calls:
- Messages to a single chat: ~1 per second
- Messages to different chats: ~30 per second
- Bulk notifications: ~25-30 messages per second globally
Solution: Use a message queue with rate limiting.
import Bottleneck from "bottleneck";
const limiter = new Bottleneck({
maxConcurrent: 1,
minTime: 35, // ~28 messages per second
});
async function sendMessage(chatId, text) {
return limiter.schedule(() => bot.api.sendMessage(chatId, text));
}
Flood Wait (429 Error)
When you hit the rate limit, Telegram returns a 429 error with a
retry_after field.
bot.api.config.use(async (prev, method, payload, signal) => {
try {
return await prev(method, payload, signal);
} catch (err) {
if (err.error_code === 429) {
const wait = err.parameters?.retry_after ?? 5;
logger.warn(`Flood wait: sleeping ${wait}s`);
await new Promise((r) => setTimeout(r, wait * 1000));
return prev(method, payload, signal);
}
throw err;
}
});
Session Conflicts
If two processes try to poll simultaneously, Telegram will return conflict errors and one process will stop receiving updates.
Fix: Ensure only one polling instance runs at a time. Use a lock in Redis or a single-instance deployment.
// Redis lock to prevent duplicate polling
const lockKey = "bot:polling:lock";
const acquired = await redis.set(lockKey, process.pid, { NX: true, EX: 60 });
if (!acquired) {
logger.fatal("Another instance is already polling. Exiting.");
process.exit(1);
}
// Refresh lock every 30s
setInterval(() => redis.expire(lockKey, 60), 30000);
Webhook Not Receiving Updates
Troubleshooting steps:
Verify webhook is set:
curl https://api.telegram.org/bot<TOKEN>/getWebhookInfoCheck for pending errors: The
last_error_messagefield in the response above shows the most recent delivery failure.Verify SSL: Telegram only sends webhooks to valid HTTPS endpoints. Self-signed certificates need to be uploaded via
setWebhook.Check firewall: Ensure port 443 (or 8443) is open for incoming connections from Telegram IPs (149.154.160.0/20, 91.108.4.0/22).
Verify the bot responds with 200: Telegram retries on non-2xx responses and will eventually disable the webhook after too many failures.
Memory Leaks
Long-running bots can accumulate memory over time.
- Set
max_memory_restartin PM2 or memory limits in Docker. - Profile with
--inspectand Chrome DevTools. - Check for event listener leaks and unbounded caches.
# Docker memory limit
services:
bot:
deploy:
resources:
limits:
memory: 512M
Quick Start Template
For a new bot deployment, copy this minimal setup and expand as needed.
# Project structure
my-bot/
src/
index.ts
Dockerfile
docker-compose.yml
.env
.env.example
.dockerignore
healthcheck.js
ecosystem.config.js
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
RUN addgroup -S bot && adduser -S bot -G bot
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
COPY healthcheck.js ./
USER bot
EXPOSE 3000
CMD ["node", "dist/index.js"]
Start the bot: docker compose up -d
View logs: docker compose logs -f bot
Restart: docker compose restart bot
Update: docker compose pull && docker compose up -d