name: integration-webhooks description: >- Cross-platform pattern for handling messaging integration webhooks (Slack, Telegram, WhatsApp, email, etc.) on serverless hosts. Use when adding a new integration adapter, debugging dropped messages, or wiring long-running agent work into a webhook handler. scope: dev metadata: internal: true
Integration Webhooks
Rule
Integration webhooks (Slack, Telegram, WhatsApp, email, Google Docs, etc.) must enqueue work to SQL and return 200 immediately, then process the work in a separate fresh function execution kicked off by a self-fired HTTP POST. A recurring retry job sweeps anything that gets stuck. This pattern works on every serverless host (Netlify, Vercel, Cloudflare Workers, Fly, Render, Node) without relying on platform-specific background-execution features.
Do not run agent loops inside the webhook handler itself. Do not rely on
fire-and-forget Promises after returning from a serverless handler — they get
killed when the function freezes.
Why
Messaging platforms expect a 200 response within a tight window — Slack will retry after 3 seconds, and a retried event triggers duplicate agent runs. At the same time, an agent loop replying to the message can take 30–60+ seconds because it may make multiple LLM calls and tool calls.
Past attempts that don't work cross-host:
- Fire-and-forget
Promise.then(...)after returning — Lambda/Vercel/CF freeze the execution context the moment the response goes out. The promise is silently killed, the user gets no reply, and there's no error in the logs. - Netlify Background Functions — Netlify-only, requires a
-backgroundfilename suffix, breaks on every other host. - Cloudflare
event.waitUntil()— CF Workers only, not portable. - Vercel Fluid /
after()— Vercel-only, gated behind specific runtimes. - A long-lived in-process queue — fine on a single Node box, but on serverless every cold start gets a fresh queue and any pending work is lost.
The only universal answer: persist the work, then trigger a brand new function execution to do it. SQL is the queue, a self-webhook is the trigger, and a recurring job is the safety net.
The Flow
┌──────────┐ 1. POST /integrations/:platform/webhook
│ Platform │────────────────────────────────────────────►┌──────────────────┐
└──────────┘ │ Webhook handler │
│ (function exec 1)│
└──────────────────┘
│
2. INSERT INTO integration_pending_tasks
(status='pending', payload=...)
│
3. fetch(POST /integrations/_process-task)
— fire-and-forget, NO await on body
│
4. return 200 to platform ◄───────────┘
┌──────────────────┐
5. POST arrives at processor │ Processor │
(separate fresh function) │ (function exec 2)│
└──────────────────┘
│
6. claimPendingTask(id) → status='processing'
7. runAgentLoop(...) — full timeout budget here
8. adapter.sendResponse(...) back to platform
9. markTaskCompleted(id)
┌──────────────────────────────────────────────┐
│ Recurring job (every 60s) — safety net │
│ Re-fires processor for tasks stuck in │
│ 'pending' or 'processing' beyond timeout. │
│ Caps retries at 3 then marks 'failed'. │
└──────────────────────────────────────────────┘
The webhook handler does as little as possible. The fresh function execution
that handles _process-task gets its own full timeout budget for the agent
loop.
Key Files
| File | Purpose |
|---|---|
packages/core/src/integrations/plugin.ts |
Mounts /_agent-native/integrations/* routes |
packages/core/src/integrations/webhook-handler.ts |
Verifies signature, parses, enqueues task, fires processor |
packages/core/src/integrations/pending-tasks-store.ts |
SQL queue: insertPendingTask, claimPendingTask, markTaskCompleted, markTaskFailed |
packages/core/src/integrations/pending-tasks-retry-job.ts |
Recurring retry sweep (startPendingTasksRetryJob, retryStuckPendingTasks) |
packages/core/src/integrations/types.ts |
PlatformAdapter, IncomingMessage, OutgoingMessage |
packages/core/src/integrations/adapters/{slack,telegram,whatsapp,email,google-docs}.ts |
One adapter per platform |
Routes
All under /_agent-native/integrations/:
| Method | Path | Purpose |
|---|---|---|
| POST | /:platform/webhook |
Platform pings this. Verifies, enqueues, returns 200 quickly. |
| POST | /_process-task |
Self-webhook target. Claims a task and runs the agent loop. |
| GET | /status |
All integrations status (settings UI). |
| GET | /:platform/status |
One platform's status. |
| POST | /:platform/enable |
Enable an integration. |
| POST | /:platform/disable |
Disable an integration. |
| POST | /:platform/setup |
Platform-specific setup (e.g. Telegram webhook registration). |
SQL Schema
The pending-task queue lives in integration_pending_tasks:
CREATE TABLE IF NOT EXISTS integration_pending_tasks (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
external_thread_id TEXT NOT NULL,
payload TEXT NOT NULL, -- JSON-serialized IncomingMessage
owner_email TEXT NOT NULL,
org_id TEXT,
status TEXT NOT NULL, -- pending | processing | completed | failed
attempts INTEGER NOT NULL DEFAULT 0,
error_message TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
completed_at INTEGER
);
CREATE INDEX IF NOT EXISTS idx_pending_tasks_status_created
ON integration_pending_tasks(status, created_at);
The store layer creates this lazily on first use via ensureTable() and uses
intType() from db/client.ts so it works on both SQLite and Postgres.
claimPendingTask is the critical concurrency primitive: it atomically flips
pending → processing and increments attempts, returning null if another
worker beat us to it. Both the initial fire-and-forget call and the retry job
funnel through the same processor endpoint, and claimPendingTask is what
prevents the same task from being processed twice.
Adding a New Platform Adapter
Implement
PlatformAdapterinpackages/core/src/integrations/adapters/<platform>.ts:export function myPlatformAdapter(): PlatformAdapter { return { platform: "myplatform", label: "MyPlatform", getRequiredEnvKeys: () => [ { name: "MYPLATFORM_TOKEN", label: "MyPlatform Bot Token", scope: "global" }, { name: "MYPLATFORM_SIGNING_SECRET", label: "MyPlatform Signing Secret", scope: "global" }, ], async handleVerification(event) { // Platform-specific challenge response, if any return { handled: false }; }, async verifyWebhook(event) { // Verify HMAC/signature with a server-side secret and constant-time comparison. // Never leave this as a permissive stub in production. return verifyMyPlatformSignature(event); }, async parseIncomingMessage(event) { // Map raw payload → IncomingMessage, or null to ignore return null; }, async sendResponse(message, context) { // POST back to the platform's API }, formatAgentResponse(text) { return { text, platformContext: {} }; }, async getStatus(baseUrl) { return { platform: "myplatform", label: "MyPlatform", enabled: false, configured: false }; }, }; }Register it in
getDefaultAdapters()insideplugin.ts. The webhook, queue, processor, and retry job are shared infrastructure — you do not write any of that per-adapter.Declare required env keys so the secrets/onboarding UI surfaces them. See
secretsandonboardingskills.Update the platform's webhook URL to point at
${baseUrl}/_agent-native/integrations/<platform>/webhook. For platforms with a registration API (Telegram), implementPOST /:platform/setup.
Never hardcode bot tokens, signing secrets, verification tokens, webhook URLs,
channel/customer identifiers, or copied platform payloads with real private data
inside the adapter, tests, docs, prompts, or fixtures. getRequiredEnvKeys()
declares credential names only. Values come from deployment configuration,
registered secrets, OAuth, or scoped credential stores, and tests should use
obvious fake placeholders.
The adapter is only responsible for:
- platform-specific verification (signatures, challenges)
- payload →
IncomingMessagemapping - agent text → platform format
- delivering the response back to the platform
It does not know about the queue, the processor, retries, or the agent loop. Those are handled by the shared webhook handler.
Slack non-message events
Slack's Events API also sends events that are not agent messages, such as
link_shared for app unfurls. Do not map those into IncomingMessage unless
they should actually run the agent. Handle them as short, provider-specific
webhook work: verify the Slack signature, return 200 OK quickly, and call the
provider API needed for the event (chat.unfurl for link previews). If a single
Slack app must handle both agent chat and app unfurls, put a dispatcher in front
of the one Slack Events Request URL and route message events to the integration
webhook flow while routing link_shared to the app-specific unfurl handler.
Long-Running Agent Work
The processor endpoint runs in a fresh function execution with its own full timeout (typically 30–60s on Netlify/Vercel, longer on background-friendly hosts). That budget is dedicated entirely to the agent loop — there is no platform-side timer racing it.
If a single agent run might exceed the function timeout (large multi-step plans, deep delegation chains), the agent should:
- Send an interim acknowledgement back to the platform so the user knows the
request landed (
adapter.sendResponse({ text: "Working on it..." })). - Persist intermediate state in chat-thread data, application state, or a recurring job so the next invocation can pick up where this one left off.
The retry job will only re-fire tasks stuck in processing for over 5 minutes,
so a normal long-running reply is safe.
Cross-Platform Considerations
- No platform-specific background APIs. No
waitUntil, no-background.tsfilenames, no Vercelafter(). The pattern works identically on every host because it only usesfetch()and SQL. - No assumed runtime. The processor endpoint is a normal H3 handler under
/_agent-native/. It runs wherever the rest of the framework runs. - No persistent in-memory state. The dedup map in the webhook handler is best-effort only; the SQL queue is the source of truth. Any cold start loses the dedup map but the queue stays consistent.
- Postgres + SQLite both supported.
claimPendingTaskusesRETURNINGon Postgres and a re-read on SQLite. No platform-specific SQL. - Self-webhook URL resolution. The processor URL is built from
WEBHOOK_BASE_URL,APP_URL, orURLenv vars (withlocalhost:3000as the dev fallback). Templates that change their public URL must keep one of these set.
Why Fire-and-Forget on Serverless Is Unreliable
Even though the webhook handler does fetch(processorUrl, ...) without
awaiting the response body, that initial dispatch is not guaranteed to
complete before the function freezes. In practice it usually does — the TCP
connect + write happens quickly — but the recurring retry job is the safety
net for the cases where:
- The serverless platform froze the handler before the outbound
fetchflushed its bytes. - The processor function 502'd or cold-started slow enough to time out.
- The processor itself was killed mid-agent-loop (function timeout, container shutdown, deploy mid-run).
Tasks stuck in pending for >90s or processing for >5min get re-fired up to
3 times. After 3 attempts they're marked failed permanently so we stop
spamming the processor.
Never assume the initial fire-and-forget succeeded. Always rely on the queue + retry job for at-least-once delivery.
Debugging Checklist
- Platform sent the webhook? Check the platform's delivery logs (Slack
admin, Telegram
getWebhookInfo). - Webhook handler returned 200? If not, the platform retries — look for duplicate task rows. Signature failures return 401.
- Task in the queue?
SELECT * FROM integration_pending_tasks WHERE external_thread_id = '...' ORDER BY created_at DESC LIMIT 5. - Status?
pendingmeans the processor never picked it up — check that_process-taskis reachable from the box itself (the self-fetch must work over the public URL).processingfor over 5 minutes means the processor died mid-run — the retry job will pick it up. - Failed? Check
error_messageandattempts. After 3 attempts the row is parked atfailedand won't be retried. - Reply not delivered? The processor likely succeeded but
adapter.sendResponsefailed — check the adapter's outbound logs.
Related Skills
server-plugins— How/_agent-native/routes get mountedrecurring-jobs— Pattern the retry job followsactions— When to use an action vs a webhooksecrets— Registering platform tokensonboarding— Surfacing setup steps for each platformdelegate-to-agent— How the processor invokes the agent loop