el-supervisor

star 0

Always-on meta-agent watchdog for the entire R9 plantel. Monitors every agent, cron, pipeline, API, and data flow. Detects drift, failures, and stuck pipelines. Auto-repairs what it safely can (ghost lead reset, endpoint re-trigger, rate limit backoff). Escalates to Telegram only when human judgment is needed. In evolve mode, observes 7d patterns and proposes concrete code/flow improvements. Objective: Jorge no tiene que estar detrás de nada — el sistema se autogobierna.

geocarp24 By geocarp24 schedule Updated 6/8/2026

name: el-supervisor description: Always-on meta-agent watchdog for the entire R9 plantel. Monitors every agent, cron, pipeline, API, and data flow. Detects drift, failures, and stuck pipelines. Auto-repairs what it safely can (ghost lead reset, endpoint re-trigger, rate limit backoff). Escalates to Telegram only when human judgment is needed. In evolve mode, observes 7d patterns and proposes concrete code/flow improvements. Objective: Jorge no tiene que estar detrás de nada — el sistema se autogobierna. version: 1.0 owner: ALEX tenant: any (config in agents/tenants/.json) cadence: heartbeat every 15 min · deep every 1h · evolve weekly Sat 07:00 CT runtime: node >= 22 + claude CLI

El Supervisor — Autonomous Operations Watchdog

Filosofía (orden directa de Jorge 2026-04-23)

"Debe de construir un agente supervisor para que todo esté funcionando bien, y si hay algún error te pueda notificar para que tú arregles los problemas o que este mismo agente pueda solucionar errores y hacer los reportes necesarios. La idea es de tener un sistema autonomista e inteligente que sepa evolucionar por sí mismo sin que yo tenga que estar detrás de ello."

Traducción operacional:

  1. Monitorear continuamente. No esperar a que Jorge pregunte.
  2. Auto-reparar lo trivial. Solo molestar a Jorge con decisiones de alto impacto.
  3. Aprender de los errores. Si algo falla 3 veces, proponer fix permanente.
  4. Evolucionar el sistema. Weekly evolve mode busca mejoras proactivas.
  5. Reportar con criterio. Critical=Telegram inmediato. Warning=digest. Info=silent log.

Modes

Mode Frequency Scope Tokens
heartbeat every 15 min quick health: APIs alive, log freshness, pipeline sanity counters 1-3K
deep every 1 hour full component health + ghost detection + auto-repair + drift analysis 5-12K
evolve weekly Sat 07:00 CT 7-day pattern recognition, root cause hypotheses, fix proposals, Ops_Insights writes 15-30K
incident manual / auto-triggered immediate forensic dive when heartbeat detects red 5-15K

What it monitors

Layer 1 — Infrastructure

  • Hostinger crons responding (HTTP 200 + recent log entries)
    • fer_first_contact.php → expect fc_sms_sent entries within last 4h in 9am-7pm CST window
    • fer_seguimiento.php → expect seg_sms_sent entry within last 25h (daily cron)
    • fer_stale_cron.php / fer_morning_brief.php → daily
  • API keys alive: Airtable, OpenPhone/Quo, Telegram bot, Anthropic
  • Deploy workflow health (last successful run < 48h)
  • DKIM/DMARC/SPF DNS records valid (monthly drift check in evolve)

Layer 2 — R9 Agents

  • Last run of each agent (Mercader, Posicionador, Escriba, Remitente, Cazador, Cartógrafo, Clasificador, Analista, Espía, Auditor)
  • Cadence adherence — if agent supposed to run weekly but last run >10d → flag
  • Score trajectory — if Mercader score dropped >10 points week-over-week → flag
  • Token usage spike (>2x 7-day avg) → cost alert

Layer 3 — Pipeline sanity (Contacts/Leads funnel)

  • Ghost contacts: Stage=Contacted step ≥1 with Last contact date >= 2 days stale → auto-reset to To Be Contacted
  • Ghost Seguimiento: Stage=Seguimiento, Next follow up date past AND Last contact date >= 5 days → auto-push step
  • Bottleneck detection: Stage="New" > 100 AND no change in 7d → notify Jorge (he is the gating hand)
  • Revenue loss canary: leads scored Hot by Clasificador but not contacted within 1h → alert
  • Fer inbound coverage: every message.received webhook should have a matching claude_ok + sms_sent within 2 min. If orphaned → flag

Layer 4 — Data quality

  • Duplicate detection (same Phone1 across multiple Contacts)
  • Missing required fields on Hot leads (no email, no property address)
  • Airtable HTTP 422 errors > 3/day → schema drift, alert
  • Email bounce rate > 5% → deliverability issue

Layer 5 — Business metrics

  • SMS sent per day baseline (alert if -50% vs 7d avg)
  • Contacts entering pipeline per week
  • Conversion rate Warm → Hot → Deal
  • If Mercader/Posicionador/Cazador scores diverge sharply from trend → investigate

Auto-repair actions (no human needed)

Condition Action Safety rail
Ghost contacts in "Contacted" stuck 2+ days Reset to "To Be Contacted" + clear steps + clear dates Max 25/run, log every reset to Ops_Health.auto_fixes_log
Ghost "Seguimiento" without Next date, Last > 5d Set Next = today → cron will pick up Max 25/run
Cron endpoint stale >4h in-window HTTP POST to endpoint to force a tick Max 1 per endpoint per hour (anti-flood)
Hostinger 504 during manual probe Retry with ignore_user_abort pattern in next tick
OpenPhone returns 429 rate limit Exponential backoff, resume next heartbeat Max 3 consecutive backoffs before alert
Airtable 422 on known field Log + skip record, do NOT loop
Duplicate Contact detected (same Phone1) Merge: keep oldest, link newer data as Notes Requires Notes & Activity table

Alert tiers (Telegram)

  • 🚨 CRITICAL (instant): API key revoked, cron dead for >6h, pipeline frozen, revenue-lost hot lead, security event
  • ⚠️ WARN (digest hourly): single agent failed, token spike, score dropped, email bounce climb
  • 🟡 NOTICE (daily 8am CT digest): auto-repairs applied, minor drift, weekly pattern previews
  • GREEN (silent): everything OK, only logged to Ops_Health

Evolve mode — el corazón del "sistema autoevolutivo"

Weekly Saturdays 07:00 CT, El Supervisor hace:

  1. Lee últimos 7 días de Ops_Health (168 heartbeats + 168 deep checks)
  2. Cuenta top 10 errores recurrentes + top 5 warnings
  3. Para cada patrón, usa Claude para generar:
    • Hipótesis de causa raíz (por qué sucede)
    • Fix concreto (file path + diff + test plan)
    • Impacto estimado (tiempo ahorrado, $ recuperado, leads salvados)
  4. Clasifica:
    • Fix trivial + seguro → aplica automáticamente + log a Ops_Insights con status=auto-applied
    • Fix complejo o riesgoso → escribe a Ops_Insights con status=open + Telegram digest al Jefe
  5. Genera reporte semanal en Markdown → PDF via make-pdf skill → Telegram a Jorge (domingo 8am)

Ejemplos de fix auto-aplicables:

  • Si ghost-reset se ejecutó >10 veces en una semana → el cron está roto, escala via Telegram
  • Si Phone1 422 aparece 5 veces → genera fix de data validation en fer_first_contact.php + crea PR automático

Ejemplos de fix escalados:

  • "Clasificador está scoring 80% Warm y 0% Hot — el prompt puede estar mal calibrado. Sugiero ajustar thresholds (evidencia adjunta)."
  • "Remitente tiene bounce rate 8%. Investigar lista, o agregar double opt-in."

Tablas Airtable

  • Ops_Health [REDACTED_AIRTABLE_TABLE_ID] — 1 row per run, rolling state of health
  • Ops_Insights [REDACTED_AIRTABLE_TABLE_ID] — accumulative knowledge base de mejoras detectadas

Invoking

node agents/supervisor/supervisor.mjs --tenant pinnacle --mode heartbeat
node agents/supervisor/supervisor.mjs --tenant pinnacle --mode deep
node agents/supervisor/supervisor.mjs --tenant pinnacle --mode evolve
node agents/supervisor/supervisor.mjs --tenant pinnacle --mode heartbeat --dry-run

R9 compliance

  • ✅ Always-on (heartbeat every 15 min = most frequent R9 agent)
  • ✅ Dedicated domain (meta-ops / system watchdog)
  • ✅ Tenant-aware (R8) — cada tenant tiene su propio Supervisor
  • ✅ Mobile-first alerts (Telegram formateado para celular)
  • ✅ Billing-ready (tokens_used per run separated per mode)
  • ✅ Audit trail (every check, every auto-fix, every alert → Ops_Health + Ops_Insights)
  • ✅ Self-improving (evolve mode writes fixes, some auto-apply)

Author log

  • 2026-04-23: v1 shipped. 10mo agente del plantel R9. Crítico para la visión "sistema autónomo" ordenada por Jorge.
Install via CLI
npx skills add https://github.com/geocarp24/investoros-web --skill el-supervisor
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator