short-circuit-activation-discipline - SKILL.md Agent Skill

name: short-circuit-activation-discipline description: The rule for safely activating any short-circuit (regex / static-map / Haiku gate) that bypasses Sonnet. Before activating, list the entities the LLM extracts today and verify the deterministic path produces them at the SAME fidelity — or plan a hybrid Haiku-extraction step. Intent precision is not extraction precision. Use before activating any Tier 0/E3/E4/E5 gate or any LLM bypass. Triggers - activate short-circuit, bypass Sonnet, static map, regex gate, Tier 0, E3, E4, E5 activation.

Short-Circuit Activation Discipline

Short-circuits save cost by skipping Sonnet — but the LLM is often doing MORE than classifying intent. Before activating any bypass, you must account for every sub-task the LLM currently handles, not just the label it emits.

Source of truth: docs/plans/2026-05-09-recovery-hardening-plan.md + the CLAUDE.md "Short-circuit activation discipline" rule. Pairs with model-routing-for-ambiguity and prompt-eval-gate.

The Rule

For EVERY short-circuit candidate (regex fast-path, static-map reply, Haiku gate), list the entities the LLM extracts today and verify the deterministic path can produce them at the SAME fidelity — OR plan to call Haiku/Sonnet for just-that-extraction as a hybrid step.

Intent-classification precision is not extraction precision. Don't trust a "100% precision" claim without scoping which sub-task it covers.

Canonical Case — E4 add_reminder (2026-05-17)

The regex ^תזכיר[יה]\s+(?:לי\s+)?(.{2,300}) had 100% precision in a 20-row eyeball — every match was a real reminder request.

But Sonnet wasn't only classifying. It was ALSO doing Hebrew natural-language time expansion (מחר בעשר → ISO send_at) that parseReminderTime (index.ts) cannot do — it handles structured 17:00/ב-5 only.

Naively activating the short-circuit would either:

Silently mis-fire times (ghost-reminder trust catastrophe), or
Fall back to Sonnet anyway (defeating the cost saving).

Decision: E4 activation DEFERRED to E5 (Haiku-as-extractor handles intent + time expansion in one cheap call).

Why This Matters Most

A short-circuit that saves $0.01 but mis-fires a reminder time is a far worse loss than the token saved — a wrong time is a phantom-reminder trust failure. The whole tuple (intent + all extracted entities) must be correct, not just the label.

Activation Bars (Gate G1, per-intent, not global)

≤ 0.3% false-negative rate
≤ 1% false-positive rate
≥ 50 firings per intent
ZERO canonical wrong-intent cases (the תזכירי לי מה ...? shape is pinned in tests/e5_corpus_pinning_test.ts)

E5 covers ~91% of Solo messages, so a 1% FN rate is a trust catastrophe at scale; the 0.3% bar holds the absolute false-negative count flat.

Checklist Before Activating Any Short-Circuit

List every entity the current LLM path extracts for this intent (intent, body, time/ISO, target, amount, etc.).
Prove the deterministic path produces each entity at the same fidelity — OR insert a Haiku/Sonnet extraction step for the gaps (hybrid).
Measure per-intent FN/FP via offline replay (prompt-eval-gate) over ≥ 50 firings.
Confirm zero canonical wrong-intent cases in the pinned corpus.
Only then flip the activation flag; keep it reversible (readBotFlag 10s TTL).

Existing Gates That Passed This Bar

The Tier 0 gates (pure-ack, bedtime, quick-undo) were safe to activate because their deterministic output fully matches what the LLM would produce for those narrow shapes:

Pure-ack (≤12 chars, no ?, no digits, no Hebrew action verb): the LLM's only contribution is a brief acknowledgment — the static map replicates it exactly.
Bedtime (לילה טוב × 2 in window): the LLM emits a standard goodnight reply — the static reply is identical in effect.
Quick-undo ("תמחקי"/"בטלי" within 60s): the LLM would call remove_last_action; the deterministic path does the same via getLastBotAction.

add_reminder did not pass because time expansion is a non-trivial extraction gap that the deterministic path cannot close.

Shadow First, Then Activate

Run in shadow mode (fire-and-forget annotation, Sonnet still replies) long enough to measure real-world FN/FP against Sonnet's actual executor output. The offline replay harness (scripts/e5_offline_replay.py) is the reference tool. Only after the bars above are met should you flip the activation flag in bot_settings.