auto-approve-architecture - SKILL.md Agent Skill

name: auto-approve-architecture description: Internals of Leap's Claude auto-approve flow and the CLI state machine (CLIStateTracker)- the PermissionRequest hook, the AskUserQuestion exclusion, per-session auto_send_mode isolation and pin-file robustness, up/down arrow handling during dialogs and slash-command pickers, the on-input no-reset rule, and the TUI-menu fallback. Use this when modifying auto-approve behavior, the state tracker, hook handling in leap-hook-process.py, or Claude permission/dialog detection. user-invocable: false

Auto-Approve Architecture (Claude)

ALWAYS-mode auto-approve has two layers — the primary is hook-based and never renders a dialog; the fallback is the legacy TUI-menu path that types "1\r" into a rendered prompt.

Primary: PermissionRequest hook (hook_script auto_approve). Configured in ClaudeProvider.configure_hooks with matcher ^(?!AskUserQuestion$).* (every tool EXCEPT AskUserQuestion — see "AskUserQuestion exclusion" below). The hook script handler _handle_auto_approve() in leap-hook-process.py reads the session's auto_send_mode from .storage/pinned_sessions.json[tag] (with global fallback to .storage/settings.json) and, in ALWAYS mode, emits {"hookSpecificOutput": {"hookEventName": "PermissionRequest", "decision": {"behavior": "allow"}}} to stdout, then sys.exit(0) so the trailing print('{}') in __main__ doesn't append a second JSON object after the decision. SystemExit inherits from BaseException (not Exception), so it propagates past the except Exception block and the trailing print('{}') never runs — leaving exactly one JSON object on stdout. PAUSE mode returns normally so the trailing print('{}') runs, telling Claude "no decision" so the dialog renders normally.

AskUserQuestion exclusion. AskUserQuestion is the one tool whose entire purpose is to elicit a user choice. If PermissionRequest returns "allow" for it, Claude interprets that as "skip user interaction" — the question dialog is never rendered and the tool returns an empty answer set to the model ("Allowed by PermissionRequest hook" with no selections), corrupting the very flow the user invoked it for. The negative-lookahead matcher excludes the exact tool name AskUserQuestion so its PermissionRequest goes unanswered, Claude renders the dialog, and the user actually picks. Pinned by test_claude_permission_request_matcher_excludes_ask_user_question.

Per-session isolation of auto_send_mode. The hook fallback (per-tag pin → global → 'pause') is the protocol the hook script uses, but in steady state the global fallback is almost never hit — four guarantees in the server + monitor keep per-tag pins authoritative:

Snapshot at LeapServer.__init__ (server.py:134-145). Resolves global → per-tag pin and writes the result back into the pin BEFORE CLIStateTracker initialises, so later changes to the Settings dialog's global default can't retroactively flip this session's hook behavior.
set_auto_send_mode handler writes ONLY the per-tag pin — never the global (server.py:582-602). The original cross-session leak ("I toggled one session and all of them auto-approved") came from a stray save_settings(...) in this handler; per-session toggles must stay per-session. Pinned structurally by test_save_settings_not_imported_by_server and behaviourally by test_handler_does_not_modify_settings_file in test_auto_send_mode_persistence.py.
SessionMixin._merge_sessions preserves auto_send_mode in pin_data by pulling it from the live server's status response (s.get('auto_send_mode')) — without this, the monitor's first auto-pin write for a brand-new session would build pin_data from a stale in-memory cache that lacked the field, blowing away the server's snapshot. Pinned by test_merge_sessions_auto_send.py.
All monitor-side pin writes are per-tag (no save_pinned_sessions(self._pinned_sessions) full-state saves anywhere). Three targeted helpers in monitor/pr_tracking/config.py — update_pinned_session_field(tag, field, value), write_pinned_session_entry(tag, entry), remove_pinned_session_tag(tag) — each does a read-modify-write that touches only the requested tag's entry. The write_pinned_session_entry helper additionally treats auto_send_mode as server-owned: disk's value always wins over the caller's (possibly stale) in-memory copy, so a monitor refresh that ships an old auto_send_mode in pin_data can't clobber a fresh server-side toggle. Pinned by test_set_auto_send_mode_isolation.py and test_pinned_sessions_corruption.py. Residual narrow race: if a different writer mutates a different tag between this helper's read and write (~5–10ms window), that writer's change is lost; full elimination would need fcntl.flock — not currently in. All three helpers (and the server's symmetric _save_pinned_auto_send_mode) also have a corrupt-disk recovery path: a malformed pin file is treated as empty on read so the next write produces a valid JSON file again, restoring the self-healing behavior the pre-fix save_pinned_sessions had.

Pin-file readers on the __init__ critical path (_load_pinned_auto_send_mode, validate_pinned_session, build_auth_fetch_url, load_settings, load_pinned_sessions) catch (OSError, ValueError) rather than (OSError, JSONDecodeError) — UnicodeDecodeError is a ValueError and the narrower except let a malformed pin file crash session startup. They also isinstance-guard non-dict roots / entries and validate auto_send_mode is one of (PAUSE, ALWAYS) before propagating into CLIStateTracker. Tested in test_pinned_sessions_corruption.py.

CHURNING state (Claude only). A turn can end (Stop hook fired, idle prompt shown, input accepted) while a background Claude Monitor is still running and will re-invoke the session. CLIState.CHURNING surfaces that distinctly from a plain "done, awaiting you" idle. It is compute-only: get_state() returns CHURNING but the internal self._state stays IDLE, and CHURNING is held out of WAITING_STATES/PROMPT_STATES/signal handling so none of the transition machinery is touched. The refine is a single tail check in get_state: if result == IDLE and self._background_active: return CHURNING.

Detection (ClaudeProvider.background_work_state, tri-state Optional[bool]). Returns True when a Monitor marker is on screen (the activity line N monitor still running, matched anywhere in the tail, or the persistent mode line · N monitor ·, whose count is matched on the mode-line row — identified by its stable markers ← for agents / shift+tab to cycle (the arrow glyph is load-bearing — a bare for agents also occurs in response prose, and a prose row mistaken for the mode line would return a false False and drop the sticky churn flag) — wherever that row is); False when the idle mode line is rendered with no marker (Monitor finished); None on an ambiguous screen (blank/partial repaint, mid-turn text). Two subtleties that were live bugs: (a) the tail is anchored to the last non-blank row, not display_lines[-N:] — a tall terminal with a short conversation pads many blank rows below the footer, which a naive bottom-slice grabs instead of the mode line; (b) the count is read on the mode-line row, not a fixed last row — a trailing ctrl+v to paste hint can displace the mode line.
Sticky flag (CLIStateTracker.on_output). _background_active is updated only when background_work_state is non-None: set True on a marker, cleared False only on a clean monitor-free idle mode line, left unchanged on None. Recomputing it unconditionally from the live screen would briefly drop it to False on the blank buffer right after a get_state _reset_screen() (or a partial repaint during the quiet wait between Monitor events) — exactly what let the auto-sender dispatch a queued message into a churning session.
churn_queue_mode mirrors auto_send_mode end-to-end (states.py ChurnQueueMode(str, Enum) SEND/WAIT, default WAIT): the __init__ global→pin snapshot, the per-tag-only set_churn_queue_mode socket handler + _save_pinned_churn_queue_mode, the _merge_sessions pin_data field, and write_pinned_session_entry's server-owned strip-and-carry. The auto-sender gate is is_ready_for_state(CHURNING) == (churn_queue_mode == SEND) — so default WAIT holds the queue while churning. The monitor must read churn_queue_mode from the status response into the session dict (session_manager.py, beside auto_send_mode) or the Queue-menu checkmark falls back to WAIT forever and toggling appears stuck while the live server keeps dispatching.
Slack. OutputCapture.on_state_change/write_current_state map CHURNING→IDLE before the allowlist check, so a turn that ends RUNNING→CHURNING still posts its response.

Covered by tests/unit/test_churning_state.py.

↑/↓ during mid-RUNNING dialogs and slash-command pickers. Two distinct classes of in-CLI UI keep the state at RUNNING while ↑/↓ need to reach the CLI:

AskUserQuestion's question dialog fires no Notification hook (it's a built-in tool, not an MCP elicitation), so state stays RUNNING until the 5 s cursor+silence fallback flips it to NEEDS_PERMISSION.
Slash-command pickers (/resume, /mcp, /agents, /config, /effort, /model, /memory, /login, /doctor, /usage, /bug, /permissions, …) fire no hook at all and leave state in RUNNING for the entire time the picker is open.

In both cases the server's input filter (server.py:3506) would normally see ↑/↓ as RUNNING-state arrows and steal them for history recall, leaving the user unable to navigate the picker. The fix is a screen check: the input filter calls CLIStateTracker.screen_has_active_dialog() and passes ↑/↓ through whenever it returns True. Two complementary predicates make up the check:

provider.is_dialog_certain(tail_compact) — strict permission-dialog footer (Entertoselect + Esctocancel in the compact form of the last 5 non-blank rows) or numbered-menu cursor (❯1.). Kept strict because the same predicate gates state transitions where false positives stick state in NEEDS_PERMISSION for 60 s.
not provider.is_idle_prompt_visible(filled_rows) — structural detection of Claude's standard idle input box: a ─ HR row immediately followed by a ❯ input row, within the last _IDLE_TAIL_WINDOW non-blank rows (HR rows must be ≥_MIN_HR_LEN (40) chars and contain only ─/whitespace, so inline ── widgets like the /effort slider axis — which carries a ▲ — are rejected). A closing bottom ─ HR is present on some Claude builds and absent on others (the footer sits directly under the input row, with no second rule), so it is not required — only the top-HR→❯ pairing. When that pairing is gone from the bottom of the screen something is taking it over — a slash-command picker, the trust dialog, a permission dialog that didn't match the strict footer — and ↑/↓ belong to that something, not to history recall. Intentionally structural so new Claude pickers added next month work without us enumerating their footer text. Falls back to True (assume idle visible) when the screen has fewer than _IDLE_DETECT_MIN_ROWS non-blank rows, so transient / boot-time screens preserve the legacy strict-dialog-only behaviour.

Non-Claude providers (Codex / Cursor / Gemini) inherit is_idle_prompt_visible defaulting to True, so that leg is a no-op for them — but see the generic detector next, which is what makes their dialogs work.

Generic selection-dialog detection (all providers). screen_has_active_dialog() calls provider.screen_shows_selection_dialog(filled) first, before the dialog_patterns short-circuit — a CLI-agnostic detector (base CLIProvider) that fires on a numbered ›/❯/▶ selection cursor (› 1.) or a footer line carrying confirm/cancel/navigate hints (esc to cancel / enter to confirm / ↑/↓ to navigate) that looks like a footer — ≥2 distinct hints, a footer separator (·/•), or a short (≤40-char) hint-only line — rather than a long prose sentence that merely quotes one hint phrase. The footer check is cursor-glyph independent (it does NOT require a ›/❯), so it catches pickers whose selection marker is something else (Gemini/Cursor, future CLIs). A bare ›/❯ is still insufficient (it appears in idle prompts — Codex's ghost-text hint, Claude's ❯), which is why the cursor leg requires a numbered option. This is what lets ↑/↓ navigate Codex dialogs (empty dialog_patterns → the old short-circuit returned False and arrows were stolen for history recall — the "stuck arrows in a Codex multi-option dialog" report) and hardens Gemini/Cursor non-permission pickers, without each provider enumerating its footers. Because screen_has_active_dialog() is consumed only by the ↑/↓ input filter, a false positive is cheap (the arrow just reaches the CLI's native handling). Claude's is_dialog_certain + structural is_idle_prompt_visible path still runs after, unchanged. Pinned by TestScreenHasActiveDialog in test_state_tracker.py (idle-visible / picker shapes / dialog scrolled-out cases, plus test_codex_selection_dialog_detected_despite_empty_patterns and test_codex_idle_prompt_is_not_a_dialog) and by TestClaudeProvider::test_claude_idle_prompt_* in test_provider_behaviors.py (sandwich + single-HR-box detection, single-rule-in-prose rejection, picker shapes, short-inline-rule rejection, picker-focused-row rejection).

The running→idle cursor+silence flip must also respect is_idle_prompt_visible (don't blank an interactive UI out from under the arrow gate). The screen_has_active_dialog() arrow gate above only protects ↑/↓ while the UI is still in pyte's buffer. The running→idle cursor+silence fallback in get_state (~5 s of output silence with the cursor visible) would _reset_screen() on its way to IDLE, and it only side-stepped that for a dialog whose footer matched the strict is_dialog_certain form (which promotes to NEEDS_PERMISSION instead). That left a gap — confirmed live against real claude 2.1.162 with Leap's real tracker wired to the PTY: any RUNNING-state interactive UI whose footer is not Enter to select + Esc to cancel fell through after ~5 s of user deliberation to RUNNING → IDLE + _reset_screen(). Real cases that miss the strict footer: every slash-command picker (/model → Enter to set as default · s to use this session only · Esc to cancel; /resume → … Esc to cancel; both lack Enter to select, and with the ❯ cursor on a later option the ❯1. numbered-menu fallback misses too), plus alternate/older Claude dialog footers that genuinely exist in the binary (Esc to close on the tabbed multi-question / /agents view, Enter to approve … Esc to cancel, the multi-select Space to toggle, Enter to confirm …). Once the screen was blanked, screen_has_active_dialog() read "no dialog", so ↑/↓ were stolen for history recall (the "arrows get stuck in a picker / multi-choice question after a few seconds" report), and the false IDLE let the auto-sender flush a queued message straight into the open UI. The fix: the cursor+silence running→idle block stays RUNNING (no reset) when both not provider.is_idle_prompt_visible(filled) and (provider.has_selection_cursor(filled) or provider.has_interactive_footer(filled)), placed after the transcript_says_running() / interrupt / post-answer-grace guards, so a genuinely silent in-flight tool is still held by the transcript guard and only the final would-be idle flip is intercepted. The box-absent signal alone is too broad — plain response text (a numbered list, a long body ending in > ) also lacks the idle box yet must still idle, so a not is_idle_prompt_visible-only guard wrongly held it RUNNING (caught by test_dialog_false_positives). A real picker/dialog additionally shows either a ❯/› selection cursor on a focused option (has_selection_cursor scans the last _IDLE_TAIL_WINDOW rows) or, for cursor-less UIs like the /agents tabbed view, a nav/dismiss footer on the bottom row (has_interactive_footer matches distinctive markers — to navigate / Esc to close / Esc to cancel / Space to toggle / Enter to confirm — on the last non-blank row only, so prose mentioning Enter to select mid-sentence with a > prompt last doesn't match); plain response text has neither. (When the idle box IS present, is_idle_prompt_visible is True so the guard never reaches the cursor check — the genuinely-idle prompt's own ❯ input row can't trip it. This is exactly why box detection must cover single-HR builds: an earlier two-HR-sandwich requirement returned False on builds that render only a top rule, so after a no-Stop-hook idle — e.g. the /cost slash command, which fires no Stop hook so the state falls to this cursor+silence path — the guard reached the cursor check, matched the prompt's own ❯, and wedged the session in RUNNING forever. The detector now requires only the top-HR→❯ pairing.) Naturally scoped: other providers' is_idle_prompt_visible defaults True (and has_selection_cursor / has_interactive_footer default False) → no-op; the < _IDLE_DETECT_MIN_ROWS short-screen shortcut still returns "idle visible" so /clear-style sparse screens idle exactly as before. This generalizes protection beyond the strict footer — AskUserQuestion itself happens to render Esc to cancel in current Claude (so it promotes to NEEDS_PERMISSION and never needed this guard), but the guard covers every non-matching footer (pickers, Esc to close, future pickers) without enumerating them. Note the cursor-hidden edge: a picker that hides its cursor skips the cursor-gated 5 s block and falls to the 60 s safety net (rare for input UIs — Claude pickers keep the cursor visible — and the 60 s net is deliberately left ungated so the hung-silent-tool escape hatch still fires). Pinned by TestInteractiveUiKeepsRunningOnSilence in test_state_tracker.py (picker stays running, arrows stay navigable, genuine idle box still idles, plain text without a cursor idles, footer-only dialog without a cursor stays running) and by TestDialogFalsePositives (numbered list / scrolled-out phrases idle, not held RUNNING).

Answering a dialog must NOT _reset_screen() (on_input, gated on from_prompt). A multi-question AskUserQuestion renders as one tabbed dialog; answering one question (Enter from a PROMPT state) advances to the next question via an Ink incremental repaint that never re-emits the (unchanged) footer. If on_input resets the pyte screen on that answer, the footer is wiped and — for the ~5 s until Claude's next full re-render — the live screen has no dialog footer. That single desync drove two bugs, both confirmed against a live session log: (1) the cursor+silence check in get_state reads "no dialog" (is_dialog_certain False) and flips RUNNING → idle, falsely marking the still-pending question as done and letting the auto-sender dispatch a queued message INTO the dialog; (2) the ↑/↓ input filter's screen_has_active_dialog() likewise reads "no dialog" and steals the arrows for history recall (the "arrows dead on the 2nd question, but typing the number works" report). The fix: on_input skips _reset_screen() when the Enter answers a PROMPT state (needs_permission/needs_input) — keeping pyte truthful so the footer survives the incremental repaint and both the promotion (→ needs_permission) and the arrow check stay correct. This mirrors the running→needs_permission promotion path, which already skips the reset for the same reason. IDLE (a fresh prompt) and INTERRUPTED (an interrupt reply) still reset — there the prior screen is stale scrollback with nothing rendered incrementally on top. Pinned by TestDialogAnswerKeepsScreen in test_state_tracker.py (footer-preserved-so-navigable, no-false-idle, and the IDLE/INTERRUPTED still-reset guards).

Holding a hookless dialog at NEEDS_PERMISSION (no Permission↔Idle oscillation). Because AskUserQuestion writes no permission signal, the rendered footer on screen is the only reliable evidence it's still pending — a first-action question is never even written to the transcript while it waits. Two rules keep the promoted state from flickering back to Idle: (1) the cursor+silence running→needs_permission promotion does not _reset_screen() (matching the _handle_idle_output proactive promotion) — resetting desyncs pyte from Ink, which then only partially repaints and never restores the footer, so the waiting→idle dismissal checks would falsely read "dialog gone" and demote; (2) the 60 s stuck-waiting safety timeout keeps the waiting state while has_dialog_indicator still matches the live screen, scoped to PROMPT_STATES so a stuck INTERRUPTED still recovers. Pinned by test_incremental_repaint_after_promotion_keeps_dialog (faithful Ink-style incremental-repaint repro: demotes to idle pre-fix, holds post-fix) plus the test_*_dialog*on_screen cases in TestSafetyTimeouts.

Navigating a cursor-hiding dialog must NOT flip NEEDS_PERMISSION → RUNNING (_try_waiting_to_running_via_cursor). The poll heuristic that promotes a hand-answered prompt to RUNNING keys off "_user_responded + cursor hidden" — on_input sets _user_responded on any keypress in a waiting state, and Ink TUIs hide the cursor while processing. But a multi-option / multi-question AskUserQuestion hides the cursor the entire time it is open, so merely navigating it (Tab between questions, ↑/↓ between options) trips both conditions while the dialog is still pending. The false flip is doubly destructive: it drops the in_prompt ↑/↓ passthrough in the input filter, and its _reset_screen() wipes the live dialog out of pyte, so screen_has_active_dialog() then reads "no dialog" and ↑/↓ are stolen for history recall — the dialog goes un-navigable by arrow (the recurring "arrows stuck in a multi-option question" report; confirmed in a real state_logs capture as NEEDS_PERMISSION→running (user_responded + cursor hidden at poll) immediately after Tab navigation, followed by a permanent "composing" wedge as the stolen arrow's recall refilled the input buffer). The fix mirrors the sibling waiting→idle dismissal check (_try_waiting_to_idle_via_dismissal): read the live screen under the same _screen_lock and only treat the hidden cursor as "moved past the dialog" once has_dialog_indicator(compact) is gone; while the dialog footer is still on screen the keypress was navigation, not an answer, so stay NEEDS_PERMISSION (which keeps in_prompt True so the input filter forwards ↑/↓ to the dialog unconditionally, even in the transient window after a proactive-promotion reset where the screen is momentarily blank). Scoped to PROMPT_STATES so INTERRUPTED (no dialog_patterns footer; its prompt is matched elsewhere) keeps its original cursor-only path and still recovers. This is distinct from the Copilot dialogs_hide_cursor flag, which disables the heuristic wholesale for providers whose dialogs always hide the cursor — Claude can't use that blunt switch because its Yes/No permission menu keeps the cursor visible and still needs the heuristic (the screen-gone check is the precise version: it works for both Claude dialog types). Pinned by test_user_responded_cursor_hidden_stays_waiting_while_dialog_shown in TestUserRespondedFlag (contrasts the existing test_user_responded_cursor_hidden_goes_running, where the dialog is gone and the flip is correct).

The waiting→idle dismissal must require POSITIVE idle evidence, not just an absent footer (idle_prompt_certain). A sibling of the bug above, via a different reset path. When a permission/question dialog is on screen and a signal promotes RUNNING→NEEDS_PERMISSION, _apply_signal_transition calls _reset_screen() (it must — the Late-Notification guard relies on that emptiness to detect a stale post-answer signal). Claude does not repaint an unchanged dialog, so pyte is left blank; a later partial repaint (e.g. moving the selection highlight in response to a forwarded ↑/↓) lands a row or two — enough to bump _last_output_time past _waiting_since and defeat the dismissal's existing output-after-waiting guard — but does not restore the footer and shows no idle box. The dismissal then reads has_dialog_indicator as "gone" and falsely demotes the still-open dialog NEEDS_PERMISSION→idle; once in_prompt is False the input filter steals ↑/↓ for history recall (confirmed in a real capture as RUNNING→needs_permission (signal) immediately followed by needs_permission→idle (indicator gone + cursor visible + silence) with the dialog still on the user's screen — the AskUserQuestion was navigable by Tab but dead to arrows). Removing the reset is not an option (it breaks the Late-Notification stale-signal guard, whose "empty screen + empty snapshot" signature depends on it — 3 tests pin that). The fix is in the dismissal instead: demote a PROMPT_STATES dialog to idle only when the provider gives positive idle evidence — provider.idle_prompt_certain(filled) is not False. Claude returns the strict idle-box signature (top-HR → ❯, via _idle_box_rendered, with no < _IDLE_DETECT_MIN_ROWS sparse-screen fallback — that fallback would wrongly pass the partial-repaint fragment); a genuine self-dismiss to idle renders that box, a dialog merely lost from pyte does not. Base default is None ("no opinion") so non-Claude providers keep their legacy demote-on-indicator-gone behaviour, and the waiting→idle grace→RUNNING branch (post-answer resume) is untouched (it sits before the gate). Worst case if Claude self-dismisses straight to running (no idle box): the dialog is held in NEEDS_PERMISSION until the Stop-hook idle signal or the 60 s safety timeout — benign (state-label-only, arrows stay protected), and strictly safer than the old false-idle which could flush a queued message into a live turn. Pinned by test_dialog_lost_to_reset_then_partial_repaint_is_not_idled in TestStaleScreenContent; the genuine-dismiss tests (test_needs_permission_self_dismiss_detected_after_new_output, test_waiting_self_dismiss_still_idles_without_an_answer, integration test_cli_dismisses_dialog_without_signal) now feed Claude's real idle box (HR + ❯) as the positive evidence.

Signal-based ↑/↓ backstop in screen_has_active_dialog() (the desync family's belt-and-suspenders). The two fixes above each guard one exit path out of NEEDS_PERMISSION (the cursor-hidden flip, the cursor-visible dismissal), but both decide on screen content that a _reset_screen() can desync. As defense-in-depth at the layer that actually matters — screen_has_active_dialog() is consumed only by the ↑/↓ input filter — it now returns True whenever the signal file says a dialog is pending (_read_signal_state() in PROMPT_STATES), before reading the screen. The signal is the persistent record: the hook wrote it, the flip paths deliberately do not delete it, and it survives the screen reset — so even if some heuristic flips state out of NEEDS_PERMISSION on a desynced screen (e.g. the cursor-hidden flip's absence-based has_dialog_indicator being fooled — the signal + cursor-hidden combination neither reported case hit, but which is reachable), ↑/↓ still route to the dialog instead of being stolen for history recall. It clears at the next idle (Stop hook), so recall resumes. The cost is exactly the cheap false positive this method's design already accepts: while a (possibly stale) permission/input signal is live, ↑/↓ reach the CLI rather than driving Leap recall — and in ALWAYS auto-approve mode no permission signal is written at all, so it is a no-op there. Pinned by test_returns_true_when_signal_pending_despite_desynced_screen in TestScreenHasActiveDialog.

Post-answer resume grace (no false idle into the first-token gap). Answering a mid-turn dialog (Enter from a PROMPT state) moves NEEDS_PERMISSION/NEEDS_INPUT → RUNNING, but Claude then resumes the same turn — AskUserQuestion is excluded from hook auto-approve, so in ALWAYS mode it's the one dialog answered by hand, and its answer is mid-turn, not end-of-turn. The model's first post-answer token can lag several seconds, while the dialog-dismissal render emits a tiny output burst within ~40 ms of the Enter. That burst moves _last_output_time past _running_since, opening the max(_last_output_time, _running_since) rebase gate, so the 5 s cursor+silence running→idle fallback then misfires on the first-token silence and the auto-sender flushes a queued message INTO the live turn (confirmed in a real state_logs capture: Enter…→running → 5.1 s silence → running→idle (cursor visible + output silent 5.1s) → ON_SEND 4 ms later → idle→needs_permission as Claude kept working). All three would-be guards are blind to this exact shape: the running-indicator only matches Compactingconversation; transcript_says_running() returns '' because the only assistant entry is the dialog's tool_use at ts <= _running_since (the answer bumped _running_since past it, tripping the if ts <= since: return '' guard in _classify_transcript_tail); and is_dialog_certain is False once the footer is cleared. The fix: on_input sets _awaiting_resume_after_prompt = from_prompt, and the shared helper _post_answer_grace_holds(silence_ref) gates both heuristic idle paths while that flag is set — capped at the safety-silence timeout (provider.silence_timeout or SAFETY_SILENCE_TIMEOUT=60 s), not an unconditional return, since these blocks run before the 60 s safety net and an early return would starve it (a genuinely hung post-answer turn must still recover). The two paths: (1) running→idle cursor+silence stays RUNNING; (2) waiting→idle cursor+silence — reachable because the running→idle block can re-promote RUNNING→NEEDS_PERMISSION off the still-on-screen answered footer (its grace check sits after that promotion), landing here when the footer finally clears — routes to RUNNING instead of idling (it must route, not just suppress: the waiting→idle signal path needs _user_responded, which the answer cleared, so staying NEEDS_PERMISSION could strand a real Stop-hook idle). The flag is cleared on every IDLE (top of get_state) and on on_send. The grace is free for all four providers because each writes an idle signal on turn end (Claude/Codex/Cursor Stop, Gemini AfterAgent) and the signal=idle path idles without consulting the flag — so a real end still idles promptly; only the unreliable heuristics are suppressed. Codex never reaches either branch (cursor_hidden_while_idle=True). INTERRUPTED never arms the flag (from_prompt is PROMPT_STATES-only), so interrupt-reply handling is untouched. Pinned by test_enter_from_waiting_stays_running_through_first_token_gap, test_post_answer_grace_still_idles_via_safety_timeout, test_on_send_clears_post_answer_grace, test_post_answer_grace_clears_on_idle, test_post_answer_stale_footer_repromote_routes_to_running (the waiting→idle secondary path), and test_waiting_self_dismiss_still_idles_without_an_answer (no-regression: flag-unset self-dismiss still idles) in TestSafetyTimeouts.

Composing guard (no false idle while the user types a prompt). The heuristic idle paths also misfire when the user composes the next prompt into a still-RUNNING session (type-ahead during the model's thinking / first-token latency): the cursor is visible, the typed keystrokes aren't echoed while the CLI is busy (so they don't bump _last_output_time), and a pause flips running→idle — a false "finished" notification, and the auto-sender could dispatch a queued message INTO the half-typed prompt. Confirmed in real state_logs: ON_INPUT state=CLIState.RUNNING data=b"f 'NotesCmdContext' " len=768 followed ~2 s later by running→idle (cursor visible + output silent). The fix: get_state(pty_alive, has_pending_input=False) takes a composing flag — the server passes bool(self._terminal_input_buf) or self._queue_capture_mode (unsubmitted text in the input box, or a ^^ queued message being composed) at both call sites. When set, the cursor+silence running→idle flip and the safety-silence timeout stay RUNNING (placed after the transcript / interrupt / post-answer-grace guards, mirroring the grace above, so interrupts and needs_permission promotions still fire). The authoritative paths are not gated — the signal=idle hook path and Codex transcript-completion still idle a genuinely-finished turn even while the user types — so no session can get stuck RUNNING; for hookless idles the gate releases the instant the box empties (_terminal_input_buf clears on Enter/Ctrl+C, shrinks on backspace/Ctrl+U). Because the gate only suppresses the heuristic fallbacks, a stale buffer can at worst delay an idle by one poll, never wedge it. is_ready (the auto-sender's readiness convenience) forwards the same flag so dispatch is gated identically — and the production auto-sender is already gated because it consumes the gated current_state via is_ready_for_state. Other consumers see it too: the monitor's idle notification reads the gated cli_state from the status response, and Slack's output_capture.on_state_change uses the gated current_state. Pinned by test_cursor_silence_idle_held_while_composing, test_safety_timeout_idle_held_while_composing, test_hook_idle_not_gated_by_composing, and test_is_ready_false_while_composing in TestSafetyTimeouts.

Heuristic-hold caps (a screen-misread can't wedge RUNNING forever) + the labeled idle-box border. The two RUNNING-holds that are screen-content heuristics — the post-answer resume grace and the picker/dialog interactive-UI guard — can mis-read the screen with no reliable user-recoverable release, so each is capped at the safety-silence timeout via the shared _heuristic_hold_cap() (provider.silence_timeout or SAFETY_SILENCE_TIMEOUT); past the cap they fall through to the idle fallback, so an is_idle_prompt_visible false-negative can no longer wedge RUNNING. The composing guard above is deliberately NOT capped — it's released the instant the input box empties (submit / Ctrl+C / clear), so it's a user-recoverable "still typing" hold, not a wedge, and capping it would re-fire the false "finished" notification it exists to prevent (this is why both composing guards — cursor+silence and safety-silence — stay bare if has_pending_input). The most common box false-negative is also fixed at the source: Claude can draw a short text badge into the idle box's top rule (e.g. ────psakdin-case-law-source────, an active-skill / model / plan-mode chip), which the strict pure-─ _is_prompt_box_hr rejected → box undetected → the interactive-UI guard matched the prompt's own ❯ and wedged (confirmed live: nushi.log held RUNNING ~58 min). _is_prompt_box_hr now accepts a rule carrying a short label, gated by an allowlist (_is_hr_label_safe: letters/digits/-_./() only), so a rule embedding graphics — table borders, block-element progress bars ████░░░░, sliders, geometric shapes, percent bars — is still rejected; the failure mode is asymmetric (a mis-classified border falls back to the cap, never a false-idle). Pinned by test_interactive_ui_guard_is_capped_and_recovers and test_claude_idle_prompt_visible_with_label_in_border.

Auto-sender dispatch safety (server.py _auto_sender_loop). Beyond gating on the composing-aware current_state, two direct guards protect the dispatch: (1) it never sends while the input buffer is non-empty (_terminal_input_buf / _queue_capture_mode), independent of state, so a half-typed prompt is never clobbered even past a hold cap; (2) a 2-consecutive-ready-poll debounce, so a single-poll false-idle can't flush a queued message mid-turn. A ^^ + Enter force-dispatch (_capture_force_dispatch) bypasses both.

Codex INTERRUPTED recovery (safety-waiting-timeout). The timeout's signal_state == current keep is scoped to PROMPT_STATES. INTERRUPTED writes its own interrupted signal, so an unscoped keep blocked the demotion forever for cursor_hidden_while_idle providers (Codex has no cursor+silence INTERRUPTED self-dismissal) — the "interrupt sticks in INTERRUPTED" failure mode. A stale INTERRUPTED now recovers after the timeout. Pinned by test_codex_interrupted_recovers_via_safety_timeout.

Critically: the auto_approve state does NOT touch the signal file. It's a pure hook decision; Leap's state machine stays RUNNING throughout, as if no permission had ever been needed.

This hook fires for subagent (Task tool) tool calls too, which the older Notification path could silently miss — Claude's Stop hook does not fire for subagents, so an entire multi-agent turn stayed RUNNING with _last_running_snapshot == [], and the Late Notification guard had no fallback content to verify the dialog against. The PermissionRequest hook sidesteps every TUI race because no dialog is ever rendered.

Fallback: TUI menu auto-approve (_try_auto_approve in server.py). Still wired up for two scenarios:

Older Claude versions that don't support PermissionRequest — the new hook entry is silently ignored by them, and approval falls back to detecting ❯ 1. Yes on the rendered menu and typing 1\r.
Defense-in-depth race — if PermissionRequest somehow doesn't fire (e.g. a future Claude bug, or an unrecognized matcher edge case), Notification(permission_prompt) still fires, the state tracker transitions to NEEDS_PERMISSION, and _try_auto_approve picks up the dialog.

The _try_auto_approve path itself was strengthened: the Late Notification guard at state_tracker.py:get_state formerly rejected RUNNING→prompt signals when no dialog patterns were on screen AND _last_running_snapshot was empty — that's exactly the multi-agent subagent shape. The guard now distinguishes the post-Enter stale signal (empty screen + empty snapshot, the freshly-answered-via-Enter signature) from a fresh subagent signal (screen has accumulated subagent output, snapshot empty because no idle transition during the turn). Only the empty-and-empty pair is treated as stale; anything else lets the signal through.

What auto-approve does NOT auto-handle. MCP Elicitation (Notification matcher elicitation_dialog) is not auto-approved — these are free-form input requests where Leap can't guess what to type. They surface to the user via NEEDS_INPUT. Permission-to-USE the elicitation tool is auto-approved (it's a tool call), but the resulting question dialog stays user-facing — that's the right asymmetry.

Other CLIs (Codex, Cursor, Gemini). The bug above is Claude-specific because Claude is the only one with subagents. Codex/Cursor have no permission hook at all (state tracker uses TUI detection); Gemini uses Notification(ToolPermission) but has no subagent concept. None of them get a PermissionRequest hook — the test test_other_providers_do_not_install_permission_request pins this.