yggui-changelog-demo - SKILL.md Agent Skill

name: yggui-changelog-demo description: Capture deterministic proof bundles, screenshots, traces, and curated changelog notes for YggUI app changes.

YggUI Changelog Demo

Use this workflow when a yggui app feature or fix should ship with proof, screenshots, and a curated changelog entry.

Observability note: the terminal attempt ledger, viewport classifier, and app-control terminal-surface helpers now live in crates/yggterm-shell/src/terminal_observe.rs. Do not keep spelunking only in shell.rs when validating or extending proof semantics.

Goals

capture deterministic evidence
produce a reusable proof bundle
draft changelog text from real artifacts
keep documentation and workflow in sync when automation grows

Inputs

a user-visible feature or fix
the relevant macro or app-control path
a target proof bundle id under artifacts/demos/unreleased/

Workflow

Identify the user-visible claim.
Choose or write the deterministic macro path.
Capture:
- screenshots
- optional recording
- app-state snapshot
- event trace / perf evidence
- active_surface_requests when a terminal load/restore claim depends on whether the request is still truthfully in flight
- when terminal open/restore is involved, the exact terminal_open_attempt object, active_terminal_surface, interactive, and terminal_settled_kind
- for terminal geometry bugs, include whether active_terminal_surface.geometry_problem was set
- for input/focus bugs, include dom.active_element, terminal_hosts[].effective_input_focus, terminal_hosts[].helper_textarea_focused, terminal_hosts[].host_has_active_element, and shell.terminal_input_override_active
- for startup input-contract bugs, also include terminal_hosts[].host_stdin_enabled
- for degraded app-control state under live terminal load, include dom.snapshot_mode, dom.degraded_reason, active terminal host geometry, retained replay prompt-follow fields, and viewport-force diagnostics
- for session/view ownership bugs, include session_view_contract_violations and reject the proof unless it is empty
- for session-selection/copy-budget bugs, include generation.copy_generation_start_count, generation.implicit_copy_generation_enabled, and the title/precis/summary in-flight path arrays before and after selection
- for inline rename bugs, include shell.tree_rename_value, dom.tree_rename_input_value, dom.tree_rename_input_focused, dom.tree_rename_input_selection_start, dom.tree_rename_input_selection_end, and any snapshot_mode == "action-fallback" evidence if KDE forced a degraded snapshot
- for titlebar search typing bugs, include shell.search_query, shell.search_focused, dom.active_element.value, dom.titlebar_search_active, and any snapshot_mode == "action-fallback" evidence if KDE forced a degraded snapshot
Create or update the proof bundle:
- manifest.json
- summary.md
- captures/
- trace/
Update CHANGELOG.md with a concise user-facing note.
If new automation or capture powers were required, update:
- docs/demos/ARCHITECTURE.md
- docs/demos/FORMAT.md
- docs/demos/STYLE.md
- this skill file

Standards

Prefer exact screenshots and traces over vague prose.
For terminal restore claims, bind the proof to one attempt id and fail the claim if that attempt latched any failure, even if a later state looks healthy.
For remote careful-restore claims, prove the one-minute boundary and the protected-runtime rule together: a Keep Alive or temporary update-restored runtime that is still running may get a non-destructive resume_recovery/ensure request after the timeout, but the trace must not show force_remote_restart_begin or a duplicate codex resume under the same label unless daemon truth first reports the runtime process gone or the proof explicitly drove a user/harness force restart.
For startup restore work, prove the app did not issue a second reopen of the already-active terminal. One startup mount sequence is correct. A duplicate reopen is a bug, even if a later attempt recovers.
For remote startup restore, the hot path no longer blocks on a separate saved-session existence probe. Expect remote_saved_session_preflight_elided_runtime_launch in the daemon trace, then prove missing-session truth from the runtime launch itself, the attempt ledger, and the overlay excerpt.
For fresh remote full-screen attaches, also capture whether ui/terminal_mount emitted resize_nudge_begin / resize_nudge_end. The nudge is part of the product contract now: it forces a repaint before Yggterm concludes that a live TUI attach is still blank.
Do not treat a visible terminal failure overlay as final proof if shell.terminal_attach_in_flight still contains the active session path. That is an in-flight recovery state, not a finished verdict.
In Terminal mode, saved preview context is no longer accepted as a terminal-ready settle. Expect terminal_settled_kind == "recovering" until the resume chip clears and the live terminal is visually revealed.
If the terminal host already has staged transcript bytes while the resume chip is still up, treat that as recovering, not overlay_context and not interactive.
Do not call a terminal overlay_context just because the host has meaningful text. overlay_context_visible only applies when the saved-context fallback is still the user-visible truth. Terminal-mode recovery should now stay recovering instead.
Codex model-permission setup selectors are interactive terminal surfaces even when they sit mid-screen with many blank rows below the hidden cursor, no attach-ready visual deadline, or only the lower half of the selector visible to the live health tail, including tails that start inside the auto-reviewer line. A proof should show terminal_settled_kind == "interactive", no remote-attention notification, and terminal_hosts[].host_stdin_enabled == true while the selector text is visible.
Codex Conversation interrupted - tell the model what to do differently input surfaces are interactive terminal surfaces when the active remote host is mounted, input-enabled, focused, and the cursor/interrupted line is visible. Do not require a normal › prompt glyph for this state; prove it with terminal_settled_kind == "interactive", no active surface problem, screenshot evidence, and a deterministic non-submitted probe-type echo.
The main viewport should stay available during terminal recovery. Resume progress belongs in notifications/toasts, not as a full-viewport curtain over the host. If a proof screenshot shows the terminal surface replaced by a recovery card, treat that as a UX regression.
For startup timing, prefer the app trace startup/window_spawned event over slower X11 root-tree detection when both are available. Use X11 tree timing only as fallback evidence.
For terminal session-switch bugs, capture both the source and destination terminal surfaces on a second X11 display and verify the destination screenshot text matches the destination active_session_path, not stale text from the previous session.
For KDE/restart lifecycle bugs, include the linux_daemon_sweep trace slice plus any spawned_daemon_child / spawned_daemon_exit or local_spawned_daemon_child / local_spawned_daemon_exit events. The proof should show same-home daemon cleanup only, no cross-home orphan reap, and no lingering temp-home GUI/daemon after the bundle closes. For hot-update daemon herds, the sweep should protect the current preserved PTY owner endpoint and the newest clean preserved-only startup bridge sidecar, then retire older preserved-only sidecars with owned_terminal_session_count == 0; a sidecar must not stay alive solely because another daemon in the same home has recoverable runtime activity, and a daemon owning only runtime keys absent from the current owner registry is a cleanup candidate. A pid named in hot-update-terminal-owners.json is a session-survival root, so exact-key coverage by a newer daemon is not by itself permission to kill that pid. Cleanup client checks must use the candidate daemon's exact client-instance endpoint scope; app-control may scan legacy scopes for handoff discovery, but that broad scan is not a cleanup guard. Startup handoff must reject old owners whose inferred terminal key set includes keys absent from the current owner registry, even when the old owner also has authorized kept sessions.
For small-window chrome or settings-rail UI regressions, resize and scroll through app-control rather than desktop-global automation: server app resize-window --width 520 --height 380, server app panel settings, and server app panel scroll --ratio 1. Pair screenshots with --only-check small_window_chrome, --only-check settings_zoom_input, and, when a terminal is active, --only-check settings_terminal_theme_dropdown so the proof includes visible titlebar bounds, native-free zoom inputs, dropdown scroll-into-view, keyboard filtering, and Enter commit state.
For daemon hot-update, multi-version control, hung-session, or latency incident claims, start the proof with yggterm-headless server monitor --scenario panic-report --expect-path <session-path> --jsonl-out <path>. Include server-list, the matching hot-restart result when lifecycle recovery is used, and a post-restart latency-check --all or wait-session proof so the bundle shows both the incident picture and the recovered server surface. If any target daemon still owns live terminal runtimes, the proof must prefer session survival. A successful handoff should expose daemon_update_state.state == "hot_update_handoff_active", update_priority == "handoff_preserve_sessions", owned_terminal_session_count, preserved_terminal_owner_count > 0, and preserved runtime keys in app-control/server status, while the monitor hot-restart result should show hot_update_handoff == true and fallback_shutdown_skipped == true. A daemon with only preserved-owner entries and owned_terminal_session_count == 0 should be allowed to restart without extending the sidecar chain, but its preserved-owner registry must be retargeted before exit, and startup reconcile should prefer that active/default sidecar over older orphaned PTY owners. If an older daemon owns only runtime keys that are absent from the current preserved-owner registry and persisted live-session state, treat that as a ghost-owned closed session, not a hot-update handoff reason. Preserved-owner entries are not durable session truth: current live-session metadata must authorize them, update-restart state may protect an unkept row only when server status.terminal_session_keys still contains that runtime key, and daemon load/keep/close proofs should show unrepresented entries pruned from hot-update-terminal-owners.json instead of allowing old non-keep-alive sessions to reappear. If handoff cannot be prepared safely, app-control should expose daemon_update_state.state == "hot_update_pending" and update_priority == "defer_update_preserve_sessions", while the monitor result should show fallback_shutdown_skipped == true rather than prepare_update_restart plus shutdown. Treat any forced shutdown of a live PTY owner during update handoff as a failed hot-update proof. For KDE duplicate-icon claims, add yggterm-headless server app desktop-identity so the bundle captures pinned launchers, desktop file fields, live client app ids, and update-handoff env.
For stale multi-version remote runtime incidents, prove that the current client treats a stale daemon with live PTYs as a hot-update owner before stdio attach: include server/remote_runtime hot_update_stale_runtime_owner_begin plus either hot_update_stale_runtime_owner_handoff or an explicit hot_update_stale_runtime_owner_direct_bridge_fallback, server-list or latency-check --all showing both versions, and a final state where the terminal-open attempt either reaches ready on the current daemon or latches a failure with terminal_attach_in_flight and active_surface_requests cleared. A trace that only skips the stale owner and then spawns a duplicate failed resume path is a regression.
For update-restored remote Live Sessions that are no longer really live, capture a fresh remote scan or app state after scan. Unkept temporary update-restore rows must disappear from Live Sessions once the scan reports live_runtime=false, and the trace should include server/remote_machine prune_temporary_stale_live_sessions. Explicit keep-alive rows may remain as recovery targets, but the proof must show live_session_snapshot_debug[].keep_alive == true, clean active_terminal_surface.problem / active_terminal_surface.geometry_problem after settle, and live remote runtime truth. Until all remote hosts are updated, a daemon terminal key shaped as local://<session-id> is compatible live-runtime evidence for the matching remote Codex codex-runtime://<session-id>. Fresh daemon-owned Codex starts can have a synthetic runtime key before Codex creates the actual transcript id; proof must use the snapshot Codex Session + Storage metadata from the PTY process tree's open JSONL fd as the saved-session source of truth while preserving Runtime Session as the terminal I/O key. The settled app-control proof must also show session_view_contract_violations == [], runtime_truth.live_row_count > 0, and a visible Live Sessions row that preserves the codex-runtime://... runtime path rather than rewriting it to local://....
For Live Sessions keep-alive UI changes, prove the kept marker is a fixed left-side status rail and the close affordance remains a separate right-side hit target. Reject proof where live_keep_alive_rect.left varies with the session title width. For remote live sessions, a second row under the remote cwd folder is expected when that row reports live_member = true; live_keep_alive should reflect durability only and may be false on the cwd projection. Local historical transcript rows should still avoid stored-tree duplicates until explicitly opened into a runtime.
For title/summary budget regressions, prove selection did not start LLM work by showing generation.copy_generation_start_count unchanged across the open/select action. Cached copy hydration is allowed; title, precis, or summary generation is not allowed unless the user used an explicit regenerate action.
For stored Codex transcript regressions, prove two separate moments: the cold-start selected row must stay idle with no active_session_path, no matching terminal_open_attempt, and no matching active_surface_requests; then an explicit server app open <path> without --view must promote the row to Terminal, move the resulting LiveLocal runtime under Live Sessions without leaving a duplicate stored-tree row, keep generation.copy_generation_start_count unchanged, and show sidebar row cursors as normal pointer values rather than idle grab/grabbing.
For expandable sidebar hit-zone regressions, prove both actions from the real DOM row: clicking dom.sidebar_visible_rows[*].label_rect selects the group and shows its scoped Startpage with no active terminal input target, while clicking icon_toggle_rect, group_expander_rect, or row_trailing_toggle_rect toggles only group_expanded. Apply the same proof to cwd folders, machine rows, and Live Sessions.
For inline rename regressions, prove the initial title is selected from 0..len(title), each typed prefix survives app-control observation, Ctrl+A selects inside the input rather than the sidebar, and Enter or click-away clears shell.tree_rename_path. If KDE forces dom_debug_snapshot_timeout, the proof may use the action fallback or shell.tree_rename_value, but a state that loses both DOM and shell rename values while shell.tree_rename_path is set is a failed proof.
For titlebar rename regressions, capture dom.titlebar_title_rect and dom.titlebar_summary_title_rect, then prove clicking either the title chip text or the title/summary modal title enters the same focused inline rename contract as the sidebar context menu.
For titlebar search typing regressions, prove the shell query and focused DOM input value advance together. If the app falls back to snapshot_mode == "action-fallback", it must still include the active search input rect and active element value.
When a proof bundle uses server app screenshot on Linux X11, state whether the branch includes the real-window screenshot path. Older WebKit-only captures could miss embedded xterm content and produce false blank-terminal evidence.
For terminal geometry or overdraw bugs, include terminal_hosts[].host_rect, terminal_hosts[].screen_rect, and terminal_hosts[].viewport_rect alongside the screenshot and attempt ledger.
Include terminal_hosts[].host_content_width, host_content_height, host_padding_left_px, host_padding_right_px, host_padding_top_px, and host_padding_bottom_px when the fix uses xterm gutter compensation or any host-content-box adjustment.
For typing/cursor visibility bugs, also include terminal_hosts[].viewport_y and terminal_hosts[].base_y so the proof shows whether the live cursor fell below the visible viewport.
For retained replay or hot-update prompt-follow bugs, include terminal_hosts[].retained_replay_source, retained_replay_expected, retained_replay_prompt_follow_ready, retained_replay_unsafe_skip_prompt_ready, retained_replay_rejected_visible_text, last_retained_replay_follow_debug, scrollback_expected, scrollback_intent, scroll_controller_visible, scroll_controller_distance_rows, last_viewport_force_debug, viewport_y, and base_y. A degraded terminal-fallback state is acceptable only when those fields are still present and the state command stays within the live app-control latency budget. A prompt-ready unsafe skip only proves the skipped cursor-addressed snapshot did not break interactivity; it is not proof that scrollback was preserved unless probe-scroll also moves real viewport/text. Scroll-controller evidence is a YggUI control-surface signal only; it must not be treated as terminal-render proof.
For xterm input-hitbox or overtyping bugs, also include terminal_hosts[].helpers_rect and terminal_hosts[].helper_textarea_rect. A drifted helper textarea is now a classified geometry failure, not a cosmetic quirk.
For terminal input bugs, also prove focus ownership. The good state is an active xterm-helper-textarea inside the active host plus helper_textarea_focused: true and host_has_active_element: true.
Do not treat stale helper focus on an inactive retained host as foreground truth by itself. A different-session host is an identity mismatch only if it still reports host_stdin_enabled / raw_input_enabled or is otherwise the active session host with focused document truth. Proof bundles should include the stale host fields when this class is under investigation.
For remote terminal input bugs, also capture one deterministic server app terminal send ... --data "__SENTINEL__" proof and the matching terminal text sample after settle. When multiple GUI clients exist, first capture server app clients and then target the proof with --pid <pid> so automation cannot bleed into the wrong desktop window. If local input works but remote input does not, inspect server/remote_stdio_bridge events such as bridge_stdin_raw_mode_enable, bridge_stdin_raw_mode_skip, and bridge_stdin_raw_mode_restore in ~/.yggterm/event-trace.jsonl.
For remote live-session latency regressions, inspect whether terminal writes are using the hot local runtime bridge before remote-direct fallback. A live remote session with a mounted local runtime should not spawn a fresh remote yggterm server terminal write --stdin command per character; the proof should include scripts/smoke_ui_latency.py --host <host> --pid <pid> --clear-after and the daemon write-strategy regression test.
server app terminal probe-type and probe-scroll now exercise the active xterm host in the main viewport. probe-type first uses xterm core data injection, then falls back to the mounted input path only if needed, with optional --per-char, --ctrl-c, --tab, and --enter. It reports visible_echo_observed plus timings.visible_echo_ms from the xterm buffer/cursor sample, so canvas-rendered terminals cannot pass by returning empty host.innerText. On 2.1.113+ --per-char dispatches characters without per-character settle sleeps in the fallback path; slow visible echo means the app/input path is slow, not the probe loop.
For latency reports, run scripts/smoke_ui_latency.py --host <host> --pid <pid> --clear-after against the live client or a second-display client. Use --read-only-drawing for live Codex sessions where typing is not acceptable; that mode now records idle render/write churn plus current /proc-delta combined GUI/WebKit CPU and should fail when a readable terminal is still burning CPU or repainting continuously. The smoke fails before typing if the active terminal is still in terminal_attach_in_flight, not rendered, not interactive, missing xterm/viewport evidence, input-disabled, scrollback-locked away from the prompt, reporting the cursor outside the visible viewport, or showing leaked internal transport output such as a prompt-line terminal session not found: local://... in text_tail/buffer_text_sample. Retained replay must also reject an already-visible xterm buffer with internal attach/SSH transport residue and force sanitized daemon replay instead of treating existing scrollback as healthy; if a later clean write repairs a dirty visible buffer, record transport_leak_reset_count as the recovery evidence. --clear-after clears the prompt before and after short marker samples, so marker runs cannot wrap and create false missing-echo failures. The first post-open terminal token is reported as warmup; steady-state samples enforce the stricter visible-echo budget, drift budget, and terminal scroll budget. The proof should include the readiness gate result, state/rows/search/panel timings, terminal warmup or read-only activity rates, terminal steady p50/p95/max/drift, scroll probe result, scrollback intent after wheel release, combined CPU where available, and the active session path.
For typing-fan regressions, run the same latency smoke with a longer sample count, for example --samples 40, and include process_samples, terminal_render_events_per_sample, terminal_write_flushes_per_sample, and terminal_skipped_perf_events. A pass must show bounded visible-echo drift and bounded client/render churn, not only fast first echo.
For resize redraw regressions, include terminal_hosts[].last_fit_guard, last_skipped_fit, xterm_dimensions, fit_overflow_px, and cursor_bottom_overflow_px. Reject proof where a visible host with usable dimensions still reports last_skipped_fit.cause == "host_not_usable" after resize settle.
The default latency-smoke budgets are for live SSH app-control proof: 1200 ms for state/rows/search/panel command round trips, 700 ms for the first terminal warmup visible echo, 500 ms for steady terminal visible echo, and 450 ms for steady terminal visible-echo p95. Tighten the flags for local CI runs.
On direct installs, run terminal probe actions through the public launcher on 2.1.55+ so the headless path can use the real X11 keyboard probe. On 2.1.52-2.1.54, the launcher/headless path cannot dispatch focus, probe-type, probe-scroll, or probe-select, so use the exact active GUI executable from install-state.json for those actions and state that limitation in the proof.
For generic terminal input regressions, raise the proof bar further: use a deterministic non-submitted marker such as server app terminal probe-type --mode xterm --data '__YGGTERM_STREAM_PROOF__' on a fresh second-X11 client and require the resulting screenshot plus state to show the marker echoed in the live runtime prompt, no transcript-resume footer, no USER/ASSISTANT preview artifacts, a visible cursor, and a still-interactive terminal.
Use /status only for a defect specifically about Codex slash-command handling or status-panel rendering. During partial /status typing, Codex may keep updating slash-command suggestions; require the typed prefix, focused helper textarea, enabled input, cursor evidence, and the screenshot instead of quiescent app-state samples.
For prompt/cursor regressions, also run the partial-input loop: type /sta without Enter, capture screenshot + state, then scroll and capture again. Reject the fix if the typed partial input is not visible on cursor_line_text, if the cursor row drifts out of the prompt band near the bottom of the viewport, or if focus/input drops during the scroll step.
server app terminal probe-select drives xterm's pointer selection path against mounted rows and reports the selected excerpt/length/contrast, selection_method, selection_layer_rect_count, and a gesture paint stack. A pass requires non-empty term.getSelection() plus xterm selection-layer rectangles. In canvas/no-row diagnostics it may report selection_method = "buffer_fallback_unverified" with visible text length/excerpt, but that is not selection proof; pair it with app-state low-contrast diagnostics and a screenshot before calling readability fixed.
For terminal-selection hit-test regressions, also capture terminal_hosts[].focus_capture_pointer_events and focus_capture_hit_target_enabled. The focus-capture overlay must stay observer-only with pointer-events: none, and a visible context-menu backdrop must not block primary xterm drag/double-click gestures.
For terminal selection-copy hangs, reject any fix whose xterm embed still calls navigator.clipboard.writeText. Proof should include the focused script/unit guard, a live app-control state response after the copy path is exercised, and terminal_clipboard/selection_copy_queued or selection_copy_owner_updated trace evidence. The copy operation must leave the WebKit render loop responsive even if the desktop clipboard stack is slow.
For browser-selection leak regressions on embedded xterm, also capture terminal_hosts[].xterm_root_user_select, rows_user_select, selection_range_count, selection_layer_count, and selection_layer_rect_count. Reject the fix if the mounted host can still accumulate a browser DOM range selection or if the xterm root/rows stop reporting user-select: none.
scripts/smoke_xterm_embed_faults.py is now the top-level fault-model suite for embedded xterm regressions. Use it when the bug spans multiple symptoms such as cursor drift, invisible text, geometry mismatch, focus/input breakage, scroll failure, and theme/readability regressions at once.
For isolated second-display labs, pass --home /tmp/... to the smoke script and prefix any follow-up server app ... commands with the same YGGTERM_HOME=/tmp/... so the proof does not accidentally target your real desktop client.
For fresh local-terminal regressions, keep one detached second-display proof that uses server app terminal new and reject the fix unless the screenshot shows the prompt in the main viewport within a few seconds, the runtime row appears under the first Live Sessions group with a close affordance, the active host reports non-empty text_sample/text_tail or canvas-mode buffer_text_sample/cursor_line_text, fresh terminals are not marked keep-alive until explicitly toggled, blank Enter does not leave the row spinning, and the same row can enter the rotating busy icon during a foreground command and recover back to plain-terminal once the prompt returns.
On 2.1.93+ server app terminal new must return a non-empty session_path for both local and remote terminal creation. Treat a missing path as an app-control regression because latency, /status, and spawn-timeline probes cannot target the created terminal deterministically.
Codex managed-CLI refresh/update checks must stay out of the foreground terminal new --kind codex path. If a proof shows npm install/managed CLI ensure blocking terminal creation, classify it as a launch-latency regression; the expected foreground event is only a fast managed-CLI launch probe followed by PTY creation.
For local startup-restore regressions, also run scripts/smoke_terminal_local_restart.py (or an equivalent second-display proof) and reject the fix unless the same local session survives app restart, reopens without a blank xterm host, and agrees three ways: active_session_path, the DOM-selected sidebar row, and browser.selected_row must all point at the same session. A stale same-session active_surface_requests entry or nonzero open request id must not keep startup restore permanently stuck in terminal_attach_in_flight; after the recovery window, the app should clear the stale bootstrap lease and retry.
The renderer contract defaults to canvas and treats dom as an explicit opt-out path. For canvas mode, inspect terminal_hosts[].buffer_text_sample and cursor_line_text because .xterm-rows is absent by design; still reject any proof where the screenshot is visually blank, geometry is wrong, or cursor/input evidence is missing. For DOM mode, reject buffered terminal text with xterm_present=true, screen_present=true, rows_present=false, and zero canvas layers; that is dom_renderer_missing_text_layer_with_buffer_text, not a healthy but empty viewport.
On 2.1.93+ active visible Codex and remote output should render through xterm.js, not the low-power text overlay. Treat terminal_hosts[].low_power_tui_overlay_active == true on those active hosts as suspect unless the proof is explicitly about offscreen/replay behavior. For 2.1.166+ plain local full-screen TUI bursts, the active low-power text surface is allowed only while the alternate-screen TUI is running; proof must show low_power_tui_frame_count advancing with readable low_power_tui_text_sample, Codex/remote sessions not using the overlay, and the overlay cleared after exit. A corrupted low-power sample with repeated incremental words such as BBoBooBoot... is a rendering regression, not a valid readiness signal.
The sidebar proof now has an explicit idle contract for local shells: after probe traffic settles back to a prompt, the selected row must recover from the rotating busy icon to the macOS-command plain-terminal icon, even when the active summary is condensed as pi@host$ >.. It also has a scroll-bounds contract: after launch, search, refresh, or expansion shrink, a sidebar whose rows fit must report sidebar_scroll_top == 0, and visible top rows such as Live Sessions must not be clipped above the sidebar frame.
Cursor visibility now has explicit native-cursor evidence too: terminal_hosts[].cursor_sample_rect, cursor_sample_text, cursor_sample_color, cursor_node_rects, and xterm_cursor_hidden. For light-theme terminal readability fixes, reject the proof unless the screenshot itself shows the cursor, cursor_sample_rect is visible while input is enabled, and xterm_cursor_hidden agrees with what the screenshot shows.
Cursor alignment now has explicit native-cursor evidence too: compare terminal_hosts[].cursor_sample_rect against cursor_expected_rect, and use cursor_node_rects as supporting evidence when xterm exposes additional raw cursor DOM spans. Reject the fix if the visible native cursor drifts away from the expected cursor cell.
Codex prompt-band proof is xterm-owned. Reject Yggterm software prompt/cursor overlays, and require the xterm_input_line_decoration_* state to agree with the screenshot: no decoration error, not disposed, and marker line matching the cursor line. In DOM renderer mode also require xterm_input_line_decoration_element_visible == true and xterm_input_line_decoration_render_count > 0; in canvas renderer mode require screenshot/pixel proof because canvas can paint the decoration without an xterm decoration DOM element.
Retained terminal hosts can coexist. Do not assume terminal_hosts[0] is the active terminal. Select the host that matches the active session path and focused input ownership, or use an explicit active-host marker if present.
When xterm emits a very wide raw .xterm-cursor span, do not fail on width alone. Fail only if that wide span is still visually active via background, border, outline, or box-shadow. The native xterm cursor is now the visible cursor contract.
Do not trust the probe-type response by itself. Always pair it with a follow-up server app state and server app screenshot, then judge the bug from the resulting screenshot plus terminal_hosts[].text_sample, terminal_hosts[].text_tail, and in canvas mode terminal_hosts[].buffer_text_sample/cursor_line_text.
For UI-theme or terminal-theme claims, prefer server app theme light|dark --pid <pid> over click-based toggles during proof capture. The resulting app state now exposes settings.theme, settings.terminal_light_theme_name, settings.terminal_dark_theme_name, settings.effective_terminal_theme_name, and the mounted xterm renderer fields terminal_hosts[].xterm_font_family, xterm_font_weight, xterm_font_weight_bold, xterm_line_height, xterm_theme_background, and xterm_theme_foreground. Also inspect the actual rendered row sample fields terminal_hosts[].rows_sample_font_family, rows_sample_font_weight, rows_sample_font_feature_settings, rows_sample_letter_spacing, rows_sample_line_height, rows_sample_color, rows_sample_class_name, rows_sample_style_attr, dim_sample_*, cursor_sample_*, low_contrast_span_count, low_contrast_min_contrast, and low_contrast_span_samples (with the older rows_* fields as fallback), or run scripts/smoke_terminal_theme_ui.py, so the proof covers the actual rendered xterm rows and not just terminal option values. Reject any proof where the sampled row font family is still a single doubly-quoted literal stack, the cursor styling is transparent, visible low-contrast spans remain, or the mounted screen width drifts far from the host viewport.
Still keep one second-X11-display proof in the loop for GUI fixes, but do not rely on flaky xdotool focus alone when the viewport probe can prove the same input path more deterministically.
For startup restore, the healthy recovery state is a visible toast plus host_stdin_enabled == false until the live terminal actually settles interactive.
For fresh Codex startup, the Update Model Permissions selector is an interactive surface when it shows the Default/Auto-review/Full Access options plus the "Press enter to confirm or esc to go back" hint, even if app-control reports many blank rows below the hidden cursor because the selector sits mid-screen. Proof should show terminal_settled_kind == "interactive", no remote-attention timeout notification, and host_stdin_enabled == true for that mounted host instead of treating the menu as stale retained transcript text.
For fresh local or remote Codex startup, use scripts/smoke_codex_launch_timeline.py or server app terminal new --machine-key <machine> --kind codex. Keep its resource baseline enabled; the smoke now records pre-launch resource_timeline.jsonl, resource-relative phase_trace.jsonl, storage preflight output, a per-phase resource summary split by live profile, isolated test profile, SSH, Codex, and WebKit buckets, focuses the owned test window and reclaims terminal focus before capture, drives app-control through the matched yggterm-headless sibling, and rejects prompt-rendered states that are not actually input-ready. Capture sub-1s, 1s, 3s, 5s, ready, and post-30s state/rows/screenshot triples; when reconciling a screenshot, use the post-screenshot state (screenshot_state_*.json), not the screenshot command response. Reject the proof if the visible host is blank, shows local Codex scaffold text, shows a prompt-only remote Codex surface without the welcome frame, has a session-specific Remote Terminal Needs Attention notification, reports readiness after settle with active_terminal_surface.problem set, reports a rendered prompt with runtime_truth.active_host_input_enabled != true, reports a rendered prompt that is focus-gated without effective_terminal_input_focus, reports app-control focus command/state disagreement after considering terminal_hosts[].effective_input_focus and shell.terminal_input_override_active, leaves the generated session id alive on the app host or remote worker host during cleanup, or reports readiness-gated active_terminal_surface.host_stdin_enabled=true before runtime_truth.active_host_ready=true. active_terminal_surface.raw_input_enabled may remain true during startup only to keep terminal-emulator protocol responses flowing back to the PTY, and xterm focus-in/focus-out bytes should be classified as protocol traffic rather than user input. On remote live-user hosts, prefer the default /home/pi/.cache smoke profile/output location over /tmp so low-space temp storage cannot corrupt staged proof runs.
For fresh remote Codex onboarding or permission-setup surfaces, also prove saved-session durability: before Codex Session plus non-empty Storage metadata exists, closing the live runtime must leave no saved remote-session://... row under the machine/cwd tree. If a sidebar row exists with a generated UUID but no storage path, treat it as a phantom-session regression, not a valid saved session.
A local Codex welcome card with a focused blank cursor line can be accepted as a blank-prompt surface when host_stdin_enabled=true, the xterm bridge is connected, and the screenshot shows the cursor. Remote Codex title-card-only or prompt-only surfaces must stay rejected unless the remote welcome/status/prompt frame is complete.
For fan/CPU regressions, run scripts/remote_linux_idle_cpu_smoke.py against an isolated release artifact and keep the default post-state cooldown enabled. The smoke should measure app CPU after app-control has settled, and active/background high-volume TUI proof should show a real TUI frame/drop signal before sampling, post-interrupt drain before background-idle sampling, active_write_frame_budget, effective_terminal_write_frame_ms, frame-like render probes/canvas health sampling, chunked alt-screen read cadence, unfocused local stream cadence, per-thread CPU rows, render-counter deltas, hot_host_health_suppressed_count, and the active plain-local low-power TUI overlay state rather than heredoc echo churn or a GUI/WebKit spin loop. On KDE Wayland+Xwayland machines, the default proof should show GDK_BACKEND=wayland, WINIT_UNIX_BACKEND=wayland, linux_desktop_backend_policy.policy == "kde_wayland_native_default", transparent_window_profile_reason == "kde_wayland_transparent_profile", and YGGTERM_XTERM_CANVAS_POLICY=xterm_canvas_enabled_for_wayland; the vendored Dioxus DMA-BUF workaround may set WEBKIT_DISABLE_DMABUF_RENDERER=1 but must not force GDK_BACKEND=x11 after Yggterm selected Wayland. Canvas mounted in an X11 WebKit child is a renderer-policy mismatch, not an acceptable CPU sample.
Stable-channel idle CPU proof must also include dom.css_running_animation_count. Sidebar/tree busy marks may be visible, but they should be static; if css_running_animation_count stays nonzero after settle without an explicit modal/probe animation, treat it as a GUI/WebKit fan-budget failure.
For active Codex Working/status CPU regressions, also capture terminal_hosts[].last_raw_payload_length and terminal_hosts[].last_coalesced_payload_length. Repeated synchronized ?2026h repaint bursts should either arrive bounded from the Rust bridge or report a much smaller coalesced payload before xterm writes; a 64KB+ repaint with no coalesced-size evidence is a failed resource proof.
After app-control backgrounds a proof window, include shell.app_control_backgrounded in the resource evidence. A selected terminal in that state must not keep active_write_frame_budget=true or user host_stdin_enabled=true; if compositor focus truth lags, the app-control background flag is the deterministic contract for the low-power path.
For canvas-mode idle CPU claims, include terminal_hosts[].visible_canvas_layer_count, hidden_canvas_layer_count, software_canvas_layer_optimization_active, software_canvas_cursor_overlay_present, software_canvas_cursor_overlay_visible, and the Codex prompt-band xterm_input_line_decoration_* diagnostic fields. Idle canvas mode may hide inactive selection/link layers, but the xterm cursor layer stays xterm-owned, live Codex/remote sessions must not use Yggterm software prompt or cursor overlays, and release proof should show xterm_input_line_decoration_present == false unless the run explicitly opts into YGGTERM_ALLOW_XTERM_INPUT_LINE_DECORATION=1 for diagnostic comparison. The Codex prompt band, typed input, cursor, status panel, resize redraw, and Working/status animation must be painted by PTY bytes and xterm cells, not by a Yggterm overlay or a release-visible decoration layer. Reject proof where the terminal host is missing, required canvas policy is inactive, prompt/cursor software overlays are visible, typed-input glyph pixel density is below the smoke threshold, the xterm decoration unexpectedly appears, or unexpected full-viewport layer churn remains. App-control cheap snapshots must expose these counters without canvas pixel reads; a dom_debug_snapshot_timeout from diagnostics is not a valid CPU proof.
If a remote terminal drops back into retry/recovery after a bad intermediate surface, the resume toast should stay visible until the session reaches the real visual reveal again. Do not accept a run where the toast disappears while host_stdin_enabled == false, terminal_settled_kind != "interactive", or the terminal request is still truthfully recovering.
If a remote resume times out, the attention toast may remain as user-facing error state, but the open-attempt ledger must move to failed and the matching terminal_attach_in_flight, bootstrap lease, and terminal surface request must clear. Reject proof where a no-progress loading toast stays in active_surface_requests indefinitely or drives high idle render counts.
For terminal-resume toast regressions, also verify the inverse case: once there are no visible notifications, the screenshot should not show an empty blurred/white toast shell still hanging under the titlebar. Capture both notifications_count and a screenshot of the same moment.
Treat /bin/bash: line 1: exec: __yggterm_initial_tty_size=...: not found as a remote startup transport regression, not as user shell output. The proof should include the open-attempt ledger staying non-interactive before the fix and a fresh remote startup restore that becomes host_stdin_enabled == true after the command wrapper is corrected.
Treat any non-null active_terminal_surface.geometry_problem as a failed terminal proof, even if the surface otherwise looks rendered.
Treat non-null active_terminal_surface.performance_problem as performance evidence, not a readiness failure by itself. It must not be used to justify disabled input unless problem/geometry_problem is also non-null. For Codex activity spinner proof, require the sidebar row busy hint to come from mounted xterm activity while hot frame text remains excluded from title/detail sampling.
Exception: the stable retained-xterm layout may present screen_rect/helpers_rect about 16px narrower than host_rect while viewport_rect still matches the host. That compensated gap is now accepted and should not be treated as a failed proof by itself.
For startup latency claims, include whether the daemon emitted daemon/startup_prewarm begin|end|error for the active terminal. Startup restore should now be prewarmed after the control socket binds instead of waiting for the first UI mount to pay the whole cost.
For remote terminal startup restore, also capture whether the initial attach stream included __YGGTERM_ATTACH_READY__. That server marker now means the PTY attach itself is live even when Codex is sitting on low-signal idle/footer chrome.
Once __YGGTERM_ATTACH_READY__ has arrived, a quiet attached terminal is allowed to settle after the reveal grace deadline only when the retained host surface is prompt-ready. Retained non-prompt text from a previous Codex answer is stale evidence: it may remain visible, but it must not clear the resume toast, mark the attempt interactive, or enable input.
For loading-truth bugs, capture one state while active_surface_requests still contains the terminal request and one after settle so the bundle shows that the UI did not silently drop the request before attach finished.
Keep changelog language user-visible and concise.
Treat demo assets as release material, not disposable debugging leftovers.
When a result is not live-verified, say so explicitly.