path-2-async - SKILL.md Agent Skill

name: path-2-async description: Path 2 — the check-in async-reflection implementation, ONE pattern inside cost-tier Path 2 (orb States 2/3, exactly one voice half on). Covers daily check-ins, single-utterance commands, transcript-only voice input, and any pre-recorded prompt + reply loop. Composition is MP3 prompt (or Sonic REST) → user reply → Soniox (STT) → callLLM → LLM → Sonic API (TTS) → user. Side effects via ActionDispatcher → DataService → Supabase. Auto-invoked when working on morning/evening check-ins, VoiceCheckInOverlay, EditHabitPage / useAdvancedPath (advanced-flow voice input), OnboardingLayout shared mic, useVoiceChat / useVoiceCommand / useVoiceInput, /api/stt, /api/cartesia-tts, /api/process-command (legacy), action-dispatcher, voice-command-system prompt, useVoicePlayer, MP3 voice manifest, or affirmation playback. NOT for onboarding (path-1-vapi) or pure text chat (path-3-direct-llm). user-invocable: false

Path 2 — Async Reflection (Daily Check-ins)

⚠️ Stale — partial update pending (Phase 3). The legacy frontend NLU engine this skill describes as "today" has been deleted: useVoiceCommand (localParse), action-dispatcher.ts, dispatcher-config.ts, voice-command-system.ts, prompt-config.ts, and scripts/validate-prompt.ts no longer exist. The home check-in now runs on the CoachChat / callLLM (/api/llm) path. The ## Migration posture / "today" tables below still name those files — treat them as historical until the full rewrite (handoff Task 2). Now retired: the backend /api/process-command (and api/_lib/date-parser.ts) have been deleted — onboarding voice runs entirely on the Vapi-tool + Direct-LLM tool path.

Scope. Cost-tier Path 2 is any state with exactly one voice half on (orb State 2 = AI speaks one-way; State 3 = mic-in, text reply). This skill documents ONE pattern inside that tier — the check-in async-reflection loop. The other Path-2 surfaces (one-way TTS, mic-only voice input on CHAT, etc.) share the Direct-LLM implementation in path-3-direct-llm. See voice-architecture/paths.md for the full state→path table.

Asynchronous voice composition. The user hears a prompt (pre-recorded MP3 if available, otherwise live Sonic REST), speaks a reply, the reply is transcribed by Soniox, run through callLLM(), and the LLM's response is spoken back via Sonic API. A check-in is a turn-based string of State 2 (prompt) then State 3 (reply) — never both halves live at once, so never Vapi. Multi-turn interruption-aware dialogue is Path 1's job.

User → Frontend → MP3 prompt (plays to user)
                → user speaks reply
                → Soniox (STT)
                → callLLM()
                → LLM
                → Sonic API (TTS)
                → User

Side effects: callLLM result → ActionDispatcher → DataService → Supabase → UI updates

Reference files

composition.md — the MP3 → mic → Soniox → callLLM → Sonic loop in detail, plus how to skip pieces (transcript-only, broadcast-only, etc.)
surfaces.md — every Path 2 surface (target + today's overlap with the legacy single-utterance pipeline)
current-cartesia-rest.md — what's wired today (/api/stt, /api/cartesia-tts, /api/process-command GPT-4o-mini NLU, ActionDispatcher) — preserve while reading existing code, do not extend

Migration posture

	Today	Target
Prompt audio	Cartesia Sonic REST live, every time	Pre-recorded MP3 from Supabase Storage when text is fixed; Sonic REST when text is dynamic
STT	`/api/stt` (Soniox async REST wrapper)	Same — Soniox endpoint, stable
Intent / response	`/api/process-command` (GPT-4o-mini NLU, single-action intent)	`callLLM()` — same model family or Anthropic, with `screen_contexts` + `session_log` delta prepended
TTS	`/api/cartesia-tts` (Sonic REST wrapper)	Same — Sonic REST endpoint, possibly renamed
Side effects	`ActionDispatcher` in browser → `DataService` → Supabase	Same shape; may relocate dispatcher server-side as part of `callLLM()`
Local fallback	Regex `localParse()` in `useVoiceCommand`	Decision pending — keep for offline, or drop in favor of "tap to retry" UX
Confidence calibration	0.9 silent / 0.7 toast / 0.5 confirm / <0.5 reject	Carry forward as either a callLLM contract or a wrapping dispatcher check

What survives:

The 8 ActionDispatcher operations (CRUD + query + reflect).
Confidence tiers + UX rules per tier.
Voice-command system prompt's intent shape ({ action, entity, params, confidence }).
30 voice-command test scenarios (scripts/validate-prompt.ts).
Cartesia voice (same voice_id as Path 1).

What evolves:

Single-utterance NLU folds into general callLLM (with screen_contexts ctx instead of a hardcoded prompt).
Pre-recorded MP3 + Sonic REST coexist; the "is the text fixed?" decision picks between them per screen.

Surfaces (target state)

Surface	Composition	Notes
Morning check-in	MP3 prompt → mic → Soniox → callLLM → Sonic	Replaces today's Line session for `metadata.screen='morning'`
Evening check-in	MP3 prompt → mic → Soniox → callLLM → Sonic	Replaces today's Line session for `metadata.screen='evening'`
Home voice check-in (single utterance)	(no MP3) → mic → Soniox → callLLM → Sonic	Today's voice-command pipeline lives here
Edit/add-habit voice input (advanced flow)	mic → Soniox → (no LLM by default) → no TTS	Transcript-only — drops into form field
SPLASH-01 / PREF-01 / MIC-01 / POST-AUTH-01	MP3 (when generated) → no mic → no LLM → no TTS	One-way broadcast. Today: live Sonic REST one-shot until MP3s exist.
Affirmation playback	text → Sonic REST → playback	Dynamic text, one-way

See surfaces.md for component → file mapping.

Composition rules

Path 2 is a construction kit, not a fixed pipeline. Compose only the pieces a surface needs:

One-way broadcast (splash, affirmation): MP3 or Sonic REST → playback. Skip mic, Soniox, callLLM.
Transcript-only (journal voice input, feedback): mic → Soniox → drop transcript into form. Skip callLLM, Sonic.
Single-utterance command (home check-in voice command): mic → Soniox → callLLM → ActionDispatcher → optional Sonic. Skip MP3.
Full async loop (morning/evening check-ins): MP3 prompt → mic → Soniox → callLLM → Sonic → playback.

The diagram in composition.md shows all four shapes side by side.

Why not Path 1 for check-ins?

The diagram says morning/evening check-ins were intended for Vapi-style realtime, but the new design moves them to Path 2:

Check-ins are single-prompt, single-reply in shape — the user doesn't need bidirectional turn-taking.
Vapi session-minutes are billed for the whole open window. A 30-second async loop is cheaper as Soniox + Sonic than as a Vapi session.
Pre-recorded MP3 prompts give near-zero first-byte latency for the fixed opening line.
callLLM + screen_contexts + session_log gives the same "AI knows where the user is and what they did" as Path 1, without the realtime infra.

If a check-in genuinely needs multi-turn dialogue (the user wants to keep talking, the assistant should ask follow-ups), it's no longer Path 2 — it becomes Path 1 (Vapi). But the MVP target is single-prompt single-reply.

Side effects — ActionDispatcher

The ActionDispatcher is the keystone for Path 2's CRUD. It maps a parsed intent ({ action, entity, params, confidence }) to a DataService method call and writes to Supabase. Today it runs in the browser; under callLLM it may relocate server-side.

Confidence calibration tiers (preserve through migration):

0.9+ — execute silently
0.7–0.89 — execute with toast
0.5–0.69 — confirm before executing
<0.5 — reject

Naming trap — read this

api/stt.ts and api/cartesia-tts.ts sound like they belong to Path 1 (the Cartesia Line agent). They don't. They're plain REST endpoints owned by Path 2. The legacy Line agent does its own STT and TTS internally over its WebSocket — it never hits /api/stt or /api/cartesia-tts.

If you're tracing a request to those endpoints, you're in Path 2.

(After the Vapi migration, these endpoints may be renamed — but the trap is worth flagging in the meantime.)

When you're editing Path 2 code

MP3 generation → asset pipeline + Supabase Storage; voice-manifest.json lookup; useVoicePlayer.
STT changes → api/stt.ts + src/lib/services/stt-service.ts.
TTS changes → api/cartesia-tts.ts + src/lib/services/tts-service.ts.
Intent / response logic (target) → callLLM() + screen_contexts row for the surface.
Intent / response logic (today) → api/process-command.ts + src/lib/prompts/voice-command-system.ts.
CRUD writes → src/lib/services/action-dispatcher.ts + src/lib/config/dispatcher-config.ts.
Confidence tiers → src/lib/config/voice.ts + dispatcher consumers.
Surface wiring → the hook for that surface (see surfaces.md).