path-2-async

star 0

Path 2 — the check-in async-reflection implementation, ONE pattern inside cost-tier Path 2 (orb States 2/3, exactly one voice half on). Covers daily check-ins, single-utterance commands, transcript-only voice input, and any pre-recorded prompt + reply loop. Composition is MP3 prompt (or Sonic REST) → user reply → Soniox (STT) → callLLM → LLM → Sonic API (TTS) → user. Side effects via ActionDispatcher → DataService → Supabase. Auto-invoked when working on morning/evening check-ins, VoiceCheckInOverlay, EditHabitPage / useAdvancedPath (advanced-flow voice input), OnboardingLayout shared mic, useVoiceChat / useVoiceCommand / useVoiceInput, /api/stt, /api/cartesia-tts, /api/process-command (legacy), action-dispatcher, voice-command-system prompt, useVoicePlayer, MP3 voice manifest, or affirmation playback. NOT for onboarding (path-1-vapi) or pure text chat (path-3-direct-llm).

guidedgrowthplatform By guidedgrowthplatform schedule Updated 6/2/2026

name: path-2-async description: Path 2 — the check-in async-reflection implementation, ONE pattern inside cost-tier Path 2 (orb States 2/3, exactly one voice half on). Covers daily check-ins, single-utterance commands, transcript-only voice input, and any pre-recorded prompt + reply loop. Composition is MP3 prompt (or Sonic REST) → user reply → Soniox (STT) → callLLM → LLM → Sonic API (TTS) → user. Side effects via ActionDispatcher → DataService → Supabase. Auto-invoked when working on morning/evening check-ins, VoiceCheckInOverlay, EditHabitPage / useAdvancedPath (advanced-flow voice input), OnboardingLayout shared mic, useVoiceChat / useVoiceCommand / useVoiceInput, /api/stt, /api/cartesia-tts, /api/process-command (legacy), action-dispatcher, voice-command-system prompt, useVoicePlayer, MP3 voice manifest, or affirmation playback. NOT for onboarding (path-1-vapi) or pure text chat (path-3-direct-llm). user-invocable: false

Path 2 — Async Reflection (Daily Check-ins)

⚠️ Stale — partial update pending (Phase 3). The legacy frontend NLU engine this skill describes as "today" has been deleted: useVoiceCommand (localParse), action-dispatcher.ts, dispatcher-config.ts, voice-command-system.ts, prompt-config.ts, and scripts/validate-prompt.ts no longer exist. The home check-in now runs on the CoachChat / callLLM (/api/llm) path. The ## Migration posture / "today" tables below still name those files — treat them as historical until the full rewrite (handoff Task 2). Now retired: the backend /api/process-command (and api/_lib/date-parser.ts) have been deleted — onboarding voice runs entirely on the Vapi-tool + Direct-LLM tool path.

Scope. Cost-tier Path 2 is any state with exactly one voice half on (orb State 2 = AI speaks one-way; State 3 = mic-in, text reply). This skill documents ONE pattern inside that tier — the check-in async-reflection loop. The other Path-2 surfaces (one-way TTS, mic-only voice input on CHAT, etc.) share the Direct-LLM implementation in path-3-direct-llm. See voice-architecture/paths.md for the full state→path table.

Asynchronous voice composition. The user hears a prompt (pre-recorded MP3 if available, otherwise live Sonic REST), speaks a reply, the reply is transcribed by Soniox, run through callLLM(), and the LLM's response is spoken back via Sonic API. A check-in is a turn-based string of State 2 (prompt) then State 3 (reply) — never both halves live at once, so never Vapi. Multi-turn interruption-aware dialogue is Path 1's job.

User → Frontend → MP3 prompt (plays to user)
                → user speaks reply
                → Soniox (STT)
                → callLLM()
                → LLM
                → Sonic API (TTS)
                → User

Side effects: callLLM result → ActionDispatcher → DataService → Supabase → UI updates

Reference files

  • composition.md — the MP3 → mic → Soniox → callLLM → Sonic loop in detail, plus how to skip pieces (transcript-only, broadcast-only, etc.)
  • surfaces.md — every Path 2 surface (target + today's overlap with the legacy single-utterance pipeline)
  • current-cartesia-rest.md — what's wired today (/api/stt, /api/cartesia-tts, /api/process-command GPT-4o-mini NLU, ActionDispatcher) — preserve while reading existing code, do not extend

Migration posture

Today Target
Prompt audio Cartesia Sonic REST live, every time Pre-recorded MP3 from Supabase Storage when text is fixed; Sonic REST when text is dynamic
STT /api/stt (Soniox async REST wrapper) Same — Soniox endpoint, stable
Intent / response /api/process-command (GPT-4o-mini NLU, single-action intent) callLLM() — same model family or Anthropic, with screen_contexts + session_log delta prepended
TTS /api/cartesia-tts (Sonic REST wrapper) Same — Sonic REST endpoint, possibly renamed
Side effects ActionDispatcher in browser → DataService → Supabase Same shape; may relocate dispatcher server-side as part of callLLM()
Local fallback Regex localParse() in useVoiceCommand Decision pending — keep for offline, or drop in favor of "tap to retry" UX
Confidence calibration 0.9 silent / 0.7 toast / 0.5 confirm / <0.5 reject Carry forward as either a callLLM contract or a wrapping dispatcher check

What survives:

  • The 8 ActionDispatcher operations (CRUD + query + reflect).
  • Confidence tiers + UX rules per tier.
  • Voice-command system prompt's intent shape ({ action, entity, params, confidence }).
  • 30 voice-command test scenarios (scripts/validate-prompt.ts).
  • Cartesia voice (same voice_id as Path 1).

What evolves:

  • Single-utterance NLU folds into general callLLM (with screen_contexts ctx instead of a hardcoded prompt).
  • Pre-recorded MP3 + Sonic REST coexist; the "is the text fixed?" decision picks between them per screen.

Surfaces (target state)

Surface Composition Notes
Morning check-in MP3 prompt → mic → Soniox → callLLM → Sonic Replaces today's Line session for metadata.screen='morning'
Evening check-in MP3 prompt → mic → Soniox → callLLM → Sonic Replaces today's Line session for metadata.screen='evening'
Home voice check-in (single utterance) (no MP3) → mic → Soniox → callLLM → Sonic Today's voice-command pipeline lives here
Edit/add-habit voice input (advanced flow) mic → Soniox → (no LLM by default) → no TTS Transcript-only — drops into form field
SPLASH-01 / PREF-01 / MIC-01 / POST-AUTH-01 MP3 (when generated) → no mic → no LLM → no TTS One-way broadcast. Today: live Sonic REST one-shot until MP3s exist.
Affirmation playback text → Sonic REST → playback Dynamic text, one-way

See surfaces.md for component → file mapping.

Composition rules

Path 2 is a construction kit, not a fixed pipeline. Compose only the pieces a surface needs:

  • One-way broadcast (splash, affirmation): MP3 or Sonic REST → playback. Skip mic, Soniox, callLLM.
  • Transcript-only (journal voice input, feedback): mic → Soniox → drop transcript into form. Skip callLLM, Sonic.
  • Single-utterance command (home check-in voice command): mic → Soniox → callLLM → ActionDispatcher → optional Sonic. Skip MP3.
  • Full async loop (morning/evening check-ins): MP3 prompt → mic → Soniox → callLLM → Sonic → playback.

The diagram in composition.md shows all four shapes side by side.

Why not Path 1 for check-ins?

The diagram says morning/evening check-ins were intended for Vapi-style realtime, but the new design moves them to Path 2:

  • Check-ins are single-prompt, single-reply in shape — the user doesn't need bidirectional turn-taking.
  • Vapi session-minutes are billed for the whole open window. A 30-second async loop is cheaper as Soniox + Sonic than as a Vapi session.
  • Pre-recorded MP3 prompts give near-zero first-byte latency for the fixed opening line.
  • callLLM + screen_contexts + session_log gives the same "AI knows where the user is and what they did" as Path 1, without the realtime infra.

If a check-in genuinely needs multi-turn dialogue (the user wants to keep talking, the assistant should ask follow-ups), it's no longer Path 2 — it becomes Path 1 (Vapi). But the MVP target is single-prompt single-reply.

Side effects — ActionDispatcher

The ActionDispatcher is the keystone for Path 2's CRUD. It maps a parsed intent ({ action, entity, params, confidence }) to a DataService method call and writes to Supabase. Today it runs in the browser; under callLLM it may relocate server-side.

Operations: create | complete | delete | query | log | update | suggest | reflect. Entities: habit | metric | journal | summary.

Confidence calibration tiers (preserve through migration):

  • 0.9+ — execute silently
  • 0.7–0.89 — execute with toast
  • 0.5–0.69 — confirm before executing
  • <0.5 — reject

Naming trap — read this

api/stt.ts and api/cartesia-tts.ts sound like they belong to Path 1 (the Cartesia Line agent). They don't. They're plain REST endpoints owned by Path 2. The legacy Line agent does its own STT and TTS internally over its WebSocket — it never hits /api/stt or /api/cartesia-tts.

If you're tracing a request to those endpoints, you're in Path 2.

(After the Vapi migration, these endpoints may be renamed — but the trap is worth flagging in the meantime.)

When you're editing Path 2 code

  • MP3 generation → asset pipeline + Supabase Storage; voice-manifest.json lookup; useVoicePlayer.
  • STT changesapi/stt.ts + src/lib/services/stt-service.ts.
  • TTS changesapi/cartesia-tts.ts + src/lib/services/tts-service.ts.
  • Intent / response logic (target)callLLM() + screen_contexts row for the surface.
  • Intent / response logic (today)api/process-command.ts + src/lib/prompts/voice-command-system.ts.
  • CRUD writessrc/lib/services/action-dispatcher.ts + src/lib/config/dispatcher-config.ts.
  • Confidence tierssrc/lib/config/voice.ts + dispatcher consumers.
  • Surface wiring → the hook for that surface (see surfaces.md).
Install via CLI
npx skills add https://github.com/guidedgrowthplatform/guided-growth-mvp --skill path-2-async
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
guidedgrowthplatform
guidedgrowthplatform Explore all skills →