name: self-diagnostic description: Deep pipeline diagnostic — every model, system specs, audio config, speaker state. Flags missing or broken components. Use for diagnose, debug, or health check. metadata: author: fae version: "2.0" tags: - system - diagnostics - debug
Self-Diagnostic (Deep Pipeline Report)
You are running a comprehensive self-diagnostic. Work through EVERY section methodically. Report findings clearly in spoken language. Be specific about what IS loaded, what's MISSING, and what's BROKEN.
1. System Hardware
Use bash to gather machine specs:
bash "sysctl -n hw.model hw.memsize machdep.cpu.brand_string 2>/dev/null; echo '---'; system_profiler SPHardwareDataType 2>/dev/null | grep -E 'Model|Chip|Memory|Cores'"
Report:
- Mac model (e.g. Mac Studio M2 Max)
- Chip (M1/M2/M3/M4, Pro/Max/Ultra)
- Total RAM in GB (this determines model selection)
- GPU cores (affects inference speed)
bash "top -l 1 -n 0 | head -12; echo '---'; vm_stat | head -5"
Report: Current memory pressure, CPU load, swap usage. Flag if memory pressure is critical.
bash "df -h / | tail -1"
Report: Disk space. Flag if below 10 GB free (model downloads need space).
2. Full Model Pipeline — CRITICAL SECTION
This is the most important part. List EVERY model in the pipeline with its load status.
2a. Speech-to-Text (STT)
- Model: Qwen3-ASR-1.7B (MLX 4-bit)
- Purpose: Converts speech audio to text
- Location: Downloaded to
~/Library/Caches/fae/by MLX on first launch - Check: Am I understanding speech? If transcription is working, STT is loaded.
- If STT failed: I can only accept text input, not voice.
2b. Large Language Model (LLM) — the brain
- Auto-selection based on RAM:
- ≥64 GB → Qwen3.5-35B-A3B MoE (128K context)
- ≥32 GB → Qwen3.5-35B-A3B MoE (32K context)
- ≥24 GB → Qwen3.5-9B (32K context)
- ≥16 GB → Qwen3.5-4B (32K context)
- <16 GB → Qwen3.5-2B OptiQ (32K context)
- Purpose: Conversation, reasoning, tool use
- Check: If I can respond to questions, LLM is loaded. Report which model.
- This is the CRITICAL engine — if it fails, nothing works.
2c. Text-to-Speech (TTS)
- Model: Kokoro-82M (KokoroSwift/MLX, float32)
- Voice: Pre-computed voice embeddings (fae.bin), 24 kHz output
- Purpose: Converts text responses to spoken audio
- Check: Am I speaking out loud? If yes, TTS is loaded.
- If TTS failed: I can only show text responses.
2d. Speaker Encoder (Voice Identity) — OFTEN BROKEN
- Primary: WeSpeaker ResNet34-LM (Core ML, 256-dim embeddings)
- Legacy fallback: ECAPA-TDNN (Core ML, 1024-dim)
- Emergency fallback: Mel-spectral statistics (640-dim) — DEGRADED, cannot distinguish speakers
- Purpose: Identifies WHO is speaking (owner vs stranger vs Fae echo)
- Model file:
wespeaker.mlmodelcin Resources/Models/SpeakerEncoder/
bash "ls -la ~/Library/Application\\ Support/fae/speakers.json 2>/dev/null && python3 -c \"import json; d=json.load(open('$(echo ~/Library/Application\\ Support/fae/speakers.json)')); [print(f'{p[\\\"label\\\"]}: role={p[\\\"role\\\"]}, embeddings={len(p[\\\"embeddings\\\"])}, centroid_dim={len(p[\\\"centroid\\\"])}') for p in d]\" 2>/dev/null || echo 'No speaker profiles found'"
Check and report:
- Which encoder loaded? (WeSpeaker 256-dim = GOOD, mel-spectral 640-dim = BAD)
- Owner profile centroid dimension — does it match the loaded encoder?
- If dimensions mismatch: "Voice identity is broken — need to re-enroll"
- If mel-spectral fallback: "WeSpeaker model missing or failed to load — voice identity degraded, Fae responds to everyone"
- If no owner profile: "No voice enrolled — use the enrollment banner"
2e. Vision Language Model (VLM)
- ≥32 GB RAM: Shares Qwen3.5-35B-A3B (same as LLM)
- 16 GB RAM: Separate Qwen3-VL-4B
- Purpose: Camera observations, screenshot analysis
- Loaded on-demand (first camera/screenshot use)
- Check: Have I successfully analyzed any camera or screenshot images?
2f. Keyword Classifier (Barge-in)
- Model: 1D-CNN (~200K params, MLX float32)
- Purpose: Detects interrupt keywords during playback (5 classes: interrupt/wake/speech/silence/noise)
- Check: Can you interrupt me while I'm speaking? If yes, keyword detector works.
2g. Embedding Engine (Memory Search)
- Model: Hash-384 (MLX)
- Purpose: Semantic memory search (ANN vector similarity)
- Check: Can I recall relevant memories? If yes, embedding engine works.
2h. SileroVAD (Voice Activity Detection)
- Model: Silero VAD (ONNX)
- Purpose: Detects speech onset and offset in audio stream
- Check: Do I respond when you speak? If yes, VAD is working.
2i. Apple SoundAnalysis Classifier
- Model: Apple built-in (303 categories)
- Purpose: Rejects music, TV, and non-speech audio before speaker verification
- Check: Does Fae ignore TV/music in the background? If yes, classifier is active.
3. Audio Pipeline Configuration
bash "system_profiler SPAudioDataType 2>/dev/null | head -20"
Report:
- Input device (which microphone)
- Sample rate (should be 16 kHz after downsampling)
Check and report:
- Apple Voice Processing: Enabled or disabled? (Release builds = ON for noise suppression + AGC + AEC)
- macOS Voice Isolation: Is system mic mode set to "Voice Isolation"? (User controls via Control Center)
- Software noise gate: Active at 0.008 RMS floor
- Echo suppressor: Time-based (800ms) + text-overlap + fae_self voiceprint
- Voiced detection threshold: Floor at 0.008 (was 0.02 — check version)
4. Speaker Profile Status
Use voice_identity check_status to review:
- Is a primary user enrolled?
- How many profiles exist? (owner, fae_self, guests)
- Embedding dimensions for each profile
- Consistency score
- Any dimension mismatches with current encoder?
5. Scheduler & Awareness
scheduler_list
Report:
- Are camera presence checks running?
- Are screen activity checks running?
- Is overnight research enabled?
- Is enhanced morning briefing enabled?
- Any tasks in error state?
6. Memory Health
Report from context:
- Approximate record count
- Last digest time
- Entity graph status (persons/orgs/locations)
- Is embedding backfill complete?
7. Security & Permissions
bash "tail -10 ~/Library/Application\\ Support/fae/security-events.jsonl 2>/dev/null || echo 'No security log'"
Report: Recent security events. Flag any errors.
Check macOS permissions:
- Microphone access
- Camera access
- Screen Recording
- Accessibility (for computer use tools)
8. Summary Report
Structure your spoken summary as:
"Here's my full diagnostic report:"
- Machine: [model, chip, RAM]
- Models loaded: List each with status (✓ loaded / ⚠ fallback / ✗ missing)
- STT: [status]
- LLM: [model name, context size]
- TTS: [status, voice]
- Speaker: [encoder type, dimension] — flag if mel-spectral
- VLM: [status]
- VAD: [status]
- Keyword: [status]
- Embedding: [status]
- Audio: [VP on/off, Voice Isolation on/off, noise gate active]
- Voice identity: [owner enrolled, profile health, dimension match]
- Issues found: List each with severity and fix
- Recommendations: What the user should do
Common Fixes to Suggest
| Issue | Fix |
|---|---|
| Mel-spectral fallback | "WeSpeaker model not loading — update to latest version or reinstall" |
| Dimension mismatch | "Re-enroll your voice — tap the enrollment banner" |
| No owner profile | "I need to learn your voice — tap 'Let me get to know you'" |
| Low disk space | "Free up disk space — I need room for model downloads" |
| Voice Isolation off | "Switch to Voice Isolation in Control Center for cleaner audio" |
| VP disabled (dev mode) | "You're in dev mode — Voice Processing is disabled for testing" |
| Stale scheduler tasks | "Some background tasks haven't run — try restarting me" |
End with: "That's the full picture. Let me know if you want me to dig deeper into anything."