name: multi-modal-diagnose description: Verify multi-modal model state, AutoProcessor availability, and diagnose audio-tool feedback loops in Gemma-3n. Use when the model fails to 'hear' or incorrectly processes auditory sensory context.
Multi-Modal Diagnose
Use this skill when Gemma-3n output indicates it cannot access or reason about audio context.
Baseline Checks
- Confirm
hf_processoris initialized inllama_model_manager.py. - Check logs for "Tool returned raw audio, re-invoking LLM...".
- Verify
inspect_audio_snippetis being called via/llm-statusor debug phase visibility.
Triage Flow
Processor Check:
- Run
/llm-diagnoseand ensure noImportErrororAttributeErrorrelated toAutoProcessor. - Confirm
trust_remote_code=Truewas used during load.
- Run
Tool Execution:
- Submit interaction: "Inspect the last 5 seconds of audio."
- Observe debug phase: Should move through Planning -> Execution.
- Confirm tool output in logs: Should show "Audio Snippet (5.0s) Features: {...}" or raw audio broadcast.
Feedback Loop:
- If
return_raw=True, ensure the second LLM invocation triggers. - Verify input sampling rate matches Gemma expectations (usually 16kHz).
- If
File Touchpoints
llama_model_manager.pyfunctional_agent.pyutils.py(RollingAudioBuffer)
Fix Patterns
- Ensure
torch.is_floating_point(v)is used beforebfloat16conversion. - Verify
apply_chat_templateincludes the follow-up prompt for retrieved audio. - Check
RollingAudioBufferqueue for stale or empty chunks.