multi-modal-diagnose - SKILL.md Agent Skill

name: multi-modal-diagnose description: Verify multi-modal model state, AutoProcessor availability, and diagnose audio-tool feedback loops in Gemma-3n. Use when the model fails to 'hear' or incorrectly processes auditory sensory context.

Use this skill when Gemma-3n output indicates it cannot access or reason about audio context.

Confirm hf_processor is initialized in llama_model_manager.py.
Check logs for "Tool returned raw audio, re-invoking LLM...".
Verify inspect_audio_snippet is being called via /llm-status or debug phase visibility.

Processor Check:
- Run /llm-diagnose and ensure no ImportError or AttributeError related to AutoProcessor.
- Confirm trust_remote_code=True was used during load.
Tool Execution:
- Submit interaction: "Inspect the last 5 seconds of audio."
- Observe debug phase: Should move through Planning -> Execution.
- Confirm tool output in logs: Should show "Audio Snippet (5.0s) Features: {...}" or raw audio broadcast.
Feedback Loop:
- If return_raw=True, ensure the second LLM invocation triggers.
- Verify input sampling rate matches Gemma expectations (usually 16kHz).