ai-provider-debug - SKILL.md Agent Skill

name: ai-provider-debug description: Debug and troubleshoot AI/LLM provider integrations in ettametta. Use when investigating API failures, rate limits, fallback chain issues, model selection problems, or cost anomalies across 17+ providers.

AI Provider Debugging

Quick Diagnostics

curl http://localhost:8000/api/v1/llm/providers
curl http://localhost:8000/api/v1/llm/models
curl -X POST http://localhost:8000/api/v1/llm/reset-circuits

Provider Registry

Provider	SDK	Primary Use
Groq	groq SDK	Default LLM, fast inference
OpenAI	openai SDK	High-reasoning (GPT-4o)
Anthropic	anthropic SDK	Premium reasoning (Claude 3.5)
Google Gemini	google.genai	Cost-effective medium tasks
Ollama	OpenAI-compat	Zero-cost local inference
xAI (Grok)	openai (custom URL)	Grok-2 reasoning
DeepSeek	openai (custom URL)	DeepSeek-Chat/Coder
Mistral	openai (custom URL)	Mistral-Large
Cohere	cohere SDK	Command-R-Plus
Cerebras	openai (custom URL)	Llama-3.3-70b (30 RPM free)
Cloudflare	REST dict	Llama-3.1-70b
Hugging Face	REST dict	Llama-3.3-70B
OpenRouter	openai (custom URL)	Multi-model gateway
NVIDIA NIM	openai (custom URL)	Llama-3.3-70b
SiliconFlow	openai (custom URL)	Qwen2.5-72B
Ollama Cloud	openai (custom URL)	Qwen2.5:72b
Dify	Custom client	Orchestrator/workflow
vLLM	REST (OpenAI-compat)	High-throughput inference

Three Orchestration Layers

1. IntelligenceHub (src/services/llm/intelligence_hub.py)

Complexity routing: low -> ollama, medium -> gemini/groq, high -> openai
Circuit breaker per engine (5 failures -> open, 60s recovery)
Auto-heal: rate limits after 10 min, 3+ errors -> degraded

2. UnifiedLLMService (src/services/llm/service.py)

7 providers, tries requested then iterates all
tenacity retries: 3 attempts, exponential backoff 1-10s

3. BaseEttamettaAgent (src/services/base_agent.py)

17 providers, fixed fallback: primary -> ollama -> xai -> deepseek -> cerebras -> groq -> openai -> openrouter -> mistral -> siliconflow -> nvidia -> gemini -> anthropic

Fallback Chains

VLM (Vision): Groq Vision -> Local Moondream2 -> Gemini 1.5 Flash -> heuristic
CrewAI: Groq (llama-3.3-70b-versatile) -> OpenAI (gpt-4o-mini) -> Ollama
Ollama self-failover: primary URL -> localhost:11434

Key Files

File	Purpose
src/services/llm/service.py	UnifiedLLMService (7 providers)
src/services/llm/intelligence_hub.py	IntelligenceHub (complexity routing)
src/services/llm/dify_client.py	DifyClient with retries
src/services/base_agent.py	BaseEttamettaAgent (17 providers)
src/services/openclaw/agent.py	OpenClawAgent (17+ providers)
src/services/video_engine/vlm_service.py	VLMService (vision chain)
src/api/utils/vault.py	3-tier secret resolution
src/api/utils/llm_vault.py	LLM vault (17 providers)
src/api/utils/resilience.py	CircuitBreaker
src/api/routes/llm.py	LLM API endpoints

Common Issues

All providers failing — circuit breakers open

curl -X POST http://localhost:8000/api/v1/llm/reset-circuits

Rate limited — auto-heal not working

IntelligenceHub auto-heals after 10 min. If stuck: reset circuits, check for 429 in logs.

Placeholder key rejected

IntelligenceHub rejects keys containing "your_", "placeholder", "CHANGE_ME". Set real keys.

Ollama not responding

docker compose ps ollama
curl http://localhost:11434/api/tags
docker compose exec ollama ollama list

API key priority

Three-tier in vault.py: UserSetting DB -> SystemSettings DB -> .env

Rate Limits

Tier	Limit
FREE	100/hr
PRO/PREMIUM/BASIC	500/hr
SOVEREIGN/STUDIO	5000/hr