name: a6-plugin-ai-proxy description: >- Skill for configuring the Apache APISIX ai-proxy plugin via the a6 CLI. Covers proxying requests to LLM providers (OpenAI, Azure OpenAI, DeepSeek, Anthropic, Gemini, Vertex AI, and more), authentication per provider, model configuration, streaming, logging, and load balancing with ai-proxy-multi. version: "1.0.0" author: Apache APISIX Contributors license: Apache-2.0 metadata: category: plugin apisix_version: ">=3.9.0" plugin_name: ai-proxy a6_commands: - a6 route create - a6 route update - a6 config sync
a6-plugin-ai-proxy
Overview
The ai-proxy plugin turns APISIX into an AI gateway. It proxies requests in
OpenAI-compatible format to LLM providers, handling authentication, endpoint
routing, and response streaming. Clients send a standard chat-completion
request; the plugin translates and forwards it to the configured provider.
When to Use
- Proxy chat-completion or embedding requests to any supported LLM provider
- Centralize API keys at the gateway instead of distributing to clients
- Add observability (token counts, latency) to LLM calls
- Combine with
ai-prompt-template,ai-prompt-decorator, or content moderation plugins for a full AI gateway pipeline
Supported Providers
| Provider | Value | Default Endpoint |
|---|---|---|
| OpenAI | openai |
https://api.openai.com/v1/chat/completions |
| DeepSeek | deepseek |
https://api.deepseek.com/chat/completions |
| Azure OpenAI | azure-openai |
Custom via override.endpoint |
| Anthropic | anthropic |
https://api.anthropic.com/v1/chat/completions |
| AIMLAPI | aimlapi |
https://api.aimlapi.com/v1/chat/completions |
| OpenRouter | openrouter |
https://openrouter.ai/api/v1/chat/completions |
| Gemini | gemini |
https://generativelanguage.googleapis.com/v1beta/openai/chat/completions |
| Vertex AI | vertex-ai |
https://aiplatform.googleapis.com |
| OpenAI-Compatible | openai-compatible |
Custom via override.endpoint |
Plugin Configuration Reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider |
string | Yes | — | One of the 9 supported providers |
auth |
object | Yes | — | Authentication config (see below) |
options |
object | No | — | Model and generation parameters |
options.model |
string | No | — | Model name (provider-specific) |
options.temperature |
number | No | — | Sampling temperature |
options.top_p |
number | No | — | Nucleus sampling |
options.max_tokens |
integer | No | — | Maximum tokens to generate |
options.stream |
boolean | No | false |
Enable SSE streaming |
override |
object | No | — | Override default endpoint |
override.endpoint |
string | No | — | Full URL for the provider API |
provider_conf |
object | No | — | Provider-specific config (Vertex AI) |
provider_conf.project_id |
string | No | — | GCP project ID (Vertex AI) |
provider_conf.region |
string | No | — | GCP region (Vertex AI) |
logging |
object | No | — | Logging options |
logging.summaries |
boolean | No | false |
Log model, duration, tokens |
logging.payloads |
boolean | No | false |
Log request/response bodies |
timeout |
integer | No | 30000 |
Request timeout (ms) |
keepalive |
boolean | No | true |
Keep connection alive |
keepalive_timeout |
integer | No | 60000 |
Keepalive timeout (ms) |
keepalive_pool |
integer | No | 30 |
Keepalive pool size |
ssl_verify |
boolean | No | true |
Verify SSL certificate |
Authentication by Provider
OpenAI / DeepSeek / Anthropic / AIMLAPI / OpenRouter
{
"auth": {
"header": {
"Authorization": "Bearer sk-your-api-key"
}
}
}
Azure OpenAI
{
"auth": {
"header": {
"api-key": "your-azure-key"
}
},
"override": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
}
}
Gemini
{
"auth": {
"header": {
"Authorization": "Bearer your-gemini-key"
}
}
}
Vertex AI (GCP Service Account)
{
"auth": {
"gcp": {
"service_account_json": "{ ... }",
"max_ttl": 3600,
"expire_early_secs": 60
}
},
"provider_conf": {
"project_id": "your-project-id",
"region": "us-central1"
}
}
The service_account_json can also be set via the GCP_SERVICE_ACCOUNT
environment variable.
Custom OpenAI-Compatible API
{
"auth": {
"header": {
"Authorization": "Bearer your-token"
}
},
"override": {
"endpoint": "https://your-custom-llm.com/v1/chat/completions"
}
}
Step-by-Step: Route to OpenAI
1. Create a route with ai-proxy
a6 route create -f - <<'EOF'
{
"id": "openai-chat",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-openai-key"
}
},
"options": {
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 1024
}
}
}
}
EOF
2. Send a request
curl http://127.0.0.1:9080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 1+1?"}
]
}'
The gateway adds authentication and forwards to OpenAI. The client never sees the API key.
Common Patterns
Streaming responses
{
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "gpt-4",
"stream": true
}
}
}
}
The client receives Server-Sent Events (SSE). To get token counts in
streaming mode, the client should include stream_options.include_usage: true
in the request body.
Azure OpenAI
{
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "your-azure-key"
}
},
"options": {
"model": "gpt-4"
},
"override": {
"endpoint": "https://myresource.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
},
"timeout": 60000
}
}
}
Embeddings endpoint
a6 route create -f - <<'EOF'
{
"id": "embeddings",
"uri": "/v1/embeddings",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "text-embedding-3-small"
},
"override": {
"endpoint": "https://api.openai.com/v1/embeddings"
}
}
}
}
EOF
Enable logging
{
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "gpt-4"
},
"logging": {
"summaries": true,
"payloads": false
}
}
}
}
Model Routing with Multiple Routes
The plugin does not natively route by model. Use separate routes with vars
matching on request body fields:
# Route requests for gpt-4 to OpenAI
a6 route create -f - <<'EOF'
{
"id": "openai-gpt4",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"vars": [["post_arg.model", "==", "gpt-4"]],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer sk-openai-key" } },
"options": { "model": "gpt-4" }
}
}
}
EOF
# Route requests for deepseek-chat to DeepSeek
a6 route create -f - <<'EOF'
{
"id": "deepseek-chat",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"vars": [["post_arg.model", "==", "deepseek-chat"]],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": { "header": { "Authorization": "Bearer sk-deepseek-key" } },
"options": { "model": "deepseek-chat" }
}
}
}
EOF
Load Balancing with ai-proxy-multi
For load balancing, failover, and priority-based routing across providers,
use ai-proxy-multi instead:
{
"plugins": {
"ai-proxy-multi": {
"balancer": {
"algorithm": "roundrobin"
},
"fallback_strategy": ["rate_limiting", "http_429", "http_5xx"],
"instances": [
{
"name": "openai-primary",
"provider": "openai",
"priority": 1,
"weight": 8,
"auth": {
"header": { "Authorization": "Bearer sk-openai-key" }
},
"options": { "model": "gpt-4" }
},
{
"name": "deepseek-backup",
"provider": "deepseek",
"priority": 0,
"weight": 2,
"auth": {
"header": { "Authorization": "Bearer sk-deepseek-key" }
},
"options": { "model": "deepseek-chat" }
}
]
}
}
}
Access Log Variables
Configure APISIX to log LLM metrics:
| Variable | Description |
|---|---|
$request_type |
traditional_http, ai_chat, or ai_stream |
$llm_time_to_first_token |
Time to first token (ms) |
$llm_model |
Actual model used by provider |
$request_llm_model |
Model requested by client |
$llm_prompt_tokens |
Prompt token count |
$llm_completion_tokens |
Completion token count |
Config Sync Example
version: "1"
routes:
- id: openai-chat
uri: /v1/chat/completions
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: Bearer sk-your-openai-key
options:
model: gpt-4
max_tokens: 1024
temperature: 0.7
logging:
summaries: true
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| 502 Bad Gateway | Wrong endpoint or provider value | Verify provider matches your API; check override.endpoint for Azure/custom |
| 401 from upstream | Invalid API key | Check auth.header value; ensure key is active with the provider |
| Timeout errors | Slow LLM response | Increase timeout (default 30000ms); use streaming for long completions |
| No token counts in streaming | Missing stream_options | Client should send stream_options.include_usage: true |
| Azure 404 | Missing api-version in URL | Include ?api-version=YYYY-MM-DD-preview in override.endpoint |
| Vertex AI auth failure | Bad service account JSON | Set via auth.gcp.service_account_json or GCP_SERVICE_ACCOUNT env var |