a6-plugin-ai-content-moderation

star 0

Skill for configuring APISIX AI content moderation plugins via the a6 CLI. Covers both ai-aws-content-moderation (AWS Comprehend, request-only) and ai-aliyun-content-moderation (Aliyun, request + response with streaming), toxicity thresholds, category filtering, and integration with ai-proxy.

api7 By api7 schedule Updated 3/7/2026

name: a6-plugin-ai-content-moderation description: >- Skill for configuring APISIX AI content moderation plugins via the a6 CLI. Covers both ai-aws-content-moderation (AWS Comprehend, request-only) and ai-aliyun-content-moderation (Aliyun, request + response with streaming), toxicity thresholds, category filtering, and integration with ai-proxy. version: "1.0.0" author: Apache APISIX Contributors license: Apache-2.0 metadata: category: plugin apisix_version: ">=3.9.0" plugin_name: ai-aws-content-moderation related_plugins: - ai-aliyun-content-moderation a6_commands: - a6 route create - a6 route update - a6 config sync


a6-plugin-ai-content-moderation

Overview

APISIX provides two content moderation plugins that filter harmful content in LLM requests and responses:

Plugin Provider Request Response Streaming
ai-aws-content-moderation AWS Comprehend
ai-aliyun-content-moderation Aliyun Moderation Plus

Both must be used alongside ai-proxy or ai-proxy-multi.

When to Use

  • Block toxic, hateful, or sexual content before it reaches the LLM
  • Filter harmful LLM responses before they reach clients (Aliyun only)
  • Enforce content policies with configurable thresholds
  • Comply with content safety regulations

Plugin Execution Order

ai-prompt-template           (priority 1071)
ai-prompt-decorator          (priority 1070)
ai-aws-content-moderation    (priority 1050) ← runs BEFORE ai-proxy
ai-proxy                     (priority 1040)
ai-aliyun-content-moderation (priority 1029) ← runs AFTER ai-proxy

The AWS plugin blocks requests before they reach the LLM. The Aliyun plugin runs after ai-proxy sets context and can check both requests and responses.


Plugin 1: ai-aws-content-moderation

Uses the AWS Comprehend detectToxicContent API to score request content.

Configuration Reference

Field Type Required Default Description
comprehend.access_key_id string Yes AWS access key ID
comprehend.secret_access_key string Yes AWS secret access key
comprehend.region string Yes AWS region (e.g. us-east-1)
comprehend.endpoint string No Auto Custom Comprehend endpoint
comprehend.ssl_verify boolean No true Verify SSL certificate
moderation_categories object No Per-category thresholds (0-1)
moderation_threshold number No 0.5 Overall toxicity threshold (0-1)

Moderation Categories

Category Description
PROFANITY Profane language
HATE_SPEECH Hateful content
INSULT Insulting language
HARASSMENT_OR_ABUSE Harassment or abusive content
SEXUAL Sexual content
VIOLENCE_OR_THREAT Violent or threatening content

Each category accepts a score threshold from 0 (strictest, blocks nearly everything) to 1 (most permissive). If moderation_categories is set, each category is checked individually. Otherwise, the moderation_threshold is used as an overall toxicity check.

Step-by-Step: AWS Content Moderation

a6 route create -f - <<'EOF'
{
  "id": "moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1"
      },
      "moderation_categories": {
        "HATE_SPEECH": 0.3,
        "VIOLENCE_OR_THREAT": 0.2,
        "SEXUAL": 0.5
      }
    },
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    }
  }
}
EOF

Toxic requests are rejected with HTTP 400:

request body exceeds HATE_SPEECH threshold

Overall threshold (no per-category filtering)

{
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIA...",
        "secret_access_key": "secret...",
        "region": "us-east-1"
      },
      "moderation_threshold": 0.7
    }
  }
}

Plugin 2: ai-aliyun-content-moderation

Uses Aliyun Machine-Assisted Moderation Plus. Supports request moderation, response moderation, and real-time streaming moderation.

Configuration Reference

Field Type Required Default Description
endpoint string Yes Aliyun service endpoint URL
region_id string Yes Aliyun region (e.g. cn-shanghai)
access_key_id string Yes Aliyun access key ID
access_key_secret string Yes Aliyun access key secret
check_request boolean No true Enable request moderation
check_response boolean No false Enable response moderation
stream_check_mode string No final_packet realtime or final_packet
stream_check_cache_size integer No 128 Max chars per batch (realtime)
stream_check_interval number No 3 Seconds between batch checks (realtime)
request_check_service string No llm_query_moderation Aliyun service for request checks
request_check_length_limit number No 2000 Max chars per request chunk
response_check_service string No llm_response_moderation Aliyun service for response checks
response_check_length_limit number No 5000 Max chars per response chunk
risk_level_bar string No high Threshold: none, low, medium, high, max
deny_code number No 200 HTTP status code for rejected content
deny_message string No Custom rejection message
timeout integer No 10000 Request timeout (ms)
ssl_verify boolean No true Verify SSL certificate

Risk Level System

Content is blocked when its risk level meets or exceeds the risk_level_bar:

none (0) < low (1) < medium (2) < high (3) < max (4)

Setting risk_level_bar: "high" blocks content rated high or max. Setting risk_level_bar: "low" blocks everything rated low or above.

Streaming Modes

Mode Behavior
final_packet Buffers entire response, checks at end
realtime Checks content in batches during streaming, can interrupt mid-response

Step-by-Step: Aliyun Request + Response Moderation

a6 route create -f - <<'EOF'
{
  "id": "aliyun-moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    },
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "your-aliyun-key-id",
      "access_key_secret": "your-aliyun-key-secret",
      "check_request": true,
      "check_response": true,
      "risk_level_bar": "high",
      "deny_code": 400,
      "deny_message": "Content policy violation"
    }
  }
}
EOF

Realtime streaming moderation

{
  "plugins": {
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "key-id",
      "access_key_secret": "key-secret",
      "check_request": true,
      "check_response": true,
      "stream_check_mode": "realtime",
      "stream_check_cache_size": 256,
      "stream_check_interval": 2,
      "risk_level_bar": "medium"
    }
  }
}

Integration Patterns

Pattern A: Request-only filtering (AWS)

Client → [AWS Comprehend blocks toxic] → ai-proxy → LLM → Response → Client
plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "${AWS_ACCESS_KEY_ID}"
      secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
      region: us-east-1
    moderation_threshold: 0.5
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"

Pattern B: Request + response filtering (Aliyun)

Client → ai-proxy [sets context] → [Aliyun checks request] → LLM
       → [Aliyun checks response] → Client
plugins:
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"
  ai-aliyun-content-moderation:
    endpoint: "https://green.cn-shanghai.aliyuncs.com"
    region_id: cn-shanghai
    access_key_id: "${ALIYUN_KEY_ID}"
    access_key_secret: "${ALIYUN_KEY_SECRET}"
    check_request: true
    check_response: true
    risk_level_bar: high

Secret Management

Both plugins support APISIX secret management for credentials:

plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "$secret://vault/aws_key_id"
      secret_access_key: "$secret://vault/aws_secret_key"
      region: us-east-1

Config Sync Example

version: "1"
routes:
  - id: moderated-chat
    uri: /v1/chat/completions
    methods:
      - POST
    plugins:
      ai-aws-content-moderation:
        comprehend:
          access_key_id: AKIAIOSFODNN7EXAMPLE
          secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
          region: us-east-1
        moderation_categories:
          HATE_SPEECH: 0.3
          VIOLENCE_OR_THREAT: 0.2
        moderation_threshold: 0.5
      ai-proxy:
        provider: openai
        auth:
          header:
            Authorization: Bearer sk-your-key
        options:
          model: gpt-4

Troubleshooting

Symptom Cause Fix
"no ai instance picked" Aliyun plugin used without ai-proxy Always configure ai-proxy or ai-proxy-multi on the same route
AWS plugin not blocking Threshold too permissive Lower moderation_threshold or per-category thresholds
Aliyun response moderation inactive check_response defaults to false Explicitly set check_response: true
"Specified signature is not matched" Wrong Aliyun credentials Verify access_key_id and access_key_secret
High latency Double moderation (both plugins) Use one moderation provider per route, not both
Streaming interrupted mid-response Aliyun realtime mode detected violation Expected behavior; adjust risk_level_bar or use final_packet mode
Install via CLI
npx skills add https://github.com/api7/a6 --skill a6-plugin-ai-content-moderation
Repository Details
star Stars 0
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator