name: shenxiang_image_gen description: Generate images and math visualization animations for shenxiang.school. Two modes — (1) Image generation using Codex's native gpt-image-2 tool based on structured JSON from a Dify intent-recognition workflow, and (2) Math animation generation using Manim to produce video of dynamic function graphs, geometry, and K12 math concepts. Use when a user asks to generate/create/draw an image, OR when they ask to visualize/animate a math function, equation, graph, or geometric concept.
Shenxiang Image & Math Animation Generator
Dual-purpose skill for shenxiang.school:
- Mode A — Image: AI-optimized prompt → native gpt-image-2 generation
- Mode B — Math Animation: AI-parsed intent → Manim code generation → video rendering
Architecture Overview
User request (any language)
│
├── Image request? ──→ Mode A (gpt-image-2)
│
└── Math/animation? ──→ Mode B (Manim video)
Mode A: Image Generation
User request → Dify 2-node workflow (intent only) → JSON contract → Codex gpt-image-2 → Image
Mode B: Math Animation
User request → Dify 2-node workflow (math intent) → JSON contract → Codex generates Manim code → render_manim.py → Video
Routing Decision
Determine mode from the user's request:
| Trigger keywords | Mode |
|---|---|
| 画图, 生成图片, create image, draw, poster, 海报, logo, 头像, product photo | A — Image |
| 函数, 图像, 动画, animate, graph, 可视化, 二次函数, sin, cos, 几何, 数学, plot, 坐标系 | B — Math Animation |
| Ambiguous (e.g. "画一个函数图") | B — Math Animation (math takes priority when "function/graph" detected) |
Mode A — Image Generation
When Dify JSON Is Provided
If the incoming message contains a JSON payload matching the image contract, skip optimization and generate directly.
Image JSON Contract
{
"optimized_prompt": "60-200 word English paragraph",
"aspect_ratio": "1:1 | 3:2 | 2:3 | 4:3 | 3:4 | 16:9 | 9:16",
"size_hint": "1024x1024 | 1024x1536 | 1536x1024 | 2048x2048",
"style_tag": "photo | illustration | 3d | design | anime | mixed",
"reasoning_short": "≤25 words"
}
Execution Steps
- Validate JSON (run
scripts/validate_dify_json.pyif uncertain). - Call gpt-image-2:
prompt=optimized_prompt,size=size_hint. - Return generated image + display
reasoning_short.
When No JSON Is Provided (Direct Image Request)
Optimize locally then generate:
- Detect language → translate meaning to English; keep on-image text in original language.
- Classify intent (portrait, product, scene, poster, etc.).
- Compose one paragraph (60–180 words): Subject → Action → Environment → Camera → Lighting → Palette → Text → Negatives.
- Pick aspect_ratio and map to size_hint (see
references/json-schema.md). - Call gpt-image-2 with optimized prompt and size.
Safety (Image)
Refuse: real private persons, minors in unsafe contexts, sexual content, self-harm, weapons, copyrighted characters. Return explanation: "Request rejected by safety policy."
For detailed image parameters: see references/gpt-image-2-params.md.
For JSON contract details: see references/json-schema.md.
Mode B — Math Animation (Manim)
When Dify Math JSON Is Provided
If the incoming message contains a JSON payload matching the math animation contract, use it directly to generate Manim code.
Math Animation JSON Contract
{
"topic": "string — math concept",
"topic_category": "linear_function | quadratic_function | trigonometric | geometry | calculus | statistics | vectors | sequences | other",
"functions": [
{"expression": "2*x + 1", "label": "y = 2x + 1", "color": "BLUE"}
],
"parameters_to_animate": [
{"name": "k", "start_value": 1, "end_value": 3, "description": "slope change"}
],
"axes_config": {"x_range": [-5, 5, 1], "y_range": [-4, 4, 1]},
"annotations": [
{"type": "dot|text|line|arrow", "content": "label", "position": "where"}
],
"animation_style": "dynamic | static | step-by-step",
"duration_seconds": 10,
"quality": "low | medium | high",
"title": "一次函数",
"reasoning": "≤30 words"
}
When No JSON Is Provided (Direct Math Request)
Generate Manim code directly from the user's description. Follow this procedure:
Step 1: Parse the math intent
- Identify the math topic (linear/quadratic/trig/geometry/etc.)
- Extract any explicit functions mentioned
- Determine what should be animated (parameter changes, comparisons, constructions)
- Decide appropriate axes range
Step 2: Generate Manim Python code
Write a complete, self-contained Manim script following these rules:
- Start with
from manim import *andimport numpy as np - Create ONE Scene subclass with a descriptive name
- Use
AxesorNumberPlanefor function visualizations - Use
ValueTracker+always_redrawfor dynamic parameter animations - Use
Text("中文", font="Noto Sans CJK SC")for Chinese labels - Use
MathTex(r"y = kx + b")for math formulas - Keep total duration 5–30 seconds
- Add a Chinese title at the top with
Text
Code structure template:
from manim import *
import numpy as np
class [TopicName]Scene(Scene):
def construct(self):
# 1. Title
title = Text("[中文标题]", font="Noto Sans CJK SC", font_size=32).to_edge(UP)
self.play(Write(title))
# 2. Axes
axes = Axes(x_range=[...], y_range=[...], axis_config={"include_numbers": True})
self.play(Create(axes))
# 3. Function graph (static or dynamic)
# For dynamic: use ValueTracker + always_redraw
# For static: use axes.plot()
# 4. Annotations (dots, labels, lines)
# 5. Animations (parameter changes, transforms)
# 6. Final wait
self.wait(1)
Step 3: Validate the code
Before rendering, validate with:
python scripts/render_manim.py --validate-only <<< "$CODE"
This checks:
- Syntax correctness (AST parse)
- No forbidden imports (os, subprocess, socket, etc.)
- Scene subclass exists
from manim import *present
Step 4: Render to video
python scripts/render_manim.py --scene [ClassName] --quality medium --format mp4 <<< "$CODE"
Or from file:
python scripts/render_manim.py --file scene.py --scene [ClassName] -q medium
Output is JSON:
{"success": true, "video_path": "/tmp/manim_output_xxx/...", "duration_seconds": 8.5}
Step 5: Deliver
- Return the video file to the user
- Show the generated code (optional, for transparency)
- Display the reasoning/title
Common Math Animation Patterns
For detailed Manim API reference and ready-to-use patterns, see references/manim-params.md. Key patterns:
- Linear function (一次函数): Axes + ValueTracker for k and b + always_redraw graph
- Quadratic function (二次函数): Parabola + vertex dot + axis of symmetry + animate a/b/c
- Trigonometric (三角函数): Extended x_range with PI steps + sin/cos comparison
- Geometry (几何): Polygon + Dot vertices + angle marks + transformations
Safety (Math Animation)
- Only
manimandnumpyimports allowed - No filesystem/network/system access in generated code
- Render timeout: 120 seconds max
- Code length limit: 50,000 characters
Integration with shenxiang.school
┌─────────────────────────────────────────────────────────────┐
│ shenxiang.school Architecture │
├─────────────────────────────────────────────────────────────┤
│ │
│ Browser → /api/dify-chat → routes to Dify │
│ │
│ Dify Workflow A: Image Intent (2 nodes) │
│ assets/dify-workflow-intent-only.yml │
│ → Returns image JSON contract │
│ │
│ Dify Workflow B: Math Intent (2 nodes) │
│ assets/dify-workflow-math-intent.yml │
│ → Returns math animation JSON contract │
│ │
│ Codex receives JSON and executes: │
│ Image → native gpt-image-2 (no gateway needed) │
│ Math → generate Manim code → render_manim.py → video │
│ │
│ Credits deducted per generation via /api/user/credits │
│ │
└─────────────────────────────────────────────────────────────┘
Key simplifications vs. original architecture
| Original | With this skill |
|---|---|
| 4-node Dify workflow + Code validation node | 2-node Dify (intent only) + Codex handles logic |
dify-image-gateway:8001 for image gen |
Eliminated — Codex has native gpt-image-2 |
| No math animation capability | Full Manim rendering pipeline added |
| Multiple API hops for one generation | Single Codex execution per request |
Deployment checklist
- Dify: Import both workflow YAMLs from
assets/ - Docker: Add manim-renderer container (for video rendering)
- Env vars:
DIFY_API_KEY_INTENT(image) +DIFY_API_KEY_MATH_INTENT(math) - Codex: Install skill to
~/.codex/skills/shenxiang_image_gen/ - CDN: Configure video output upload to
cdn.shenxiang.school