glm-5 - SKILL.md Agent Skill

name: glm-5 description: | GLM-5 Z.ai Frontier Model routing and configuration. Use when: a task needs budget-friendly frontier reasoning, MIT-licensed model output, agentic engineering, or cost-effective high-quality inference via OpenRouter. role: Specialist Executor intent: Route budget frontier reasoning tasks to GLM-5's 744B MoE model for cost-effective high-quality inference. kpis: - cost_per_task - hallucination_rate - routing_accuracy status: active priority: high triggers: - glm - glm-5 - zhipu - z.ai - chatglm - chinese model - cogvideo - cogview - agentic engineering - huawei trained execution: sequential — classify task → verify budget fit → call GLM-5 via OpenRouter → validate response → log cost dependencies: - OPENROUTER_API_KEY

GLM-5 — Z.ai Frontier Model

Released 2026-02-11 by Zhipu AI (Z.ai). A 744B parameter Mixture-of-Experts model trained on Huawei Ascend infrastructure. MIT licensed — all outputs are commercially usable with no restrictions.

Model Overview

Property	Value
Parameters	744B (MoE, ~120B active)
Context Window	128K tokens
License	MIT
Training Hardware	Huawei Ascend 910B
Provider	OpenRouter
Model ID	`zhipu/glm-5`

Pricing

Component	Input (per MTok)	Output (per MTok)
Text	$1.00	$3.20

GLM-5 is one of the most cost-effective frontier models available. At $1.00/$3.20 per MTok, it undercuts most competitors while delivering competitive benchmark performance.

Routing Rules

Use GLM-5 as the default budget frontier model — for tasks that need strong reasoning but do not require the absolute best (use Gemini 3.1 Pro or Claude 4.6 for those).
Prefer GLM-5 over GPT-4o for cost-sensitive tasks — similar quality at lower price.
Use for agentic engineering — GLM-5 excels at multi-step tool-use and code generation tasks common in ACHEEVY workflows.
MIT license advantage — when the output must be redistributable or embedded in customer-facing exports, GLM-5's MIT license removes legal friction.
Fallback to GLM-5 when primary models (Gemini 3.1 Pro, Claude 4.6) are rate-limited or unavailable.

When NOT to Use GLM-5

Tasks requiring >128K context window (use Gemini 3.1 Pro with 2M context).
Tasks where absolute top-tier accuracy is critical (use Claude 4.6 Opus or Gemini 3.1 Pro high-thinking).
Real-time streaming with sub-200ms first-token latency requirements.
Image/audio multimodal input (GLM-5 is text-only; use CogView/CogVideo siblings for media generation).

Anti-Patterns

Routing GLM-5 for tasks that exceed its 128K context limit.
Ignoring the MIT license benefit when producing exportable/customer-facing content.
Using GLM-5 for safety-critical decisions without a secondary verification model.
Not logging cost_per_task — the whole point of GLM-5 routing is cost optimization.

Related Models (Zhipu Ecosystem)

Model	Purpose
CogVideo X	Video generation
CogView 4	Image generation
GLM-4	Previous-gen text model