name: moonshot-ai version: "1.0.0" description: Moonshot AI Kimi API - Trillion-parameter MoE model with 256K context, tool calling, and agentic capabilities for chat, coding, and autonomous task execution
Moonshot AI Skill
Moonshot AI provides the Kimi large language model series, featuring the flagship Kimi K2 - a state-of-the-art mixture-of-experts (MoE) model with 1 trillion total parameters. The API offers OpenAI-compatible endpoints with 256K context length, strong tool calling capabilities, and competitive pricing.
Key Value Proposition: Access a trillion-parameter model optimized for agentic tasks, tool use, and coding at significantly lower costs than competitors (up to 100x cheaper than GPT-4 for some tasks), with excellent multilingual support for Chinese and English.
When to Use This Skill
- Integrating Moonshot AI/Kimi models into applications
- Building agentic AI systems with autonomous tool calling
- Processing long documents with 128K-256K context windows
- Developing cost-effective LLM solutions
- Creating multilingual applications (Chinese/English)
- Implementing function calling and tool use patterns
When NOT to Use This Skill
- For OpenAI API specifically (use openai skill)
- For Claude/Anthropic API (use anthropic skill)
- For image generation or multimodal tasks (Kimi is text-focused)
- For models requiring real-time voice interaction
Core Concepts
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ Moonshot AI Platform │
│ platform.moonshot.ai │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Kimi K2 │ │ moonshot-v1 │ │ Tool Use │
│ (Latest) │ │ (Legacy) │ │ │
├───────────────┤ ├───────────────┤ ├───────────────┤
│ • 1T params │ │ • v1-8k │ │ • Functions │
│ • 32B active │ │ • v1-32k │ │ • Web Search │
│ • 128K-256K │ │ • v1-128k │ │ • Code Exec │
│ • MoE arch │ │ │ │ • Custom │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└─────────────────────┼─────────────────────┘
│
▼
┌───────────────────┐
│ API Endpoints │
├───────────────────┤
│ • OpenAI compat │
│ • Anthropic compat│
│ • Streaming │
│ • Tool calling │
└───────────────────┘
Model Specifications
| Model | Parameters | Active | Context | Best For |
|---|---|---|---|---|
| kimi-k2-0905-preview | 1T | 32B | 256K | Latest, agentic tasks |
| kimi-k2-turbo-preview | 1T | 32B | 128K | Fast, general use |
| kimi-k2-thinking | 1T | 32B | 128K | Multi-step reasoning |
| moonshot-v1-8k | - | - | 8K | Short context |
| moonshot-v1-32k | - | - | 32K | Medium context |
| moonshot-v1-128k | - | - | 128K | Long documents |
| kimi-latest | - | - | Auto | Auto-selects tier |
Kimi K2 Technical Details
Architecture: Mixture-of-Experts (MoE)
Total Parameters: 1 Trillion
Activated Parameters: 32 Billion per token
Layers: 61 (including 1 dense layer)
Experts: 384 total, 8 selected per token
Attention: MLA (Multi-head Latent Attention)
Activation: SwiGLU
Vocabulary: 160K tokens
Context: 128K tokens (256K for 0905-preview)
Training Data: 15.5T tokens
Quick Start
Get API Key
- Visit platform.moonshot.ai
- Create an account
- Generate API key from dashboard
Environment Setup
export MOONSHOT_API_KEY="your-api-key-here"
# Optional: Use China endpoint
export MOONSHOT_API_BASE="https://api.moonshot.cn/v1"
Basic Chat Completion
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.moonshot.ai/v1"
)
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.6, # Recommended
max_tokens=1024
)
print(response.choices[0].message.content)
API Reference
Base URLs
| Region | URL |
|---|---|
| Global | https://api.moonshot.ai/v1 |
| China | https://api.moonshot.cn/v1 |
Authentication
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-0905-preview",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Chat Completions
Endpoint: POST /v1/chat/completions
Request Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier |
messages |
array | Yes | Conversation history |
temperature |
float | No | 0.0-1.0, recommended 0.6 |
max_tokens |
int | No | Maximum response length |
stream |
bool | No | Enable streaming |
top_p |
float | No | Nucleus sampling |
tools |
array | No | Function definitions |
tool_choice |
string | No | auto, none, or specific |
Message Format:
{
"messages": [
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message"},
{"role": "assistant", "content": "Previous response"},
{"role": "user", "content": [
{"type": "text", "text": "Multimodal content"}
]}
]
}
Response:
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "kimi-k2-0905-preview",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
}
}
Tool Calling / Function Calling
Kimi K2 has strong native support for tool calling, enabling agentic applications.
Define Tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"required": ["city"],
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
}
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"required": ["query"],
"properties": {
"query": {
"type": "string",
"description": "Search query"
}
}
}
}
}
]
Make Tool Call Request
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=tools,
tool_choice="auto",
temperature=0.6
)
# Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
Complete Tool Call Loop
import json
def execute_tool(name: str, args: dict) -> str:
"""Execute tool and return result."""
if name == "get_weather":
return json.dumps({"temp": 22, "condition": "sunny"})
elif name == "search_web":
return json.dumps({"results": ["Result 1", "Result 2"]})
return json.dumps({"error": "Unknown tool"})
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
while True:
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=messages,
tools=tools,
tool_choice="auto",
temperature=0.6
)
message = response.choices[0].message
messages.append(message)
if not message.tool_calls:
# No more tool calls, done
print(message.content)
break
# Execute each tool call
for tool_call in message.tool_calls:
result = execute_tool(
tool_call.function.name,
json.loads(tool_call.function.arguments)
)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
Streaming
Python Streaming
stream = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Write a poem about AI"}],
stream=True,
temperature=0.6
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
JavaScript/Node.js
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.MOONSHOT_API_KEY,
baseURL: 'https://api.moonshot.ai/v1'
});
async function chat() {
const stream = await client.chat.completions.create({
model: 'kimi-k2-0905-preview',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
cURL Streaming
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-0905-preview",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Pricing
Kimi K2 Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| kimi-k2-0905-preview | ~$0.15 | ~$2.50 |
| kimi-k2-turbo-preview | ~$0.15 | ~$2.50 |
moonshot-v1 Models (kimi-latest auto-selects)
| Context Tier | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| 8K | $0.20 | $2.00 |
| 32K | $1.00 | $3.00 |
| 128K | $2.00 | $5.00 |
Built-in Tools
| Tool | Cost per Call |
|---|---|
| $web_search | ~$0.005 |
LiteLLM Integration
Configuration
from litellm import completion
response = completion(
model="moonshot/kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Hello"}]
)
Proxy Config (config.yaml)
model_list:
- model_name: kimi-k2
litellm_params:
model: moonshot/kimi-k2-0905-preview
api_key: os.environ/MOONSHOT_API_KEY
- model_name: kimi-128k
litellm_params:
model: moonshot/moonshot-v1-128k
api_key: os.environ/MOONSHOT_API_KEY
Handled Quirks
LiteLLM automatically handles:
- Temperature capping: Values > 1 are clamped
- Temperature constraint: Sets to 0.3 when temp < 0.3 and n > 1
- Tool choice: Converts "required" by adding context
Anthropic-Compatible API
Moonshot also offers an Anthropic-compatible API endpoint:
from anthropic import Anthropic
client = Anthropic(
api_key="your-moonshot-key",
base_url="https://api.moonshot.ai/v1"
)
# Note: Temperature mapping
# real_temperature = request_temperature * 0.6
response = client.messages.create(
model="kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024,
temperature=1.0 # Will become 0.6 internally
)
Best Practices
Temperature Settings
# Recommended default
temperature = 0.6
# For creative tasks
temperature = 0.8
# For factual/deterministic tasks
temperature = 0.3
System Prompts
# Default system prompt (good starting point)
system_prompt = "You are Kimi, an AI assistant created by Moonshot AI."
# Custom for specific tasks
system_prompt = """You are a coding assistant.
Provide clean, well-documented code with explanations.
Use Python unless otherwise specified."""
Long Context Usage
# For documents up to 256K tokens
response = client.chat.completions.create(
model="kimi-k2-0905-preview", # Supports 256K
messages=[
{"role": "system", "content": "Analyze the following document."},
{"role": "user", "content": very_long_document}
],
temperature=0.3 # Lower for analysis tasks
)
Performance Benchmarks
| Benchmark | Score | Notes |
|---|---|---|
| AIME 2024 | 69.6% | Math reasoning |
| MATH-500 | 97.4% | Mathematics |
| LiveCodeBench | 53.7% | Code generation |
| SWE-bench Verified | 71.6% | Agentic coding |
| MMLU | 89.5% | General knowledge |
| MMLU-Redux | 92.7% | Updated evaluation |
| Tau2 Retail | 70.6% | Tool use |
| AceBench | 76.5% | Agent evaluation |
Troubleshooting
Authentication Errors
Error: 401 Unauthorized
Solutions:
- Verify API key is correct
- Check environment variable is set
- Ensure key hasn't expired
Rate Limiting
Error: 429 Too Many Requests
Solutions:
- Implement exponential backoff
- Reduce request frequency
- Consider upgrading plan
Context Length Exceeded
Error: Context length exceeded
Solutions:
- Use longer context model (kimi-k2-0905-preview for 256K)
- Truncate input text
- Summarize previous messages
Tool Call Issues
Error: Invalid tool definition
Solutions:
- Verify JSON schema is valid
- Check required fields are present
- Ensure parameter types are correct
Resources
Official Documentation
Open Source
Integration Guides
Support
- Email: support@moonshot.cn
Version History
- 1.0.0 (2026-01-12): Initial skill release
- Complete Kimi K2 model documentation
- API reference with all parameters
- Tool calling / function calling guide
- Streaming examples (Python, Node.js, cURL)
- Pricing information
- LiteLLM and Anthropic-compatible API integration
- Performance benchmarks
- Troubleshooting guide