moonshot-ai - SKILL.md Agent Skill

name: moonshot-ai version: "1.0.0" description: Moonshot AI Kimi API - Trillion-parameter MoE model with 256K context, tool calling, and agentic capabilities for chat, coding, and autonomous task execution

Moonshot AI Skill

Moonshot AI provides the Kimi large language model series, featuring the flagship Kimi K2 - a state-of-the-art mixture-of-experts (MoE) model with 1 trillion total parameters. The API offers OpenAI-compatible endpoints with 256K context length, strong tool calling capabilities, and competitive pricing.

Key Value Proposition: Access a trillion-parameter model optimized for agentic tasks, tool use, and coding at significantly lower costs than competitors (up to 100x cheaper than GPT-4 for some tasks), with excellent multilingual support for Chinese and English.

When to Use This Skill

Integrating Moonshot AI/Kimi models into applications
Building agentic AI systems with autonomous tool calling
Processing long documents with 128K-256K context windows
Developing cost-effective LLM solutions
Creating multilingual applications (Chinese/English)
Implementing function calling and tool use patterns

When NOT to Use This Skill

For OpenAI API specifically (use openai skill)
For Claude/Anthropic API (use anthropic skill)
For image generation or multimodal tasks (Kimi is text-focused)
For models requiring real-time voice interaction

Core Concepts

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Moonshot AI Platform                          │
│                  platform.moonshot.ai                            │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│  Kimi K2      │    │ moonshot-v1   │    │   Tool Use    │
│  (Latest)     │    │  (Legacy)     │    │               │
├───────────────┤    ├───────────────┤    ├───────────────┤
│ • 1T params   │    │ • v1-8k       │    │ • Functions   │
│ • 32B active  │    │ • v1-32k      │    │ • Web Search  │
│ • 128K-256K   │    │ • v1-128k     │    │ • Code Exec   │
│ • MoE arch    │    │               │    │ • Custom      │
└───────────────┘    └───────────────┘    └───────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                              ▼
                    ┌───────────────────┐
                    │   API Endpoints   │
                    ├───────────────────┤
                    │ • OpenAI compat   │
                    │ • Anthropic compat│
                    │ • Streaming       │
                    │ • Tool calling    │
                    └───────────────────┘

Model Specifications

Model	Parameters	Active	Context	Best For
kimi-k2-0905-preview	1T	32B	256K	Latest, agentic tasks
kimi-k2-turbo-preview	1T	32B	128K	Fast, general use
kimi-k2-thinking	1T	32B	128K	Multi-step reasoning
moonshot-v1-8k	-	-	8K	Short context
moonshot-v1-32k	-	-	32K	Medium context
moonshot-v1-128k	-	-	128K	Long documents
kimi-latest	-	-	Auto	Auto-selects tier

Kimi K2 Technical Details

Architecture: Mixture-of-Experts (MoE)
Total Parameters: 1 Trillion
Activated Parameters: 32 Billion per token
Layers: 61 (including 1 dense layer)
Experts: 384 total, 8 selected per token
Attention: MLA (Multi-head Latent Attention)
Activation: SwiGLU
Vocabulary: 160K tokens
Context: 128K tokens (256K for 0905-preview)
Training Data: 15.5T tokens

Quick Start

Get API Key

Visit platform.moonshot.ai
Create an account
Generate API key from dashboard

Environment Setup

export MOONSHOT_API_KEY="your-api-key-here"

# Optional: Use China endpoint
export MOONSHOT_API_BASE="https://api.moonshot.cn/v1"

Basic Chat Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.moonshot.ai/v1"
)

response = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.6,  # Recommended
    max_tokens=1024
)

print(response.choices[0].message.content)

API Reference

Base URLs

Region	URL
Global	`https://api.moonshot.ai/v1`
China	`https://api.moonshot.cn/v1`

Authentication

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2-0905-preview",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Chat Completions

Endpoint: POST /v1/chat/completions

Request Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model identifier
`messages`	array	Yes	Conversation history
`temperature`	float	No	0.0-1.0, recommended 0.6
`max_tokens`	int	No	Maximum response length
`stream`	bool	No	Enable streaming
`top_p`	float	No	Nucleus sampling
`tools`	array	No	Function definitions
`tool_choice`	string	No	`auto`, `none`, or specific

Message Format:

{
  "messages": [
    {"role": "system", "content": "System prompt"},
    {"role": "user", "content": "User message"},
    {"role": "assistant", "content": "Previous response"},
    {"role": "user", "content": [
      {"type": "text", "text": "Multimodal content"}
    ]}
  ]
}

Response:

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "kimi-k2-0905-preview",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Response text"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  }
}

Tool Calling / Function Calling

Kimi K2 has strong native support for tool calling, enabling agentic applications.

Define Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "required": ["city"],
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                }
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "required": ["query"],
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                }
            }
        }
    }
]

Make Tool Call Request

response = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=tools,
    tool_choice="auto",
    temperature=0.6
)

# Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

Complete Tool Call Loop

import json

def execute_tool(name: str, args: dict) -> str:
    """Execute tool and return result."""
    if name == "get_weather":
        return json.dumps({"temp": 22, "condition": "sunny"})
    elif name == "search_web":
        return json.dumps({"results": ["Result 1", "Result 2"]})
    return json.dumps({"error": "Unknown tool"})

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

while True:
    response = client.chat.completions.create(
        model="kimi-k2-0905-preview",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        temperature=0.6
    )

    message = response.choices[0].message
    messages.append(message)

    if not message.tool_calls:
        # No more tool calls, done
        print(message.content)
        break

    # Execute each tool call
    for tool_call in message.tool_calls:
        result = execute_tool(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result
        })

Streaming

Python Streaming

stream = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True,
    temperature=0.6
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScript/Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: 'https://api.moonshot.ai/v1'
});

async function chat() {
  const stream = await client.chat.completions.create({
    model: 'kimi-k2-0905-preview',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

cURL Streaming

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2-0905-preview",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Pricing

Kimi K2 Models

Model	Input (per 1M tokens)	Output (per 1M tokens)
kimi-k2-0905-preview	~$0.15	~$2.50
kimi-k2-turbo-preview	~$0.15	~$2.50

moonshot-v1 Models (kimi-latest auto-selects)

Context Tier	Input (per 1M tokens)	Output (per 1M tokens)
8K	$0.20	$2.00
32K	$1.00	$3.00
128K	$2.00	$5.00

Built-in Tools

Tool	Cost per Call
$web_search	~$0.005

LiteLLM Integration

Configuration

from litellm import completion

response = completion(
    model="moonshot/kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Hello"}]
)

Proxy Config (config.yaml)

model_list:
  - model_name: kimi-k2
    litellm_params:
      model: moonshot/kimi-k2-0905-preview
      api_key: os.environ/MOONSHOT_API_KEY

  - model_name: kimi-128k
    litellm_params:
      model: moonshot/moonshot-v1-128k
      api_key: os.environ/MOONSHOT_API_KEY

Handled Quirks

LiteLLM automatically handles:

Temperature capping: Values > 1 are clamped
Temperature constraint: Sets to 0.3 when temp < 0.3 and n > 1
Tool choice: Converts "required" by adding context

Anthropic-Compatible API

Moonshot also offers an Anthropic-compatible API endpoint:

from anthropic import Anthropic

client = Anthropic(
    api_key="your-moonshot-key",
    base_url="https://api.moonshot.ai/v1"
)

# Note: Temperature mapping
# real_temperature = request_temperature * 0.6
response = client.messages.create(
    model="kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024,
    temperature=1.0  # Will become 0.6 internally
)

Best Practices

Temperature Settings

# Recommended default
temperature = 0.6

# For creative tasks
temperature = 0.8

# For factual/deterministic tasks
temperature = 0.3

System Prompts

# Default system prompt (good starting point)
system_prompt = "You are Kimi, an AI assistant created by Moonshot AI."

# Custom for specific tasks
system_prompt = """You are a coding assistant.
Provide clean, well-documented code with explanations.
Use Python unless otherwise specified."""

Long Context Usage

# For documents up to 256K tokens
response = client.chat.completions.create(
    model="kimi-k2-0905-preview",  # Supports 256K
    messages=[
        {"role": "system", "content": "Analyze the following document."},
        {"role": "user", "content": very_long_document}
    ],
    temperature=0.3  # Lower for analysis tasks
)

Performance Benchmarks

Benchmark	Score	Notes
AIME 2024	69.6%	Math reasoning
MATH-500	97.4%	Mathematics
LiveCodeBench	53.7%	Code generation
SWE-bench Verified	71.6%	Agentic coding
MMLU	89.5%	General knowledge
MMLU-Redux	92.7%	Updated evaluation
Tau2 Retail	70.6%	Tool use
AceBench	76.5%	Agent evaluation

Troubleshooting

Authentication Errors

Error: 401 Unauthorized

Solutions:

Verify API key is correct
Check environment variable is set
Ensure key hasn't expired

Rate Limiting

Error: 429 Too Many Requests

Solutions:

Implement exponential backoff
Reduce request frequency
Consider upgrading plan

Context Length Exceeded

Error: Context length exceeded

Solutions:

Use longer context model (kimi-k2-0905-preview for 256K)
Truncate input text
Summarize previous messages

Tool Call Issues

Error: Invalid tool definition

Solutions:

Verify JSON schema is valid
Check required fields are present
Ensure parameter types are correct

Resources

Official Documentation

Open Source

Integration Guides

LiteLLM Provider

Support

Email: support@moonshot.cn

Version History

1.0.0 (2026-01-12): Initial skill release
- Complete Kimi K2 model documentation
- API reference with all parameters
- Tool calling / function calling guide
- Streaming examples (Python, Node.js, cURL)
- Pricing information
- LiteLLM and Anthropic-compatible API integration
- Performance benchmarks
- Troubleshooting guide