mimoskill

name: mimoskill description: Use Xiaomi MiMo V2.5 (the LLM behind mimo2codex) for chat, vision, web search, TTS and ASR — and route around capabilities MiMo doesn't natively support, especially OCR / image recognition / 识图 / 提取图片文字 / extract text from image when the current model can't see images, and image generation / 图像生成 / 生成图片 / draw a picture / 画一张 including Codex Pets `/hatch`. Trigger when the user mentions MiMo, calls into mimo2codex, asks to read text from an image, asks to describe or 识别 an image while using a non-vision model (mimo-v2.5-pro, mimo-v2-flash, …), asks to generate / hatch a Codex pet, asks for image generation while using MiMo as the chat backend, or hits a "no image generation available" / "image_gen tool unavailable" / "this model does not support image input" message inside Codex.

mimoskill — Xiaomi MiMo V2.5 + gap fillers

This skill bundles two things:

Direct MiMo V2.5 access — recipes for hitting https://api.xiaomimimo.com/v1 for chat, vision, web search, TTS, and ASR (works whether or not the mimo2codex proxy is running).
Workarounds for MiMo's gaps — concrete scripts for the few things MiMo doesn't do, particularly image generation (which is what Codex's /hatch pet creation needs).

Hard rules (for Codex agents)

These are non-negotiable when operating inside Codex pointed at this proxy:

Never pip install openai and never import openai. All scripts use only the Python standard library (urllib.request, json, etc.). The OpenAI SDK would fail auth or hit non-existent endpoints.
Never assume image generation is available natively. MiMo has no image-gen endpoint. Use scripts/generate_image.py or scripts/generate_pet.py.
Don't fight the sandbox. If you need a Python dependency, check mimoskill/scripts/ first — most things are already there in stdlib.
Non-vision model + image input → OCR it. When the chat model can't see images, run scripts/ocr.py — never ask the user to switch models.

When to use

Trigger this skill when:

User asks to hit MiMo's API directly (chat / vision / web search / TTS / ASR)
User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
User wants image generation as part of a MiMo-backed workflow
User pastes the Codex error: the image generation tool (image_gen) is not available in this environment or the CLI fallback requires the openai Python package
User wants to OCR / read text from / describe / 识别 / 提取文字 from an image while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model) — use scripts/ocr.py. Works with or without a MiMo key (free pollinations fallback when MIMO_API_KEY is unset).
User sees the proxy's [N image attachment(s) omitted: this model does not support image input …] placeholder in their transcript
Anything in the mimo2codex repo that touches a feature MiMo doesn't support

What MiMo V2.5 does and doesn't do

Quick answer:

Capability	MiMo native	Best model	Notes
Text chat	✅	`mimo-v2.5-pro`	reasoning + tools
Tool / function calling	✅	any	parallel calls supported
Vision (image input)	✅	`mimo-v2.5` or `mimo-v2-omni`	NOT mimo-v2.5-pro
Web search	✅	any	requires Web Search Plugin activated in MiMo console
TTS (speech synth)	✅	`mimo-v2.5-tts`	separate endpoint
ASR (speech recog)	✅	`mimo-v2.5-asr`	separate endpoint
Audio chat	✅	`mimo-v2-omni`	input only
Video understanding	✅	`mimo-v2-omni`	input only
Image generation	❌	—	`scripts/generate_image.py` (general) or `scripts/generate_pet.py` (Codex pets) — see below
OCR / 识图 (when chat model is non-vision)	⚠️ via `mimo-v2.5` or free pollinations	`scripts/ocr.py`	`--engine auto`: mimo if `MIMO_API_KEY` set, else pollinations (no key)
Code interpreter / sandbox	❌	—	not provided

For the full capability matrix and examples, read references/models.md.

Decision tree: what does the user actually want?

Is it OCR / read text from image / describe / 识别 an image
when the active chat model is non-vision?
├── Yes → use scripts/ocr.py (mimo-v2.5 if MIMO_API_KEY set, else free pollinations)
└── No
    │
    Is it chat / vision / search / TTS / ASR with a vision-capable model?
    ├── Yes → use MiMo directly (see "Calling MiMo directly" below) or via mimo2codex if Codex is the client
    └── No, they want image generation
        │
        Is it for a Codex pet (`/hatch`)?
        ├── Yes → see "Generating a Codex pet" below (scripts/generate_pet.py + install_pet.sh)
        └── No  → see "General (non-pet) image generation" below (scripts/generate_image.py)

Calling chat directly (works without any key)

Use scripts/mimo_chat.py for one-shot or streaming chat. Two engines, --engine auto (default) picks mimo if MIMO_API_KEY is set, else pollinations (free, no key) — so the script works without any key for text and vision.

# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
python3 mimoskill/scripts/mimo_chat.py --image https://example.com/x.png "describe this"

# Best quality + MiMo-specific features (web search, TTS, ASR)
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
python3 mimoskill/scripts/mimo_chat.py "今天上海天气?"   # web search auto-enabled on sk-* keys
python3 mimoskill/scripts/mimo_chat.py --stream "tell me a story"

When the mimo engine is active the script handles all MiMo-specific quirks — max_completion_tokens instead of max_tokens, the required text part next to image_url, reasoning_content round-tripping, etc. Web search is auto-enabled on pay-as-you-go (sk-*) keys — the web_search builtin is always included in the tools array and the model decides when to invoke it (tool_choice: "auto"). Token-plan (tp-*) keys skip web search (the endpoint doesn't support it). The pollinations engine doesn't support web search, TTS, or ASR (those are MiMo native features); it auto-switches to OpenAI-compat field names (max_tokens).

For non-trivial integrations, references/models.md and the official MiMo OpenAI-compat doc are the authoritative references.

OCR / image recognition (when the chat model can't see images)

If the user wants to read text from an image or describe / 识别 an image but the current chat model is non-vision (mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model), invoke scripts/ocr.py. Three engines, --engine auto (default) picks in this order — mimo if MIMO_API_KEY set, else tesseract if installed and mode=text, else pollinations:

mimo — needs MIMO_API_KEY, uses mimo-v2.5 regardless of the chat model. Best quality. All modes.
tesseract — no key, no network. Fully local OCR. Auto-used if installed and --mode text. Recommended for users behind GFW or offline. One-time install: brew install tesseract tesseract-lang / sudo apt install tesseract-ocr tesseract-ocr-chi-sim / Windows installer at github.com/UB-Mannheim/tesseract/wiki.
pollinations — free public vision endpoint at text.pollinations.ai, no key required. All modes. But may be unreachable from mainland China — if you see "connection failed (pollinations)", suggest tesseract as the offline alternative.

The proxy silently drops image attachments on non-vision models (src/translate/reqToChat.ts:48-72) and leaves a [N image attachment(s) omitted: …] placeholder. When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation. Don't ask the user to switch models.

# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
python3 mimoskill/scripts/ocr.py path/to/image.png
python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown

# Best quality — set MiMo key, auto picks mimo
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
python3 mimoskill/scripts/ocr.py path/to/image.png

# Force the free engine even when you have a MiMo key (e.g. to save quota)
python3 mimoskill/scripts/ocr.py --engine pollinations form.png

ocr.py accepts local paths, http(s) URLs, data: URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one upstream call. Non-vision --model values are auto-coerced to mimo-v2.5 with one stderr note (mimo engine only; on pollinations use --pollinations-model).

See references/ocr_workflow.md for full mode reference, exit codes, JSON shape for --mode structured, and the --lang / --prompt knobs.

General (non-pet) image generation

For arbitrary image generation, use scripts/generate_image.py — a thin wrapper over generate_pet.py with the chibi-pet prompt boilerplate removed and an optional --style for common looks. Same providers (auto / pollinations / gpt-image-1 / replicate / local-sd), same env vars, same auto fallback to free Pollinations when you only have a MiMo key.

# free, no key
python3 mimoskill/scripts/generate_image.py \
    --prompt "isometric cyberpunk city at dusk" --out /tmp/out.png

# with a style preset
python3 mimoskill/scripts/generate_image.py --style pixel-art \
    --prompt "a brave knight" --out /tmp/knight.png

# multiple variants -> /tmp/img-1.png /tmp/img-2.png /tmp/img-3.png /tmp/img-4.png
python3 mimoskill/scripts/generate_image.py --n 4 \
    --prompt "watercolor desert sunrise" --out /tmp/img.png

# best quality (needs PET_OPENAI_API_KEY — same env var as the pet flow)
export PET_OPENAI_API_KEY=sk-real-openai-key
python3 mimoskill/scripts/generate_image.py --provider gpt-image-1 \
    --prompt "..." --out /tmp/out.png

--style choices: plain (default, no prefix), pixel-art, photo, 3d-render, line-art, watercolor, sticker. plain sends your prompt verbatim — pick that when the user gave a fully-specified prompt.

For Codex /hatch pets keep using generate_pet.py + install_pet.sh — that flow is unchanged and tuned for the chibi sprite + 3-state bundle Codex wants.

Generating a Codex pet (the `/hatch` alternative)

Why this needs special handling: Codex's built-in /hatch pet generation requires OpenAI's image generation API (gpt-image-1). MiMo doesn't have an image generation endpoint, and mimo2codex can't fake one. So /hatch from inside Codex won't work when Codex is pointed at MiMo.

The workaround: generate the pet image outside of Codex, then drop the result into Codex's pet directory and restart Codex. The script supports several image-gen backends:

auto (default) — picks gpt-image-1 if you have an OpenAI key set, otherwise falls back to pollinations.ai (free, no key, no signup). Works with only a MiMo key.
pollinations — free, no key required
gpt-image-1 — best quality, needs a real OpenAI key (separate from MIMO_API_KEY)
replicate — FLUX/SDXL, ~$0.003/img, needs REPLICATE_API_TOKEN
local-sd — Automatic1111/ComfyUI on 127.0.0.1:7860, free, needs local setup

Quickstart (only MiMo key required)

# 1. No OpenAI key, no pip install — just run with the free fallback
python3 mimoskill/scripts/generate_pet.py \
    --description "a chubby cyberpunk axolotl coding hero" \
    --out ~/Downloads/my-pet.png

# 2. Install into Codex's pet folder
bash mimoskill/scripts/install_pet.sh ~/Downloads/my-pet.png "axolotl-coder"

# 3. Restart Codex completely and select the new pet from the pet menu

If the sandbox blocks the network call

Codex's sandbox may prevent the scripts from reaching external APIs (Pollinations, OpenAI, Replicate, MiMo, etc.). When that happens, do not respond with "please install openai" or try to work around the sandbox. Tell the user to run the command in a regular terminal:

I can't reach the network from inside the sandbox. Please run the following in a regular terminal (outside Codex), then tell me when it's done and I'll continue:
python3 mimoskill/scripts/generate_pet.py --description "..." --out /tmp/pet.png
bash mimoskill/scripts/install_pet.sh /tmp/pet.png "<pet-name>"
No pip install is needed — the script uses only the Python standard library.

The same pattern applies for generate_image.py, ocr.py, and mimo_chat.py.

generate_pet.py will print [provider] auto → pollinations so you know the free path is in use.

Optional: better quality with an OpenAI key

If you do want gpt-image-1 quality (and image-to-image edit via --reference):

export PET_OPENAI_API_KEY=sk-real-openai-key  # NOT mimo2codex-local
python3 mimoskill/scripts/generate_pet.py \
    --reference path/to/source-image.jpg \
    --description "a chubby cyberpunk axolotl coding hero" \
    --out ~/Downloads/my-pet.png

auto will pick gpt-image-1 automatically when this env var is set. This OpenAI key is only used for the image generation call — your chat conversations still go through MiMo via mimo2codex.

Step-by-step walkthrough + prompt design

Read references/pet_workflow.md for:

The exact Codex pet folder location on macOS / Linux / Windows
How to make a static image work (most pets are animated GIFs, but a static PNG fallback works)
How to generate animated states (idle / working / done) — typically requires multiple gpt-image-1 calls with edit / remix prompting
How to mix MiMo + image gen: have MiMo write the prompt, then feed that prompt to gpt-image-1

Use the proven pet prompt formula in assets/pet_prompt_template.md — it's tuned for the chibi / sticker style Codex uses.

Image generation in general

If the user wants image generation for some other reason (not a pet), the same workaround applies: gpt-image-1 is the highest-quality option but requires a real OpenAI key. Free alternatives:

Stable Diffusion locally via Automatic1111 or ComfyUI — heavy setup but no per-call cost
Together AI / Replicate — pay-as-you-go for SDXL / FLUX
Pollinations.ai — free, no key required, lower quality

scripts/generate_pet.py defaults to gpt-image-1 but accepts --provider pollinations for the free path (with reduced quality).

Cost notes

Direct MiMo: pay-as-you-go (sk-xxx) or token plan (tp-xxx). See pricing.
Web Search plugin: separately metered per keyword search. Cap with max_keyword.
gpt-image-1: ~$0.04 per 1024×1024 image (low quality), up to ~$0.17 (HD). One pet usually costs <$0.50 even with retries.
Pollinations.ai: free.

Don't use this skill for

Just running mimo2codex (that's an HTTP proxy; this skill is direct API + workarounds). For mimo2codex itself, see the project README.md / README.zh.md.
Configuring Codex (use mimo2codex print-config or mimo2codex print-cc-switch).
Anything Anthropic / Claude — this is MiMo-specific.