name: minicpm5-finetune description: Pick the right fine-tuning framework for a MiniCPM5-1B base checkpoint and route to a framework-specific cookbook skill. Use when the user wants to SFT / LoRA / DPO / continue-pretrain MiniCPM5 and has not yet committed to a specific framework, or when they say "fine-tune MiniCPM5", "train MiniCPM5", "MiniCPM5 微调", "LoRA MiniCPM5", "继续训练 MiniCPM5".
Fine-tune MiniCPM5-1B — framework router
You're being asked to fine-tune MiniCPM5-1B. Pick exactly one framework skill below and invoke that skill rather than improvising — every framework has at least one MiniCPM5-specific gotcha that the dedicated skill knows about.
1. Required input from the user
| Variable | Example | Notes |
|---|---|---|
BASE_MODEL |
HF id openbmb/MiniCPM5-1B (post-release) or a local path |
see cluster paths below |
DATA |
path to JSONL in messages format | [{"messages": [{"role":"user","content":"..."}, {"role":"assistant","content":"..."}]}] |
OUTPUT_DIR |
where to write checkpoints | mkdir if missing |
| Goal | "LoRA SFT" / "full SFT" / "DPO" / "QLoRA on consumer GPU" / "continue-pretrain at scale" | drives skill choice |
| Hardware | 1× GPU / multi-node | drives skill choice |
Default base model
If BASE_MODEL is not pinned, default to the Hugging Face fp16 release:
openbmb/MiniCPM5-1B
Any local directory containing config.json + model.safetensors + tokenizer.json also works.
2. Decision matrix — pick exactly one
| User says / wants | Best fit | → Skill to invoke |
|---|---|---|
| YAML / WebUI driven SFT, broad community support | LLaMA-Factory | minicpm5-finetune-llamafactory |
| ChatML template + ModelScope-native SFT/DPO/KTO/ORPO | ms-swift | minicpm5-finetune-ms-swift |
| Bare-metal Python, assistant-only loss, minimal abstractions | TRL + PEFT | minicpm5-finetune-trl |
| Single-GPU LoRA / QLoRA, tight VRAM (24 GB or less) | unsloth | minicpm5-finetune-unsloth |
| mmengine config-driven SFT, OpenMMLab stack | xtuner | minicpm5-finetune-xtuner |
Decision shortcuts
- First time fine-tuning MiniCPM5: pick
minicpm5-finetune-llamafactory— most documented, fewest surprises. - Need DPO / KTO / ORPO: pick
minicpm5-finetune-ms-swift(best out-of-the-box) orminicpm5-finetune-trl(most control). - Single 24 GB consumer GPU: pick
minicpm5-finetune-unslothwithload_in_4bit=True. - Target is llama.cpp / Ollama / LM Studio / MiniCPM Desk Pet (GGUF): train with any skill above, then convert the adapter with
minicpm5-finetune-gguf-lora(PEFT adapter → GGUF LoRA).
3. Invocation contract
Each sub-skill expects BASE_MODEL, DATA, OUTPUT_DIR and outputs an LoRA adapter (or full checkpoint) at OUTPUT_DIR/. Read the picked sub-skill in full before running — every framework has at least one MiniCPM5-specific gotcha:
| Framework | Gotcha (skill handles it for you) |
|---|---|
| LLaMA-Factory | template: empty (delegate to model's own jinja, NOT template: llama3) |
| ms-swift | mandatory --model_type llama --template chatml flags |
| TRL | training-only chat template patch for assistant_only_loss=True |
| unsloth | transformers==4.57.3 pin if vLLM is in the same env |
| xtuner | prompt_template=PROMPT_TEMPLATE.qwen_chat (ChatML), use openai_map_fn for messages-format data, start_factor ≥ 1e-2 |
4. Universal sanity check after training
Regardless of framework:
# 1. Verify adapter (or full ckpt) was saved
ls "$OUTPUT_DIR" # adapter_model.safetensors + adapter_config.json (LoRA) OR a full HF directory
# 2. Quick inference check (HF-side, works for both LoRA and full)
python -c "
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained('$BASE_MODEL', torch_dtype=torch.bfloat16, device_map='auto').eval()
model = PeftModel.from_pretrained(base, '$OUTPUT_DIR').eval() # skip this line for full SFT
tok = AutoTokenizer.from_pretrained('$BASE_MODEL')
inputs = tok.apply_chat_template([{'role':'user','content':'1+1=?'}], add_generation_prompt=True, enable_thinking=False, return_tensors='pt').to(model.device)
out = model.generate(inputs, max_new_tokens=32, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
"
Expected: a coherent answer (e.g. "2"). If it's gibberish or empty, training broke (most likely chat template misalignment — ⮕ check the framework's gotcha row above).
5. Don't reinvent: link to the cookbook
Each sub-skill is paired with a one-page cookbook in docs/finetune/. The skill is the machine-readable shortcut; the cookbook is the human-readable reference.
6. Next step — deploying the adapter
The adapter at OUTPUT_DIR/ is a PEFT adapter (adapter_model.safetensors). How you ship it depends on the runtime:
- vLLM / transformers / SGLang (fp16 HF base): load the PEFT adapter directly — no conversion.
- llama.cpp / Ollama / LM Studio / MiniCPM Desk Pet (GGUF base): convert it to a GGUF LoRA first with
minicpm5-finetune-gguf-lora, then--lora adapter.gguf(or upload it in the Desk Pet app).