name: llava-onevision2-consistency description: Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings compatibility: opencode metadata: domain: model-validation framework: llava-onevision2 repo: llava-onevision2
Purpose / 用途
Use this skill when validating whether a HuggingFace checkpoint and a Megatron/MCore checkpoint are behaviorally consistent in this repository.
在这个仓库里,需要验证 HuggingFace checkpoint 和 Megatron/MCore checkpoint 是否行为一致时,使用这个 skill。
There are two test systems in this repo:
本仓库有两套测试系统:
1. pytest test suite (recommended / 推荐)
tests/consistency/conftest.py— session fixtures, HF→mcore conversion, Megatron initializationtests/consistency/test_model_consistency.py— 6 integration teststests/consistency/test_consistency_utils.py— 10 utility functions + 11 unit teststests/consistency/run_consistency_tests.sh— shell wrapper with auto-conversion + torchruntests/consistency/conftest.py—— session 级 fixture、HF→mcore 转换、Megatron 初始化tests/consistency/test_model_consistency.py—— 6 个集成测试tests/consistency/test_consistency_utils.py—— 10 个工具函数 + 11 个单元测试tests/consistency/run_consistency_tests.sh—— shell 入口,自动转换 + torchrun
2. Legacy monolithic script (reference only / 仅供参考)
examples/llava_onevision2/check_model_consistency.shexamples/llava_onevision2/check_model_consistency.py
仅作历史参考,新的工作请用 pytest 套件。
Architecture / 架构
Direction: HF → mcore
The pytest suite assumes only the HF checkpoint exists as input. The mcore checkpoint is generated automatically via conversion.
pytest 测试套件假设 只有 HF checkpoint 作为输入。mcore checkpoint 通过转换 自动生成。
HF auto-model (input)
→ convert_4b_hf_to_mcore.sh (auto-run by conftest.py or run_consistency_tests.sh)
→ mcore checkpoint (generated)
→ both models loaded → 6 tests run
Direction: mcore → HF (reverse / deploy / round-trip) / 反向:mcore → HF(部署 / 回环)
The pytest suite does not exercise the reverse path. For the p14m2 variant, two scripts ship for this:
| Script | Use case |
|---|---|
examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh |
Single mcore→HF pass (deploy, inference debug) |
examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh |
Re-shard mcore via HF round-trip (change TP/PP without retraining) |
# mcore → HF (auto-detects /release subdir; pass either form)
bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh \
/train_tmp/llava_onevision2_4b_p14m2_mcore_tp1pp1 \
/train_tmp/llava_onevision2_4b_p14m2_hf_out \
1 1
# Re-shard: mcore TP=1 PP=1 → mcore TP=2 PP=4 (round-trips through HF)
bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh \
/src_mcore_tp1pp1 /dst_mcore_tp2pp4 2 4 0,12,12,12
Round-trip correctness (TP=1 PP=1, verified 2026-05-25):
mcore → HF → mcore is bitwise identical to the original mcore checkpoint
(588 non-empty tensors compared, max abs diff = 0.000e+00, 0 shape mismatches,
0 missing keys). This is the strongest correctness guarantee for the reverse
path. Use this whenever changing TP/PP layout without retraining.
pytest 套件 不覆盖 反向路径。p14m2 variant 提供两个脚本(单次 mcore→HF
用于部署/推理 debug,mcore→release 用于通过 HF 中转改 TP/PP 切分)。
回环 mcore→HF→mcore 在 TP=1 PP=1 下与原始 mcore 逐位一致(588 个非空 tensor,
max abs diff = 0.000e+00,0 形状不匹配,0 缺失键,2026-05-25 验证)。
在不重训的前提下改 TP/PP layout 时使用回环。
Path convention note:
convert_4b_p14m2_mcore_to_hf.shauto-detects<load>/release— pass either the parent dir or the explicit release path. Sibling scripts (4b, p14m3, p16m3, 8b, 30b) still require explicit/release.路径约定说明:
convert_4b_p14m2_mcore_to_hf.sh会自动检测<load>/release— 父目录或显式 release 路径都可以传。Sibling 脚本仍要求显式/release。
Test file structure / 测试文件结构
tests/consistency/
├── __init__.py # empty package init / 空包初始化
├── conftest.py # 9 session fixtures (209 lines) / 9 个 session fixture(209 行)
├── test_consistency_utils.py # 10 utilities + 11 unit tests (373 lines, DO NOT MODIFY) / 10 个工具 + 11 个单元测试(373 行,不要改)
├── test_model_consistency.py # 6 integration tests (402 lines) / 6 个集成测试(402 行)
└── run_consistency_tests.sh # shell wrapper (60 lines) / shell 入口(60 行)
Fixtures in conftest.py / conftest.py 中的 fixtures
下表列出 conftest.py 暴露的 session 级 fixture,及其用途和默认来源:
| Fixture | Scope | Description |
|---|---|---|
hf_model_path |
session | HF auto-model directory (env: HF_MODEL_PATH) |
converted_mcore_path |
session | Auto-converts HF→mcore if MCORE_CHECKPOINT_PATH not set |
preprocessor_path |
session | Processor path (defaults to HF_MODEL_PATH) |
test_image_path |
session | Local test image (default: asset/performance.png) |
megatron_init |
session | Initializes Megatron via sys.argv override |
hf_config |
session | LlavaOnevision2Config.from_pretrained() |
hf_vision_model |
session | LlavaOnevision2Model.from_pretrained().visual on cuda bf16 |
hf_cond_gen_model |
session | LlavaOnevision2ForConditionalGeneration on cuda bf16 |
mcore_model |
session | Megatron get_model() + load_checkpoint() |
hf_processor |
session | AutoProcessor.from_pretrained() |
What the 6 tests check / 6 个测试检查什么
test_weight_consistency (fast)
Compares all mapped weights between HF and mcore vision models:
比较 HF 和 mcore 视觉模型之间所有映射权重:
- Patch embedding (conv weight + bias) / patch embedding(卷积 weight + bias)
- Class embedding / class embedding
- Pre/post layer norms / 前/后 layer norm
- Per-layer (24 layers): QKV weight/bias, projection, MLP fc1/fc2, layer norms / 每层(24 层):QKV weight/bias、projection、MLP fc1/fc2、layer norms
- QKV layout conversion via
convert_hf_qkv_to_mcore_layout(interleaved Q/K/V per head) / 通过convert_hf_qkv_to_mcore_layout做 QKV 布局转换(每 head 交织 Q/K/V) - TP-aware gathering via
_maybe_gather_tp_weight/ 通过_maybe_gather_tp_weight做 TP-aware gather - Threshold: cosine > 0.9999 / 阈值:cosine > 0.9999
test_vision_encoder_consistency_336px (fast)
Compares forward_debug outputs at 4 strategic points:
在 4 个关键点比较 forward_debug 输出:
after_patch_embed— patch embedding output / patch embedding 输出rotary_pos_emb— rotary position embedding (aligned viaalign_rotary_debug_tensors) / 旋转位置编码(通过align_rotary_debug_tensors对齐)after_pre_layernorm— after pre-layernorm / 经过 pre-layernorm 之后before_adapter— final encoder output before adapter / 进入 adapter 之前的最终 encoder 输出- Threshold: cosine > 0.99 / 阈值:cosine > 0.99
test_mllm_after_merger_336px (fast)
Compares vision + adapter pipeline output:
比较视觉 + adapter pipeline 输出:
- HF:
forward_debug['after_merger']/ HF:forward_debug['after_merger'] - mcore:
vision_model()→adapter()/ mcore:vision_model()→adapter() - Threshold: cosine > 0.99 / 阈值:cosine > 0.99
test_encoder_layer_wise_consistency (slow)
Layer-by-layer comparison of all 24 encoder layers:
逐层比较所有 24 个 encoder 层:
layer_{i}_inputandlayer_{i}_outputfor each layer / 每层的layer_{i}_input和layer_{i}_outputinput_hidden_states— initial encoder input / 初始 encoder 输入final_output— final encoder output / 最终 encoder 输出- Uses
align_encoder_debug_tensorsfor shape alignment / 用align_encoder_debug_tensors做形状对齐 - Threshold: cosine > 0.99 / 阈值:cosine > 0.99
test_llm_output_consistency (slow)
End-to-end LLM logits comparison:
端到端 LLM logits 比较:
- Loads
LlavaOnevision2ForConditionalGeneration(HF) and full mcore model / 加载 HF 的LlavaOnevision2ForConditionalGeneration和完整 mcore 模型 - Tokenizes prompt with image, runs forward pass on both / 用图像 tokenize prompt,两边都跑 forward
- Compares output logits / 比较输出 logits
- Threshold: cosine > 0.99 / 阈值:cosine > 0.99
test_hf_loading_consistency (slow)
Validates HF model loading methods are equivalent:
验证 HF 模型加载方式等价:
from_pretrained()vs manualload_file()from safetensors /from_pretrained()对比从 safetensors 手动load_file()- Compares all vision weights (exact match via
np.allclose) / 比较所有 vision 权重(用np.allclose做精确匹配) - Compares
forward_debugoutputs (cosine > 0.9999) / 比较forward_debug输出(cosine > 0.9999)
Environment variables / 环境变量
| Variable | Default | Description |
|---|---|---|
HF_MODEL_PATH |
<path/to/hf_checkpoint> |
HF checkpoint (the only required input) |
MCORE_CHECKPOINT_PATH |
(auto-generated) | Set to skip conversion |
PREPROCESSOR_PATH |
$HF_MODEL_PATH |
Image processor path |
TEST_IMAGE_PATH |
$REPO_ROOT/asset/performance.png |
Local test image |
CONSISTENCY_TEST_TP |
1 |
Tensor parallel size |
CONSISTENCY_TEST_PP |
1 |
Pipeline parallel size |
AIAK_TRAINING_PATH |
$REPO_ROOT |
AIAK training framework root |
AIAK_MAGATRON_PATH |
$REPO_ROOT/aiak_megatron |
AIAK Megatron path |
MASTER_PORT |
29500 |
Distributed master port |
How to run / 怎么跑
All Python must run inside the container llava_megatron_container_ax.
所有 Python 必须在容器 llava_megatron_container_ax 内运行。
Quick: run non-slow tests with auto-conversion / 快速:跑非 slow 测试 + 自动转换
# Inside container, from repo root:
# 在容器内、仓库根目录执行:
bash tests/consistency/run_consistency_tests.sh
Run all tests including slow / 跑全部测试(含 slow)
bash tests/consistency/run_consistency_tests.sh -m ""
Custom TP/PP / 自定义 TP/PP
TP=2 PP=1 MASTER_PORT=29501 bash tests/consistency/run_consistency_tests.sh
Skip conversion (pre-existing mcore checkpoint) / 跳过转换(已有 mcore checkpoint)
MCORE_CHECKPOINT_PATH=/path/to/existing bash tests/consistency/run_consistency_tests.sh
Run only unit tests (no GPU needed, works on host) / 只跑单元测试(不需要 GPU,host 上也能跑)
pytest tests/consistency/test_consistency_utils.py -v
Run specific integration test / 跑指定的集成测试
bash tests/consistency/run_consistency_tests.sh -k test_weight_consistency
What run_consistency_tests.sh does / run_consistency_tests.sh 做了什么
- Validates
HF_MODEL_PATHandTEST_IMAGE_PATHexist / 校验HF_MODEL_PATH和TEST_IMAGE_PATH存在 - If
MCORE_CHECKPOINT_PATHis empty, runsconvert_4b_hf_to_mcore.shto generate it / 如果MCORE_CHECKPOINT_PATH为空,跑convert_4b_hf_to_mcore.sh生成 - Exports all env vars for conftest.py / 为 conftest.py 导出所有环境变量
- Sets
PYTHONPATHto includetransformers_impl/llavaonevision2,aiak_megatron, repo root / 把transformers_impl/llavaonevision2、aiak_megatron、仓库根目录加入PYTHONPATH - Launches
torchrun --nproc_per_node=$((TP*PP))with pytest / 用torchrun --nproc_per_node=$((TP*PP))启动 pytest
What conftest.py does for Megatron init / conftest.py 如何初始化 Megatron
Since pytest has its own arg parsing, Megatron CLI args can't be passed via command line. The solution:
由于 pytest 有自己的参数解析,Megatron CLI 参数不能通过命令行传递。解决方案:
- Shell script exports env vars (
HF_MODEL_PATH,MCORE_CHECKPOINT_PATH,CONSISTENCY_TEST_TP/PP, etc.) / shell 脚本导出环境变量(HF_MODEL_PATH、MCORE_CHECKPOINT_PATH、CONSISTENCY_TEST_TP/PP等) conftest.pyreads env vars, temporarily overridessys.argvwith constructed Megatron CLI args /conftest.py读取环境变量,临时把sys.argv替换成构造好的 Megatron CLI 参数- Calls
parse_arguments()+initialize_aiak_megatron()inside the override / 在替换期内调用parse_arguments()+initialize_aiak_megatron() - Restores
sys.argvafterward / 完事后恢复sys.argv
How to interpret failures / 如何解读失败
Priority order for diagnosis / 诊断优先顺序
- test_weight_consistency — If this fails, all other tests are unreliable / 这个挂了,其他测试都不可信
- test_vision_encoder_consistency_336px — Strategic checkpoint comparison / 关键 checkpoint 点比较
- test_mllm_after_merger_336px — Vision + adapter pipeline health / 视觉 + adapter pipeline 健康度
- test_encoder_layer_wise_consistency — May fail due to debug alignment, not real bugs / 可能因 debug 对齐问题失败,未必是真 bug
- test_llm_output_consistency — Full end-to-end, most sensitive to any discrepancy / 完整端到端,对任何偏差最敏感
- test_hf_loading_consistency — HF-only test, independent of mcore / 仅 HF 的测试,与 mcore 无关
Common failure causes / 常见失败原因
| Symptom | Likely Cause | Fix |
|---|---|---|
| weight_consistency fails on QKV | QKV layout conversion bug | Check convert_hf_qkv_to_mcore_layout for num_heads |
| weight_consistency fails on many keys | Wrong model / TP/PP mismatch | Verify HF_MODEL_PATH and conversion TP/PP |
| vision_encoder rotary_pos_emb fails | Debug tensor shape mismatch | Check align_rotary_debug_tensors — HF (1,S,64) vs mcore (S,32) |
| encoder_layer_wise late layers fail | Debug capture timing / layout | Usually not a real model bug if weight + merger pass |
| llm_output shape mismatch | Wrong tokenization or attention mask | Check prompt formatting and attention_mask.logical_not() |
| Megatron init fails | Wrong CLI args | Check _build_megatron_cli_args in conftest.py |
| Conversion fails | Missing AIAK_TRAINING_PATH |
Export it before running |
Key weight mapping / 关键权重映射
| HF Key | mcore Key |
|---|---|
embeddings.patch_embedding |
patch_embed.proj |
embeddings.class_embedding |
class_embedding |
layernorm_pre/post |
pre_layernorm/post_layernorm |
encoder.layers.{i}.layer_norm1 |
decoder.layers.{i}.self_attention.linear_qkv.layer_norm |
encoder.layers.{i}.self_attn.qkv |
decoder.layers.{i}.self_attention.linear_qkv |
encoder.layers.{i}.self_attn.proj |
decoder.layers.{i}.self_attention.linear_proj |
encoder.layers.{i}.layer_norm2 |
decoder.layers.{i}.mlp.linear_fc1.layer_norm |
encoder.layers.{i}.mlp.fc1/fc2 |
decoder.layers.{i}.mlp.linear_fc1/fc2 |
QKV weights need layout conversion: HF stores [Q_all, K_all, V_all], mcore stores interleaved [Q_h0, K_h0, V_h0, Q_h1, K_h1, V_h1, ...].
QKV 权重需要布局转换:HF 存储 [Q_all, K_all, V_all],mcore 存储交织的 [Q_h0, K_h0, V_h0, Q_h1, K_h1, V_h1, ...]。
Known repo-local lessons / 当前仓库已知经验
1. Rotary debug representation must be aligned
HF and Megatron expose different rotary_pos_emb debug shapes:
HF 和 Megatron 暴露不同形状的 rotary_pos_emb debug 张量:
- HF:
(1, S, 64) - Megatron:
(S, 32)
The align_rotary_debug_tensors function handles this by squeezing batch dim and concatenating mcore's half-dim.
align_rotary_debug_tensors 函数通过去掉 batch 维度并拼接 mcore 的半维度来处理。
2. PP-aware testing is necessary
When PP > 1, not every pipeline stage owns vision_model, adapter, or decoder post-process outputs. Tests must skip non-owner stages.
当 PP > 1 时,不是每个 pipeline stage 都拥有 vision_model、adapter 或 decoder 后处理输出。测试必须跳过非 owner stage。
3. TP-aware weight comparison is necessary
When TP > 1, use _maybe_gather_tp_weight to gather shards before comparison. It gathers along first dim for QKV/FC1, last dim for proj/FC2.
当 TP > 1 时,用 _maybe_gather_tp_weight 在比较前 gather shards。QKV/FC1 沿第一维 gather,proj/FC2 沿最后一维。
4. HF and mcore use the same pixel value 2x2 memory layout
No pixel value conversion is needed between HF and mcore models.
HF 和 mcore 模型使用相同的 2x2 内存布局,无需转换 pixel values。
5. Encoder-layer-wise failures may be debug-layout issues
If weight_consistency + merger pass but encoder_layer_wise fails in late layers, suspect debug capture semantics rather than real model bugs.
如果 weight_consistency + merger 通过但 encoder_layer_wise 在后面层失败,优先怀疑 debug 捕获语义而非模型真错。
Minimal troubleshooting checklist / 最小排查清单
If the run fails, check in this order:
如果运行失败,按以下顺序排查:
Is the container running?
docker exec -it llava_megatron_container_ax bashDoes
HF_MODEL_PATHexist and contain safetensors files?Did the HF→mcore conversion succeed? Check stderr output.
Does the container have enough GPUs for
TP * PP?Is
MASTER_PORTalready in use? Try a different port.Did
test_weight_consistencyfail? → Fix this first before investigating other tests.Is the failure in a
@pytest.mark.slowtest? → Run fast tests first with default marker filter.容器是否在运行?
docker exec -it llava_megatron_container_ax bashHF_MODEL_PATH是否存在且包含 safetensors 文件?HF→mcore 转换是否成功?检查 stderr 输出。
容器 GPU 数量是否满足
TP * PP?MASTER_PORT是否被占用?换一个端口试试。test_weight_consistency是否失败?→ 先修这个再看其他测试。失败的是否是
@pytest.mark.slow测试?→ 先用默认 marker 跑 fast 测试。