name: model-checker description: "Validate a newly supported optimum-intel model with OpenVINO GenAI. Use when: checking new model support, verifying model export to OpenVINO IR, running GenAI inference test with llm_bench, benchmarking model accuracy with who-what-benchmark." argument-hint: "model_id and task (e.g. tencent/HY-MT1.5-1.8B text-generation-with-past)"
Model Checker
Validates that a HuggingFace model exported via optimum-intel works correctly with OpenVINO GenAI pipelines and passes accuracy benchmarks.
When to Use
- A new model was added to optimum-intel and needs GenAI validation
- Verify a HuggingFace model exports to OpenVINO IR and runs inference
- Check model accuracy after conversion using who-what-benchmark
Inputs
The user must provide:
- model_id: HuggingFace model identifier (e.g.
tencent/HY-MT1.5-1.8B) - task: optimum-cli export task. Supported values:
text-generation-with-pastimage-text-to-texttext-to-imageimage-to-imagefeature-extractiontext-classificationtext-to-videoautomatic-speech-recognition
Prerequisites
Ensure the Python virtual environment is activated before running any commands.
- Locate the virtual environment — check for common directories at the repository root:
.venv/,venv/,env/. Uselist_dirto find it. If none is found, ask the user for its location. - Check if already activated: if
which pythonorwhere pythonpoints inside the virtual environment, it's already activated. If not, proceed to activate it. - Activate based on the current platform:
- Linux/macOS:
source <venv_path>/bin/activate - Windows (cmd):
<venv_path>\Scripts\activate.bat - Windows (PowerShell):
<venv_path>\Scripts\Activate.ps1
- Linux/macOS:
- The background terminal doesn't inherit the venv activation. Run it with the venv activated in the same command.
Procedure
Step 1: Run check_model.py
Run the checker script from the repository root:
python3 .github/skills/model-checker/scripts/check_model.py \
--model-id <model_id> \
--task <export_task> \
--work-dir .model_enabler/model_checker
Run python3 .github/skills/model-checker/scripts/check_model.py --help for the full argument reference including defaults. The --work-dir is where all intermediate files, logs, and outputs will be stored. Do not pipe with any additional logging or redirection — the script handles its own logging.
Skip flags (for re-runs after a fix)
When a previous run already passed some steps (e.g. export succeeded but inference test failed), use skip flags to avoid repeating expensive passed steps:
--skip-export— reuse existing IR in<work-dir>/model_irinstead of re-exporting (avoids re-downloading weights)--skip-llm-bench— skip the llm_bench inference test--skip-wwb— skip the who-what-benchmark accuracy check
Do not use skip flags on the first run. Only use them when retrying after a targeted fix.
Step 2: Interpret Results
The script logs progress for each step and exits with code 0 (pass) or non-zero (fail).
Pass criteria:
Export: exit code 0
Inference test (llm_bench): exit code 0, metrics line logged
WWB accuracy (three sub-steps, all must pass):
- HF ground truth generation: exit code 0
- Optimum target evaluation: similarity ≥
SIMILARITY_THRESHOLD - GenAI target evaluation: similarity ≥
SIMILARITY_THRESHOLD
Note: the WWB step is skipped automatically for
automatic-speech-recognition(no WWB support).
Log files: each tool writes its own dedicated log; paths are printed during execution. When a step fails, read the corresponding log for the full traceback and context before drawing any conclusions.
work-dir: work-dir is in current workspace, prefer to use tool calls to access logs and outputs instead of custom bash commands.
Step 3: Report Results
Results format:
- Model:
<model_id>(<task>) - Validation: PASSED / FAILED
- Performance (if passed):
- 1st token latency, 2nd token latency, throughput
- Optimum similarity / GenAI similarity (if applicable)
- Logs: paths to export log, llm_bench log, WWB logs
- Failed step analysis (if failed): summary of the failure and relevant log path for details
Security
- NEVER install any packages. Assume the environment is pre-configured.
- NEVER invoke
optimum-cli,wwb, orllm_benchdirectly. Always go throughcheck_model.py. - NEVER modify
model_id— pass it exactly as provided by the user.