name: llm-bench-fail-analyzer description: "Analyze failed llm_bench execution results for a model. Use when: checking llm_bench log, troubleshooting llm_bench fails." argument-hint: "model_dir and log_info (e.g. /path/to/tencent_HY-MT1.5-1.8B /path/to/logs/for/llm_bench)"
LLM Bench Fail Analyzer
Analyzes failed llm bench results for models, which were run with transformers, optimum-intel or GenAI backends; Provides fixes or insights for troubleshooting.
When to Use
- llm_bench fails during execution pipeline with the model
Inputs
The user must provide:
- model_dir: path to the directory with the model (e.g.
/path/to/tencent_HY-MT1.5-1.8B) - log_info: path to the folder containing the failed llm_bench log files or llm_bench log file or execution output
Code Structure Reference
When analyzing failures and implementing fixes, refer to the following key locations in the codebase:
Main benchmark script:
tools/llm_bench/benchmark.py- Entry point for benchmarking, command-line argument parsing, and task orchestration
Model execution pipeline implementations:
tools/llm_bench/task/text_generation.py- Text generation benchmarking for LLMstools/llm_bench/task/image_generation.py- Image generation(text to image, image to image, inpainting) benchmarkingtools/llm_bench/task/visual_language_generation.py- VLM benchmarking for multimodal modelstools/llm_bench/task/video_generation.py- Video generation benchmarkingtools/llm_bench/task/super_resolution_generation.py- Super resolution benchmarkingtools/llm_bench/task/speech_to_text_generation.py- Speech-to-text (ASR) benchmarkingtools/llm_bench/task/text_to_speech_generation.py- Text-to-speech (TTS) benchmarkingtools/llm_bench/task/text_embeddings.py- Text embedding model benchmarkingtools/llm_bench/task/text_reranker.py- Text reranking model benchmarkingtools/llm_bench/task/pipeline_utils.py- Common pipeline utilities and base classes for all tasks
Core utilities:
tools/llm_bench/llm_bench_utils/config_class.py- Configuration classes, model class definitions, attention backend settingstools/llm_bench/llm_bench_utils/model_utils.py- Model utility functions: parameter loading, config parsing, precision handlingtools/llm_bench/llm_bench_utils/ov_utils.py- OpenVINO model creation and management (GenAI, optimum-intel)tools/llm_bench/llm_bench_utils/pt_utils.py- PyTorch model creation and torch.compile supporttools/llm_bench/llm_bench_utils/ov_model_classes.py- Custom OpenVINO model classes (OVMPTModel, OVChatGLMModel, etc.)tools/llm_bench/llm_bench_utils/prompt_utils.py- Prompt loading, preprocessing for text/image/video inputstools/llm_bench/llm_bench_utils/parse_json_data.py- JSON data parsing for prompts and configurationstools/llm_bench/llm_bench_utils/get_use_case.py- Use case detection and configuration
Model wrappers for performance measurement with transformers/optimum-intel:
tools/llm_bench/llm_bench_utils/hook_forward.py- hooks for image generation, RAG and TTS pipelinestools/llm_bench/llm_bench_utils/hook_greedy_search.py- Greedy sampling hookstools/llm_bench/llm_bench_utils/hook_beam_search.py- Beam search sampling hookstools/llm_bench/llm_bench_utils/hook_common.py- determination of the required hooktools/llm_bench/llm_bench_utils/hook_forward_whisper.py- ASR forward hooktools/llm_bench/llm_bench_utils/llm_hook_sample/*.py- version-specific greedy hook implementations for different transformers versions (v4_43, v4_45, v4_51, v4_52, v4_55, v4_57, v5, v5_3)tools/llm_bench/llm_bench_utils/llm_hook_beam_search/*.py- version-specific beam search hook implementations for different transformers versions (v4_43, v4_45, v4_51, v4_52, v4_55, v4_57, v5, v5_3)
Output and reporting:
tools/llm_bench/llm_bench_utils/metrics_print.py- Metrics printing and logging to consoletools/llm_bench/llm_bench_utils/output_json.py- JSON output generationtools/llm_bench/llm_bench_utils/output_csv.py- CSV output generationtools/llm_bench/llm_bench_utils/output_file.py- construct file name and save output to filetools/llm_bench/llm_bench_utils/gen_output_data.py- convert output data to dict format
Use this reference throughout Steps 1-3 when analyzing logs and identifying where to implement fixes.
Procedure
Step 1: Analyze the logs
If the logs for llm_bench don't contain failures, proceed to Step 2. Otherwise, follow the next steps:
- Read the corresponding log for the full traceback and context.
- Analyze the failure root cause from the log.
- Define whether fail relates to llm_bench or backend/model.
- If it's a llm_bench bug/limitation, implement necessary fixes to llm_bench tool. Use the Code Structure Reference to locate the exact functions to modify. Follow OpenVINO GenAI coding guidelines from
.github/copilot-instructions.md. Ensure changes don't break existing functionality. Add appropriate error messages and logging. Test changes by re-running llm_bench tool with corresponding cmd parameters. - If it's a model issue or backend limitation, provide description in the report.
Step 2: Report Results
Add the results for each llm_bench run to the logs. Results format:
Model Information
- Model dir:
<model_dir> - Task:
<task> - Status: PASSED / FAILED
- Log:
- Issue:
- Error message:
- Possible fix:
- LLM Bench modification: modified / nothing changed
- After fix status: <PASSED / FAILED if failed>
Security
- NEVER install any packages. Assume the environment is pre-configured.
- NEVER modify
model_dir— pass it exactly as provided by the user.