llm-bench-fail-analyzer

star 530

Analyze failed llm_bench execution results for a model. Use when: checking llm_bench log, troubleshooting llm_bench fails.

openvinotoolkit By openvinotoolkit schedule Updated 6/9/2026

name: llm-bench-fail-analyzer description: "Analyze failed llm_bench execution results for a model. Use when: checking llm_bench log, troubleshooting llm_bench fails." argument-hint: "model_dir and log_info (e.g. /path/to/tencent_HY-MT1.5-1.8B /path/to/logs/for/llm_bench)"

LLM Bench Fail Analyzer

Analyzes failed llm bench results for models, which were run with transformers, optimum-intel or GenAI backends; Provides fixes or insights for troubleshooting.

When to Use

  • llm_bench fails during execution pipeline with the model

Inputs

The user must provide:

  • model_dir: path to the directory with the model (e.g. /path/to/tencent_HY-MT1.5-1.8B)
  • log_info: path to the folder containing the failed llm_bench log files or llm_bench log file or execution output

Code Structure Reference

When analyzing failures and implementing fixes, refer to the following key locations in the codebase:

Main benchmark script:

  • tools/llm_bench/benchmark.py - Entry point for benchmarking, command-line argument parsing, and task orchestration

Model execution pipeline implementations:

  • tools/llm_bench/task/text_generation.py - Text generation benchmarking for LLMs
  • tools/llm_bench/task/image_generation.py - Image generation(text to image, image to image, inpainting) benchmarking
  • tools/llm_bench/task/visual_language_generation.py - VLM benchmarking for multimodal models
  • tools/llm_bench/task/video_generation.py - Video generation benchmarking
  • tools/llm_bench/task/super_resolution_generation.py - Super resolution benchmarking
  • tools/llm_bench/task/speech_to_text_generation.py - Speech-to-text (ASR) benchmarking
  • tools/llm_bench/task/text_to_speech_generation.py - Text-to-speech (TTS) benchmarking
  • tools/llm_bench/task/text_embeddings.py - Text embedding model benchmarking
  • tools/llm_bench/task/text_reranker.py - Text reranking model benchmarking
  • tools/llm_bench/task/pipeline_utils.py - Common pipeline utilities and base classes for all tasks

Core utilities:

  • tools/llm_bench/llm_bench_utils/config_class.py - Configuration classes, model class definitions, attention backend settings
  • tools/llm_bench/llm_bench_utils/model_utils.py - Model utility functions: parameter loading, config parsing, precision handling
  • tools/llm_bench/llm_bench_utils/ov_utils.py - OpenVINO model creation and management (GenAI, optimum-intel)
  • tools/llm_bench/llm_bench_utils/pt_utils.py - PyTorch model creation and torch.compile support
  • tools/llm_bench/llm_bench_utils/ov_model_classes.py - Custom OpenVINO model classes (OVMPTModel, OVChatGLMModel, etc.)
  • tools/llm_bench/llm_bench_utils/prompt_utils.py - Prompt loading, preprocessing for text/image/video inputs
  • tools/llm_bench/llm_bench_utils/parse_json_data.py - JSON data parsing for prompts and configurations
  • tools/llm_bench/llm_bench_utils/get_use_case.py - Use case detection and configuration

Model wrappers for performance measurement with transformers/optimum-intel:

  • tools/llm_bench/llm_bench_utils/hook_forward.py - hooks for image generation, RAG and TTS pipelines
  • tools/llm_bench/llm_bench_utils/hook_greedy_search.py - Greedy sampling hooks
  • tools/llm_bench/llm_bench_utils/hook_beam_search.py - Beam search sampling hooks
  • tools/llm_bench/llm_bench_utils/hook_common.py - determination of the required hook
  • tools/llm_bench/llm_bench_utils/hook_forward_whisper.py - ASR forward hook
  • tools/llm_bench/llm_bench_utils/llm_hook_sample/*.py - version-specific greedy hook implementations for different transformers versions (v4_43, v4_45, v4_51, v4_52, v4_55, v4_57, v5, v5_3)
  • tools/llm_bench/llm_bench_utils/llm_hook_beam_search/*.py - version-specific beam search hook implementations for different transformers versions (v4_43, v4_45, v4_51, v4_52, v4_55, v4_57, v5, v5_3)

Output and reporting:

  • tools/llm_bench/llm_bench_utils/metrics_print.py - Metrics printing and logging to console
  • tools/llm_bench/llm_bench_utils/output_json.py - JSON output generation
  • tools/llm_bench/llm_bench_utils/output_csv.py - CSV output generation
  • tools/llm_bench/llm_bench_utils/output_file.py - construct file name and save output to file
  • tools/llm_bench/llm_bench_utils/gen_output_data.py - convert output data to dict format

Use this reference throughout Steps 1-3 when analyzing logs and identifying where to implement fixes.

Procedure

Step 1: Analyze the logs

If the logs for llm_bench don't contain failures, proceed to Step 2. Otherwise, follow the next steps:

  • Read the corresponding log for the full traceback and context.
  • Analyze the failure root cause from the log.
  • Define whether fail relates to llm_bench or backend/model.
  • If it's a llm_bench bug/limitation, implement necessary fixes to llm_bench tool. Use the Code Structure Reference to locate the exact functions to modify. Follow OpenVINO GenAI coding guidelines from .github/copilot-instructions.md. Ensure changes don't break existing functionality. Add appropriate error messages and logging. Test changes by re-running llm_bench tool with corresponding cmd parameters.
  • If it's a model issue or backend limitation, provide description in the report.

Step 2: Report Results

Add the results for each llm_bench run to the logs. Results format:

Model Information

  • Model dir: <model_dir>
  • Task: <task>
  • Status: PASSED / FAILED
  • Log:
  • Issue:
  • Error message:
  • Possible fix:
  • LLM Bench modification: modified / nothing changed
  • After fix status: <PASSED / FAILED if failed>

Security

  • NEVER install any packages. Assume the environment is pre-configured.
  • NEVER modify model_dir — pass it exactly as provided by the user.
Install via CLI
npx skills add https://github.com/openvinotoolkit/openvino.genai --skill llm-bench-fail-analyzer
Repository Details
star Stars 530
call_split Forks 406
navigation Branch main
article Path SKILL.md
More from Creator
openvinotoolkit
openvinotoolkit Explore all skills →