inferencex-report

star 99

Automatically fetch InferenceX benchmark data and generate daily performance reports for LLM inference on various hardware (NVIDIA, AMD, etc.). Supports email delivery, data change detection, and 8k1k sequence length performance analysis. Use when needing to track LLM inference performance trends, compare hardware configurations, or monitor benchmark updates.

ascend-ai-coding By ascend-ai-coding schedule Updated 6/9/2026

name: inferencex-report description: Automatically fetch InferenceX benchmark data and generate daily performance reports for LLM inference on various hardware (NVIDIA, AMD, etc.). Supports email delivery, data change detection, and 8k1k sequence length performance analysis. Use when needing to track LLM inference performance trends, compare hardware configurations, or monitor benchmark updates. keywords: - inferencex - benchmark - llm inference - performance report - nvidia - amd - ascend - deepseek - llama - qwen - daily report - 性能报告 - 推理基准


InferenceX Report - LLM Inference Benchmark Tracker

Automatically fetch InferenceX benchmark data and generate daily performance reports for LLM inference on various hardware platforms.

Overview

This skill fetches real-time benchmark data from InferenceX API, generates comprehensive performance reports, and sends email notifications with:

  • Daily benchmark data summary
  • Hardware-Model-Framework performance matrix
  • 8k1k sequence length detailed analysis
  • Data change detection (new combinations, performance changes)

Supported Models

Model Records (2026-06-02)
DeepSeek-R1-0528 1,839
gpt-oss-120b 617
Llama-3.3-70B-Instruct-FP8 681
Qwen-3.5-397B-A17B 650
Kimi-K2.5 268
MiniMax-M2.5 840
GLM-5 367
DeepSeek-V4-Pro 607
Total 5,869

Supported Hardware

  • NVIDIA: B200, B300, GB200, GB300, H100, H200
  • AMD: MI300X, MI325X, MI355X

Supported Frameworks

  • vLLM, SGLang, TensorRT-LLM
  • Dynamo, Dynamo-Serve, Dynamo-Disaggregated
  • Atom, Mori-SGLang

Quick Start

Run Report Generation

python3 scripts/inferencex_api_report.py

API Endpoint

# Get latest benchmark data
curl -s "https://inferencex.semianalysis.com/api/v1/benchmarks?model=DeepSeek-R1-0528" | gunzip

# Get specific date
curl -s "https://inferencex.semianalysis.com/api/v1/benchmarks?model=DeepSeek-R1-0528&date=2026-06-02&exact=true" | gunzip

Report Contents

1. Data Update Summary

  • New data availability check
  • New hardware-model-framework-precision combinations
  • Performance changes (>5% threshold)

2. Data Overview

  • Total combinations count
  • Hardware/Model/Framework coverage
  • Best throughput records
  • NVIDIA vs AMD performance comparison

3. 8k1k Performance Matrix ⭐

  • Sequence Length: ISL=8192, OSL=1024 (RAG typical scenario)
  • Filter: interactivity > 20 tps
  • Format: Hardware (rows) × Model (columns)
  • Color Coding:
    • 🟢 Green: > 10k tok/s/GPU (High performance)
    • 🟠 Orange: > 5k tok/s/GPU (Medium performance)
    • ⚪ White: Normal performance
    • ➖ Gray "-": Not supported or no data

Sequence Length Combinations

Combo ISL OSL Scenario Performance
1k1k 1024 1024 Standard chat Medium
8k1k 8192 1024 RAG/Search (long input, short output) Best
1k8k 1024 8192 Code gen (short input, long output) Lower

Why 8k1k is fastest?

  • Prefill phase can parallelize 8192 tokens, fully utilizing GPU compute
  • Decode phase only generates 1024 tokens, reducing autoregression overhead

Configuration

Email Settings

Edit scripts/inferencex_api_report.py:

SMTP_SERVER = "smtp.163.com"
SMTP_PORT = 465
SENDER_EMAIL = "your-email@163.com"
SENDER_PASSWORD = "your-auth-code"  # Not login password!
RECIPIENT_EMAIL = "recipient@example.com"

Cron Job (Optional)

# Add to crontab for daily 9:00 AM execution
0 9 * * * cd /path/to/skill && python3 scripts/inferencex_api_report.py

Output Files

data/inferencex/
├── inferencex_summary_YYYY-MM-DD.csv    # Full performance data
├── inferencex_summary_YYYY-MM-DD.json   # Raw data for comparison
└── email_YYYY-MM-DD.html                # Email content backup

Data Source

  • API: https://inferencex.semianalysis.com/api/v1/benchmarks
  • Update Frequency: Real-time
  • Format: JSON (gzip compressed for some models)

Requirements

  • Python 3.8+
  • requests
  • Standard library only (no external dependencies)

License

MIT License - See repository for details.

Author

Created for Ascend AI Coding community.

Install via CLI
npx skills add https://github.com/ascend-ai-coding/awesome-ascend-skills --skill inferencex-report
Repository Details
star Stars 99
call_split Forks 46
navigation Branch main
article Path SKILL.md
More from Creator
ascend-ai-coding
ascend-ai-coding Explore all skills →