name: hf-perftest description: Benchmark Gradio app performance. Use when the user wants to load test a Gradio app, compare branches, profile a HF Space, or run A/B tests. argument-hint: [run|run-remote|result-schema] [options] allowed-tools: Bash(hf-perftest *)
hf-perftest — Gradio Performance Benchmarking
Installation
pip install hf-perftest
Commands
List built-in apps
hf-perftest list-apps
Built-in apps can be used by name (no file path needed): echo_text, file_heavy, image_to_image, streaming_chat, stateful_counter, llm_chat, text_to_image, audio_to_audio, video_to_video.
Local benchmark
hf-perftest run \
--app echo_text \
--tiers 1,10,100 \
--requests-per-user 10 \
--output-dir results
Key options:
--app— Built-in app name or path to a Gradio app file (required)--tiers— Comma-separated concurrent user counts (default: 1,10,100)--requests-per-user— Rounds per tier (default: 10)--mode burst|wave— Simultaneous or staggered requests (default: burst)--concurrency-limit— App concurrency limit (default: 1, "none" for unlimited)--mixed-traffic— Add background page loads, uploads, and downloads alongside predictions--num-workers— Number of Gradio workers via GRADIO_NUM_WORKERS (default: 1)--port— App port (default: 7860, auto-increments if occupied)--api-name— Target API endpoint (auto-detected if omitted)
Remote benchmark on HF Jobs
Single branch:
hf-perftest run-remote run \
--apps echo_text streaming_chat \
--branch main \
--hardware cpu-upgrade \
--tiers 1,10,100
A/B test:
hf-perftest run-remote ab \
--apps echo_text file_heavy \
--base main \
--branch my-optimization \
--hardware cpu-upgrade \
--tiers 1,10,100
Profile a HF Space:
hf-perftest run-remote run \
--apps owner/space-name \
--sidecar prompts.json \
--api-name /generate \
--branch main \
--hardware gpu-l4-1
Additional remote options:
--hardware— HF Jobs flavor (default: cpu-basic)--sidecar— Prompt files for spaces--timeout— Job timeout (default: 90m)--dry-run— Preview without submitting--run-name— Label for the run
Result schema
hf-perftest result-schema
Prints the directory structure of benchmark results.
Sidecar Prompt Files
For apps with non-text inputs, create a .prompts.json sidecar file. Two formats:
String list (text-only inputs — replaces the first text input):
["A cat sitting on a windowsill", "Sunset over a mountain lake"]
List of lists (full data payloads — sent as-is):
[
["A cat sitting on a windowsill", 1024, 1024, 4, 42, true],
["Sunset over a mountain lake", 1024, 1024, 4, 42, true]
]
Interpreting Results
Results are saved to <output-dir>/<timestamp>/summary.json with per-tier breakdowns:
client_summary— p50/p90/p95/p99 client latency in ms, success rateserver_summary— Per-phase server timing (queue_wait, preprocess, fn_call, postprocess, total)background_traffic— (if --mixed-traffic) p50/p90/p99 for page loads, uploads, downloads
Always validate with multiple runs — single runs may be affected by system variance.
Monitoring Remote Jobs
hf jobs logs <job_id>
hf jobs inspect <job_id>