jmh-benchmark-compare

name: jmh-benchmark-compare description: Parse JMH result text by finding the first header line that starts with Benchmark and contains Mode and Score, build a structured table for all columns/rows, compare overlapping benchmarks across 2+ files, compute Diff Score and Diff %, filter by deviation or regression thresholds, analyze regressions over time from filename/mtime timestamps, and export sortable reports to txt/md/csv/xlsx/html. Use for benchmark run comparisons, regression triage, and directory-wide historical analysis.

Use this skill when benchmark output comparison must be reproducible, sortable, and exportable.

Quick start

Run two-file comparison:

python3 .codex/skills/jmh-benchmark-compare/scripts/jmh_benchmark_compare.py \
  /path/run-a.txt /path/run-b.txt \
  --export-formats txt,md,csv,xlsx,html \
  --output-dir /tmp \
  --output-base jmh-compare

Sort by diff percent (descending):

python3 .codex/skills/jmh-benchmark-compare/scripts/jmh_benchmark_compare.py \
  run-a.txt run-b.txt \
  --sort-column "Diff % [run-b - run-a]" \
  --sort-desc \
  --export-formats md \
  --output /tmp/jmh-diff.md

Core behavior

Detect first JMH table header line: line.startswith("Benchmark") and "Mode" in line and "Score" in line.
Derive column boundaries from that header.
Parse all following benchmark rows into an internal table.
Match overlapping benchmark keys across files.
Add derived columns: Diff Score [target - baseline], Diff % [target - baseline], Status [...].

Default key columns are all columns except Cnt, Score, Error. Override via --id-columns.

Inputs and overlap

Pass any mix of files and directories.
Directory entries are scanned for files that contain a JMH header.
--overlap-mode all keeps only rows present in all files.
--overlap-mode any keeps rows present in at least two files.
Baseline selection: --baseline <index-or-label>.

Filters and regression shortcuts

Hide tiny deltas: --min-deviation-pct 1.0
Show only regressions above threshold: --regressions-over-pct 3.0
Control direction interpretation: --score-direction auto|higher|lower

Historical analysis

Analyze trends across many runs:

python3 .codex/skills/jmh-benchmark-compare/scripts/jmh_benchmark_compare.py \
  /path/bench-history \
  --recursive \
  --glob "*.txt" \
  --timestamp-source auto \
  --analyze-over-time \
  --regressions-over-pct 2.5 \
  --export-formats html,csv \
  --output-dir /tmp \
  --output-base jmh-history

Timeline report files are emitted with -timeline suffix.

Exports

txt: aligned plain-text table.
md: valid markdown table.
csv: spreadsheet-friendly CSV.
xlsx: native Excel workbook (single sheet). (xslx alias accepted)
html: sortable table (click header), built-in CSS + JS, color theme selector.

If one format and explicit destination needed, use --output /path/file.ext. If multiple formats, use --output-dir + --output-base.

Script

scripts/jmh_benchmark_compare.py

For timestamp parsing behavior and filename examples, see: references/timestamps-and-discovery.md