benchmark

name: benchmark description: Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code.

Any change touching apply_*, functional.py, get_params, get_params_dependent_on_data, composition.py, or transforms_interface.py must include benchmark results.

Standard Matrix

Always benchmark all 9 combinations:

Size	Channels	Use case
256×256	1	Grayscale classification
256×256	3	RGB classification
256×256	5	Multispectral
512×512	1	Depth maps
512×512	3	Detection/segmentation (YOLO, U-Net)
512×512	5	Multispectral segmentation
1024×1024	1	Medical imaging
1024×1024	3	High-res segmentation
1024×1024	5	Satellite imagery

Skip channel counts the transform explicitly doesn't support. Always include the channel axis: grayscale inputs are (H, W, 1), not (H, W).

If the optimization changes dtype conversion or a @uint8_io / @float32_io wrapped function, benchmark the hot dtype and add correctness tests for the other supported dtype. For example, a uint8-only speedup in a @uint8_io function still needs a float32 regression test that verifies wrapper round-tripping.

Template: Isolated Function

import timeit
import numpy as np

SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)}
CHANNELS = [1, 3, 5]
N = 100

for size_name, (h, w) in SIZES.items():
    for ch in CHANNELS:
        shape = (h, w, ch)
        img = np.random.randint(0, 256, shape, dtype=np.uint8)

        old_t = timeit.timeit(lambda img=img: old_func(img, **params), number=N)
        new_t = timeit.timeit(lambda img=img: new_func(img, **params), number=N)
        print(f"{size_name} {h}x{w}x{ch}: old={old_t:.4f}s new={new_t:.4f}s speedup={old_t/new_t:.2f}x")

Template: Full Pipeline (Compose)

import timeit
import numpy as np
import albumentations as A

SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)}
CHANNELS = [1, 3, 5]

transform = A.Compose([A.YourTransform(p=1.0)])

for size_name, (h, w) in SIZES.items():
    for ch in CHANNELS:
        shape = (h, w, ch)
        img = np.random.randint(0, 256, shape, dtype=np.uint8)

        t = timeit.timeit(lambda img=img: transform(image=img), number=100)
        print(f"{size_name} {h}x{w}x{ch}: {t:.4f}s (100 calls)")

Workflow

Before: run benchmark on the current main / original code, save output to a JSON file
After: run benchmark on the modified code, save output to a JSON file
Compare: load both JSON files, compute speedup = old_time / new_time for each transform/size combo
Report results in the PR/commit message body

JSON Output Format

Save benchmark results as JSON for automated comparison:

import json

results = {}
for transform_name, (h, w), ch, elapsed in all_results:
    key = f"{transform_name}_{h}x{w}x{ch}"
    results[key] = {"time": elapsed, "iterations": N}

with open("benchmark_results.json", "w") as f:
    json.dump(results, f, indent=2)

Comparison Script Pattern

import json

with open("bench_old.json") as f:
    old = json.load(f)
with open("bench_new.json") as f:
    new = json.load(f)

for key in sorted(old):
    if key in new:
        speedup = old[key]["time"] / new[key]["time"]
        indicator = "FASTER" if speedup > 1.05 else "SLOWER" if speedup < 0.95 else "SAME"
        print(f"{key}: {old[key]['time']:.4f}s -> {new[key]['time']:.4f}s  {speedup:.2f}x  {indicator}")

Reporting Format

Benchmark (uint8, 100 iterations):

Function direct:
  256x256x1   — Before: 0.0200s After: 0.0100s Speedup: 2.00x
  256x256x3   — Before: 0.0500s After: 0.0300s Speedup: 1.67x
  ...

Compose single:
  256x256x1   — 0.0120s
  256x256x3   — 0.0340s
  ...

Template: Batch (apply_to_images)

When benchmarking batch optimizations (kernel pre-computation, 4D indexing, pre-allocated loops):

import timeit
import numpy as np
import albumentations as A

BATCH_SIZES = [4, 8, 16]
SIZES = {"small": (256, 256), "medium": (512, 512)}

transform = A.Compose([A.YourTransform(p=1.0)])

for batch_size in BATCH_SIZES:
    for size_name, (h, w) in SIZES.items():
        # Grayscale batch — benefits from reshape trick
        images = [np.random.randint(0, 256, (h, w, 1), dtype=np.uint8) for _ in range(batch_size)]
        t = timeit.timeit(lambda: transform(images=images), number=50)
        print(f"batch={batch_size} {size_name} {h}x{w}x1: {t:.4f}s")

        # RGB batch — baseline
        images_rgb = [np.random.randint(0, 256, (h, w, 3), dtype=np.uint8) for _ in range(batch_size)]
        t = timeit.timeit(lambda: transform(images=images_rgb), number=50)
        print(f"batch={batch_size} {size_name} {h}x{w}x3: {t:.4f}s")

Rules

Run on the same machine, back-to-back, same conditions
Use at least 100 iterations for fast functions; fewer for slow ones (aim for >1s total)
Test both uint8 and float32 if the change affects dtype handling. If benchmarking only the hot dtype, add correctness tests for the other dtype.
A >5% regression on any combination requires justification or rework
If adding a new transform, benchmark against the equivalent naive numpy implementation
For batch optimizations, compare 1-channel, 3-channel RGB, and 5-channel multichannel inputs to verify speedup holds across channel counts
Keep channel-last shapes throughout: images (H,W,C), image batches (N,H,W,C), volumes (D,H,W,C), volume batches (N,D,H,W,C)