name: benchmark-optimization-loop description: "Use when a goal is vague speed ("make it faster", "reduce p95", "cut query time") and you need a bounded, measured loop that promotes only verified, correctness-preserving wins instead of guessed micro-tweaks." license: MIT metadata: author: "Petr Král (pekral.cz)"
Benchmark Optimization Loop (Laravel/PHP)
Turn "make it faster" into a recursive, measured loop. Every promoted change beats a recorded baseline on a real measurement and keeps correctness green. No baseline, no budget, no promotion.
Constraints
- Apply
@rules/sql/optimalize.mdcfor any query change (N+1, eager loading, index usage, batching). - Apply
@rules/code-testing/general.mdc— the correctness gate is the project test suite. - Measure, never guess. Every win is a real readback, not a label or intuition.
- Never trade correctness for speed: a faster variant that changes output is rejected, not promoted.
- Defer all measurement mechanics (how to run, warm-up, repetitions, noise) to
@skills/benchmark/SKILL.md. This skill owns the loop and the gate, not the stopwatch.
Use when
- The goal is unquantified speed: "make it faster", "it's slow", "reduce p95".
- You are eliminating an N+1, tuning a query, choosing a cache strategy, or raising queue throughput.
- Multiple candidate fixes exist and you must pick the one that actually wins.
- A "faster" claim was made without a before/after number.
Do not use for first-time correctness (use @skills/test-driven-development/SKILL.md)
or pure latency architecture without a comparison loop (@skills/latency-critical-systems/SKILL.md).
Required baseline
Before touching any code, write down all five. If any is missing, stop and define it.
- Operation — the exact thing optimized (route, Action, job, query).
- Correctness gate — the tests that must stay green (per
@rules/code-testing/general.mdc). Identical output for identical input. - Metric — pick one primary: wall time, p95 latency, rows/sec, queries/request, cost/run, memory, queue throughput.
- Baseline value — measure the current code via
@skills/benchmark/SKILL.md. Record the number and its variance. - Budget — the stopping line, set up front:
- target (e.g. "p95 < 200 ms" or "≤ 5 queries/request"), AND
- cost cap (time/effort/iterations you will spend), AND
- noise threshold (smallest delta that counts as real, from the baseline variance).
A baseline without variance is not a baseline — you cannot tell a win from noise.
Loop
Run per iteration. The current winner starts as the baseline.
- Measure — get a fresh number for the current winner via
@skills/benchmark/SKILL.md(identical inputs, warm-up, repeated runs). - Identify bottleneck — from data, not hunch. Use
@skills/laravel-telescope/SKILL.mdfor per-request query/timing breakdown,EXPLAIN(@rules/sql/optimalize.mdc) for query plans,@skills/mysql-problem-solver/SKILL.mdfor schema/index issues, Horizon for queue lag. The bottleneck is usually one segment. - One hypothesis → one variant — change exactly one thing. State the hypothesis ("eager-load
with('items')removes the N+1"). One variable per variant or you cannot attribute the delta.- Laravel examples: eager-load to kill an N+1; add a covering index; replace per-row loop with
whereIn/upsert;Cache::remember()a stable read; raisemaxProcesses/chunk size for queue throughput.
- Laravel examples: eager-load to kill an N+1; add a covering index; replace per-row loop with
- Test with identical inputs — same dataset, same warm-up, same repetition count as the baseline. Different inputs invalidate the comparison.
- Reject or promote vs the current winner:
- Reject if the correctness gate fails, the delta is within the noise threshold, or it regresses any tracked metric.
- Promote to new current winner only if it beats the current winner by more than the noise threshold AND keeps every gate green.
- Codify — fold the promoted variant into the code; commit with the exact command and before/after numbers in the message.
- Confirm — re-measure the promoted code from a clean state to prove the win is reproducible, not a fluke.
Always compare against the current accepted winner, not merely the previous run — a lucky run must not become the bar.
Variant table
Keep a running ledger. Never discard rejected rows; they prevent re-testing dead ends.
| # | Hypothesis | Change | Metric (primary) | vs winner | Correctness | Decision | Note |
|---|---|---|---|---|---|---|---|
| 0 | baseline | none | 412 ms p95 | — | green | winner | variance ±18 ms |
| 1 | N+1 on items | with('items') |
96 ms p95 | -316 ms | green | promote | 51→2 queries |
| 2 | cache list 30 s | Cache::remember |
88 ms p95 | -8 ms | green | reject | within noise; adds staleness |
| 3 | covering index | add (user_id,created_at) |
71 ms p95 | -25 ms | green | promote | EXPLAIN: ref, no filesort |
Promotion gate
A variant becomes the new default only when all hold:
- Correctness preserved — full gate green; identical output for identical input (
@rules/code-testing/general.mdc). - Real win — beats the current winner beyond the noise threshold, confirmed by a clean re-measure.
- Reproducible — same result on a repeat run, not a single lucky sample.
- Rollback path — isolated in a commit/branch that can be reverted without collateral; risky changes (destructive migration, customer-facing) need an explicit approval gate.
- Documented — committed with the exact benchmark command, dataset, and before/after numbers.
Stop the recursion when any holds
- The budget target is met (good enough — do not gold-plate).
- The cost cap (iterations/time/effort) is reached.
- The last few variants show diminishing returns — deltas shrinking toward the noise threshold.
- The bottleneck has moved to something out of scope (provider API, network, hardware).
When you stop, the current winner is the result. Record why you stopped.
Done when
- Baseline (operation, gate, metric, value+variance, budget) was recorded before any change.
- Each variant tested one hypothesis with identical inputs and is logged in the variant table.
- Every promotion passed the full promotion gate; rejected variants stayed logged.
- The final winner's improvement is confirmed by a clean re-measurement, not a label.
- The correctness gate is green and a rollback path exists for every promoted change.
- A stopping criterion was hit and the reason is recorded.