benchmark

star 224

Run regression benchmarks, track results, and generate trend reports

nockawa By nockawa schedule Updated 2/28/2026

name: benchmark description: Run regression benchmarks, track results, and generate trend reports argument-hint: [--quick] [--report-only] [--list] [--btree-fast] [--btree-medium] [--btree-full]

Benchmark Regression Tracking

Run Typhon regression benchmarks, record results to history, and generate trend reports with regression detection.

Comparison mode: Each run is compared against the immediately previous run (not an averaged baseline). This gives a clear trend view.

Noise filtering: Benchmarks are automatically classified as "noisy" (filtered from regressions) when:

  • Mean is below min_measurable_ns (default: 1.0ns) — below BDN's measurement resolution
  • Coefficient of Variation exceeds max_cov_pct (default: 30%) — inherently high-variance benchmarks
  • Absolute delta is below min_delta_ns (default: 0.5ns) — sub-ns shifts on fast micro-benchmarks

Input

$ARGUMENTS may contain:

  • --quick — Run with reduced warmup/iterations for fast feedback
  • --report-only — Skip benchmark execution, regenerate reports from existing history
  • --list — List all regression-tracked benchmarks with their thresholds
  • --btree-fast — BTree quick profile: core ops + 2 concurrent scenarios (~3 min)
  • --btree-medium — BTree medium profile: all key types + concurrent scaling (~15 min)
  • --btree-full — BTree full profile: everything including tree sizes + enumeration (~50 min)
  • (empty) — Full regression benchmark run + report generation

Workflow

/benchmark --list

List all regression-tracked benchmark classes and methods:

cd test/Typhon.Benchmark && dotnet run -c Release -- --list --allCategories Regression

Then read benchmark/config.json and display the configured thresholds per benchmark.

/benchmark --report-only

Skip benchmark execution. Regenerate reports from existing history:

python3 benchmark/scripts/report_generator.py --history benchmark/history/results.jsonl --config benchmark/config.json --output-dir benchmark/reports

Read benchmark/reports/latest.md and display a condensed summary.

/benchmark --quick

Same as default workflow below, but append quick-mode flags to BDN.

Clean stale artifacts first (same as Step 2 of default workflow), then run in the background (run_in_background: true) and poll with TaskOutput (block: true, timeout: 600000):

cd test/Typhon.Benchmark && dotnet run -c Release -- --allCategories Regression --exporters json --warmupCount 1 --iterationCount 2

Then continue with report generation step.

/benchmark --btree-fast

BTree quick profile (~3 min). Runs benchmarks tagged BTreeFast: core single-threaded ops, 2 concurrent scenarios (read scaling + write serialization), secondary index small-delta, and 95/5 mixed workload.

Follow the same Steps 1-6 as the default workflow, but use this BDN command in Step 3:

cd test/Typhon.Benchmark && dotnet run -c Release --no-build -- --btree-fast --exporters json

Note: --btree-fast is a custom Program.cs switch that maps to --allCategories BTreeFast.

/benchmark --btree-medium

BTree medium profile (~15 min). Runs all BTreeMedium-tagged benchmarks: all key types (L16/L32/L64/String64), concurrent scaling with more thread counts, secondary index patterns.

Follow the same Steps 1-6 as the default workflow, but use this BDN command in Step 3:

cd test/Typhon.Benchmark && dotnet run -c Release --no-build -- --btree-medium --exporters json

Note: --btree-medium maps to --allCategories BTreeMedium.

/benchmark --btree-full

BTree full profile (~50 min). Runs ALL BTree benchmarks: everything from fast + medium, plus tree depth scaling (100 to 100K entries), enumeration under contention (0-32 writers), and full thread count sweep.

Follow the same Steps 1-6 as the default workflow, but use this BDN command in Step 3:

cd test/Typhon.Benchmark && dotnet run -c Release --no-build -- --btree-full --exporters json

IMPORTANT: This can take up to ~50 minutes. Run in the background and poll with TaskOutput (block: true, timeout: 3600000).

Note: --btree-full maps to --anyCategories BTreeFast BTreeMedium BTreeFull (space-separated — BDN requires separate tokens, not comma-separated).

/benchmark (default — full run + report)

Step 1: Build in Release

dotnet build -c Release test/Typhon.Benchmark/Typhon.Benchmark.csproj

If build fails, report errors and stop.

Step 2: Clean Stale BDN Artifacts

Remove prior BDN result files to prevent exploratory benchmark data from polluting the regression report:

# Windows
if exist "test\Typhon.Benchmark\BenchmarkDotNet.Artifacts\results" rmdir /s /q "test\Typhon.Benchmark\BenchmarkDotNet.Artifacts\results"
# Unix/macOS
rm -rf test/Typhon.Benchmark/BenchmarkDotNet.Artifacts/results

Step 3: Run Regression Benchmarks

cd test/Typhon.Benchmark && dotnet run -c Release --no-build -- --allCategories Regression --exporters json

IMPORTANT: This step can take up to ~12 minutes, which exceeds the Bash tool's 10-minute max timeout. Run this command in the background (run_in_background: true) and poll with TaskOutput (use block: true, timeout: 600000). Let the user know benchmarks are running before starting the background task.

Step 4: Generate Report

python3 benchmark/scripts/report_generator.py --bdn-results test/Typhon.Benchmark/BenchmarkDotNet.Artifacts/results --history benchmark/history/results.jsonl --config benchmark/config.json --output-dir benchmark/reports

Step 5: Display Summary

Read benchmark/reports/latest.md and display a condensed summary to the user:

  • Total benchmarks run
  • Regressions found (list each with name + % change) — highlight prominently
  • Improvements found (list each with name + % change)
  • Stable benchmark count
  • Link to full report: benchmark/reports/latest.md

Step 6: Prompt for History Commit

Ask the user:

Question: "Benchmark results have been appended to history. Commit the updated history?" Header: "Commit" Options:

  • Yes, commit history (description: "Commit benchmark/history/results.jsonl with the new run data")
  • No, skip commit (description: "Keep the local changes without committing")

If yes, commit only benchmark/history/results.jsonl:

git add benchmark/history/results.jsonl
git commit -m "benchmark: record regression benchmark results"
Install via CLI
npx skills add https://github.com/nockawa/Typhon --skill benchmark
Repository Details
star Stars 224
call_split Forks 13
navigation Branch main
article Path SKILL.md
More from Creator