name: benchmark-kvengine description: Run RecStore KVEngine correctness, YCSB, and storage-only batch lookup benchmark workflows, including read_mode=batch_get_flat for aligning KVEngine limits with PS RDMA batch GET. Use when Codex needs to validate src/test/test_kvengine.cpp, run tools/benchmarks/run_kvengine_compare.py, prompt for thread count, SSD benchmark path, results directory, read mode, and batch keys, then generate a Chinese summary.md.
Benchmark KVEngine
Workflow
Use this skill from a RecStore checkout. Do not run helper scripts from this skill directory; call the project script directly.
- Confirm the current directory is the RecStore repo root, or pass
--repo. - Prompt the user for:
- thread count (default = 16)
- SSD root path for benchmark data (default = /mnt/nvme1n1_recstore/recstore)
- result output directory (default = results/benchmark_kvengine_$(date +%m%d%H%M))
- workloads to run (default = a b c)
- repeat count for YCSB (default = 3, ask if user wants 1)
- distributions to use (default = uniform and zipfian)
- record-count (default=10M)
- read mode (default =
get; usebatch_get_flatwhen aligning with PS RDMA GET) - batch keys (default =
500forbatch_get_flat)
- Run:
cmake -S . -B buildcmake --build build --target test_kvengine -jctest -R '^test_kvengine$' -VVcmake --build build --target benchmark_kv_engine -jtools/benchmarks/run_kvengine_compare.py
- Save logs and CSV/SVG artifacts under the chosen result directory.
- Write
summary.mdas exactly three report tables, with the benchmark hyperparameters recorded as Chinese prose underWorkload 说明before the first table:- Workload description
- Run throughput
- Load throughput
Command Template
Ask the user for threads, ssd_root, and output_dir. Use defaults only when the user accepts them.
cmake -S . -B build
cmake --build build --target test_kvengine -j
ctest -R '^test_kvengine$' -VV
cmake --build build --target benchmark_kv_engine -j
python3 tools/benchmarks/run_kvengine_compare.py \
--output-dir <output_dir> \
--workloads a b c \
--distributions uniform \
--record-count 10000000 \
--runtime-seconds 3 \
--threads <threads> \
--load-threads <threads> \
--repeat 1 \
--value-size 128 \
--read-mode get
tools/benchmarks/run_kvengine_compare.py currently uses /mnt/nvme1n1_recstore/recstore internally for SSD data. If the user provides a different SSD path, create a temporary symlink or patch the command wrapper only after making that choice explicit.
If the user asks for "3 次平均", pass --repeat 3; otherwise preserve the requested repeat count in the summary.md heading.
For storage-only PS RDMA alignment, use random batch lookup rather than single-key reads. This is the path used to establish that DRAM_EXTENDIBLE_HASH is around 19.45M keys/s for BatchGetFlat(500 random keys), while DRAM_PET_HASH is around 51.96M keys/s.
python3 tools/benchmarks/run_kvengine_compare.py \
--output-dir <output_dir> \
--engines dram_eh_dram dram_pet_dram \
--workloads workloadc \
--distributions uniform \
--record-count 300000 \
--runtime-seconds 3 \
--threads 16 \
--load-threads 16 \
--repeat 1 \
--value-size 512 \
--read-mode batch_get_flat \
--batch-keys 500
Summary Format
Generate <output_dir>/summary.md from kvengine_workload_summary.csv after YCSB finishes. Keep only these three sections:
Workload 说明Run 吞吐(ops/s,...)Load 吞吐(ops/s,...)
Under Workload 说明, before the workload table, record the benchmark hyperparameters in Chinese prose. Include at least: threads, load_threads, record_count, runtime_seconds, repeat, value_size, read_mode, batch_keys when applicable, distributions, workloads, SSD root path, output directory, ssd_io_backend, ssd_queue_depth, and allocator choices that affect the benchmark.
Use M for values >= 1,000,000 and K for values >= 1,000. Include the 三 workload 平均 column only in the Run table.
Reporting Rules
- Do not claim tests pass unless the script completed successfully.
- If
test_kvenginefails, stop before YCSB and report the log path. - If any YCSB row exits nonzero, still write
summary.md, but state failures in the final response and point tosummary.csv. - Keep generated project-facing report text in Chinese.
Current Bring-up Notes
run_kvengine_compare.pyrenderskvengine_ycsb_run_throughput.svgunconditionally at the end of a normal run. Ifmatplotlibis missing, the command can exit nonzero aftersummary.csvandkvengine_workload_summary.csvare already written.- In that case, still generate
summary.mdfromkvengine_workload_summary.csvand report the missing chart dependency separately. - For
DRAM_VALUE_STORElanes, watch for allocator failures such asConcurrentSlabMemoryPool OOM. If that happens, record the failing engine row and consider rerunning with explicit--dram-capacity-bytesor a different allocator. - Judge
petkvsuccess by exit code plusYCSB_LOAD_RESULT/YCSB_RESULT, not by incidentalPetHash invalid. capacity_ == 0log lines alone. - Use
read_mode=batch_get_flat --batch-keys 500when the goal is to compare storage-only KVEngine limits with PS RDMA GETbatch_keys=500. Do not compare that number directly with ordinary single-key YCSBgetthroughput without labeling the operation mismatch.