name: sim-compare
description: Run cache policy hit rate comparison across multiple cache sizes with charts
argument-hint: " [policies...] [sizes...]"
context: fork
disable-model-invocation: true
allowed-tools: Read, Grep, Glob, Bash
Run a comprehensive cache policy comparison for the given trace.
Input
Trace file: $ARGUMENTS
If no policies or sizes are specified, use sensible defaults based on the trace.
Workflow
Identify the trace format. Check the file extension and contents:
.gzfiles: try common formats (lirs, arc, etc.)- Look at
simulator/src/main/resources/reference.conffor format options - Use
format:pathsyntax (e.g.,lirs:trace.gz)
Select policies to compare. Policy names use
category.PolicyNameformat. Note: config categories use hyphens (two-queue,greedy-dual), not underscores. Each policy is paired with configured admission filters (default: Always, TinyLfu, Clairvoyant), creating multiple instances per policy name.Include at minimum:
product.Caffeine(the production implementation)opt.Clairvoyant(theoretical optimal, upper bound)opt.Unbounded(infinite cache, ceiling)linked.Lru(baseline)sketch.WindowTinyLfu(research W-TinyLFU)sketch.HillClimberWindowTinyLfu(adaptive variant)- Add relevant competitors based on trace characteristics:
- For recency-heavy:
linked.S4Lru,adaptive.Arc - For frequency-heavy:
linked.Lfu,irr.Lirs - For scan-resistant:
two-queue.TwoQueue,two-queue.S3Fifo - For size-aware traces:
greedy-dual.Gdsf,greedy-dual.Camp
- For recency-heavy:
Choose cache sizes. Use a geometric progression covering the working set:
- Start small (e.g., 100), end near working set size
- 5-8 sizes: e.g.,
100,500,1_000,2_500,5_000,10_000,25_000 - If the trace has few distinct keys, reduce the range
Run the simulation. Use the Gradle task:
./gradlew simulator:simulate -q \ --maximumSize=100,500,1000,2500,5000,10000 \ --metric="Hit Rate" \ --title="Description" \ --theme=light \ --outputDir=build/reports/simOverride the trace and policies via system properties appended to the command:
-Dcaffeine.simulator.files.paths.0="format:path/to/trace" -Dcaffeine.simulator.policies.0=product.Caffeine -Dcaffeine.simulator.policies.1=opt.Clairvoyant # ... etcNote: for single-size runs, use
./gradlew simulator:run -qwith-Dcaffeine.simulator.maximum-size=Ninstead ofsimulator:simulate.Read and interpret results. The simulate task produces:
- Individual CSV per cache size
- Combined CSV (policies as rows, sizes as columns)
- PNG chart (line graph of metric vs cache size) Read the CSV output files in the output directory:
- Compare hit rates across policies at each cache size
- Identify the crossover points where one policy overtakes another
- Note the gap between Caffeine and Clairvoyant (theoretical ceiling)
Explain findings. For each notable result:
- WHY does policy X beat policy Y on this trace?
- What trace characteristic drives the difference? (frequency bias, recency bias, scan patterns, temporal shifts)
- How close is Caffeine to optimal? Where does it lose?
- Reference the relevant research paper if applicable:
- TinyLFU paper for admission filter behavior
- Adaptive paper for hill climber effectiveness
- See
.claude/docs/research-foundations.mdfor paper-to-code mapping
Report. Present:
- Summary table of hit rates at each cache size
- Key takeaways (2-3 sentences)
- Notable policy behaviors
- Path to generated chart PNG