name: wasm-d8-perf description: Use this skill to reproduce, profile, or analyze WebAssembly (Wasm) execution performance using d8 shell.
WebAssembly Performance Analysis in d8
Reproduce, profile, and analyze Wasm performance using the V8 developer shell
(d8).
1. Reproduction and Establishing a Baseline
- Build Configurations:
- CRITICAL: Always use a
releasebuild (e.g.,out/x64.release/d8) for benchmarking. - Use
optdebug(e.g.,out/x64.optdebug/d8) for disassembling/debugging (preserves symbols).
- CRITICAL: Always use a
- Measurement Metrics:
- Wall time: Measure execution time (e.g., JS
console.time()).- LLM Tip (Statistical Rigor & Warmup): Run 3-5 times; take the median/average. Ensure code runs in a loop to trigger Turboshaft tier-up (warmup) before measuring.
- Resource/CPU: Use
/usr/bin/time -v(Wall time, memory) andperf stat -d(cycles, instructions, IPC, branch misses).- LLM Tip (perf Fallback): If
perffails due to permissions (e.g.kernel.perf_event_paranoid), notify the user to adjust settings or switch to a PMU-enabled environment (gLinux/Cloudtop). Fall back tod8 --profif needed.
- LLM Tip (perf Fallback): If
- GC Overhead: Use
--trace-gcto trace garbage collection pauses, especially if WasmGC is used.
- Wall time: Measure execution time (e.g., JS
- Tier Isolation: Compare performance between execution tiers to understand
the optimization effect.
- Liftoff (Baseline):
d8 --liftoff-only script.js - Turboshaft (Optimized):
d8 --no-liftoff script.js- Note:
--no-liftoffdisables speculative inlining since it skips Liftoff execution and thus doesn't collect feedback.
- Note:
- Liftoff (Baseline):
2. Hotspot Identification
Find where time is spent, what the hottest functions, loops, blocks, or instructions are, so we focus on that:
- Linux Perf: Run
./tools/profiling/linux-perf-d8.pyto get a profile with JIT symbols resolved, then analyze withperf report. - V8 Profiler: Run
d8 --prof script.jsand analyzev8.logwith./tools/linux-tick-processor. - Categorize Hotspots: Wasm user code, JS code, V8 C++ Runtime/GC/Compilation, or JS/Wasm boundary (wrappers).
3. Deep Dive Analysis
Compilation Overhead
- Identify Slow to Compile Functions: Use
d8 --trace-wasm-compilation-timesandd8 --turbo-stats-wasm. - Inspect Bytecode: Build
wami(tools/dev/gm.py x64.release wami) and disassemble:out/x64.release/wami --single-wat=<func_index> <path_to_wasm>
Wasm Binary Quality & Toolchain
Unoptimized Wasm binary patterns can hurt V8 compilation and execution.
- Ask the user if source code (C++/Rust) is available.
- Check for: huge functions, excessive locals, many branches, or missed static optimizations.
- Ensure binary is a release build (compiled with
-O3/-Os).
Execution Performance & Optimization (Deep Dives)
Perform deep dives on hot functions:
- Turboshaft IR Traces: Run
d8 --trace-turbo --trace-turbo-filter=<func_name>.- Context Window: To keep the context windows clean use
jqprogrammatically:- List phases:
jq '.phases[].name' turbo-*.json - Count nodes:
jq '.phases[] | {name: .name, node_count: (if (.data | type) == "object" and .data.nodes then (.data.nodes | length) else 0 end)}' turbo-*.json - Find ops:
jq '.phases[] | select(.name == "TurboshaftTypeAssertionsPhase") | select((.data | type) == "object" and .data.nodes) | .data.nodes[] | select(.title | contains("BoundsCheck"))' turbo-*.json
- List phases:
- Context Window: To keep the context windows clean use
- Assembly Profiling: Run
perf annotate -i file.perf.data.jitted --stdioto map hot instructions to bytecode.- Instruction Skid: Account for cycle misattribution; inspect surrounding instructions and data dependencies.
- Disassembly & Comments: Run
out/x64.optdebug/d8 --print-wasm-code-function-index=<idx> --code-comments.- LLM Tip: Redirect disassembly output to a file (
> disasm.txt) to avoid flooding the terminal.
- LLM Tip: Redirect disassembly output to a file (
- Optimization & Tracing Flags:
- Inlining:
--trace-wasm-inlining,--trace-turbo-inlining - Loop Peeling and Unrolling: Enable/disable
--wasm-loop-peeling,--wasm-loop-unrollingto see an effect of those optimizations. - Other Optimizations:
--turboshaft-trace-load-elimination,--turboshaft-trace-peeling,--turboshaft-trace-unrolling - Typing:
--trace-wasm-typer
- Inlining:
- Inlining & Tiering Budgets: Scalability tuning flags to adjust
optimizations:
- Tiering up earlier or later:
--wasm-tiering-budget=<value>. - Inline more or less aggressively:
--wasm-inlining-budget=<value>.
- Tiering up earlier or later:
- V8 Flag Discovery: Search for context-specific flags using
d8 --help | grep -i <keyword>or by looking up definitions insrc/flags/flag-definitions.hand viagrep. Refer tov8-commandsskill for general usage.
4. Synthesis and Reporting
Provide a clear report summarizing:
- Hottest Functions: Contribution and compilation tier (Liftoff vs Turboshaft).
- Inlining Status: Report Wasm-to-Wasm and JS/Wasm wrapper and body inlining.
- Wasm Input Binary Properties: Release vs debug, missed static optimizations, very large functions, or otherwise noteworthy code patterns.
- Disassembly & Assembly Hotspots: Highlight hot instructions and map them to Wasm bytecode.
- Turboshaft Optimization: Key phases affecting the CFG and any failed optimizations.
- Code Skeleton: Provide a simplified snippet of the hot block/loop at the most useful level (source, bytecode, IR, or machine code).
5. Experimentation & Verification
Only proceed to this step after your initial report and explicit approval by the user.
- Ablation Studies: Toggle optimizations to measure wall-time impact (e.g.
--no-wasm-loop-unrolling,--no-turboshaft-wasm-load-elimination,--no-wasm-inlining,--no-turbo-inline-js-wasm-calls).- LLM Tip: Present results in a Before vs. After table.
- Wasm Binary/Static Optimization: Use
wasm-opt -O3(or-Os) if available onPATH, or ask the user to rebuild with different flags. - Hypotheses: Align on potential optimization opportunities with the user before implementing fixes.