name: bal-devnet-ab-test description: A/B test erigon's BAL parallel-execution scheduling on any BAL devnet (bal-devnet-N). Uses launch-devnet to bring up the primary instance, then spins up a second instance with IGNORE_BAL=true on a different port offset so throughput can be compared head-to-head. Use this when the user wants to compare BAL vs non-BAL parallel execution. allowed-tools: Bash, Read, Write, Edit, Glob allowed-prompts:
- tool: Bash prompt: start, stop, and manage two parallel erigon+CL instances for BAL A/B testing
BAL A/B Testing Wrapper
Compare erigon's parallel-execution throughput with vs without BAL (Block Access List) scheduling on a bal-devnet-N network. The flow is:
- Instance A (BAL) — default behaviour. Uses BAL hints from blocks to pre-populate version maps and schedule transactions optimistically.
- Instance B (No-BAL) — same code, but with
IGNORE_BAL=trueset in the environment, forcing the dependency-tracking scheduling path.
Both instances sync the same chain so gas/s, repeat%, abort, and invalid counters can be compared directly at chain tip.
Prerequisite: bring up Instance A
Use the launch-devnet skill with the BAL devnet URL the user provides (e.g. https://bal-devnet-3.ethpandaops.io). That skill discovers chain id, forks, bootnodes, and CL image from the network's config service, generates start/stop/clean scripts, and monitors progress. Do not duplicate that logic here.
When launch-devnet finishes, you'll have:
$WORKDIR/— Instance A working directory (default:~/<devnet-name>/)$WORKDIR/devnet-info.txt— chain id, fork schedule, port offset chosen for Instance A$WORKDIR/inventory.json— bootnodes & peer endpoints (reused by Instance B)$WORKDIR/genesis.json,$WORKDIR/testnet-config/— config artefacts (reused by Instance B)- A running erigon (BAL) + CL pair
Confirm Instance A is past genesis and importing payloads before starting Instance B.
Set up Instance B (No-BAL)
launch-devnet writes Instance A's chosen offset to $WORKDIR/devnet-info.txt as port_offset: <N> (key/value format). Read it instead of guessing, then derive Instance B's offset by adding +300 (leaves room for +200/+300 ephemeral nodes the user might also be running). Verify the candidate ports with lsof (cross-platform — ss is Linux-only) before committing to the offset.
NOBAL_DIR="${WORKDIR}-nobal"
OFFSET_A=$(awk -F': *' '$1=="port_offset"{print $2}' "$WORKDIR/devnet-info.txt")
OFFSET_B=$(( OFFSET_A + 300 ))
# Re-check the +OFFSET_B family (TCP + UDP). Bump by +100 and retry on conflict.
for p in <each B port at +OFFSET_B>; do
lsof -nP -iTCP:$p -sTCP:LISTEN >/dev/null 2>&1 && echo "TCP $p in use"
done
for p in <each B UDP port at +OFFSET_B>; do
lsof -nP -iUDP:$p >/dev/null 2>&1 && echo "UDP $p in use"
done
mkdir -p "$NOBAL_DIR/erigon-data" "$NOBAL_DIR/cl-data"
cp "$WORKDIR/genesis.json" "$NOBAL_DIR/"
cp -r "$WORKDIR/testnet-config" "$NOBAL_DIR/"
cp "$WORKDIR/inventory.json" "$NOBAL_DIR/"
cp "$WORKDIR/devnet-info.txt" "$NOBAL_DIR/devnet-info.txt"
# Update Instance B's metadata to record the new offset and IGNORE_BAL flag
sed -i.bak "s/^port_offset:.*/port_offset: $OFFSET_B/" "$NOBAL_DIR/devnet-info.txt" && rm "$NOBAL_DIR/devnet-info.txt.bak"
echo "ignore_bal: true" >> "$NOBAL_DIR/devnet-info.txt"
./build/bin/erigon init --datadir "$NOBAL_DIR/erigon-data" "$NOBAL_DIR/genesis.json"
Generate Instance B's start scripts the same way launch-devnet does, with these changes:
start-erigon.sh— addexport IGNORE_BAL=trueat the top of the script, before theerigoninvocation. All other env vars stay the same as Instance A. Keep theexec ./build/bin/erigon …ending so the captured PID is the erigon PID (used by Cleanup).- All ports — use the
+OFFSET_Bfamily computed above (or whichever offset survived port-conflict checks). Update both the erigon flags and the CL flags. - CL container name — use
${DEVNET}-nobal-clto avoid colliding with Instance A's container. - CL
--execution-endpoint— point at Instance B's authrpc port, not A's. - CL
--disable-enr-auto-update— required when running two CLs on the same host.
Start in the same order: erigon first, save its PID, poll for jwt.hex and authrpc bind (same loop as launch-devnet Step 8 — do not sleep), then CL.
cd "$NOBAL_DIR" && nohup bash start-erigon.sh > erigon-console.log 2>&1 &
echo $! > "$NOBAL_DIR/erigon.pid"
AUTHRPC_B=<authrpc port at +OFFSET_B>
for i in $(seq 1 60); do
if [ -f "$NOBAL_DIR/erigon-data/jwt.hex" ] \
&& lsof -nP -iTCP:"$AUTHRPC_B" -sTCP:LISTEN >/dev/null 2>&1; then
break
fi
sleep 1
done
[ -f "$NOBAL_DIR/erigon-data/jwt.hex" ] || { echo "Instance B jwt.hex not produced after 60s"; exit 1; }
cd "$NOBAL_DIR" && nohup bash start-cl.sh > cl-console.log 2>&1 &
Compare at chain tip
Wait until both instances are caught up (compare eth_blockNumber against the public RPC). Once both are at-head, the steady-state metrics in the execution log lines are what matter:
| Metric | Source | What it measures |
|---|---|---|
gas/s |
erigon execution log lines | Raw execution throughput |
repeat% |
erigon execution log lines | Speculative re-execution rate (lower = better dependency prediction) |
abort |
erigon execution log lines | Transactions aborted per batch |
invalid |
erigon execution log lines | Transactions invalidated by conflict detection |
blk/s |
erigon execution log lines | Block processing rate |
Expected: BAL instance should have lower repeat% and abort because BAL pre-populates the version map, reducing false conflicts. The gas/s delta is the net throughput impact.
Side-by-side log snapshot:
echo "=== BAL (A) ===" && \
grep -E "parallel (executed|done)" "$WORKDIR/erigon-data/logs/erigon.log" | tail -3
echo
echo "=== No-BAL (B) ===" && \
grep -E "parallel (executed|done)" "${WORKDIR}-nobal/erigon-data/logs/erigon.log" | tail -3
For longer comparisons, sample both logs at fixed intervals and aggregate (mean ± stdev) rather than eyeballing the tail.
Cleanup of Instance B only
NOBAL_DIR="${WORKDIR}-nobal"
docker stop "${DEVNET}-nobal-cl" 2>/dev/null || true
docker rm "${DEVNET}-nobal-cl" 2>/dev/null || true
# Stop Instance B's erigon by PID (saved at start time) — avoids pkill -f regex risks.
if [ -f "$NOBAL_DIR/erigon.pid" ]; then
PID=$(cat "$NOBAL_DIR/erigon.pid")
kill "$PID" 2>/dev/null || true
rm -f "$NOBAL_DIR/erigon.pid"
fi
# Optional — wipe Instance B's working directory.
# Refuse if WORKDIR is empty, NOBAL_DIR doesn't end with `-nobal`, or
# the directory doesn't carry the marker file we wrote at setup time.
# These guards prevent `rm -rf` from acting on an unrelated path if a
# variable is unset or has been overwritten.
if [ -n "$WORKDIR" ] \
&& [[ "$NOBAL_DIR" == *-nobal ]] \
&& [ -f "$NOBAL_DIR/devnet-info.txt" ] \
&& grep -q '^ignore_bal: true' "$NOBAL_DIR/devnet-info.txt"; then
rm -rf "$NOBAL_DIR"
else
echo "refusing to wipe '$NOBAL_DIR' — sanity check failed; clean it up by hand"
fi
Instance A is unaffected; use its own stop.sh / clean.sh to manage it.
Investigating discrepancies
If A and B disagree on block contents or state root (rather than just on throughput), this is not a BAL-vs-no-BAL throughput question — it's a correctness divergence. Stop the comparison and follow the failure-investigation flow in launch-devnet ("Investigating failures — finding the absolute truth"). The two instances are running the same erigon binary with one env-var difference, so a state-root diff between them points squarely at the BAL scheduling path.