name: fix-build-determinism description: Use this when asked to fix a Bazel build reproducibility / determinism issue — a target whose outputs differ between builds (e.g. across machines, users, or checkout locations), typically because something bakes an absolute build path or a timestamp into an artifact.
Fix build determinism issues
A reproducible build produces byte-identical outputs regardless of where it runs (which directory it's checked out in, which output base, which user). The IC publishes reproducible artifacts, so any target that bakes a build-time absolute path, timestamp, or other environment detail into its output is a bug.
The usual culprits are externally-built dependencies — http_archives built
with rules_foreign_cc (autotools/cmake) and Rust crates with build.rs —
because they escape Bazel's normal path/timestamp scrubbing and can embed
$PWD, an install --prefix, __DATE__/__TIME__, a build-script probe
artifact, etc. into their outputs.
All commands run from the repository root (cd "$(git rev-parse --show-toplevel)").
The hunt script
Diagnosis is driven by the upstream hunt reproducibility script. It is not
checked into this repo — get it from its gist, save it
at the repo root as hunt, and chmod +x it. It builds a target twice, each
time in a freshly cloned checkout under an output base nested at a different
depth — so the absolute build path differs between the two runs the same way
it would differ between two machines. It writes each build's
--execution_log_json_file and diffs the actualOutputs (path + content
digest) of every action. Any output whose digest differs between the two runs is
a non-reproducible artifact. A clean run ends with builds 1 - 2: no diff ✓.
usage: ./hunt [--root ROOT] [--startup-options OPTS] [--build-options OPTS] [--runs N] TARGET
--root ROOT dir for the per-run checkouts, output bases and execlogs.
Default: a fresh `mktemp -d`.
--startup-options OPTS extra bazel *startup* options, one space-separated string.
--build-options OPTS extra bazel *build* options, one space-separated string.
--runs N number of builds to compare (default: 2).
TARGET label to build, e.g. //:mkfs.ext4
Always pass --build-options='--config=local'
Run with --build-options='--config=local', which builds without the internal
remote cache — essential, because a cache hit would serve a previously-built
(possibly non-reproducible) artifact and mask the very non-determinism you're
hunting. See the build-without-dfinity-infra skill for what --config=local
does. (--startup-options/--build-options are how you feed bazel any other
flags the build needs.)
Running it
Run hunt inside the dev container so the build environment is the pinned one.
See the run-in-dev-container skill for how to invoke container-run.sh
(including on hosts without podman):
# quick pass/fail (artifacts land in an ephemeral in-container tempdir):
./ci/container/container-run.sh ./hunt --build-options='--config=local' //my:target
# to *diagnose* (step 3 needs the two builds' outputs to survive the container),
# point --root at a bind-mounted path under /ic so the artifacts persist on the host:
./ci/container/container-run.sh ./hunt --root /ic/,hunt --build-options='--config=local' //my:target
Gotchas:
huntdoesgit cloneof the repo, so your fix must be committed (to the current branch) for a hunt run to pick it up. Iterate: commit → hunt → repeat.- With the default tempdir, the checkouts/output bases live in the container's
/tmpand vanish when the container exits — fine for a verdict, but use--rootunder/icwhen you need to inspect artifacts. - Bazel marks its output trees read-only. To clean the hunt root between runs:
chmod -R u+w <root> && rm -rf <root>. - Point
--rootat a path on the same filesystem as the repo for faster (hardlinked) clones.
Procedure
Run
hunt --build-options='--config=local'on the failing target and read the mismatch JSON it prints — a list of{path, digest}for outputs that differed.Look at the first differing target/output. Non-determinism cascades: one non-reproducible artifact (a generated source, a static lib, a tool binary) makes everything that embeds it differ too. Fix the earliest / most upstream differing artifact first; re-running often makes the downstream diffs vanish. Map the output path back to the dependency that produces it (e.g.
external/+_repo_rules+<name>/...→ thehttp_archivenamed<name>;external/.../<crate>-<version>/...→ a crate).Pin down what differs. The execlog only gives digests. Pull the two actual artifacts from the persisted output bases (
$ROOT/output-base-1/...vs$ROOT/output-base-2/...— use--rootso these survive) and compare them directly:cmp "$A" "$B" # confirm they differ diff <(strings -a "$A"|sort -u) <(strings -a "$B"|sort -u) # build paths / dates diff <(readelf -SW "$A") <(readelf -SW "$B") # for ELF: which sectionTypical findings: an embedded absolute build path (
/.../sandbox/.../...), a__DATE__/__TIME__string, a build-script probe artifact, or archive ordering. (Bazel's C toolchain already redacts__DATE__/__TIME__and passes-no-canonical-prefixes, so it's almost always an embedded path.)From the offending string, settle on a one-artifact probe for the bad pattern — a command that's non-empty on a broken build and empty once fixed, so you can judge a single build without re-running the full hunt. Pick the most specific stable marker the leak leaves behind (the
sandboxpath component, the exec-root / output-base prefix,.build_tmpdir, a date, the stray probe filename), e.g.:strings -a "$A" | grep -n sandbox # expect matches now; none once fixedRead the dependency's source to find where that string comes from — a
configure-substituted install path, abuild.rswritingenv!("OUT_DIR")or acanonicalize()d path into generated code, a hardcodedPREFIX/...constant, a stray probe file left inOUT_DIR, etc.Check out the source locally and iterate against it. Get the dependency at the exact version (same
urls/sha256as inMODULE.bazel, or the crate source), then point Bazel at your local copy so you can edit and rebuild without re-uploading a patch each time:bazel build --config=local --override_repository=<repo_name>=/abs/path/to/src //my:target(Find
<repo_name>withbazel query --output=build <target>or from the execlog path.) Edit, rebuild, and run the step-3 probe on the freshly-built artifact as a fast inner-loop check — when it comes back empty the embedded dependency is gone:strings -a bazel-bin/.../<artifact> | grep sandbox || echo cleanThis single-build probe is a quick check only; it does not replace the full reproducibility confirmation in step 8. Keep a pristine copy to
diffagainst for the patch.Report it upstream. A local patch is a workaround — the real fix belongs in the dependency, and an upstream fix lets us eventually drop the patch. File an issue on the dependency's tracker (for a crate or other GitHub project,
https://github.com/<owner>/<repo>/issues/new), including:- a short explanation of the bug and its impact, e.g. "
build.rswrites the absolute$OUT_DIRpath into the generatedfoo.rs, so the crate's rlib differs between builds at different filesystem locations, breaking reproducible/hermetic builds." - a minimal reproducible example if the maintainer will likely need one — the
smallest snippet that emits the offending output, or simply "build at two
different paths and
diffthe artifacts" (the step-3 probe doubles as the symptom).
Keep the resulting issue/PR URL; reference it from the patch header below.
- a short explanation of the bug and its impact, e.g. "
Capture the fix as a patch applied at fetch time (don't fork the dep):
http_archivedependency — add a patch (convention:third_party/<name>_<what>.patch) and reference it from the archive inMODULE.bazel:http_archive( name = "<name>", ... patches = ["//third_party:<name>_<what>.patch"], patch_strip = 1, )Rust crate — add a patch under
bazel/and a crate annotation inbazel/rust.MODULE.bazel:crate.annotation( crate = "<crate-name>", patch_args = ["-p1"], patches = ["@@//bazel:<crate>.patch"], )
The patch is a
git diff(pathsa/…b/…); leading#comment lines describing the fix — and linking the upstream ticket from step 6 — are fine and conventional here. Prefer the smallest patch that removes the environment dependency. If the offending artifact is something nothing downstream consumes (an extra tool the dependency builds/installs), it's also valid to just stop shipping it (e.g. trim it in arules_foreign_ccpostfix_script) rather than make it reproducible.Re-run
huntto confirmbuilds 1 - 2: no diff ✓. If a new (further downstream) difference appears, repeat from step 2 — that's the cascade resolving one layer at a time. Verify the real consumers of the target still build.
Worked examples
| Dependency | Kind | Cause | Fix | Commit |
|---|---|---|---|---|
askama (bazel/askama.patch) |
crate (derive macro) | Path::canonicalize() resolved the sandbox symlink to the repo's real path; the resulting relative path's ..-count depended on the sandbox path depth, so the rlib changed across output-base/nest depths. |
Skip canonicalize() so both paths stay anchored to the sandbox. |
0d9c593299 (#10167) |
rustix (bazel/rustix.patch) |
crate (build.rs) |
The can_compile() probe left a rustix_test_can_compile artifact in OUT_DIR, which rules_rust captures as a cacheable output; it embeds non-deterministic compiler-internal metadata. |
Delete the probe artifact after use (upstream PR 1628). | 9f0476004b (#10353) |
libssh2-sys (bazel/libssh2-sys.patch) |
crate (build.rs) |
build.rs generated a pkgconfig file containing absolute build paths. |
Disable that generation. | 737666659c (#3197) |
e2fsprogs / mke2fs (third_party/e2fsprogs_no_external_config.patch) |
http_archive (rules_foreign_cc) |
configure's --prefix is the per-build $BUILD_TMPDIR, so the compiled-in mke2fs.conf path (ROOT_SYSCONFDIR) landed in the binary's .rodata. |
Empty config_fn (use the built-in default profile) + trim the unused, also-non-reproducible extra tools from the install tree. |
cf7fe5147b |
The askama case is the canonical illustration of why hunt varies the nest
depth: the bug only manifests when the build path's depth changes between runs.