name: fix-flaky-tests description: Use this when asked to fix flaky bazel tests.
This guide explains how to find flaky tests to fix and how to debug them. Flaky tests are bazel tests that run on GitHub workflows that pass after having failed in a previous attempt.
Prerequisites
Make sure you're on an up-to-date
masterbranch to ensure you're using and reading the latest code:git checkout master && git pullRun
gh auth statusto check ifghis authenticated withgithub.comusingGit operations protocol: ssh.If not run:
gh auth login --hostname github.com --git-protocol ssh --skip-ssh-key --webThis prints a one-time device code and a URL. Instruct the user to open the URL in their browser and enter the code.
Do not use the bare
gh auth logincommand, as the interactive prompts are unreliable when run from an AI agent.
Fix a flaky test
If not instructed to fix a test with a specified
labeldetermine which test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:Run the following command to get the top 100 tests ordered descendingly by how much percent of their total runs they flaked in the last week, showing only tests which flaked 1% or more of their runs:
bazel run //ci/githubstats:query -- top 100 flaky% --ge 1 --weekPick the
labelof the top most test which doesn't have an open PR or git commit in the last week mentioning its<test_name>which is the part of thelabelafter the:.<test_name>might be suffixed with_head_nnsor_colocatewhich are variants of the same test. Strip those suffixes when checking for open PRs or commits to avoid missing matches.To check if there is an open PR mentioning the test, run the following command (replace underscores with spaces because GitHub search doesn't match underscored compound words):
gh pr list --search "$(echo '<test_name>' | tr '_' ' ')" --state openTo check if there is a git commit mentioning the test, run the following command:
git log --oneline --since 'last week' | grep "$(echo '<test_name>' | tr '_' '.')"Continue with the next test if you find an open PR or commit mentioning
<test_name>even if it seems the commit is not about fixing flakiness. It's better to pick a test which has no other work being done on it to avoid conflicts.
Get the last flaky runs of the test named
labelin the last week by running one of the following commands, replacing<label>with the label of the test:For most system-tests, which run via Farm, also download their journald logs from ElasticSearch and their console logs from Farm:
bazel run //ci/githubstats:query -- last --flaky --week --download-ic-logs --download-console-logs <label>For
_localsystem-tests, i.e. targets matching the regex//rs/tests/.*_local$, omit the--download-ic-logsand--download-console-logsoptions:bazel run //ci/githubstats:query -- last --flaky --week <label>These tests use the
Localbackend which doesn't run via Farm and doesn't upload its journald logs to ElasticSearch. Instead the journald and console logs are streamed to the test log itself (theFAILED.log/PASSED.logfiles), so there's nothing to download from ElasticSearch or Farm.
Note the command will print
Downloading logs to: <LOG_DIR>.Read
<LOG_DIR>/README.mdto understand how the logs are organized.Analyze the source code of
labeland the logs in<LOG_DIR>to determine the root causes of the flakiness.Ignore failures containing the error:
Retried too many times: sending a request to Farmas those are due to infrastructure issues unrelated to the test code.
When asked to just figure out the root causes without fixing them, document the root causes in a markdown file and finish without running the following steps.
Once the root causes have been determined, pick the most common or recent one and fix the test by addressing that root cause taking
.claude/CLAUDE.mdinto account.Don't address multiple root causes in the same fix. Prefer making separate PRs for each root cause to make it easier to review and revert if needed.
Verify the test still passes by running:
bazel test --test_output=errors --runs_per_test=3 --jobs=3 <label>This executes 3 runs of the test in parallel to increase the chances of reproducing the flakiness. If it fails, analyze the failure and fix it until it passes reliably.
Make a draft Pull Request with the fix, following these steps:
From the root of the repository, create a new git branch named
ai/deflake-<test_name>-<date>, replacing<test_name>with the name of the test and<date>with the current date inYYYY-MM-DDformat, and commit your fix to that branch.Push the branch to
origin(assuming it'sgit@github.com:dfinity/ic.git) using:git push --set-upstream origin HEADSubmit a draft PR using
ghwith the fix. Name it:fix: deflake <label>. Include the root cause analysis in the PR description and mention the PR was created following the steps in.claude/skills/fix-flaky-tests/SKILL.md.