name: rust-testing
description: |-
Monorepo testing guide: L1/L2/L3 taxonomy, canonical just recipes,
require_level! gating, nextest filtersets, and fuzzing. Load this
before writing or reviewing tests in the rusty-biscuit workspace.
hash: 1acc7c1c76b11142-9d61f99b8c9672e3
last_updated: 2026-06-06
Rust Testing — Rusty Biscuit Monorepo
Decision Tree: "What tier should my test live in?"
Start at the requirement, not the code:
Does the test need a real terminal, browser, or device to verify behaviour?
├── NO → Is it slow (>5 s) or does it hammer an external API?
│ ├── NO → L1 (default). Name it normally.
│ └── YES → L1 with `slow_` prefix so sanity skips it.
├── YES → Is it a headless browser test?
│ ├── YES → Browser tier. Name it `browser_*`.
│ └── NO → Does it need OS keyboard/mouse injection?
│ ├── YES → L3. Name it `level3_*`. Requires RUN_LEVEL3=1.
│ └── NO → L2. Name it `level2_*`. Requires a harness (tmux/WezTerm/Chrome).
If the only meaningful coverage of a public API requires a real resource,
document the exception in docs/testing-strategy.md; do not force it into
sanity.
Test Levels
| Level | Prefix | Resource | Skip when absent | Hard-fail env |
|---|---|---|---|---|
| L1 | (none) | In-process only | Never | — |
| L2 | level2_ |
Real terminal / PTY | Harness missing | BISCUIT_TEST_LEVEL_REQUIRED=2 |
| L3 | level3_ |
OS keyboard/mouse | RUN_LEVEL3 unset |
BISCUIT_TEST_LEVEL_REQUIRED=3 |
| Browser | browser_ |
Chrome/Chromium | Browser missing | BISCUIT_BROWSER_REQUIRED=1 |
| Real | real_ |
External device/API | Resource missing | Per-package env vars |
| Slow | slow_ |
None (slow L1) | Excluded from sanity | — |
Gating Tests
Use test_toolkit::require_level! at the top of a test body:
use test_toolkit::{require_level, Level};
#[test]
#[serial_test::serial]
fn level2_renders_in_real_terminal() {
require_level!(Level::L2, WezTermHarness::available(), "WezTerm");
// ... test body
}
For browser tests:
#[tokio::test]
#[serial_test::serial(browser)]
async fn browser_computed_style_matches() {
if !biscuit_browser_harness::require_browser() { return; }
// ... test body
}
Canonical Just Recipes
Every curated package area defines these 12 recipes:
| Recipe | Meaning |
|---|---|
sanity |
Fast confidence (≤15 s). cargo nextest run --lib --bins -E '!set:slow'. |
test |
Full L1 suite. |
test-l2 |
Real-terminal tests. Pre-spawns one shared pane per backend via biscuit-harness-broker, exports BISCUIT_SHARED_*_ID env vars, runs nextest with -j 1, tears panes down in a trap. Tests use <Backend>Harness::shared_or_spawn() to attach to the pre-spawned pane and fall back to per-process spawning when the env var is missing. |
test-l3 |
OS keyboard/mouse tests. |
test-browser |
Headless browser tests. Runs -j 1 (one Chrome at a time); the tier gets a 5s leak-timeout override for Chrome teardown. |
test-real |
External resource tests. |
lint |
Clippy + fmt check. |
bench |
Criterion benchmarks (no-op if opted out). |
coverage |
Per-package LCOV. |
doctest |
cargo test --doc. |
fuzz |
cargo +nightly fuzz run (no-op if no targets). |
all |
sanity → lint → doctest → test → test-l2 → test-browser. |
Delegate to shared recipes in just/devops.just (e.g. @just _test my-crate).
At the repository root, just test delegates to _test_workspace. It uses
Cargo metadata as the package source of truth, runs every workspace package,
continues after ordinary failures, and reports failed packages at the end.
Ctrl+C aborts the remaining packages and preserves exit code 130. Optional
selectors may be exact package names or package-area paths.
Run just check-test-interrupts to verify that every package-area test
recipe also preserves Ctrl+C as exit 130.
Running L2 Tests (read before you run)
level2_* tests spawn real terminal windows / panes. Run them only via
just test-l2, never cargo test / cargo nextest run -E 'test(/level2_/)'
directly. The recipe pre-spawns one shared pane per backend and runs nextest
-j 1; bypassing it spawns windows in parallel, races on global GUI state,
leaks windows on timeout/panic, and produces ambiguous osascript/PTY failures
that look like — but are not — code regressions.
- A wall of single-backend failures (e.g. every
*_in_wezterm) usually means that emulator is absent/unscriptable here, not that the renderer broke — confirm the same test on an available backend (_in_kitty,_apple_terminal). - The Apple Terminal backend is GUI-automated and especially fragile (focus,
do scriptwindow reuse, orphan leaks). Before touching it or debugging anlevel2_apple_terminal_*failure, readapple-terminal-harness-pitfalls.md. - Spawning must never steal foreground focus and must never close a window it did not create — these are hard harness invariants.
Nextest Filtersets
The .config/nextest.toml does not yet define named filterset aliases (nextest
feature limitation). The shared _sanity, _test_l2, etc. recipes pass the
filter expression directly:
sanity:-E '!(test(/level2_/) + test(/level3_/) + test(/browser_/) + test(/real_/) + test(/slow_/))'test-l2:-E 'test(/level2_/)'test-l3:-E 'test(/level3_/)'test-browser:-E 'test(/browser_/)'test-real:-E 'test(/real_/)'
Leaked Process Detection
Two complementary layers catch tests that spawn child processes and fail to reap them:
- nextest
LEAK(per test, all platforms)..config/nextest.tomlsetsleak-timeout = { period = "100ms", result = "fail" }on both profiles, so a test that exits while a child still holds its stdout/stderr fails the run. Clean tests are not slowed — only a leak waits the window out. Dropresult = "fail"to downgrade leaks to a non-fatal warning.- Browser-tier override.
test(/browser_/)raisesleak-timeoutto5s. Headless Chrome's helper/crashpad processes inherit the test's stdout and need longer than 100ms to reap; without the grace they trip spuriousLEAK-FAILs even though the test exits cleanly.result = "fail"is kept so a genuinely runaway browser still fails. The tier also runs-j 1(see below) so only one Chrome tears down at a time —#[serial(browser)]cannot serialize them under nextest's process-per-test model.
- Browser-tier override.
just test-leaks(post-run sweep, all platforms). Wrapsjust testinleak-sweep(tools/test-toolkit,--features leak-sweep). It diffs the process list before/after the whole run and reports survivors whose executable or command line is under the repo (exit code99). Catches detached orphans that closed the test's pipes — whichLEAKcannot see. Attribution is by workspace path, not parent PID (orphan reparenting is OS-specific).
Environment Contract
| Variable | Purpose |
|---|---|
BISCUIT_TEST_LEVEL=1|2|3 |
Max level to run; higher tiers skip cleanly. |
BISCUIT_TEST_LEVEL_REQUIRED=2|3 |
Missing harness panics instead of skipping. |
BISCUIT_BROWSER_REQUIRED=1 |
Missing Chrome panics instead of skipping. |
RUN_LEVEL3=1 |
Opt-in for OS-keyboard-injection tests. |
Fixtures and Env Guards
Use test_toolkit::EnvGuard for process-env setup/teardown and
#[serial_test::serial] when mutating global state:
use test_toolkit::{trace_phase, EnvGuard};
use rstest::{fixture, rstest};
#[fixture]
fn dry_run() -> EnvGuard {
EnvGuard::set_safe("PLAYA_DRY_RUN", "1")
}
#[rstest]
#[tokio::test]
#[serial_test::serial]
async fn dispatch_with_dry_run(#[from(dry_run)] _g: EnvGuard) {
// ...
}
Browser Tests
Assert on computed styles, not source substrings or screenshots:
let mut h = ChromeHarness::new();
h.spawn().await?;
h.render_html(&wrap_fragment("<div class='x'>hi</div>", "#fff")).await?;
let bg = h.computed_style(".x", "background-color").await?;
assert_eq!(bg, "rgb(17, 27, 39)");
Fuzzing
Fuzz targets live in <crate>/fuzz/ and require nightly Rust. Run locally:
cd biscuit-file/lib/fuzz
cargo +nightly fuzz run pdf_extract -- -runs=1000
Fuzz is not part of sanity, test, or PR gates. It runs nightly in CI.
Key Crates
| Crate | Purpose |
|---|---|
test_toolkit |
require_level!, EnvGuard, trace_phase! |
biscuit_test_harness |
Terminal harnesses (WezTerm, Kitty, tmux, Apple Terminal); SharedHarness + per-backend shared_or_spawn(); biscuit-harness-broker binary used by test-l2. For backend selection and API, load the biscuit-test-harness skill via the Skill tool. |
biscuit_browser_harness |
Headless Chrome harness (ChromeHarness, require_browser) |
criterion |
Benchmarking |
rstest |
Fixtures and parameterization |
serial_test |
Serialize env/stateful tests |
pretty_assertions |
Better diffs |
insta |
Snapshot testing |
Topic Pages
Open the topic file when the task matches:
| Topic | File |
|---|---|
L2 WezTerm capture gotchas (SGR collapsing, semicolon vs colon form). For backend selection / harness API, load the biscuit-test-harness skill via the Skill tool. |
wezterm-harness-pitfalls.md |
L2 Apple Terminal pitfalls (do script reuse, focus-steal, resolved: orphan leaks, plain-text capture) |
apple-terminal-harness-pitfalls.md |
| CLI output (channels, color modes, completions, snapshots) | cli-output-testing.md |
| TUI rendering and event/reducer tests | tui-testing.md |
| Browser tests (computed-style assertions) | browser-testing.md |
| Integration tests | integration-tests.md |
| Unit tests | unit-tests.md |
| Snapshots and redaction | snapshots.md, snapshot-redaction.md |
| Doc tests | doc-tests.md |
| Mocking | mocking.md |
| Property testing | property-testing.md |
| Fuzzing | fuzzing.md |
| Performance testing tool choice (Criterion vs Divan) | performance-testing.md |
| Criterion benchmarking (getting started → deep dive → Bencher) | criterion.md |
| Nextest details | nextest.md |
Resources
docs/testing-strategy.md— human-facing deep divejust/devops.just— shared_*lifecycle recipes.config/nextest.toml— slow-timeout and retry config