rust-testing

star 2

Monorepo testing guide: L1/L2/L3 taxonomy, canonical just recipes, `require_level!` gating, nextest filtersets, and fuzzing. Load this before writing or reviewing tests in the rusty-biscuit workspace.

yankeeinlondon By yankeeinlondon schedule Updated 6/7/2026

name: rust-testing description: |- Monorepo testing guide: L1/L2/L3 taxonomy, canonical just recipes, require_level! gating, nextest filtersets, and fuzzing. Load this before writing or reviewing tests in the rusty-biscuit workspace. hash: 1acc7c1c76b11142-9d61f99b8c9672e3 last_updated: 2026-06-06

Rust Testing — Rusty Biscuit Monorepo

Decision Tree: "What tier should my test live in?"

Start at the requirement, not the code:

Does the test need a real terminal, browser, or device to verify behaviour?
├── NO  → Is it slow (>5 s) or does it hammer an external API?
│   ├── NO  → L1 (default). Name it normally.
│   └── YES → L1 with `slow_` prefix so sanity skips it.
├── YES → Is it a headless browser test?
│   ├── YES → Browser tier. Name it `browser_*`.
│   └── NO  → Does it need OS keyboard/mouse injection?
│       ├── YES → L3. Name it `level3_*`. Requires RUN_LEVEL3=1.
│       └── NO  → L2. Name it `level2_*`. Requires a harness (tmux/WezTerm/Chrome).

If the only meaningful coverage of a public API requires a real resource, document the exception in docs/testing-strategy.md; do not force it into sanity.

Test Levels

Level Prefix Resource Skip when absent Hard-fail env
L1 (none) In-process only Never
L2 level2_ Real terminal / PTY Harness missing BISCUIT_TEST_LEVEL_REQUIRED=2
L3 level3_ OS keyboard/mouse RUN_LEVEL3 unset BISCUIT_TEST_LEVEL_REQUIRED=3
Browser browser_ Chrome/Chromium Browser missing BISCUIT_BROWSER_REQUIRED=1
Real real_ External device/API Resource missing Per-package env vars
Slow slow_ None (slow L1) Excluded from sanity

Gating Tests

Use test_toolkit::require_level! at the top of a test body:

use test_toolkit::{require_level, Level};

#[test]
#[serial_test::serial]
fn level2_renders_in_real_terminal() {
    require_level!(Level::L2, WezTermHarness::available(), "WezTerm");
    // ... test body
}

For browser tests:

#[tokio::test]
#[serial_test::serial(browser)]
async fn browser_computed_style_matches() {
    if !biscuit_browser_harness::require_browser() { return; }
    // ... test body
}

Canonical Just Recipes

Every curated package area defines these 12 recipes:

Recipe Meaning
sanity Fast confidence (≤15 s). cargo nextest run --lib --bins -E '!set:slow'.
test Full L1 suite.
test-l2 Real-terminal tests. Pre-spawns one shared pane per backend via biscuit-harness-broker, exports BISCUIT_SHARED_*_ID env vars, runs nextest with -j 1, tears panes down in a trap. Tests use <Backend>Harness::shared_or_spawn() to attach to the pre-spawned pane and fall back to per-process spawning when the env var is missing.
test-l3 OS keyboard/mouse tests.
test-browser Headless browser tests. Runs -j 1 (one Chrome at a time); the tier gets a 5s leak-timeout override for Chrome teardown.
test-real External resource tests.
lint Clippy + fmt check.
bench Criterion benchmarks (no-op if opted out).
coverage Per-package LCOV.
doctest cargo test --doc.
fuzz cargo +nightly fuzz run (no-op if no targets).
all sanity → lint → doctest → test → test-l2 → test-browser.

Delegate to shared recipes in just/devops.just (e.g. @just _test my-crate).

At the repository root, just test delegates to _test_workspace. It uses Cargo metadata as the package source of truth, runs every workspace package, continues after ordinary failures, and reports failed packages at the end. Ctrl+C aborts the remaining packages and preserves exit code 130. Optional selectors may be exact package names or package-area paths.

Run just check-test-interrupts to verify that every package-area test recipe also preserves Ctrl+C as exit 130.

Running L2 Tests (read before you run)

level2_* tests spawn real terminal windows / panes. Run them only via just test-l2, never cargo test / cargo nextest run -E 'test(/level2_/)' directly. The recipe pre-spawns one shared pane per backend and runs nextest -j 1; bypassing it spawns windows in parallel, races on global GUI state, leaks windows on timeout/panic, and produces ambiguous osascript/PTY failures that look like — but are not — code regressions.

  • A wall of single-backend failures (e.g. every *_in_wezterm) usually means that emulator is absent/unscriptable here, not that the renderer broke — confirm the same test on an available backend (_in_kitty, _apple_terminal).
  • The Apple Terminal backend is GUI-automated and especially fragile (focus, do script window reuse, orphan leaks). Before touching it or debugging an level2_apple_terminal_* failure, read apple-terminal-harness-pitfalls.md.
  • Spawning must never steal foreground focus and must never close a window it did not create — these are hard harness invariants.

Nextest Filtersets

The .config/nextest.toml does not yet define named filterset aliases (nextest feature limitation). The shared _sanity, _test_l2, etc. recipes pass the filter expression directly:

  • sanity: -E '!(test(/level2_/) + test(/level3_/) + test(/browser_/) + test(/real_/) + test(/slow_/))'
  • test-l2: -E 'test(/level2_/)'
  • test-l3: -E 'test(/level3_/)'
  • test-browser: -E 'test(/browser_/)'
  • test-real: -E 'test(/real_/)'

Leaked Process Detection

Two complementary layers catch tests that spawn child processes and fail to reap them:

  1. nextest LEAK (per test, all platforms). .config/nextest.toml sets leak-timeout = { period = "100ms", result = "fail" } on both profiles, so a test that exits while a child still holds its stdout/stderr fails the run. Clean tests are not slowed — only a leak waits the window out. Drop result = "fail" to downgrade leaks to a non-fatal warning.
    • Browser-tier override. test(/browser_/) raises leak-timeout to 5s. Headless Chrome's helper/crashpad processes inherit the test's stdout and need longer than 100ms to reap; without the grace they trip spurious LEAK-FAILs even though the test exits cleanly. result = "fail" is kept so a genuinely runaway browser still fails. The tier also runs -j 1 (see below) so only one Chrome tears down at a time — #[serial(browser)] cannot serialize them under nextest's process-per-test model.
  2. just test-leaks (post-run sweep, all platforms). Wraps just test in leak-sweep (tools/test-toolkit, --features leak-sweep). It diffs the process list before/after the whole run and reports survivors whose executable or command line is under the repo (exit code 99). Catches detached orphans that closed the test's pipes — which LEAK cannot see. Attribution is by workspace path, not parent PID (orphan reparenting is OS-specific).

Environment Contract

Variable Purpose
BISCUIT_TEST_LEVEL=1|2|3 Max level to run; higher tiers skip cleanly.
BISCUIT_TEST_LEVEL_REQUIRED=2|3 Missing harness panics instead of skipping.
BISCUIT_BROWSER_REQUIRED=1 Missing Chrome panics instead of skipping.
RUN_LEVEL3=1 Opt-in for OS-keyboard-injection tests.

Fixtures and Env Guards

Use test_toolkit::EnvGuard for process-env setup/teardown and #[serial_test::serial] when mutating global state:

use test_toolkit::{trace_phase, EnvGuard};
use rstest::{fixture, rstest};

#[fixture]
fn dry_run() -> EnvGuard {
    EnvGuard::set_safe("PLAYA_DRY_RUN", "1")
}

#[rstest]
#[tokio::test]
#[serial_test::serial]
async fn dispatch_with_dry_run(#[from(dry_run)] _g: EnvGuard) {
    // ...
}

Browser Tests

Assert on computed styles, not source substrings or screenshots:

let mut h = ChromeHarness::new();
h.spawn().await?;
h.render_html(&wrap_fragment("<div class='x'>hi</div>", "#fff")).await?;
let bg = h.computed_style(".x", "background-color").await?;
assert_eq!(bg, "rgb(17, 27, 39)");

Fuzzing

Fuzz targets live in <crate>/fuzz/ and require nightly Rust. Run locally:

cd biscuit-file/lib/fuzz
cargo +nightly fuzz run pdf_extract -- -runs=1000

Fuzz is not part of sanity, test, or PR gates. It runs nightly in CI.

Key Crates

Crate Purpose
test_toolkit require_level!, EnvGuard, trace_phase!
biscuit_test_harness Terminal harnesses (WezTerm, Kitty, tmux, Apple Terminal); SharedHarness + per-backend shared_or_spawn(); biscuit-harness-broker binary used by test-l2. For backend selection and API, load the biscuit-test-harness skill via the Skill tool.
biscuit_browser_harness Headless Chrome harness (ChromeHarness, require_browser)
criterion Benchmarking
rstest Fixtures and parameterization
serial_test Serialize env/stateful tests
pretty_assertions Better diffs
insta Snapshot testing

Topic Pages

Open the topic file when the task matches:

Topic File
L2 WezTerm capture gotchas (SGR collapsing, semicolon vs colon form). For backend selection / harness API, load the biscuit-test-harness skill via the Skill tool. wezterm-harness-pitfalls.md
L2 Apple Terminal pitfalls (do script reuse, focus-steal, resolved: orphan leaks, plain-text capture) apple-terminal-harness-pitfalls.md
CLI output (channels, color modes, completions, snapshots) cli-output-testing.md
TUI rendering and event/reducer tests tui-testing.md
Browser tests (computed-style assertions) browser-testing.md
Integration tests integration-tests.md
Unit tests unit-tests.md
Snapshots and redaction snapshots.md, snapshot-redaction.md
Doc tests doc-tests.md
Mocking mocking.md
Property testing property-testing.md
Fuzzing fuzzing.md
Performance testing tool choice (Criterion vs Divan) performance-testing.md
Criterion benchmarking (getting started → deep dive → Bencher) criterion.md
Nextest details nextest.md

Resources

  • docs/testing-strategy.md — human-facing deep dive
  • just/devops.just — shared _* lifecycle recipes
  • .config/nextest.toml — slow-timeout and retry config
Install via CLI
npx skills add https://github.com/yankeeinlondon/rusty-biscuit --skill rust-testing
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
yankeeinlondon
yankeeinlondon Explore all skills →