toolchain-tests

name: Toolchain tests description: Instructions for authoring, structuring, and running toolchain tests using the file_test infrastructure.

Toolchain tests

Introduction

This skill provides guidelines and patterns for creating and updating tests for the Carbon toolchain, especially file tests in toolchain/*/testdata/ (for example, toolchain/check/testdata/).

Toolchain tests evaluate Carbon source files through Lexing, Parsing, Checking, and optionally Lowering. Output (for example SemIR dumps, Clang errors) is captured and validated using inline CHECK records.

Structure and Authoring

File Layout and Headers

Test files must start with the standard Carbon license, followed by configuration comments. Separate sections with blank comment lines (//).

// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// INCLUDE-FILE: toolchain/testing/testdata/min_prelude/...
//
// AUTOUPDATE

// AUTOUPDATE is mandatory for files using CHECK markers.
// TIP: lines are automatically generated by the autoupdater. You do not need to hand-write them. It is harmless to add them, but the script will handle it.

Minimized Preludes

When writing tests entirely unrelated to the Core package, specify a minimal prelude file using // INCLUDE-FILE. Usually, include toolchain/testing/testdata/min_prelude/ scripts, such as int.carbon or primitives.carbon. This significantly speeds up execution and minimizes STDOUT noise.

Builtin Primitive Testing: Standard operators (such as +, -, /, <, etc.) are not imported or available inside minimized preludes. To write tests with a minimal prelude footprint, call primitive builtins directly (e.g., float.negate, float.div) inside your test code to build expressions.

Split Tests and `[[@TEST_NAME]]`

A single physical file can test multiple scenarios using split constraints:

// --- passing_case.carbon
library "[[@TEST_NAME]]";

// ...

// --- fail_bad_case.carbon
library "[[@TEST_NAME]]";
// ...

Always use library "[[@TEST_NAME]]"; in each split rather than hardcoding the library name. This prevents name conflicts, avoids redefining the default library, and keeps the test code clean and templateable.
Exactly [[@TEST_NAME]] (including the brackets) must be used. The test infrastructure automatically replaces it with the split's filename minus todo_ and fail_ prefixes.
Do not put code that is expected to pass and code that is expected to fail into the same split. Validation relies on non-failing splits producing absolutely no errors and failing splits producing the correct compiler errors independently.

File Prefixing: `fail_` and `todo_`

Expected failures must be differentiated from unexpected failures (and from bugs). Include prefixes to name individual split files or the main test:

fail_...: The test should and does produce compiler errors.
todo_fail_...: The test should produce errors but currently does not.
fail_todo_...: The test does produce errors or crashes, but it shouldn't (or produces the wrong errors or otherwise misbehaves with errors).
todo_...: The test has some incorrect behavior, but doesn't produce errors currently, and shouldn't.

Main File Naming: The main test file (and any split-files) must have a fail_ prefix if they have an associated error. Exception: The main file may omit fail_ if it contains a least one split that has a fail_ prefix.

Both the fail_ and todo_ prefixes are stripped from filename properties like [[@TEST_NAME]].

Constant Evaluation Validation

When testing constant evaluation in semantic checker tests, follow these conventions to ensure diagnostic stability and accuracy:

Literal Spelling Canonicalization: In Semantic IR, real literals (floating-point constants) with identical mathematical values can be assigned distinct internal representation identifiers based on spelling variations in source code. To completely prevent literal spelling mismatches in expected output checks, validation tests must be performed using canonical comparison methods (for example, passing converted values through an Expect(X as f64) function).
Generic Parameters Validation: To bypass compile-time constraints where local runtime variables are rejected as generic function arguments, test generic type conversions at runtime, and validate compile-time conversions by passing static literal values directly into primitive builtin calls.
Exhaustive Edge Case Verification: For complex mathematical algorithms (such as floating-point to integer truncation and rounding), map and execute test constraints covering every code branch, conditional exit, and fallback evaluation path.
Rounding Threshold Boundaries: Test cases that land extremely close to mathematical boundaries (for example, floating-point literals representing a tiny fraction above 1.0, such as $2^{30} \times 2^{-30}$ or $10^{10} \times 10^{-10}$, verifying correct exact truncation down to 1 or 0).
Precise Float Literal Spelling: Spell floating-point literals in test code with exact mathematical precision targeting target thresholds. For example, if testing the smallest fractional increment above 1.0, use the exact hex fractional representation (e.g. 0x1.0000000000001p0) or a highly precise decimal fractional spelling (e.g. 1.0000000000000001) instead of coarse fractions like 1.1 to ensure correct boundary assertions.
Representation Capacity Boundaries: Explicitly target edge cases near representation limits of target types. Test combinations of mantissas and exponents that yield values exactly on, just below, or just above the capacity limits of fixed-size destination types (e.g. signed/unsigned targets like i32 or u32).
Zero-Value Sizing Bounds: Verify boundary inputs of 0 and 0.0 explicitly. Assert that zero inputs are sized and simplified correctly without triggering calculation underflows, division-by-zero errors, or underestimating required bit allocations.

Test Code Comments

No agent thinking: Do not include comments describing your reasoning or "train of thought" (for example, "Wait, but...") inside the test files. Any comments left in tests should be concise and describe what the test itself is validating for human readers.

SemIR Dumps and Minimizing Output

Limit STDOUT checks to the logic under test. Always use //@dump-sem-ir-begin and //@dump-sem-ir-end around the specific declarations/blocks where SemIR output is desired. Only use these markers and not --dump-sem-ir-ranges=if-present or similar extra args—new tests use //@dump-sem-ir... to naturally filter output to the highlighted segments based on the default behavior.

//@dump-sem-ir-begin
fn F(x:? form(ref i32));
//@dump-sem-ir-end

Creating/Updating the Output

AI tools should never hand-write or manually touch // CHECK:STDOUT: or // CHECK:STDERR: comments.

Write your Carbon test code, headers, and // AUTOUPDATE then run the test updater:

./toolchain/autoupdate_testdata.py toolchain/PATH/TO/YOUR/TEST.carbon

Review the updated test outputs (for example, by way of git diff). Ensure logic paths are correctly tested rather than producing massive boilerplate blocks.