name: Toolchain tests description: Instructions for authoring, structuring, and running toolchain tests using the file_test infrastructure.
Toolchain tests
Introduction
This skill provides guidelines and patterns for creating and updating tests for
the Carbon toolchain, especially file tests in toolchain/*/testdata/ (for
example, toolchain/check/testdata/).
Toolchain tests evaluate Carbon source files through Lexing, Parsing, Checking, and optionally Lowering. Output (for example SemIR dumps, Clang errors) is captured and validated using inline CHECK records.
Structure and Authoring
File Layout and Headers
Test files must start with the standard Carbon license, followed by
configuration comments. Separate sections with blank comment lines (//).
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// INCLUDE-FILE: toolchain/testing/testdata/min_prelude/...
//
// AUTOUPDATE
// AUTOUPDATEis mandatory for files using CHECK markers.// TIP:lines are automatically generated by the autoupdater. You do not need to hand-write them. It is harmless to add them, but the script will handle it.
Minimized Preludes
When writing tests entirely unrelated to the Core package, specify a minimal
prelude file using // INCLUDE-FILE. Usually, include
toolchain/testing/testdata/min_prelude/ scripts, such as int.carbon or
primitives.carbon. This significantly speeds up execution and minimizes STDOUT
noise.
- Builtin Primitive Testing: Standard operators (such as
+,-,/,<, etc.) are not imported or available inside minimized preludes. To write tests with a minimal prelude footprint, call primitive builtins directly (e.g.,float.negate,float.div) inside your test code to build expressions.
Split Tests and [[@TEST_NAME]]
A single physical file can test multiple scenarios using split constraints:
// --- passing_case.carbon
library "[[@TEST_NAME]]";
// ...
// --- fail_bad_case.carbon
library "[[@TEST_NAME]]";
// ...
- Always use
library "[[@TEST_NAME]]";in each split rather than hardcoding the library name. This prevents name conflicts, avoids redefining the default library, and keeps the test code clean and templateable. - Exactly
[[@TEST_NAME]](including the brackets) must be used. The test infrastructure automatically replaces it with the split's filename minustodo_andfail_prefixes. - Do not put code that is expected to pass and code that is expected to fail into the same split. Validation relies on non-failing splits producing absolutely no errors and failing splits producing the correct compiler errors independently.
File Prefixing: fail_ and todo_
Expected failures must be differentiated from unexpected failures (and from bugs). Include prefixes to name individual split files or the main test:
fail_...: The test should and does produce compiler errors.todo_fail_...: The test should produce errors but currently does not.fail_todo_...: The test does produce errors or crashes, but it shouldn't (or produces the wrong errors or otherwise misbehaves with errors).todo_...: The test has some incorrect behavior, but doesn't produce errors currently, and shouldn't.
Main File Naming: The main test file (and any split-files) must have a
fail_ prefix if they have an associated error. Exception: The main file
may omit fail_ if it contains a least one split that has a fail_ prefix.
Both the fail_ and todo_ prefixes are stripped from filename properties like
[[@TEST_NAME]].
Constant Evaluation Validation
When testing constant evaluation in semantic checker tests, follow these conventions to ensure diagnostic stability and accuracy:
- Literal Spelling Canonicalization: In Semantic IR, real literals
(floating-point constants) with identical mathematical values can be
assigned distinct internal representation identifiers based on spelling
variations in source code. To completely prevent literal spelling mismatches
in expected output checks, validation tests must be performed using
canonical comparison methods (for example, passing converted values through
an
Expect(X as f64)function). - Generic Parameters Validation: To bypass compile-time constraints where local runtime variables are rejected as generic function arguments, test generic type conversions at runtime, and validate compile-time conversions by passing static literal values directly into primitive builtin calls.
- Exhaustive Edge Case Verification: For complex mathematical algorithms (such as floating-point to integer truncation and rounding), map and execute test constraints covering every code branch, conditional exit, and fallback evaluation path.
- Rounding Threshold Boundaries: Test cases that land extremely close to mathematical boundaries (for example, floating-point literals representing a tiny fraction above 1.0, such as $2^{30} \times 2^{-30}$ or $10^{10} \times 10^{-10}$, verifying correct exact truncation down to 1 or 0).
- Precise Float Literal Spelling: Spell floating-point literals in test
code with exact mathematical precision targeting target thresholds. For
example, if testing the smallest fractional increment above 1.0, use the
exact hex fractional representation (e.g.
0x1.0000000000001p0) or a highly precise decimal fractional spelling (e.g.1.0000000000000001) instead of coarse fractions like1.1to ensure correct boundary assertions. - Representation Capacity Boundaries: Explicitly target edge cases near
representation limits of target types. Test combinations of mantissas and
exponents that yield values exactly on, just below, or just above the
capacity limits of fixed-size destination types (e.g. signed/unsigned
targets like
i32oru32). - Zero-Value Sizing Bounds: Verify boundary inputs of
0and0.0explicitly. Assert that zero inputs are sized and simplified correctly without triggering calculation underflows, division-by-zero errors, or underestimating required bit allocations.
Test Code Comments
- No agent thinking: Do not include comments describing your reasoning or "train of thought" (for example, "Wait, but...") inside the test files. Any comments left in tests should be concise and describe what the test itself is validating for human readers.
SemIR Dumps and Minimizing Output
Limit STDOUT checks to the logic under test. Always use //@dump-sem-ir-begin
and //@dump-sem-ir-end around the specific declarations/blocks where SemIR
output is desired. Only use these markers and not
--dump-sem-ir-ranges=if-present or similar extra args—new tests use
//@dump-sem-ir... to naturally filter output to the highlighted segments based
on the default behavior.
//@dump-sem-ir-begin
fn F(x:? form(ref i32));
//@dump-sem-ir-end
Creating/Updating the Output
AI tools should never hand-write or manually touch // CHECK:STDOUT: or
// CHECK:STDERR: comments.
Write your Carbon test code, headers, and // AUTOUPDATE then run the test
updater:
./toolchain/autoupdate_testdata.py toolchain/PATH/TO/YOUR/TEST.carbon
Review the updated test outputs (for example, by way of git diff). Ensure
logic paths are correctly tested rather than producing massive boilerplate
blocks.