name: Toolchain development description: Instructions for checking, building, debugging, and understanding the Carbon toolchain.
Toolchain development
Toolchain structure
- Under
toolchain/:
Toolchain architecture
- Documentation: Refer to
toolchain/docsfor detailed architecture design and patterns.- Refer to Toolchain Idioms for a
comprehensive list of patterns (for example,
ValueStore, formatting.deffiles, struct reflection) used throughout the implementation.
- Refer to Toolchain Idioms for a
comprehensive list of patterns (for example,
- Builtin Functions: Refer to the Builtin functions skill
(SKILL.md) for guidelines on registering, mapping,
constant evaluating, and lowering compiler builtin primitives (e.g.
"int.convert_float"). - Phases: Lex -> Parse -> Check -> Lower.
- Definitions: Many kinds (tokens, parse nodes, SemIR instructions) are
defined in
.deffiles and expanded by way of macros. - Handlers:
- Parser:
Handle<StateName>inparse/handle_*.cpp. - Checker:
HandleParseNodeincheck/handle_*.cpp. - Lowering:
HandleInstinlower/handle_*.cpp.
- Parser:
- Iteration: Prefer iterative algorithms over recursive ones to prevent stack exhaustion on complex codebases.
Essential commands
- Test everything:
bazelisk test //... - Test specific target:
bazelisk test //toolchain/testing:file_test - Test specific file:
bazelisk test //toolchain/testing:file_test --test_arg=--file_tests=<path_to_carbon_file> - Build toolchain:
bazelisk build //toolchain/...
Updating test data
Carbon tests often use file_test (for example,
//toolchain/testing/file_test). For detailed guidelines on authoring tests,
including file splits, naming conventions (fail_, todo_), and generating
minimal output with SemIR dumps, please refer to the Toolchain tests skill.
If you change compiler behavior, you likely need to update expected test outputs. Do not manually edit thousands of lines of expected output. Use the script:
./toolchain/autoupdate_testdata.py
# Or for a specific file:
./toolchain/autoupdate_testdata.py toolchain/check/testdata/my_test.carbon
Debugging and diagnostics
- Compiler Diagnostics: Refer to the Diagnostics skill (SKILL.md) for strict rules on declaring, formatting, emitting, testing, and styling compiler diagnostic messages (errors, warnings, notes).
- Printing to stderr: Use
llvm::errs() << "debug info\n";.- Avoid
std::cout(it may interfere with tool output).
- Avoid
- SemIR Stringification:
- SemIR objects often have a
Printmethod oroperator<<. inst.Print(llvm::errs())
- SemIR objects often have a
- Debugging Crashes:
- Bazel sandboxing can hide artifacts. Use
--sandbox_debugif needed, but often running the binary directly frombazel-bin/is easier for debugging.
- Bazel sandboxing can hide artifacts. Use
Error handling
- No exceptions: Do not use C++ exceptions.
ErrorOr<T>: ReturnErrorOr<T>for fallible operations.- Check with
if (auto result = Function(); result) { Use(*result); }
- Check with
llvm::Expected<T>: Similar toErrorOr, used when interfacing with LLVM.
Context-Aware Diagnostics
When declaring and emitting errors, ensure semantic wording matches the exact context:
- Semantic Precision: Do not reference "types" when raising errors for
unsized expressions like
IntLiteralorFloatLiteral. For example, useRealLiteralTooLargeForUnsizedIntinstead of a diagnostic referencing an "integer type". - Wording Consistency: Before declaring a new diagnostic in
kind.def, search for existing
diagnostics in the targeted implementation files (for example, other uses of
MaxIntWidth) to align message structures and parameter expectations.
Casting (LLVM style)
- Use
llvm::cast<T>(obj)(checked, asserts on failure). - Use
llvm::dyn_cast<T>(obj)(returns null on failure). - Use
llvm::isa<T>(obj)(boolean check). - Avoid
dynamic_castand standard RTTI.
Leverage LLVM APIs
Before implementing custom algorithms for mathematical, logical, or bitwise operations, inspect target LLVM ADT class APIs:
- Builtin APIs: Verify if LLVM classes (such as
APInt,APFloat, orAPSInt) already offer native equivalents (for example,.pow(),ilogb(),.changeSign(),convertFromAPInt()). Avoid duplicate, naive, or inefficient custom loops.
Data structures
- Prefer APIs in
common/andtoolchain/base/over LLVM ADTs. For example, useMapinstead ofllvm::DenseMap. - If no Carbon API exists, prefer LLVM ADTs over standard library ones (for
example
llvm::SmallVector,llvm::StringRef). StringRefis a view; be careful with lifetimes.
Common pitfalls
- Legacy
explorerreferences: Theexplorerprototype has been moved. Ignore references to it in proposals or old docs; focus ontoolchain. - Manually updating test files: Always check if
autoupdate_testdata.pycan do it for you. - Using
std::stringunnecessarily: Preferllvm::StringReffor arguments. - Header includes: Use specific include orders (often enforced by
clang-format). - Parse node order: Semantics processes parse nodes in post-order; ensure your parser transitions support this.
- Builtin implementation gaps: If adding a primitive builtin function, make sure you address all phases of the lifecycle: macro definition registration, signature validation, compile-time constant evaluation (interpreter), LLVM IR lowering, and prelude modular implementation bindings (avoiding orphan rules). Refer to the Builtin functions skill (SKILL.md) for details.
- Premature helper abstraction: Avoid extracting tiny helper functions that are called from exactly one place and do not significantly modularize complex code. Prefer inlining directly to keep the implementation compact, readable, and localized.
- Redundant bounds calculations: Avoid repeating calculations of complex boundary estimations (such as lower and upper bound estimations). Refactor the logic to calculate unified values once, preserving compactness.