llamafile - SKILL.md Agent Skill

name: llamafile description: This skill should be used when the user asks to "build llamafile", "rebuild llamafile", "run llamafile", "run llamafile tests", "debug llamafile", "set up llamafile", "update patches", "fix patch conflict", "update llama.cpp", "pull latest llama.cpp", "sync upstream llama.cpp", "reset submodules", "write a test for llamafile", "how does llamafile work", "llamafile architecture", or needs guidance on the llamafile build system, patch workflow, submodule integration, cosmocc toolchain, or development practices. version: 0.1.4

Llamafile Development Guide

Llamafile combines llama.cpp, whisper.cpp, and stable-diffusion.cpp with Cosmopolitan Libc to create single-file executables that run LLMs locally across Windows, macOS, Linux, and BSD without installation.

Version Disambiguation

New llamafile (or simply "llamafile"): The code in the main branch, used for releases >=0.10.0
Old/Classic llamafile: The legacy code, used for releases until 0.9.3 (see commit 7e7d33c).

This guide covers the new llamafile project.

Quick Reference

Initial Setup

make setup

Immediately after cloning the repo (or after a reset done with make reset-repo), this command initializes git submodules and applies llamafile-specific patches.

Building

Run llamafile:build to build all targets.

Testing

Run llamafile:check to run the unit test suite.

Cleaning

Run llamafile:clean to remove all build outputs.

Reset Submodules

After make setup, submodules contain patches and are no longer in a clean state. To reset them, run:

make reset-repo  # Warning: removes all local changes

WARNING: this command removes all local changes. Do not run it without first generating patches from any modifications.

Core Workflows

Building from Scratch

To build llamafile from a fresh clone:

Clone the repository
Run make setup to initialize submodules and apply patches
Build with llamafile:build

Build outputs appear in o/$(MODE)/ directory.

Modifying Core Code

For changes to llamafile's own code (not submodules):

Edit files in llamafile/ directory
Rebuild with llamafile:build
Run unit tests with llamafile:check

Modifying Submodule Code

Submodules (llama.cpp, whisper.cpp, stable-diffusion.cpp) require a patch-based workflow:

Make changes directly in the submodule directory
Rebuild with llamafile:build
Run unit tests with llamafile:check

NOTE: never try to edit patches or generate them manually. This step is done only after rebuild and tests (even manual ones) are successful. See development.md for detailed patch workflow.

Running Specific Tests

Tests use the .runs pattern in BUILD.mk files:

o/$(MODE)/llamafile/json_test.runs

To run all tests: llamafile:check

Key Concepts

Cosmopolitan Toolchain

The project uses Cosmopolitan Libc (cosmocc) to create Actually Portable Executables (APE) - single files that run on multiple platforms without modification. Always use the llamafile:build, llamafile:check, and llamafile:clean commands (which use cosmocc's make), not system make.

Patch System

Each submodule has a corresponding patches directory:

llama.cpp.patches/
whisper.cpp.patches/
stable-diffusion.cpp.patches/

Patches include:

Modifications (.patch files): Changes to upstream code
Additions (llamafile-files/): New files for integration (BUILD.mk, utilities)

Build System

build/config.mk: Compiler and toolchain configuration
build/rules.mk: Generic build patterns (.c → .o, archives, asset bundling)
BUILD.mk files: Per-package build logic

Outputs: o/$(MODE)/package/file.o

Multi-Architecture Support

Binaries include both x86_64 and aarch64 code paths with runtime CPU feature detection (AVX, AVX2, AVX-512, ARM NEON).

GPU Backend Loaders

Dynamically-loaded backends that export the ggml C ABI — CUDA, ROCm, Vulkan — all go through the shared probe core in llamafile/gpu_backend.c. Each is just a GpuBackendDesc + a link thunk; the core does load → log-suppress → device-count gate (reject 0-device DSOs so AUTO falls back) → register, with a SIGSEGV/SIGABRT crash guard around the foreign probe call (driver init can fault across the cosmo/ms_abi boundary — issue #988). Metal stays separate by design (runtime-compiled, no ms_abi split, no device gate). When adding/changing a backend: route it through the core, keep the gate, and add a case to tests/gpu_backend_test.cpp. A more detailed design doc lives separately.

Main Executables

After building, find binaries in o/$(MODE)/:

Binary	Purpose
`llamafile/llamafile`	Main llamafile executable
`third_party/zipalign/zipalign`	Bundle assets into executables
`whisperfile/whisperfile`	Main whisperfile executable

Troubleshooting

Build Fails After Submodule Update

Run make setup to reapply patches after any submodule changes.

Submodule Has Uncommitted Changes

To reset a single submodule:

cd <submodule> && git reset --hard && git clean -fdx

To reset all submodules:

make reset-repo

Wrong Make Being Used

Ensure using the llamafile:build command (which uses cosmocc's make), not system make.

Additional Resources

Reference Files

For detailed information, consult:

building.md - Complete build system documentation, toolchain details
architecture.md - Repository structure, component overview
development.md - Development workflow, patch management, submodule integration
testing.md - Test patterns, running and writing tests
update_llamacpp.md - Keeping llamafile updated with upstream llama.cpp

Project Documentation

README.md in repo: Project introduction
docs/ directory: User documentation (quickstart, installation, troubleshooting)
RELEASE.md: Release process
Most executables support --help