name: core-ai description: Run on-device AI models in iOS, iPadOS, macOS, and visionOS apps with Core AI. Use when bundling .aimodel files, loading an AIModel, running inference over NDArray tensors, or compiling models ahead of time with coreai-build.
Core AI: On-Device AI Models
Overview
Core AI runs AI models on device inside your app. Inference stays private, works
offline, and has no per-inference cost. You start from a .aimodel file (converted
from a model or already in the correct format) that contains one or more named
inference functions, bundle it, load it as an AIModel, and run those functions
over NDArray tensors (image inputs use a pixel-buffer value).
Available on Apple Intelligence devices: iPhone/iPad with A17 Pro or later, Mac
with M1 or later, Apple Vision Pro (M2+). Building requires the Metal Toolchain
in Xcode (not installed by default) — without it, builds that include .aimodel
files fail with a missing Metal compiler error.
Core AI runs your own model files at the tensor level. For Apple's high-level on-device LLM API (chat sessions, tool calling, guided generation), use the Foundation Models framework — a separate framework, not covered here.
Workflow
- Inspect the model in Xcode's model viewer:
- General — size (parameter count and on-disk storage), numeric precision split into compute (used during inference) and storage (weights on disk), operation distribution, and editable metadata.
- Functions — each inference function's input/output names, types, and shapes.
A
?in a dimension means it is dynamic (supplied or determined at runtime).
- Bundle the model — add the
.aimodelfile to the Xcode target (it appears in the Compile Sources build phase). Install the Metal Toolchain first. - Load the model —
AIModel(contentsOf:)is asynchronous because Core AI specializes the model for the current device and selects the compute units that deliver the best performance. For large models this can take significant time, so consider ahead-of-time compilation (below). - Load a function —
model.loadFunction(named:)returns the function ornilwhen no function with that name exists (it throws on other load failures). UsefunctionNameswhen a model has multiple functions. The same inference function is safe to call from concurrent tasks. - Prepare inputs — match each input's shape and scalar type from
function.descriptor, then write data through a mutable view. - Run and read outputs —
function.run(inputs:)returns outputs keyed by name; pull each result withoutputs.remove(_:)and read it through a view.
Core API
import CoreAI
// 1. Specialize for this device and load the model.
let model = try await AIModel(contentsOf: urlOfModel)
// 2. Load the inference function. Returns nil if the name is absent.
guard let function = try model.loadFunction(named: "main") else {
// Handle a missing function.
}
// 3. Verify the input's shape and scalar type from the descriptor.
let descriptor = function.descriptor
guard let valueDescriptor = descriptor.inputDescriptor(of: "input"),
case .ndArray(let arrayDescriptor) = valueDescriptor,
arrayDescriptor.shape == [3, 4],
arrayDescriptor.scalarType == .float32 else {
// Handle an unexpected type or shape.
}
// 4. Create the input tensor and write data through a mutable view.
var input = NDArray(shape: [3, 4], scalarType: .float32)
var mutableView = input.mutableView(as: Float.self)
guard let elements = mutableView.contiguousElements else {
// Handle a non-contiguous memory layout.
}
writeInputData(into: elements)
// 5. Run inference and extract the named output.
var outputs = try await function.run(inputs: ["input": input])
guard let value = outputs.remove("prediction"),
let prediction = value.ndArray else {
// Handle a missing or unexpected output.
}
processOutput(prediction.view())
Tensors and values
NDArray— an n-dimensional tensor. Build it withNDArray(shape:scalarType:). It is read-only by default: usemutableView(as:)→contiguousElementsto write, andview()to read. Swift enforces read vs. write access at compile time.scalarType— the element type, e.g..float32. Shape is an[Int]matching the model's expectation; a?dimension in the viewer is dynamic.- Images — values marked as images at conversion time use a pixel-buffer value
rather than
NDArray. ValueDescriptor—.ndArray(ArrayDescriptor)vs. image cases. Inspectdescriptor.inputDescriptor(of:)/outputDescriptor(of:)at runtime so the app can adapt if a function's signature changes between deployments without code edits.
Ahead-of-Time (AOT) Compilation
On-device specialization can delay first load. Move the most expensive part — model
compilation — to the build machine with the coreai-build CLI. It converts
MyModel.aimodel into one MyModel.<arch>.aimodelc asset per device architecture.
At runtime the app picks the asset for the current architecture and loads it with the
same AIModel API, so loading code does not change.
# 1. Install the Metal Toolchain (also: Xcode > Settings > Components > Get).
xcodebuild -downloadComponent MetalToolchain
# 2. Compile one .aimodelc per architecture.
xcrun coreai-build compile MyModel.aimodel --platform iOS --output compiled/
# Override compute units, deployment version, target arch, and more:
xcrun coreai-build compile MyModel.aimodel --platform macOS \
--preferred-compute gpuAndNeuralEngine --output compiled/
xcrun coreai-build compile --help
// Select the compiled asset for this device, then load normally.
let arch = AIModel.deviceArchitectureName
let assetName = "MyModel.\(arch).aimodelc"
let model = try await AIModel(contentsOf: bundledURL(for: assetName))
Notes:
coreai-buildemits one.<arch>.aimodelcper architecture; the filename prefix comes from the input model.AIModel.deviceArchitectureNameis the identifier that matches<arch>at runtime.- Compute units default to best performance. Pass
--preferred-computeto override, and use matching load options. - A compiled asset still requires some on-device specialization — AOT removes the bulk of compilation, not all of it. AOT only targets Apple Intelligence devices.
Checklist
- Metal Toolchain installed (Xcode, or
xcodebuild -downloadComponent MetalToolchain). -
.aimodeladded to the target and visible in Compile Sources. -
AIModel(contentsOf:)awaited; slow first load handled (or AOT-compiled). -
loadFunction(named:)nil-checked;functionNamesinspected for multi-function models. - Input
shape/scalarTypeverified againstfunction.descriptor. - Mutable views for writes, read-only views for reads.
- Outputs extracted by name with
outputs.remove(_:). - For large models: per-architecture
.aimodelcbuilt viacoreai-build, and the correct asset selected at runtime usingAIModel.deviceArchitectureName.
Resources
- Apple — Integrating on-device AI models in your app with Core AI
- Apple — Compiling Core AI models ahead of time
- Prefer Apple docs for up-to-date API details; web-search the current Core AI documentation alongside this skill.