on-device-ai

star 224

Build on-device AI features in React Native and Expo apps with React Native ExecuTorch. Use when adding AI to a mobile app without cloud dependencies — chatbots, image classification, object detection, OCR, semantic or instance segmentation, style transfer, image generation, pose estimation, speech-to-text, text-to-speech, voice activity detection, semantic search with embeddings, tokenization, privacy / PII redaction, or vision-language image understanding. Also use when mentioning offline / on-device / privacy AI, reducing cloud cost or latency, or managing ML models. Covers initExecutorch and every hook (useLLM, useClassification, useObjectDetection, useOCR, useSemanticSegmentation, useInstanceSegmentation, useStyleTransfer, useTextToImage, useImageEmbeddings, usePoseEstimation, useSpeechToText, useTextToSpeech, useVAD, useTextEmbeddings, useTokenizer, usePrivacyFilter, useExecutorchModule), tool calling, structured output, VLMs, Expo and bare resource-fetcher adapters, and error handling.

software-mansion-labs By software-mansion-labs schedule Updated 6/9/2026

name: on-device-ai description: Build on-device AI features in React Native and Expo apps with React Native ExecuTorch. Use when adding AI to a mobile app without cloud dependencies — chatbots, image classification, object detection, OCR, semantic or instance segmentation, style transfer, image generation, pose estimation, speech-to-text, text-to-speech, voice activity detection, semantic search with embeddings, tokenization, privacy / PII redaction, or vision-language image understanding. Also use when mentioning offline / on-device / privacy AI, reducing cloud cost or latency, or managing ML models. Covers initExecutorch and every hook (useLLM, useClassification, useObjectDetection, useOCR, useSemanticSegmentation, useInstanceSegmentation, useStyleTransfer, useTextToImage, useImageEmbeddings, usePoseEstimation, useSpeechToText, useTextToSpeech, useVAD, useTextEmbeddings, useTokenizer, usePrivacyFilter, useExecutorchModule), tool calling, structured output, VLMs, Expo and bare resource-fetcher adapters, and error handling.

React Native ExecuTorch

Software Mansion's production patterns for on-device AI in React Native and Expo using React Native ExecuTorch.

Targets the current published API (v0.10.x). Load at most one reference file per question. For hook signatures, model constants, or config options not covered here, webfetch the matching page from docs.swmansion.com/react-native-executorch.

Decision Tree

What does the feature need?
│
├── Generate / chat with text?
│   └── useLLM                                          → see llm.md
│       ├── Plain chat → standard useLLM
│       ├── Image + text input → useLLM with a VLM model (LFM2_VL_*)
│       ├── Tool / function calling → configure with toolsConfig
│       └── Structured JSON output → getStructuredOutputPrompt
│
├── Understand or transform images?
│   ├── What is in this image? → useClassification      → see vision.md
│   ├── Where are the objects? → useObjectDetection     → see vision.md
│   ├── Per-pixel class → useSemanticSegmentation       → see vision.md
│   ├── Per-instance segmentation → useInstanceSegmentation → see vision.md
│   ├── Human pose keypoints → usePoseEstimation        → see vision.md
│   ├── Read text from image → useOCR / useVerticalOCR  → see vision.md
│   ├── Apply artistic style → useStyleTransfer         → see vision.md
│   ├── Generate image from prompt → useTextToImage     → see vision.md
│   └── Embed image as vector → useImageEmbeddings      → see vision.md
│
├── Speech / audio?
│   ├── Transcribe speech → useSpeechToText             → see speech.md
│   ├── Synthesize speech → useTextToSpeech             → see speech.md
│   └── Detect speech segments → useVAD                 → see speech.md
│
├── Text utilities?
│   ├── Embed text as vector → useTextEmbeddings        → see vision.md
│   ├── Count or inspect tokens → useTokenizer          → see setup.md
│   └── Redact PII from text → usePrivacyFilter         → see setup.md
│
├── Full RAG pipeline (retrieval + generation + vector store)?
│   └── react-native-rag (sibling library)              → see setup.md
│
└── Custom `.pte` model not covered by a dedicated hook?
    └── useExecutorchModule                             → see setup.md

Critical Rules

  • Call initExecutorch() at app entry, before any other API. The library does not bundle a network/file layer — you must register a resource-fetcher adapter (ExpoResourceFetcher for Expo, BareResourceFetcher for bare RN). Any hook called before initialization throws ResourceFetcherAdapterNotInitialized.

  • Check isReady before calling forward / generate / transcribe. All hooks load asynchronously. Inference before the model is ready throws ModuleNotLoaded.

  • Interrupt LLM generation before unmounting. Unmounting while isGenerating is true crashes. Call llm.interrupt() and wait for isGenerating === false before navigating away.

  • Use quantized model variants on mobile. Full-precision variants exceed device memory on most phones. Every supported model ships a _QUANTIZED variant — prefer it unless you've measured otherwise.

  • Audio for speech-to-text and VAD must be 16 kHz mono. Mismatched sample rates produce silently garbled transcriptions. Decode with new AudioContext({ sampleRate: 16000 }).

  • Audio from text-to-speech is 24 kHz. Create the playback context with new AudioContext({ sampleRate: 24000 }).

  • The New Architecture (Fabric) is required. Old architecture is unsupported. Expo Go is unsupported — use a custom dev build (npx expo prebuild). iOS release builds need a real device (the simulator lacks the Metal APIs ExecuTorch relies on).

Minimal Setup

// App.tsx (Expo)
import { initExecutorch } from 'react-native-executorch';
import { ExpoResourceFetcher } from 'react-native-executorch-expo-resource-fetcher';

initExecutorch({ resourceFetcher: ExpoResourceFetcher });
// App.tsx (bare React Native)
import { initExecutorch } from 'react-native-executorch';
import { BareResourceFetcher } from 'react-native-executorch-bare-resource-fetcher';

initExecutorch({ resourceFetcher: BareResourceFetcher });

Full setup, Metro config for bundled .pte files, custom adapters, model-loading strategies, and error handling: see setup.md.

Hook Quick Reference

Hook Purpose Reference
useLLM Text generation, chat, tool calling, VLM llm.md
useClassification Image categorisation vision.md
useObjectDetection Bounding-box detection (YOLO26, RF-DETR, SSDLite) vision.md
useSemanticSegmentation Per-pixel class segmentation vision.md
useInstanceSegmentation Per-instance segmentation vision.md
usePoseEstimation COCO 17-keypoint human pose vision.md
useStyleTransfer Artistic image filters vision.md
useTextToImage Stable Diffusion image generation vision.md
useImageEmbeddings CLIP image embeddings vision.md
useOCR Horizontal text OCR vision.md
useVerticalOCR Vertical text OCR (experimental, CJK) vision.md
useTextEmbeddings Sentence embeddings for similarity / RAG vision.md
useSpeechToText Whisper transcription (batch + streaming) speech.md
useTextToSpeech Kokoro TTS (batch + streaming, phoneme input) speech.md
useVAD FSMN voice activity detection speech.md
useTokenizer HuggingFace-compatible tokenization setup.md
usePrivacyFilter On-device PII / privacy redaction setup.md
useExecutorchModule Custom .pte model inference setup.md

Every hook also has a non-React Module counterpart (e.g. LLMModule.fromModelName(...), ClassificationModule.fromModelName(...)) for use outside React components.

Common Pitfalls

Symptom Likely cause Fix
ResourceFetcherAdapterNotInitialized initExecutorch not called Call it at app entry with an adapter
ModuleNotLoaded Inference before model finished loading Gate calls on isReady
MemoryAllocationFailed on launch Model too large for device Switch to _QUANTIZED variant or smaller parameter count
App crashes on screen navigation Unmount during active generation llm.interrupt() and await isGenerating === false
Whisper produces garbled text Wrong sample rate Decode audio at 16 kHz mono
TTS output sounds chipmunked Playback context at wrong rate Create AudioContext({ sampleRate: 24000 })
Build fails on iOS simulator (release) Simulator lacks Metal APIs Build release on real device

Full error code list and recovery patterns: setup.md.

References

File When to read
llm.md useLLM functional + managed modes, tool calling, structured output (JSON Schema / Zod), interrupting, vision-language models, generation config
vision.md Image classification, object detection, semantic + instance segmentation, pose estimation, OCR (horizontal + vertical), style transfer, text-to-image, image + text embeddings
speech.md Speech-to-text (Whisper batch + streaming with timestamps), text-to-speech (Kokoro batch + streaming, phoneme input, voice catalogue), voice activity detection, audio sample-rate requirements
setup.md initExecutorch, Expo / bare resource-fetcher adapters, model loading strategies, Metro config, error codes and recovery, useExecutorchModule for custom .pte models, useTokenizer, usePrivacyFilter, full model catalogue

External Resources

Install via CLI
npx skills add https://github.com/software-mansion-labs/skills --skill on-device-ai
Repository Details
star Stars 224
call_split Forks 9
navigation Branch main
article Path SKILL.md
More from Creator
software-mansion-labs
software-mansion-labs Explore all skills →