name: numba-0-65-1 description: JIT compiler for Python that translates numerical code into optimized machine code using LLVM. Supports @jit, @njit, @vectorize, @guvectorize, and @jitclass decorators with CPU parallelization, SIMD vectorization, and CUDA GPU targets. Use when accelerating Python numerical loops, creating NumPy-compatible ufuncs, parallelizing array operations, or targeting GPU execution from pure Python.
Numba 0.65.1
Overview
Numba is an open-source just-in-time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. It uses the LLVM compiler infrastructure to generate optimized native code at runtime, achieving performance comparable to C, C++, or Fortran without switching languages or interpreters.
Key capabilities:
- JIT compilation — On-the-fly code generation at import time or runtime via decorators
- Native CPU code — Code tailored to specific CPU capabilities (SSE, AVX, AVX-512)
- GPU acceleration — NVIDIA CUDA support for parallel GPU algorithms from pure Python
- NumPy integration — Deep support for NumPy arrays, ufuncs, and broadcasting
- Parallel execution — Automatic multi-core parallelization with
parallel=Trueand explicitprangeloops - SIMD vectorization — Automatic translation of loops into vector instructions
Numba works best on code that uses NumPy arrays, mathematical operations, and loops. It is not suited for code heavy in pandas, string manipulation, or general-purpose Python features.
What Changed in 0.65.1
Patch release over 0.65.0:
sys.monitoringdisabled on Python 3.14.4+ — CPython 3.14.4 changed interpreter internals that Numba relied on for JITsys.monitoringintegration (#10538). The feature is disabled from 3.14.4 onward;NUMBA_ENABLE_SYS_MONITORINGhas no effect and emits aUserWarningif set to a non-zero value.- CI/workflow fixes — Python version canonicalization in build matrices for free-threaded (3.14.3t) support.
When to Use
- Accelerating performance-critical numerical loops in Python
- Creating high-performance NumPy-compatible universal functions (ufuncs)
- Parallelizing array computations across multiple CPU cores
- Writing stencil operations for image processing, PDE solving, and spatial computations
- Compiling Python classes with typed fields for use in JIT-compiled code
- Generating C callbacks for interfacing with native libraries
- Targeting NVIDIA CUDA GPUs from pure Python
- Building ahead-of-time compiled extension modules
Core Concepts
Compilation Modes
Numba operates in two compilation modes:
- Nopython mode — The default and recommended mode. Compiles the function to run entirely without the Python interpreter, producing the fastest code. Use
@njit(alias for@jit(nopython=True)) or plain@jit. - Object mode — Falls back to running code through the Python interpreter when nopython compilation fails. Use only when necessary; it provides minimal speedup and adds overhead.
Type System
Numba uses a fine-grained type system rather than Python's dynamic types. Types are inferred from argument types at call time. Key numeric types include int8–int64, uint8–uint64, float32, float64, complex64, complex128, and boolean. Array types use subscript notation: float64[:] for 1D, float64[:, :] for 2D.
Lazy vs Eager Compilation
- Lazy compilation (default) — Compilation is deferred until the first function call with specific argument types. Separate specializations are generated for different input types.
- Eager compilation — Specify an explicit signature at decoration time to compile immediately:
@jit(float64(float64, float64)).
Performance Measurement
Always account for compilation time when benchmarking. The first call includes JIT compilation overhead. Use timeit or call the function once before timing to measure post-compilation performance.
Installation / Setup
Numba is available via conda or pip:
- Conda:
conda install numba - Pip:
pip install numba
Optional dependencies for additional functionality:
scipy— enablesnumpy.linalgfunction compilationcolorama— color highlighting in error messagespyyaml— YAML configuration file support (.numba_config.yaml)intel-cmplr-lib-rt— Intel SVML high-performance math library (x86_64)tbb— Intel TBB threading layer for parallel execution
Supported platforms: x86, x86_64, POWER8/9, ARM (including Apple M1), NVIDIA GPUs. Operating systems: Windows, macOS, Linux. Python versions: 3.9–3.12.
Usage Examples
Basic JIT Compilation
from numba import njit
import numpy as np
@njit
def sum_array(arr):
total = 0.0
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
total += arr[i, j]
return total
x = np.arange(100).reshape(10, 10)
print(sum_array(x))
Parallel Loops with prange
from numba import njit, prange
import numpy as np
@njit(parallel=True)
def parallel_sum(A):
s = 0.0
for i in prange(A.shape[0]):
s += A[i]
return s
Creating a ufunc with @vectorize
from numba import vectorize, float64
@vectorize([float64(float64, float64)])
def add(x, y):
return x + y
# Works like a NumPy ufunc with reduce, accumulate, broadcasting
result = add.reduce(some_array, axis=0)
Stencil Operations
from numba import stencil
import numpy as np
@stencil
def average_neighbors(a):
return 0.25 * (a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0])
input_arr = np.arange(100).reshape((10, 10))
output = average_neighbors(input_arr)
JIT-Compiled Class
import numpy as np
from numba import int32, float32
from numba.experimental import jitclass
spec = [
('value', int32),
('array', float32[:]),
]
@jitclass(spec)
class Bag:
def __init__(self, value):
self.value = value
self.array = np.zeros(value, dtype=np.float32)
@property
def size(self):
return self.array.size
def increment(self, val):
for i in range(self.size):
self.array[i] += val
return self.array
mybag = Bag(10)
C Callback with @cfunc
from numba import cfunc
@cfunc("float64(float64, float64)")
def add(x, y):
return x + y
# Access the ctypes callback
print(add.ctypes(4.0, 5.0)) # prints 9.0
Advanced Topics
JIT Compilation Details: @jit decorator options, signatures, lazy vs eager compilation, caching → JIT Compilation
Vectorize and Guvectorize: Creating NumPy ufuncs and generalized ufuncs with multiple targets (cpu, parallel, cuda) → Universal Functions
jitclass: Compiling Python classes with typed fields, type inference from annotations, dunder methods → JIT Classes
Parallel Execution: parallel=True, prange, supported operations, reduction patterns, threading layers (tbb, omp, workqueue) → Parallel Execution
Stencil Operations: Kernel definition, neighborhood specification, border handling, standard indexing → Stencil Operations
C Interoperability: @cfunc callbacks, carray, C structures via CFFI and Record.make_c_struct → C Interoperability
Ahead-of-Time Compilation: numba.pycc module, extension module generation, signature syntax → AOT Compilation
Types and Signatures: Numba type system, numeric types, array layouts, function types, first-class functions → Types and Signatures
Performance Tuning: nopython mode, fastmath, SIMD vectorization, Intel SVML, parallel optimization tips → Performance Tips
Supported Features: Python language support, built-in types, standard library modules, NumPy features, deviations from Python semantics → Supported Features
Environment Variables and Debugging: Configuration via .numba_config.yaml, JIT flags, debugging tools, GDB integration → Configuration and Debugging
Overview
Numba is an open-source just-in-time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. It uses the LLVM compiler infrastructure to generate optimized native code at runtime, achieving performance comparable to C, C++, or Fortran without switching languages or interpreters.
Key capabilities:
- JIT compilation — On-the-fly code generation at import time or runtime via decorators
- Native CPU code — Code tailored to specific CPU capabilities (SSE, AVX, AVX-512)
- GPU acceleration — NVIDIA CUDA support for parallel GPU algorithms from pure Python
- NumPy integration — Deep support for NumPy arrays, ufuncs, and broadcasting
- Parallel execution — Automatic multi-core parallelization with
parallel=Trueand explicitprangeloops - SIMD vectorization — Automatic translation of loops into vector instructions
Numba works best on code that uses NumPy arrays, mathematical operations, and loops. It is not suited for code heavy in pandas, string manipulation, or general-purpose Python features.
When to Use
- Accelerating performance-critical numerical loops in Python
- Creating high-performance NumPy-compatible universal functions (ufuncs)
- Parallelizing array computations across multiple CPU cores
- Writing stencil operations for image processing, PDE solving, and spatial computations
- Compiling Python classes with typed fields for use in JIT-compiled code
- Generating C callbacks for interfacing with native libraries
- Targeting NVIDIA CUDA GPUs from pure Python
- Building ahead-of-time compiled extension modules
Core Concepts
Compilation Modes
Numba operates in two compilation modes:
- Nopython mode — The default and recommended mode. Compiles the function to run entirely without the Python interpreter, producing the fastest code. Use
@njit(alias for@jit(nopython=True)) or plain@jit. - Object mode — Falls back to running code through the Python interpreter when nopython compilation fails. Use only when necessary; it provides minimal speedup and adds overhead.
Type System
Numba uses a fine-grained type system rather than Python's dynamic types. Types are inferred from argument types at call time. Key numeric types include int8–int64, uint8–uint64, float32, float64, complex64, complex128, and boolean. Array types use subscript notation: float64[:] for 1D, float64[:, :] for 2D.
Lazy vs Eager Compilation
- Lazy compilation (default) — Compilation is deferred until the first function call with specific argument types. Separate specializations are generated for different input types.
- Eager compilation — Specify an explicit signature at decoration time to compile immediately:
@jit(float64(float64, float64)).
Performance Measurement
Always account for compilation time when benchmarking. The first call includes JIT compilation overhead. Use timeit or call the function once before timing to measure post-compilation performance.
Installation / Setup
Numba is available via conda or pip:
- Conda:
conda install numba - Pip:
pip install numba
Optional dependencies for additional functionality:
scipy— enablesnumpy.linalgfunction compilationcolorama— color highlighting in error messagespyyaml— YAML configuration file support (.numba_config.yaml)intel-cmplr-lib-rt— Intel SVML high-performance math library (x86_64)tbb— Intel TBB threading layer for parallel execution
Supported platforms: x86, x86_64, POWER8/9, ARM (including Apple M1), NVIDIA GPUs. Operating systems: Windows, macOS, Linux. Python versions: 3.9–3.12.
Usage Examples
Basic JIT Compilation
from numba import njit
import numpy as np
@njit
def sum_array(arr):
total = 0.0
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
total += arr[i, j]
return total
x = np.arange(100).reshape(10, 10)
print(sum_array(x))
Parallel Loops with prange
from numba import njit, prange
import numpy as np
@njit(parallel=True)
def parallel_sum(A):
s = 0.0
for i in prange(A.shape[0]):
s += A[i]
return s
Creating a ufunc with @vectorize
from numba import vectorize, float64
@vectorize([float64(float64, float64)])
def add(x, y):
return x + y
# Works like a NumPy ufunc with reduce, accumulate, broadcasting
result = add.reduce(some_array, axis=0)
Stencil Operations
from numba import stencil
import numpy as np
@stencil
def average_neighbors(a):
return 0.25 * (a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0])
input_arr = np.arange(100).reshape((10, 10))
output = average_neighbors(input_arr)
JIT-Compiled Class
import numpy as np
from numba import int32, float32
from numba.experimental import jitclass
spec = [
('value', int32),
('array', float32[:]),
]
@jitclass(spec)
class Bag:
def __init__(self, value):
self.value = value
self.array = np.zeros(value, dtype=np.float32)
@property
def size(self):
return self.array.size
def increment(self, val):
for i in range(self.size):
self.array[i] += val
return self.array
mybag = Bag(10)
C Callback with @cfunc
from numba import cfunc
@cfunc("float64(float64, float64)")
def add(x, y):
return x + y
# Access the ctypes callback
print(add.ctypes(4.0, 5.0)) # prints 9.0
Advanced Topics
JIT Compilation Details: @jit decorator options, signatures, lazy vs eager compilation, caching → JIT Compilation
Vectorize and Guvectorize: Creating NumPy ufuncs and generalized ufuncs with multiple targets (cpu, parallel, cuda) → Universal Functions
jitclass: Compiling Python classes with typed fields, type inference from annotations, dunder methods → JIT Classes
Parallel Execution: parallel=True, prange, supported operations, reduction patterns, threading layers (tbb, omp, workqueue) → Parallel Execution
Stencil Operations: Kernel definition, neighborhood specification, border handling, standard indexing → Stencil Operations
C Interoperability: @cfunc callbacks, carray, C structures via CFFI and Record.make_c_struct → C Interoperability
Ahead-of-Time Compilation: numba.pycc module, extension module generation, signature syntax → AOT Compilation
Types and Signatures: Numba type system, numeric types, array layouts, function types, first-class functions → Types and Signatures
Performance Tuning: nopython mode, fastmath, SIMD vectorization, Intel SVML, parallel optimization tips → Performance Tips
Supported Features: Python language support, built-in types, standard library modules, NumPy features, deviations from Python semantics → Supported Features
Environment Variables and Debugging: Configuration via .numba_config.yaml, JIT flags, debugging tools, GDB integration → Configuration and Debugging