name: openmp-6.0 description: "Authoritative knowledge base from the OpenMP API v6.0 specification (Nov 2024) + the Nov-2025 errata (corrections applied inline). CONSULT THIS BEFORE ANSWERING — do not answer OpenMP questions from memory; directive/clause semantics, data-sharing vs data-mapping rules, the flush memory model, schedule/tasking/offload behavior, and the runtime API are subtle and version-sensitive. TRIGGER whenever a question concerns: writing/reading/debugging any OpenMP directive (#pragma omp / !$omp); multithreading or GPU/device offload in C/C++/Fortran via OpenMP; data-sharing clauses (shared/private/firstprivate/lastprivate/reduction) or data-mapping (map/target); parallel/teams/simd/masked, worksharing (for/sections/single/distribute/schedule), tasking (task/taskloop/taskgraph/depend), synchronization (barrier/critical/atomic/ordered/flush), the device model (target/declare target), memory allocators/spaces, variant directives (metadirective/declare variant), the omp_* runtime API, OMP_* environment variables/ICVs, OMPT/OMPD tools, or a v6.0 / errata detail. SKIP only when the user explicitly wants OpenACC, CUDA, or a vendor-compiler-specific (gcc/llvm/nvhpc) behavior rather than the OpenMP standard." allowed-tools: - Read - Grep argument-hint: [topic, directive/clause, omp_ routine, OMP_ env var, or chapter (e.g. ch07)]
OpenMP Application Programming Interface — Version 6.0
Source: OpenMP API v6.0 (Nov 2024) + Errata Nov 2025 (11 corrections applied inline) | Pages: ~964 | Chapters: 19 (grouped from 37 spec chapters + appendices) | _OPENMP: 202411 | Generated: 2026-06-09
How to Use This Skill
- Without arguments — load the execution/memory model + data-environment core below.
- With a topic — ask about
data clauses,map,tasking,target,schedule,atomic,metadirective; I read the relevant chapter. - With a directive/clause — name it (
firstprivate,taskloop,distribute); I find the chapter. - With an
omp_routine /OMP_var — → ch17 / ch04. - With a chapter —
ch07(data),ch12(parallelism),ch14(tasking),ch15(device).
This is the specification (+ errata), not a tutorial — answers cite sections. Errata corrections are marked ⚠ in ch07/08/10/14/17. Pairs with openacc-3.4, fortran-2023-standard, and CLAUDE-gpu.md.
Core Frameworks & Mental Models
Execution model (Ch 1) — three parallelism axes
- threads (
parallel→ team, fork-join, end barrier), tasks (task→ deferrable work units withdependDAG), devices (target→ offload), plus SIMD (simd). Modern code composes all four. - Entity hierarchy: device ⊃ contention group (thread pool) ⊃ team;
teamsmakes a league. No portable synchronization across teams/contention groups (deadlock risk). - User-directed: the implementation checks nothing — you own race/deadlock/dependence correctness.
Two attribute systems (Ch 7) — never conflate
- data-sharing (thread level):
shared/private/firstprivate/lastprivate/reduction/linear. Usedefault(none)to force explicit classification (likeimplicit none). - data-mapping (device level):
map(to/from/tofrom/alloc)moves data to/from a device data environment. A variable can be both privatized and mapped.
Memory model (Ch 16) — visibility is not automatic
- Cross-thread visibility needs synchronizing flushes: a release flush before an acquire flush of the same data establishes happens-before.
barrier,critical,atomic(with order), and task scheduling imply flushes. Without them, threads see stale values.
Worksharing & scheduling (Ch 13)
for/do(thread worksharing),sections,single,distribute(across teams),loop(descriptive/portable).schedule:static(uniform, cache-friendly),dynamic/guided(irregular),runtime(OMP_SCHEDULE).nowaitdrops the end barrier (only when independent).
Device offload (Ch 15) — the GPU model
target teams distribute parallel foris the canonical offload idiom. Hoist data:target enter data map(to:)once + resident kernels +target exit data map(from:)— avoid per-kernel transfers (the dominant perf lever).map(to/from)not blankettofrom.OMP_TARGET_OFFLOAD=MANDATORYto catch silent host fallback.
Tasking (Ch 14)
task depend(in/out/inout: ...)builds a data-flow DAG the runtime schedules — replace manual barriers.taskgraph(6.0) records/replays a stable task graph. Generate tasks from one thread (single/masked); cut recursion withfinal.
Correctness traps (cross-referenced to your memory)
cpu_time/clock()for parallel timing is wrong — useomp_get_wtime+ synchronization (a missing join/barrier fakes huge speedups; cf.feedback_gpu_benchmark_timing).- Consumer NVIDIA GPUs: 1:64 FP64:FP32 → FP32-store/FP64-compute is slower than full FP64 (
reference_consumer_gpu_fp64_trap). - Manual accumulation into a
sharedvar is a race — usereduction.atomiconly for irregular scatter.
v6.0 highlights
Loop-transform constructs + apply (ch11) · taskgraph/replayable (ch14) · free-agent threads · safesync · masked replaces master · expanded allocators/memspaces (ch8) · richer metadirective/declare variant (ch9) · atomic compare CAS (ch16).
Chapter Index
| # | Covers | Key topics |
|---|---|---|
| ch01 | Ch 1 | execution model, contention groups, OMPT/OMPD, compliance |
| ch02 | Ch 2 | normative terms; data-sharing vs mapping; memory model terms |
| ch03 | Ch 3 | ICVs, scopes, precedence |
| ch04 | Ch 4 | OMP_* env vars, affinity, offload policy |
| ch05 | Ch 5 | directive/clause syntax, array sections, iterators, _OPENMP |
| ch06 | Ch 6 | structured block, canonical loop nest, C/C++/Fortran binding |
| ch07 ⚠ | Ch 7 | data-sharing + data-mapping clauses, reductions, defaultmap |
| ch08 ⚠ | Ch 8 | memory spaces, allocators, traits (HBM/NUMA) |
| ch09 | Ch 9 | metadirective, declare variant, dispatch, contexts |
| ch10 ⚠ | Ch 10 | assume, error, requires (USM) |
| ch11 | Ch 11 | tile/unroll/interchange/fuse + apply |
| ch12 | Ch 12 | parallel, teams, simd, masked, num_threads, proc_bind |
| ch13 | Ch 13 | for/sections/single/distribute/scan, schedule, loop |
| ch14 ⚠ | Ch 14 | task, taskloop, taskgraph, depend |
| ch15 | Ch 15-16 | target, map, declare target, interop |
| ch16 | Ch 17-19 | barrier/critical/atomic/ordered/flush, cancel, composition |
| ch17 ⚠ | Ch 20-30 | omp_* runtime API, device memory, locks, timing |
| ch18 | Ch 31-37 | OMPT (profiling), OMPD (debugging) |
| ch19 | App A-D | impl-defined, history, nesting, compound directives |
(⚠ = contains Nov-2025 errata corrections.)
Topic Index
- affinity / proc_bind / places → ch04, ch12
- allocators / memory spaces (HBM/NUMA) → ch08, ch17
- assume / requires / unified_shared_memory → ch10
- atomic / critical / flush / memory model → ch16, ch02
- barrier / nowait → ch13, ch16
- canonical loop / structured block → ch06
- cancellation → ch16, ch01
- data-mapping / map / target data → ch07, ch15
- data-sharing (shared/private/firstprivate/reduction) → ch07
- declare target / declare variant → ch15, ch09
- depend / task DAG → ch14, ch16
- device offload / target / teams → ch15, ch12
- environment variables (OMP_*) → ch04
- errata (Nov 2025) → cheatsheet, ch07/08/10/14/17
- ICVs → ch03
- interop (CUDA stream) → ch15, ch17
- loop construct / loop transforms (tile/unroll) → ch13, ch11
- metadirective / context → ch09
- OMPT / OMPD / tools → ch18
- parallel / teams / masked / simd → ch12
- reduction (incl. scan, task) → ch07, ch13, ch14
- runtime API (omp_*) → ch17
- schedule (static/dynamic/guided) → ch13
- synchronization → ch16
- target offload pattern → ch15
- taskgraph / replayable (6.0) → ch14
- tasking (task/taskloop) → ch14
- timing (omp_get_wtime) → ch17
- worksharing (for/sections/single/distribute) → ch13
Supporting Files
- glossary.md — normative terms + directive/clause vocabulary
- patterns.md — OpenMP idioms (offload, tasking DAG, NUMA, timing)
- cheatsheet.md — errata table + clause decision rules + tells & smells
Scope & Limits
Covers OpenMP API v6.0 (Nov 2024) with the Nov-2025 errata folded in (the 11 corrections are marked ⚠ in their chapters and tabulated in cheatsheet.md). Extracted with pdftotext (docling garbles this spec class — see the fortran-2023-standard note). This is the standard — implementation-defined behavior (default schedule/thread count, device mapping, lock fairness) lives in ch19/Appendix A and your compiler's docs (GCC libgomp, LLVM, NVHPC). Stubs, interface declarations, examples, and grammar are separate OpenMP documents, not in this spec. For OpenACC use openacc-3.4; for Fortran base-language rules fortran-2023-standard; for GPU/HPC practice CLAUDE-gpu.md.