SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: LicenseRef-NvidiaProprietary
NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
property and proprietary rights in and to this material, related
documentation and any modifications thereto. Any use, reproduction,
disclosure or distribution of this material and related documentation
without an express license agreement from NVIDIA CORPORATION or
its affiliates is strictly prohibited.
name: cuda-interop description: > GPU interop patterns for CUDA arrays, timeline semaphores, and Vulkan shared memory. Use when user asks about CUDA interop, GPU rendering pipelines, Vulkan interop, shared memory, or timeline semaphores. license: LicenseRef-NvidiaProprietary version: "0.3.0" author: NVIDIA ovrtx tags: - ovrtx - cuda - interop tools: - Read - Grep
CUDA Interop
When to Use
Use this skill when the user asks about CUDA interop, GPU rendering pipelines, Vulkan interop, shared memory, or timeline semaphores.
Inputs
Resolve inputs in this order: existing repository files and referenced snippets, explicit user request, then broader agent context.
- Target API surface: Python, C/C++, USD, or a combination.
- Interop path: render output to linear CUDA memory, render output to CUDA array, attribute mapping to CUDA, or Vulkan external memory.
- RenderProduct/RenderVar or attribute target, CUDA device expectations, stream/event handles, and ownership boundaries.
- Tensor/image shape, dtype, synchronization points, and whether the consumer expects linear memory or array/image memory.
- Repository source snippets referenced below. Treat these snippets as the API source of truth.
Prerequisites
- Use an ovrtx checkout that contains the referenced examples and docs tests.
- Read the relevant
> **Source:**snippet before writing or explaining API usage. - Confirm whether the requested path needs
Device.CUDA,OVRTX_MAP_DEVICE_TYPE_CUDA,OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAY, or Vulkan shared memory. - Use
reading-render-outputfor CPU readback or non-interop render output mapping.
Instructions
- Identify whether the workflow maps render output to CUDA memory, maps output to a CUDA array, writes attributes from CUDA memory, or coordinates CUDA with Vulkan.
- Read the source snippet for the exact map/unmap path before choosing
OVRTX_MAP_DEVICE_TYPE_CUDA,OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAY, or PythonDevice.CUDA. - Preserve the synchronization contract: wait on ovrtx-provided CUDA events before reading, record CUDA work completion events before unmapping, and use
sync_streamonly for the auto-sync path. - For Vulkan interop, keep external-memory ownership, timeline semaphore ordering, and double-buffer lifetimes aligned with the full
vulkan-interopexample. - When changing code, run the CUDA render-output or Vulkan interop example/test that owns the snippet whenever practical.
Output Format
- For explanations, cite the relevant API names, source snippets, and caveats.
- For code changes, summarize the files changed, snippets affected, and validation run.
Scripts
This skill has no scripts.
Limitations
- The referenced snippets remain the source of truth; update or add tested snippets before documenting new API usage.
Overview
ovrtx renders on the GPU and can provide output as CUDA device memory, and in C it can also provide CUDA arrays (zero-copy). For advanced pipelines (e.g., Vulkan display, custom CUDA post-processing), you need to handle CUDA synchronization correctly to avoid race conditions between ovrtx's internal GPU work and your own.
Python
Map render output to CUDA
Source:
tests/docs/python/test_camera_sensors.pysnippetdoc-map-render-output-cuda
Map attribute to CUDA for Warp kernel writes
Source:
tests/docs/python/test_attribute_bindings.pysnippetdoc-map-attribute-cuda
C
OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAY is available through the C API (ovrtx_map_render_var_output). Python render-var mapping uses device=Device.CPU / device=Device.CUDA and does not expose a CUDA-array selector.
Map render output to linear CUDA memory
Use OVRTX_MAP_DEVICE_TYPE_CUDA when your CUDA consumer expects a linear device pointer. Image outputs may require an internal copy into linear memory.
Source:
tests/docs/c/test_camera_sensors.cppsnippetdoc-map-render-output-cuda-c
Map render output as CUDA array (zero-copy)
Use OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAY when your CUDA consumer can read a CUarray through texture/surface APIs and you want the zero-copy image path.
Source:
tests/docs/c/test_camera_sensors.cppsnippetdoc-map-render-output-cuda-array-c
Wait for render completion before accessing
The output may not be fully written when map returns. Check the wait event:
Source:
examples/c/vulkan-interop/src/main.cppsnippetmap-rendered-output-cuda-arrayCheck
rendered_output.cuda_sync.wait_eventand callcuStreamWaitEventbefore accessing.
Signal when CUDA work is done, then unmap
Source:
examples/c/vulkan-interop/src/main.cppsnippetwrite-camera-transformRecord a CUDA event after your kernel, then pass it via
ovrtx_cuda_sync_t.wait_eventon unmap.
Double-buffered async pattern (from vulkan-interop example)
For maximum throughput, use two shared images and ping-pong between them. CUDA writes to one while a consumer (e.g., Vulkan) reads the other.
See examples/c/vulkan-interop/src/main.cpp for the full double-buffered async
rendering pattern with timeline semaphores and CUDA-Vulkan shared images.
Map with sync_stream (auto-sync)
If you provide sync_stream on the map call, ovrtx inserts the wait automatically:
Source:
examples/c/vulkan-interop/src/main.cppsnippetmap-rendered-output-cuda-arraySet
map_desc.sync_stream = (uintptr_t)cuda_streamfor automatic synchronization.
Key Types / Functions
| Concept | Python | C |
|---|---|---|
| Map to CUDA | render_var.map(device=Device.CUDA) |
OVRTX_MAP_DEVICE_TYPE_CUDA |
| Map to CUDA array | N/A (Python uses CUDA) | OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAY |
| Wait for render | sync_stream= on map() |
cuStreamWaitEvent(stream, wait_event, 0) |
| Signal done | var.unmap(stream=...) |
cuda_sync.wait_event = (uintptr_t)event |
| Stream sync on map | sync_stream parameter |
map_desc.sync_stream |
C CUDA synchronization uses ovrtx_cuda_sync_t with stream and wait_event
fields. A stream value of 0 means no stream, 1 means the default stream,
and values greater than 1 identify a specific CUDA stream. A wait_event
value of 0 means no event wait.
Troubleshooting
- Consumer owns lifetime (Python). The C render resource stays alive as long as any DLPack consumer (Warp, PyTorch, CuPy, etc.) holds a reference to the tensor. Drop the views when you're done to release the resource. If you need data to outlive the mapping, take a
.copy(). - Always wait on
rendered_output.cuda_sync.wait_eventbefore accessing CUDA array data. - Always signal completion (via event or stream) when unmapping after CUDA work, or ovrtx may reclaim the buffer while your kernel is still running.
- In C,
OVRTX_MAP_DEVICE_TYPE_CUDA_ARRAYreturns aCUarray(opaque pointer indl.data), not linear memory. Usesurf2Dwrite/tex2Dto access it. - In C,
OVRTX_MAP_DEVICE_TYPE_CUDAreturns linear device memory (may incur a copy for image outputs). - In Python,
eventandstreamonunmap()are mutually exclusive — pass one or the other. write_attribute(),write_array_attribute(), andbinding.write()also acceptcuda_stream=andcuda_event=for GPU-synchronised writes. When you pass a CUDA Warp/PyTorch/etc. tensor together withcuda_stream=, ovrtx forwards the stream to the producer's DLPack sync — the producer bridges its internal stream automatically, so no manualwp.synchronize_streamis needed before the call.
References
- Use the
> **Source:**directives in this skill to locate tested snippets before reusing API patterns. - Keep related skills, docs, and snippets synchronized when changing the workflow.