gpu-ci

star 5

GPU CI patterns for CUDA compilation caching, manylinux wheels, and multi-arch builds. Use when distribution=pypi-wheel or hardware_targets includes cuda.

AilvenLiu By AilvenLiu schedule Updated 6/5/2026

name: gpu-ci description: "GPU CI patterns for CUDA compilation caching, manylinux wheels, and multi-arch builds. Use when distribution=pypi-wheel or hardware_targets includes cuda." version: 1.0.0

/gpu-ci

Guidance-only skill. No .ai/bin/agent-gpu-ci wrapper exists. Applies when .ai/project.yml declares distribution: pypi-wheel or hardware_targets: [cuda].

Covers: sccache compilation caching, auditwheel manylinux validation, multi-CUDA wheel matrices, and GPU test gating patterns.


1. sccache for CUDA compilation caching

sccache wraps nvcc and g++. Set these before cmake:

export SCCACHE_BUCKET=my-build-cache   # S3 bucket
export SCCACHE_REGION=us-west-2
export CUDAHOSTCXX=/usr/bin/g++        # separate host/device compilation
sccache --start-server

In CMakeLists.txt:

set(CMAKE_CUDA_COMPILER_LAUNCHER sccache)
set(CMAKE_CXX_COMPILER_LAUNCHER  sccache)

GitHub Actions — use mozilla-actions/sccache-action@v0.0.3. Cache-hit optimisation: pin compiler versions in Docker, avoid timestamp-dependent flags, and keep CUDA Toolkit versions consistent across matrix jobs.


2. auditwheel — manylinux wheel validation

CUDA runtime libraries MUST be excluded from bundling (user provides them):

auditwheel repair dist/*.whl \
  --exclude libcuda.so.1      \
  --exclude libcudart.so.11.0 \
  --exclude libcudart.so.12.0 \
  --exclude libcublas.so.11   \
  --exclude libcublas.so.12   \
  --exclude libcublasLt.so.11 \
  --exclude libcublasLt.so.12 \
  --exclude libcudnn.so.8     \
  --exclude libnccl.so.2      \
  --plat manylinux2014_x86_64 \
  -w dist/repaired/

Always exclude: libcuda, libcudart, libnvrtc, libcublas*, libcudnn, libnccl. May bundle: custom CUDA kernels compiled as .so.


3. Multi-CUDA wheel build matrix

Use PEP 440 local version identifiers: mypackage-0.1.0+cu118-cp310-…whl

Typical GitHub Actions matrix:

strategy:
  matrix:
    cuda: [cu118, cu121, cu124]
    python: ['3.9', '3.10', '3.11', '3.12']

CUDA version map: cu11811.8.0, cu12112.1.0, cu12412.4.0. Use Jimver/cuda-toolkit@v0.2.11 to install. Inject the +cuXYZ suffix before uploading to avoid overwriting wheels of different CUDA variants.


4. GPU test gating patterns

pytest markers (conftest.py)

def pytest_runtest_setup(item):
    gpu = _get_gpu_type()   # nvidia-smi --query-gpu=name
    if item.get_closest_marker('gpu') and gpu is None:
        pytest.skip("GPU not available")
    if item.get_closest_marker('h100') and 'H100' not in (gpu or ''):
        pytest.skip("H100 not available")
    if item.get_closest_marker('a100') and 'A100' not in (gpu or ''):
        pytest.skip("A100 not available")

Usage:

@pytest.mark.gpu
def test_basic_cuda(): ...

@pytest.mark.h100
def test_h100_fp8(): ...

Self-hosted runner labels

jobs:
  test-h100:
    runs-on: [self-hosted, gpu, h100]
  test-a100:
    runs-on: [self-hosted, gpu, a100]

5. Common pitfalls

Problem Solution
auditwheel bundles libcudart Always pass --exclude libcudart.so.*
Inconsistent CUDA patch versions break sccache Pin exact versions: 11.8.0 not 11.8
Multiple CUDA variants overwrite each other Inject +cu118 suffix before upload
GPU tests fail on CPU-only runners Use pytest markers + skip logic (section 4)

6. Constraints respected

  • .ai/constraints/hybrid/python-cpp-build.md — wheel build patterns
  • .ai/constraints/hybrid/system-deps.md — CUDA Toolkit discovery
  • .ai/constraints/cpp/cuda-modern.md — CUDA compilation flags
Install via CLI
npx skills add https://github.com/AilvenLiu/repo_template --skill gpu-ci
Repository Details
star Stars 5
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator