Skip to main content

Module runtime

Module runtime 

Source
Expand description

XlogDeviceRuntime — per-CUDA-ordinal singleton hosting the device-runtime allocator stack.

Replaces the per-CudaKernelProvider GpuMemoryManager model with a single live runtime per physical GPU. All CudaKernelProviders on a given ordinal share the same runtime once the migration commit lands; until then this type is constructed and used by tests only.

Singleton lifetime: leaked-Box, so the returned &'static borrows are valid for the process. No teardown on drop — appropriate for a GPU device runtime that should outlive any single executor.

§Initialization race semantics

Earlier revisions used OnceLock::get_or_init(|| leaked_box) after building the runtime outside the lock. That pattern leaked the loser’s runtime (and its CUDA context handle) when two threads raced on the first access for an ordinal.

This module now uses an explicit per-ordinal Mutex plus OnceLock: callers fast-path on OnceLock::get(), and on a miss take the per-ordinal mutex, double-check the OnceLock, and only the winner inside the mutex builds and stores the runtime. The mutex is held only across the build, so subsequent reads are still lock-free.

Structs§

XlogDeviceRuntime
Per-CUDA-ordinal device-runtime singleton.

Constants§

MAX_DEVICE_ORDINALS
Maximum CUDA ordinal supported by the singleton table. CUDA itself caps at 16 visible devices in typical configurations; raise here only when a multi-GPU node demands it.