Expand description
XlogDeviceRuntime — per-CUDA-ordinal singleton hosting the
device-runtime allocator stack.
Replaces the per-CudaKernelProvider GpuMemoryManager model with
a single live runtime per physical GPU. All CudaKernelProviders
on a given ordinal share the same runtime once the migration
commit lands; until then this type is constructed and used by
tests only.
Singleton lifetime: leaked-Box, so the returned &'static borrows
are valid for the process. No teardown on drop — appropriate for a
GPU device runtime that should outlive any single executor.
§Initialization race semantics
Earlier revisions used OnceLock::get_or_init(|| leaked_box)
after building the runtime outside the lock. That pattern leaked
the loser’s runtime (and its CUDA context handle) when two
threads raced on the first access for an ordinal.
This module now uses an explicit per-ordinal Mutex plus
OnceLock: callers fast-path on OnceLock::get(), and on a miss
take the per-ordinal mutex, double-check the OnceLock, and only
the winner inside the mutex builds and stores the runtime. The
mutex is held only across the build, so subsequent reads are still
lock-free.
Structs§
- Xlog
Device Runtime - Per-CUDA-ordinal device-runtime singleton.
Constants§
- MAX_
DEVICE_ ORDINALS - Maximum CUDA ordinal supported by the singleton table. CUDA itself caps at 16 visible devices in typical configurations; raise here only when a multi-GPU node demands it.