Expand description
DirectCudaResource — cudarc default (non-pooled) allocation
backend.
Each DeviceMemoryResource::allocate call goes through cudarc’s
CudaDeviceInner::alloc::<u8>(bytes). cudarc itself routes that
through CudaStream::alloc against the device’s default stream,
which forwards to cuMemAllocAsync on contexts that support
async-alloc and falls back to a synchronous path otherwise.
There is no xlog-level pooling or suballocation in this layer —
every allocate is one cudarc call, every deallocate drops the
resulting CudaSlice<u8> (which in turn invokes cuMemFreeAsync
or the synchronous fallback that cudarc selected).
Earlier revisions described this backend as “raw cuMemAlloc /
cuMemFree”. That was wrong. A genuine raw-driver direct backend
(bypassing cudarc entirely) is a separate work item; until that
exists, this backend is the non-pooled default — not a synchronous
cuMemAlloc/cuMemFree adaptor — and it does not by itself
guarantee that pool suballocation is absent from the underlying
call path on a given host.
Sanitizer status: unproven. The intent of having a non-pooled
backend is that pool suballocation hides byte-level
out-of-bounds access from Compute Sanitizer. The cudarc default
path forwards to cuMemAllocAsync, which on async-alloc hosts is
a stream-ordered allocator; whether that is sufficiently
sanitizer-visible is exactly what the manual Compute Sanitizer
acceptance gate is supposed to confirm on a supported host. Do
not describe this backend as “sanitizer-certified” until that
manual gate has produced a captured negative-test pass; until the
gate lands, treat the sanitizer role as “candidate, not certified”.
Stream-ordered semantics: the backend records the caller-supplied
alloc_stream on the returned DeviceBlock but does not
attempt to bind the underlying cudarc allocation to that stream —
cudarc allocates against the device’s default stream regardless.
Stream-ordered allocation/free that honors a caller-supplied
StreamId is AsyncCudaResource’s responsibility (separate
commit).
Structs§
- Direct
Cuda Resource - cudarc default (non-pooled) allocation adaptor. Holds the
underlying
CudaSlice<u8>allocations alive in an internal map so the runtime returns opaqueDeviceBlocks to callers; on deallocate the slice is dropped, which invokes whichever cudarc free path matches the alloc path (cuMemFreeAsyncon async-alloc hosts, the synchronous fallback otherwise).