Skip to main content

Module direct

Module direct 

Source
Expand description

DirectCudaResource — cudarc default (non-pooled) allocation backend.

Each DeviceMemoryResource::allocate call goes through cudarc’s CudaDeviceInner::alloc::<u8>(bytes). cudarc itself routes that through CudaStream::alloc against the device’s default stream, which forwards to cuMemAllocAsync on contexts that support async-alloc and falls back to a synchronous path otherwise. There is no xlog-level pooling or suballocation in this layer — every allocate is one cudarc call, every deallocate drops the resulting CudaSlice<u8> (which in turn invokes cuMemFreeAsync or the synchronous fallback that cudarc selected).

Earlier revisions described this backend as “raw cuMemAlloc / cuMemFree”. That was wrong. A genuine raw-driver direct backend (bypassing cudarc entirely) is a separate work item; until that exists, this backend is the non-pooled default — not a synchronous cuMemAlloc/cuMemFree adaptor — and it does not by itself guarantee that pool suballocation is absent from the underlying call path on a given host.

Sanitizer status: unproven. The intent of having a non-pooled backend is that pool suballocation hides byte-level out-of-bounds access from Compute Sanitizer. The cudarc default path forwards to cuMemAllocAsync, which on async-alloc hosts is a stream-ordered allocator; whether that is sufficiently sanitizer-visible is exactly what the manual Compute Sanitizer acceptance gate is supposed to confirm on a supported host. Do not describe this backend as “sanitizer-certified” until that manual gate has produced a captured negative-test pass; until the gate lands, treat the sanitizer role as “candidate, not certified”.

Stream-ordered semantics: the backend records the caller-supplied alloc_stream on the returned DeviceBlock but does not attempt to bind the underlying cudarc allocation to that stream — cudarc allocates against the device’s default stream regardless. Stream-ordered allocation/free that honors a caller-supplied StreamId is AsyncCudaResource’s responsibility (separate commit).

Structs§

DirectCudaResource
cudarc default (non-pooled) allocation adaptor. Holds the underlying CudaSlice<u8> allocations alive in an internal map so the runtime returns opaque DeviceBlocks to callers; on deallocate the slice is dropped, which invokes whichever cudarc free path matches the alloc path (cuMemFreeAsync on async-alloc hosts, the synchronous fallback otherwise).