Trait DeviceMemoryResource

Source

pub trait DeviceMemoryResource: Send + Sync {
    // Required methods
    fn allocate(
        &self,
        bytes: usize,
        stream: StreamId,
        tag: AllocTag,
    ) -> ResourceResult<DeviceBlock>;
    fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>;
    fn device_ordinal(&self) -> u32;
    fn bytes_outstanding(&self) -> usize;

    // Provided methods
    fn reap_pending(&self) -> ResourceResult<()> { ... }
    fn record_block_use(
        &self,
        block: &DeviceBlock,
        use_stream: StreamId,
    ) -> ResourceResult<()> { ... }
    fn supports_block_use_tracking(&self) -> bool { ... }
    fn prepare_block_use(
        &self,
        block: BlockId,
        use_stream: StreamId,
        access: Access,
    ) -> ResourceResult<()> { ... }
    fn finish_block_use(
        &self,
        block: BlockId,
        use_stream: StreamId,
        access: Access,
    ) -> ResourceResult<()> { ... }
}

Expand description

Stream-ordered device memory resource. Implementations:

crate::device_runtime::direct::DirectCudaResource — cudarc default (non-pooled) backend; candidate for the sanitizer/cert role, unproven until the manual Compute Sanitizer acceptance gate runs on a supported host.
crate::device_runtime::async_resource::AsyncCudaResource — stream-ordered cuMemAllocAsync/cuMemFreeAsync backend; production default when the context supports async-alloc.
crate::device_runtime::logging::LoggingResource — telemetry decorator over any inner resource.
crate::device_runtime::budget::GlobalDeviceBudget — per-runtime byte-limit decorator over any inner resource.
PoolResource — performance tier; v0.7+ (not implemented).
DebugGuardResource — canary/poison/quarantine; v0.7+ (not implemented).

Implementations must be thread-safe. The runtime composes resources via decoration (each resource wraps an inner Box<dyn DeviceMemoryResource + Send + Sync>).

Required Methods§

Source

fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>

Allocate bytes bytes on the resource’s device, ordered on stream. The returned block is in BlockState::Live.

Source

fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>

Return block to the resource. After this call the block’s state is BlockState::Retired (or BlockState::Quarantined for debug-guard resources). Reuse of the underlying memory is resource-specific but must respect the stream-ordered contract.

block.alloc_stream is authoritative for ordering. If the caller has touched the memory on a different stream, they must have synchronized before calling deallocate.

Source

fn device_ordinal(&self) -> u32

CUDA device ordinal this resource serves. Resources are pinned to a single device.

Source

fn bytes_outstanding(&self) -> usize

Bytes currently outstanding (live + retired-but-not-yet-freed). Used by tests and by the global budget adaptor.

Provided Methods§

Source

fn reap_pending(&self) -> ResourceResult<()>

Drain any retired-but-not-yet-freed bytes whose underlying CUDA work has completed. For synchronous backends this is a no-op. For stream-ordered async backends this synchronizes the streams that have queued cuMemFreeAsync calls and re-counts bytes_outstanding accordingly.

Callers that need an accurate budget reading after a burst of asynchronous deallocations should call this before reading bytes_outstanding. Calling on a synchronous backend is harmless and free.

Source

fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>

Record that work has been (or is being) submitted on use_stream that touches block’s bytes. Resources that participate in cross-stream lifetime tracking (notably the stream-ordered async backend) MUST attach a CUDA event from use_stream to the block; on deallocate(block), the block’s alloc_stream will wait on every recorded event before queueing the underlying free.

The default implementation returns ResourceError::StreamMisuse. This is intentional: a silent no-op default would let a launch builder call record_block_use against a resource that does not actually track cross-stream uses (e.g., crate::device_runtime::direct::DirectCudaResource), observe Ok(()), queue a kernel on a different stream, then drop the block — and quietly hit the cross-stream use-after-free that this API exists to prevent. False safety is worse than no safety. Resources that cannot track cross-stream uses MUST inherit this default; callers (notably the future xlog launch builder) MUST surface the error rather than masking it.

Override status today:

crate::device_runtime::async_resource::AsyncCudaResource overrides with real event tracking.
crate::device_runtime::logging::LoggingResource and crate::device_runtime::budget::GlobalDeviceBudget forward to their inner resource (so the underlying backend’s behavior surfaces unchanged).
crate::device_runtime::direct::DirectCudaResource does NOT override — it correctly returns StreamMisuse and forces callers to either route allocations through AsyncCudaResource or take responsibility for cross-stream synchronization themselves.

§Errors

ResourceError::StreamMisuse from the default impl when the resource cannot track cross-stream uses.
ResourceError::UseAfterFree if block is not the block currently live at block.ptr (caller likely handed back a stale DeviceBlock whose generation no longer matches the live entry).
ResourceError::StreamMisuse if use_stream does not resolve in the resource’s stream pool.
ResourceError::Driver for CUDA driver / event creation failures.

Callers that bypass this API and submit cross-stream work directly (raw cuMemcpyDtoHAsync, raw Vec<*mut c_void> kernel launches that the launch builder did not see, etc.) are responsible for their own cross-stream synchronization. The resource cannot infer arbitrary external CUDA work.

Source

fn supports_block_use_tracking(&self) -> bool

Whether this resource (and any inner resources it composes) actually tracks cross-stream uses via record_block_use. Used by the launch recorder’s preflight to fail BEFORE queueing CUDA work, rather than after. The default returns false to match the trait’s default record_block_use behavior; resources that override record_block_use to track events MUST override this to return true. Decorators forward to inner.

Source

fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Pre-launch / pre-copy hook: queue any cross-stream waits required for use_stream to safely access block with access semantics. MUST be called BEFORE the GPU work is enqueued on use_stream.

Concretely, on Access::Read the resource must queue use_stream.wait(&last_write) if a write on a different stream is outstanding. On Access::Write / Access::ReadWrite the resource must additionally queue waits on every outstanding read recorded on a different stream — the writer must observe completion of every prior reader. Same-stream events are skipped (CUDA stream order already covers them).

The default implementation returns ResourceError::StreamMisuse. Same rationale as record_block_use: a silent no-op default would let callers paired against a non-tracking backend believe the dependency edge was queued. Decorators forward; tracking backends override.

§Errors

ResourceError::StreamMisuse from the default impl when the resource cannot track cross-stream uses.
ResourceError::UseAfterFree if block is not the id currently live at block.ptr.
ResourceError::Driver for CUDA driver / event-wait failures.

Source

fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Post-launch / post-copy hook: record an event on use_stream capturing the work just enqueued and update block’s dependency state.

Concretely, on Access::Read the new event is appended to the block’s outstanding-reads list (so future writers and the eventual deallocate can wait on it). On Access::Write / Access::ReadWrite the new event replaces the block’s last-write event and the outstanding-reads list is cleared (any prior reader’s dependency was queued at prepare time and is now subsumed by the new write event).

The default implementation returns ResourceError::StreamMisuse. Same rationale as record_block_use. Decorators forward; tracking backends override.

Implementors§

Source §

DeviceMemoryResource

Trait DeviceMemoryResource Copy item path

Required Methods§

fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>

fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>

fn device_ordinal(&self) -> u32

fn bytes_outstanding(&self) -> usize

Provided Methods§

fn reap_pending(&self) -> ResourceResult<()>

fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>

§Errors

fn supports_block_use_tracking(&self) -> bool

fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

§Errors

fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Implementors§

impl DeviceMemoryResource for AsyncCudaResource

impl DeviceMemoryResource for GlobalDeviceBudget

impl DeviceMemoryResource for DirectCudaResource

impl DeviceMemoryResource for LoggingResource

Trait DeviceMemoryResource