pub trait DeviceMemoryResource: Send + Sync {
// Required methods
fn allocate(
&self,
bytes: usize,
stream: StreamId,
tag: AllocTag,
) -> ResourceResult<DeviceBlock>;
fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>;
fn device_ordinal(&self) -> u32;
fn bytes_outstanding(&self) -> usize;
// Provided methods
fn reap_pending(&self) -> ResourceResult<()> { ... }
fn record_block_use(
&self,
block: &DeviceBlock,
use_stream: StreamId,
) -> ResourceResult<()> { ... }
fn supports_block_use_tracking(&self) -> bool { ... }
fn prepare_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()> { ... }
fn finish_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()> { ... }
}Expand description
Stream-ordered device memory resource. Implementations:
crate::device_runtime::direct::DirectCudaResource— cudarc default (non-pooled) backend; candidate for the sanitizer/cert role, unproven until the manual Compute Sanitizer acceptance gate runs on a supported host.crate::device_runtime::async_resource::AsyncCudaResource— stream-ordered cuMemAllocAsync/cuMemFreeAsync backend; production default when the context supports async-alloc.crate::device_runtime::logging::LoggingResource— telemetry decorator over any inner resource.crate::device_runtime::budget::GlobalDeviceBudget— per-runtime byte-limit decorator over any inner resource.PoolResource— performance tier; v0.7+ (not implemented).DebugGuardResource— canary/poison/quarantine; v0.7+ (not implemented).
Implementations must be thread-safe. The runtime composes resources
via decoration (each resource wraps an inner Box<dyn DeviceMemoryResource + Send + Sync>).
Required Methods§
Sourcefn allocate(
&self,
bytes: usize,
stream: StreamId,
tag: AllocTag,
) -> ResourceResult<DeviceBlock>
fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>
Allocate bytes bytes on the resource’s device, ordered on
stream. The returned block is in BlockState::Live.
Sourcefn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
Return block to the resource. After this call the block’s
state is BlockState::Retired (or BlockState::Quarantined
for debug-guard resources). Reuse of the underlying memory is
resource-specific but must respect the stream-ordered contract.
block.alloc_stream is authoritative for ordering. If the
caller has touched the memory on a different stream, they must
have synchronized before calling deallocate.
Sourcefn device_ordinal(&self) -> u32
fn device_ordinal(&self) -> u32
CUDA device ordinal this resource serves. Resources are pinned to a single device.
Sourcefn bytes_outstanding(&self) -> usize
fn bytes_outstanding(&self) -> usize
Bytes currently outstanding (live + retired-but-not-yet-freed). Used by tests and by the global budget adaptor.
Provided Methods§
Sourcefn reap_pending(&self) -> ResourceResult<()>
fn reap_pending(&self) -> ResourceResult<()>
Drain any retired-but-not-yet-freed bytes whose underlying
CUDA work has completed. For synchronous backends this is a
no-op. For stream-ordered async backends this synchronizes
the streams that have queued cuMemFreeAsync calls and
re-counts bytes_outstanding accordingly.
Callers that need an accurate budget reading after a burst
of asynchronous deallocations should call this before
reading bytes_outstanding. Calling on a synchronous backend
is harmless and free.
Sourcefn record_block_use(
&self,
block: &DeviceBlock,
use_stream: StreamId,
) -> ResourceResult<()>
fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>
Record that work has been (or is being) submitted on
use_stream that touches block’s bytes. Resources that
participate in cross-stream lifetime tracking (notably the
stream-ordered async backend) MUST attach a CUDA event from
use_stream to the block; on deallocate(block), the
block’s alloc_stream will wait on every recorded event
before queueing the underlying free.
The default implementation returns
ResourceError::StreamMisuse. This is intentional: a
silent no-op default would let a launch builder call
record_block_use against a resource that does not
actually track cross-stream uses (e.g.,
crate::device_runtime::direct::DirectCudaResource),
observe Ok(()), queue a kernel on a different stream,
then drop the block — and quietly hit the cross-stream
use-after-free that this API exists to prevent. False
safety is worse than no safety. Resources that cannot
track cross-stream uses MUST inherit this default;
callers (notably the future xlog launch builder) MUST
surface the error rather than masking it.
Override status today:
crate::device_runtime::async_resource::AsyncCudaResourceoverrides with real event tracking.crate::device_runtime::logging::LoggingResourceandcrate::device_runtime::budget::GlobalDeviceBudgetforward to their inner resource (so the underlying backend’s behavior surfaces unchanged).crate::device_runtime::direct::DirectCudaResourcedoes NOT override — it correctly returnsStreamMisuseand forces callers to either route allocations throughAsyncCudaResourceor take responsibility for cross-stream synchronization themselves.
§Errors
ResourceError::StreamMisusefrom the default impl when the resource cannot track cross-stream uses.ResourceError::UseAfterFreeifblockis not the block currently live atblock.ptr(caller likely handed back a staleDeviceBlockwhose generation no longer matches the live entry).ResourceError::StreamMisuseifuse_streamdoes not resolve in the resource’s stream pool.ResourceError::Driverfor CUDA driver / event creation failures.
Callers that bypass this API and submit cross-stream work
directly (raw cuMemcpyDtoHAsync, raw Vec<*mut c_void>
kernel launches that the launch builder did not see, etc.)
are responsible for their own cross-stream synchronization.
The resource cannot infer arbitrary external CUDA work.
Sourcefn supports_block_use_tracking(&self) -> bool
fn supports_block_use_tracking(&self) -> bool
Whether this resource (and any inner resources it
composes) actually tracks cross-stream uses via
record_block_use. Used by the launch recorder’s
preflight to fail BEFORE queueing CUDA work, rather than
after. The default returns false to match the trait’s
default record_block_use behavior; resources that
override record_block_use to track events MUST override
this to return true. Decorators forward to inner.
Sourcefn prepare_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
Pre-launch / pre-copy hook: queue any cross-stream waits
required for use_stream to safely access block with
access semantics. MUST be called BEFORE the GPU work is
enqueued on use_stream.
Concretely, on Access::Read the resource must queue
use_stream.wait(&last_write) if a write on a different
stream is outstanding. On Access::Write /
Access::ReadWrite the resource must additionally queue
waits on every outstanding read recorded on a different
stream — the writer must observe completion of every prior
reader. Same-stream events are skipped (CUDA stream order
already covers them).
The default implementation returns
ResourceError::StreamMisuse. Same rationale as
record_block_use: a silent no-op default would let
callers paired against a non-tracking backend believe the
dependency edge was queued. Decorators forward; tracking
backends override.
§Errors
ResourceError::StreamMisusefrom the default impl when the resource cannot track cross-stream uses.ResourceError::UseAfterFreeifblockis not the id currently live atblock.ptr.ResourceError::Driverfor CUDA driver / event-wait failures.
Sourcefn finish_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
Post-launch / post-copy hook: record an event on
use_stream capturing the work just enqueued and update
block’s dependency state.
Concretely, on Access::Read the new event is appended
to the block’s outstanding-reads list (so future writers
and the eventual deallocate can wait on it). On
Access::Write / Access::ReadWrite the new event
replaces the block’s last-write event and the
outstanding-reads list is cleared (any prior reader’s
dependency was queued at prepare time and is now subsumed
by the new write event).
The default implementation returns
ResourceError::StreamMisuse. Same rationale as
record_block_use. Decorators forward; tracking backends
override.