xlog_cuda::device_runtime::async_resource

Struct AsyncCudaResource

pub struct AsyncCudaResource { /* private fields */ }

Expand description

Stream-ordered cudarc-backed allocator.

Implementations§

Source §

impl AsyncCudaResource

Source

pub fn new( device: Arc<CudaDevice>, device_ordinal: u32, stream_pool: Arc<StreamPool>, ) -> Self

Construct a resource bound to device using stream_pool for stream resolution. device_ordinal is the CUDA ordinal for logging / multi-device disambiguation.

Source

pub fn device(&self) -> &Arc<CudaDevice>

Source

pub fn stream_pool(&self) -> &Arc<StreamPool>

Source

pub fn live_bytes(&self) -> usize

Bytes currently held by live blocks (excludes pending frees). Test/diagnostic accessor — production code should use bytes_outstanding.

Source

pub fn pending_free_bytes(&self) -> usize

Bytes queued for cuMemFreeAsync whose stream has not yet been synchronized by us. Test/diagnostic accessor.

Source

pub fn pending_per_stream_total(&self) -> usize

Sum of per-stream pending byte tallies. Test/diagnostic accessor used to assert the invariant pending_free_bytes() == pending_per_stream_total(). The invariant must hold at any quiescent moment; if it fails the bookkeeping under the pending_per_stream mutex has drifted from the global atomic — see deallocate and reap_pending, which update both as a unit.

Source

pub fn pending_use_event_count(&self, ptr: u64) -> Option<usize>

Number of recorded outstanding-read events plus a last_write event (0 or 1) currently attached to the live block at ptr. Test/diagnostic accessor — used by reproducers to confirm finish_block_use actually attached events before deallocate consumed them. Returns None if ptr is not currently in the live map.

Trait Implementations§

Source §

impl DeviceMemoryResource for AsyncCudaResource

Source §

fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>

Allocate bytes bytes on the resource’s device, ordered on stream. The returned block is in BlockState::Live.

Source §

fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>

Return block to the resource. After this call the block’s state is BlockState::Retired (or BlockState::Quarantined for debug-guard resources). Reuse of the underlying memory is resource-specific but must respect the stream-ordered contract. Read more

Source §

fn device_ordinal(&self) -> u32

CUDA device ordinal this resource serves. Resources are pinned to a single device.

Source §

fn bytes_outstanding(&self) -> usize

Bytes currently outstanding (live + retired-but-not-yet-freed). Used by tests and by the global budget adaptor.

Source §

fn reap_pending(&self) -> ResourceResult<()>

Drain any retired-but-not-yet-freed bytes whose underlying CUDA work has completed. For synchronous backends this is a no-op. For stream-ordered async backends this synchronizes the streams that have queued cuMemFreeAsync calls and re-counts bytes_outstanding accordingly. Read more

Source §

fn supports_block_use_tracking(&self) -> bool

Whether this resource (and any inner resources it composes) actually tracks cross-stream uses via record_block_use. Used by the launch recorder’s preflight to fail BEFORE queueing CUDA work, rather than after. The default returns false to match the trait’s default record_block_use behavior; resources that override record_block_use to track events MUST override this to return true. Decorators forward to inner.

Source §

fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>

Record that work has been (or is being) submitted on use_stream that touches block’s bytes. Resources that participate in cross-stream lifetime tracking (notably the stream-ordered async backend) MUST attach a CUDA event from use_stream to the block; on deallocate(block), the block’s alloc_stream will wait on every recorded event before queueing the underlying free. Read more

Source §

fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Pre-launch / pre-copy hook: queue any cross-stream waits required for use_stream to safely access block with access semantics. MUST be called BEFORE the GPU work is enqueued on use_stream. Read more

Source §