pub struct AsyncCudaResource { /* private fields */ }Expand description
Stream-ordered cudarc-backed allocator.
Implementations§
Source§impl AsyncCudaResource
impl AsyncCudaResource
Sourcepub fn new(
device: Arc<CudaDevice>,
device_ordinal: u32,
stream_pool: Arc<StreamPool>,
) -> Self
pub fn new( device: Arc<CudaDevice>, device_ordinal: u32, stream_pool: Arc<StreamPool>, ) -> Self
Construct a resource bound to device using stream_pool for
stream resolution. device_ordinal is the CUDA ordinal for
logging / multi-device disambiguation.
pub fn device(&self) -> &Arc<CudaDevice>
pub fn stream_pool(&self) -> &Arc<StreamPool>
Sourcepub fn live_bytes(&self) -> usize
pub fn live_bytes(&self) -> usize
Bytes currently held by live blocks (excludes pending frees).
Test/diagnostic accessor — production code should use
bytes_outstanding.
Sourcepub fn pending_free_bytes(&self) -> usize
pub fn pending_free_bytes(&self) -> usize
Bytes queued for cuMemFreeAsync whose stream has not yet
been synchronized by us. Test/diagnostic accessor.
Sourcepub fn pending_per_stream_total(&self) -> usize
pub fn pending_per_stream_total(&self) -> usize
Sum of per-stream pending byte tallies. Test/diagnostic
accessor used to assert the invariant
pending_free_bytes() == pending_per_stream_total(). The
invariant must hold at any quiescent moment; if it fails
the bookkeeping under the pending_per_stream mutex has
drifted from the global atomic — see deallocate and
reap_pending, which update both as a unit.
Sourcepub fn pending_use_event_count(&self, ptr: u64) -> Option<usize>
pub fn pending_use_event_count(&self, ptr: u64) -> Option<usize>
Number of recorded outstanding-read events plus a
last_write event (0 or 1) currently attached to the live
block at ptr. Test/diagnostic accessor — used by
reproducers to confirm finish_block_use actually
attached events before deallocate consumed them. Returns
None if ptr is not currently in the live map.
Trait Implementations§
Source§impl DeviceMemoryResource for AsyncCudaResource
impl DeviceMemoryResource for AsyncCudaResource
Source§fn allocate(
&self,
bytes: usize,
stream: StreamId,
tag: AllocTag,
) -> ResourceResult<DeviceBlock>
fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>
bytes bytes on the resource’s device, ordered on
stream. The returned block is in BlockState::Live.Source§fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
block to the resource. After this call the block’s
state is BlockState::Retired (or BlockState::Quarantined
for debug-guard resources). Reuse of the underlying memory is
resource-specific but must respect the stream-ordered contract. Read moreSource§fn device_ordinal(&self) -> u32
fn device_ordinal(&self) -> u32
Source§fn bytes_outstanding(&self) -> usize
fn bytes_outstanding(&self) -> usize
Source§fn reap_pending(&self) -> ResourceResult<()>
fn reap_pending(&self) -> ResourceResult<()>
cuMemFreeAsync calls and
re-counts bytes_outstanding accordingly. Read moreSource§fn supports_block_use_tracking(&self) -> bool
fn supports_block_use_tracking(&self) -> bool
record_block_use. Used by the launch recorder’s
preflight to fail BEFORE queueing CUDA work, rather than
after. The default returns false to match the trait’s
default record_block_use behavior; resources that
override record_block_use to track events MUST override
this to return true. Decorators forward to inner.Source§fn record_block_use(
&self,
block: &DeviceBlock,
use_stream: StreamId,
) -> ResourceResult<()>
fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>
use_stream that touches block’s bytes. Resources that
participate in cross-stream lifetime tracking (notably the
stream-ordered async backend) MUST attach a CUDA event from
use_stream to the block; on deallocate(block), the
block’s alloc_stream will wait on every recorded event
before queueing the underlying free. Read moreSource§fn prepare_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
use_stream to safely access block with
access semantics. MUST be called BEFORE the GPU work is
enqueued on use_stream. Read moreSource§fn finish_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
use_stream capturing the work just enqueued and update
block’s dependency state. Read more