Skip to main content

AsyncCudaResource

Struct AsyncCudaResource 

Source
pub struct AsyncCudaResource { /* private fields */ }
Expand description

Stream-ordered cudarc-backed allocator.

Implementations§

Source§

impl AsyncCudaResource

Source

pub fn new( device: Arc<CudaDevice>, device_ordinal: u32, stream_pool: Arc<StreamPool>, ) -> Self

Construct a resource bound to device using stream_pool for stream resolution. device_ordinal is the CUDA ordinal for logging / multi-device disambiguation.

Source

pub fn device(&self) -> &Arc<CudaDevice>

Source

pub fn stream_pool(&self) -> &Arc<StreamPool>

Source

pub fn live_bytes(&self) -> usize

Bytes currently held by live blocks (excludes pending frees). Test/diagnostic accessor — production code should use bytes_outstanding.

Source

pub fn pending_free_bytes(&self) -> usize

Bytes queued for cuMemFreeAsync whose stream has not yet been synchronized by us. Test/diagnostic accessor.

Source

pub fn pending_per_stream_total(&self) -> usize

Sum of per-stream pending byte tallies. Test/diagnostic accessor used to assert the invariant pending_free_bytes() == pending_per_stream_total(). The invariant must hold at any quiescent moment; if it fails the bookkeeping under the pending_per_stream mutex has drifted from the global atomic — see deallocate and reap_pending, which update both as a unit.

Source

pub fn pending_use_event_count(&self, ptr: u64) -> Option<usize>

Number of recorded outstanding-read events plus a last_write event (0 or 1) currently attached to the live block at ptr. Test/diagnostic accessor — used by reproducers to confirm finish_block_use actually attached events before deallocate consumed them. Returns None if ptr is not currently in the live map.

Trait Implementations§

Source§

impl DeviceMemoryResource for AsyncCudaResource

Source§

fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>

Allocate bytes bytes on the resource’s device, ordered on stream. The returned block is in BlockState::Live.
Source§

fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>

Return block to the resource. After this call the block’s state is BlockState::Retired (or BlockState::Quarantined for debug-guard resources). Reuse of the underlying memory is resource-specific but must respect the stream-ordered contract. Read more
Source§

fn device_ordinal(&self) -> u32

CUDA device ordinal this resource serves. Resources are pinned to a single device.
Source§

fn bytes_outstanding(&self) -> usize

Bytes currently outstanding (live + retired-but-not-yet-freed). Used by tests and by the global budget adaptor.
Source§

fn reap_pending(&self) -> ResourceResult<()>

Drain any retired-but-not-yet-freed bytes whose underlying CUDA work has completed. For synchronous backends this is a no-op. For stream-ordered async backends this synchronizes the streams that have queued cuMemFreeAsync calls and re-counts bytes_outstanding accordingly. Read more
Source§

fn supports_block_use_tracking(&self) -> bool

Whether this resource (and any inner resources it composes) actually tracks cross-stream uses via record_block_use. Used by the launch recorder’s preflight to fail BEFORE queueing CUDA work, rather than after. The default returns false to match the trait’s default record_block_use behavior; resources that override record_block_use to track events MUST override this to return true. Decorators forward to inner.
Source§

fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>

Record that work has been (or is being) submitted on use_stream that touches block’s bytes. Resources that participate in cross-stream lifetime tracking (notably the stream-ordered async backend) MUST attach a CUDA event from use_stream to the block; on deallocate(block), the block’s alloc_stream will wait on every recorded event before queueing the underlying free. Read more
Source§

fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Pre-launch / pre-copy hook: queue any cross-stream waits required for use_stream to safely access block with access semantics. MUST be called BEFORE the GPU work is enqueued on use_stream. Read more
Source§

fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Post-launch / post-copy hook: record an event on use_stream capturing the work just enqueued and update block’s dependency state. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,