pub struct DirectCudaResource { /* private fields */ }Expand description
cudarc default (non-pooled) allocation adaptor. Holds the
underlying CudaSlice<u8> allocations alive in an internal map so
the runtime returns opaque DeviceBlocks to callers; on
deallocate the slice is dropped, which invokes whichever cudarc
free path matches the alloc path (cuMemFreeAsync on async-alloc
hosts, the synchronous fallback otherwise).
Concurrency: Send + Sync. The internal map is protected by a
Mutex. Allocate and deallocate are short-running map operations
plus the underlying CUDA call.
Implementations§
Source§impl DirectCudaResource
impl DirectCudaResource
Sourcepub fn new(device: Arc<CudaDevice>, device_ordinal: u32) -> Self
pub fn new(device: Arc<CudaDevice>, device_ordinal: u32) -> Self
Construct a resource bound to device. device_ordinal is the
CUDA ordinal for logging / multi-device disambiguation.
Sourcepub fn device(&self) -> &Arc<CudaDevice>
pub fn device(&self) -> &Arc<CudaDevice>
Borrow the device handle. Tests and downstream resources use this to launch kernels against the same device this resource allocates on.
Trait Implementations§
Source§impl DeviceMemoryResource for DirectCudaResource
impl DeviceMemoryResource for DirectCudaResource
Source§fn allocate(
&self,
bytes: usize,
stream: StreamId,
tag: AllocTag,
) -> ResourceResult<DeviceBlock>
fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>
bytes bytes on the resource’s device, ordered on
stream. The returned block is in BlockState::Live.Source§fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>
block to the resource. After this call the block’s
state is BlockState::Retired (or BlockState::Quarantined
for debug-guard resources). Reuse of the underlying memory is
resource-specific but must respect the stream-ordered contract. Read moreSource§fn device_ordinal(&self) -> u32
fn device_ordinal(&self) -> u32
Source§fn bytes_outstanding(&self) -> usize
fn bytes_outstanding(&self) -> usize
Source§fn reap_pending(&self) -> ResourceResult<()>
fn reap_pending(&self) -> ResourceResult<()>
cuMemFreeAsync calls and
re-counts bytes_outstanding accordingly. Read moreSource§fn record_block_use(
&self,
block: &DeviceBlock,
use_stream: StreamId,
) -> ResourceResult<()>
fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>
use_stream that touches block’s bytes. Resources that
participate in cross-stream lifetime tracking (notably the
stream-ordered async backend) MUST attach a CUDA event from
use_stream to the block; on deallocate(block), the
block’s alloc_stream will wait on every recorded event
before queueing the underlying free. Read moreSource§fn supports_block_use_tracking(&self) -> bool
fn supports_block_use_tracking(&self) -> bool
record_block_use. Used by the launch recorder’s
preflight to fail BEFORE queueing CUDA work, rather than
after. The default returns false to match the trait’s
default record_block_use behavior; resources that
override record_block_use to track events MUST override
this to return true. Decorators forward to inner.Source§fn prepare_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
use_stream to safely access block with
access semantics. MUST be called BEFORE the GPU work is
enqueued on use_stream. Read moreSource§fn finish_block_use(
&self,
block: BlockId,
use_stream: StreamId,
access: Access,
) -> ResourceResult<()>
fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>
use_stream capturing the work just enqueued and update
block’s dependency state. Read more