Skip to main content

XlogDeviceRuntime

Struct XlogDeviceRuntime 

Source
pub struct XlogDeviceRuntime { /* private fields */ }
Expand description

Per-CUDA-ordinal device-runtime singleton.

Owns the device handle, stream pool, and resource stack. Allocate / deallocate calls forward to the resource. The resource is fixed at construction (currently always DirectCudaResource); a future commit will swap in [AsyncCudaResource] as the default while keeping the direct backend reachable for sanitizer mode.

Implementations§

Source§

impl XlogDeviceRuntime

Source

pub fn with_resource( device: Arc<CudaDevice>, device_ordinal: u32, stream_pool: Arc<StreamPool>, resource: Box<dyn DeviceMemoryResource + Send + Sync>, ) -> Self

Compose an owned runtime around a caller-supplied resource stack. Not a singleton — the returned value is not stored in [RUNTIMES] and does not interact with try_get.

Intended uses:

  • Tests that need to drive a specific backend (e.g., AsyncCudaResource) through the same facade production code uses, instead of constructing the resource directly.
  • Future decorator stacks (LoggingResource, GlobalDeviceBudget, DebugGuardResource) that wrap the base resource before installation.

The device and stream_pool arguments must be consistent with device_ordinal (the pool must be bound to the same device handle, and the device must be the one the resource allocates against). The constructor does not verify this — callers that compose mismatched parts get undefined runtime-level behavior, but the per-resource device-ordinal check on deallocate will still surface obvious mistakes as ResourceError::Driver.

The singleton path remains Self::try_get, which today always installs the cudarc default (non-pooled) backend (DirectCudaResource). Swapping the singleton’s default resource is a separate later change gated on GlobalDeviceBudget and LoggingResource landing.

Source

pub fn try_get(ordinal: u32) -> Result<&'static XlogDeviceRuntime>

Get the singleton for ordinal, initializing it on first access. Subsequent calls return the same &'static.

Errors:

  • XlogError::Kernel if ordinal >= MAX_DEVICE_ORDINALS.
  • XlogError::Kernel if the CUDA device cannot be opened.

Concurrency: at most one thread builds the runtime for a given ordinal. Other concurrent first callers block on the per-ordinal init mutex until the winner publishes via OnceLock::set, after which they observe the published runtime via the inside-mutex double-check or the lock-free fast path on subsequent calls.

Source

pub fn device_ordinal(&self) -> u32

CUDA ordinal this runtime serves.

Source

pub fn device(&self) -> &Arc<CudaDevice>

Borrow the device handle.

Source

pub fn stream_pool(&self) -> &Arc<StreamPool>

Borrow the stream pool.

Source

pub fn allocate( &self, bytes: usize, stream: StreamId, tag: AllocTag, ) -> ResourceResult<DeviceBlock>

Allocate via the underlying resource. Stream-ordered: the returned DeviceBlock is bound to stream.

Source

pub fn deallocate(&self, block: DeviceBlock) -> ResourceResult<()>

Deallocate via the underlying resource.

Source

pub fn bytes_outstanding(&self) -> usize

Sum of bytes currently outstanding on this device, as reported by the underlying resource. Used by the global-budget adaptor (later commit) and the parallel-stress acceptance test.

Source

pub fn reap_pending(&self) -> ResourceResult<()>

Drain pending async frees on the underlying resource. No-op for synchronous backends. Callers that need an accurate bytes_outstanding reading after a burst of asynchronous deallocations should call this first.

Source

pub fn record_block_use( &self, block: &DeviceBlock, use_stream: StreamId, ) -> ResourceResult<()>

Record that work has been (or is being) submitted on use_stream that touches block. Forwards to the underlying resource stack (GlobalDeviceBudgetLoggingResourceAsyncCudaResource), where the stream-ordered backend attaches a CUDA event so block.alloc_stream waits on it before the queued cuMemFreeAsync runs. This is the production-reachable hook the future xlog launch builder will call for read / write / read_write buffer args; until that lands, callers that submit raw CUDA work on a stream other than block.alloc_stream should call this directly. See DeviceMemoryResource::record_block_use for the underlying contract.

Source

pub fn supports_block_use_tracking(&self) -> bool

Whether the active resource stack tracks cross-stream uses (i.e., supports record_block_use). The launch recorder’s preflight checks this BEFORE queuing CUDA work, so a misconfigured runtime fails loudly at the boundary rather than after the launch is in flight.

Source

pub fn prepare_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Pre-launch hook: queue cross-stream waits required for use_stream to safely access block with access semantics. MUST be called BEFORE the GPU work is enqueued on use_stream. Forwards to the resource stack; see DeviceMemoryResource::prepare_block_use for the underlying contract.

Source

pub fn finish_block_use( &self, block: BlockId, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Post-launch hook: record an event on use_stream capturing the work just enqueued and update block’s dependency state. MUST be called AFTER the launch / copy is queued. Forwards to the resource stack; see DeviceMemoryResource::finish_block_use for the underlying contract.

Source

pub fn prepare_first_use<T: DeviceRepr>( &self, slice: &TrackedCudaSlice<T>, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Convenience for helper-internal scratch allocations that will be immediately written / read on use_stream.

Looks up the BlockId from the slice’s runtime block and calls Self::prepare_block_use with access. Use this directly after GpuMemoryManager::alloc when the buffer’s first cross-stream consumer is the same operator (e.g., a hash-table bucket array memset on launch_stream against a buffer freshly allocated on the manager’s default stream).

Returns Err(ResourceError::StreamMisuse) if slice is not runtime-backed — strict callers should ensure their memory manager carries a runtime.

Source

pub fn finish_first_use<T: DeviceRepr>( &self, slice: &TrackedCudaSlice<T>, use_stream: StreamId, access: Access, ) -> ResourceResult<()>

Convenience for helper-internal scratch finish: looks up the BlockId from the slice and forwards to Self::finish_block_use.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,