Struct GpuMemoryManager

Source

pub struct GpuMemoryManager { /* private fields */ }

Expand description

GPU memory manager with budget enforcement

Tracks allocated GPU memory and enforces a memory budget. When the budget would be exceeded, returns XlogError::ResourceExhausted.

§v0.6 device-runtime routing (opt-in)

Constructing via GpuMemoryManager::with_runtime attaches an XlogDeviceRuntime that mediates allocations through the v0.6 resource stack (e.g., GlobalDeviceBudget → LoggingResource → AsyncCudaResource). When attached:

GpuMemoryManager::alloc::<T> routes the underlying allocation through the runtime and produces a typed view via cudarc’s upgrade_device_ptr::<T>. The returned TrackedCudaSlice frees through the runtime on drop.
GpuMemoryManager::alloc_raw is the explicit raw-bytes entry point (no typed view), also runtime-routed.

Both budgets apply: the manager’s local MemoryBudget AND any GlobalDeviceBudget stacked above the runtime’s underlying resource.

When the manager is constructed via GpuMemoryManager::new (no runtime attached), alloc::<T> and the rest of the public API behave bit-for-bit identically to pre-migration: cudarc’s device.alloc::<T>(len) allocates and cudarc frees on drop. alloc_raw returns XlogError::Kernel when no runtime is attached (no silent fallback). CudaKernelProvider::new continues to construct the manager via new for now; runtime-routed providers are an opt-in through with_runtime at construction sites that need it.

Implementations§

Source §

impl GpuMemoryManager

Source

pub fn new(device: Arc<CudaDevice>, budget: MemoryBudget) -> Self

Create a new GPU memory manager

§Arguments

device - The CUDA device to allocate memory on
budget - Memory budget configuration

Source

pub fn with_runtime( device: Arc<CudaDevice>, budget: MemoryBudget, runtime: Arc<XlogDeviceRuntime>, ) -> Self

Like [new], but additionally attaches a v0.6 XlogDeviceRuntime. The runtime mediates both alloc::<T> and alloc_raw through the v0.6 resource stack: typed alloc::<T> returns a TrackedCudaSlice<T> whose underlying memory is owned by the runtime (typed view via cudarc’s upgrade_device_ptr::<T>, freed through the runtime on drop). The legacy cudarc path is only used when the manager is built via [new] (no runtime attached). Provider construction does not yet require the runtime; callers that want runtime-routed allocations opt in here.

Source

pub fn runtime(&self) -> Option<&Arc<XlogDeviceRuntime>>

Borrow the attached device runtime, if any. None when the manager was constructed via [new]. Test/diagnostic accessor; production call sites that need the runtime own it directly.

Source

pub fn alloc<T: DeviceRepr>( self: &Arc<Self>, len: usize, ) -> Result<TrackedCudaSlice<T>>

Allocate GPU memory for len elements of type T

§Arguments

len - Number of elements to allocate

§Returns

A tracked CudaSlice<T> containing the allocated memory

§Errors

XlogError::ResourceExhausted if allocation would exceed budget
XlogError::Kernel if CUDA allocation fails

§v0.6 routing

When the manager has an attached XlogDeviceRuntime (constructed via [with_runtime]), the underlying allocation is routed through the runtime’s resource stack and a typed view is created via cudarc’s upgrade_device_ptr::<T> over the runtime’s raw pointer. The returned TrackedCudaSlice frees through the runtime on drop. Without a runtime attached, the legacy cudarc alloc::<T> path is used and drop frees through cudarc — bit-for-bit identical to pre-migration behavior.

Source

pub fn check_budget(&self, bytes: u64) -> Result<()>

Check if an allocation of bytes would exceed the budget

§Arguments

bytes - Number of bytes to allocate

§Returns

Ok(()) if allocation is within budget

§Errors

XlogError::ResourceExhausted if allocation would exceed budget

Source

pub fn allocated_bytes(&self) -> u64

Get the current allocated memory in bytes

Source

pub fn peak_bytes(&self) -> u64

High-water mark of allocated bytes since construction or the last reset_peak. Always ≥ allocated_bytes at the moment it was recorded. Measurement-harness API (S3 peak-memory gate).

Source

pub fn reset_peak(&self)

Reset the peak high-water mark to the current allocated level, so a measurement window starts from live state rather than zero. Measurement-harness API.

Source

pub fn alloc_count(&self) -> u64

Number of alloc calls issued so far (device allocation requests). The GPU-resident MC engine snapshots this around the measured region to prove per_operator_host_allocations == 0 (all arenas pre-allocated).

Source

pub fn reset_alloc_count(&self)

Reset the allocation-request counter to zero.

Source

pub fn budget(&self) -> &MemoryBudget

Get the memory budget

Source

pub fn device(&self) -> &Arc<CudaDevice>

Get the underlying CUDA device

Source

pub fn record_free(&self, bytes: u64)

Record that memory has been freed

Note: cudarc automatically frees memory when CudaSlice is dropped. This method should be called to update tracking when memory is freed.

Source

pub fn alloc_raw( self: &Arc<Self>, bytes: usize, tag: AllocTag, ) -> Result<RuntimeAllocBlock>

v0.6 device-runtime entry point: allocate bytes raw bytes through the attached XlogDeviceRuntime.

Returns a RuntimeAllocBlock that owns the allocation. On drop, the block deallocates through the runtime and updates both the manager’s local allocated counter and the runtime’s bookkeeping.

Both budgets apply: the manager’s local MemoryBudget::device_bytes AND any GlobalDeviceBudget stacked above the runtime’s underlying resource. Either rejecting the request returns an XlogError. On runtime rejection the local reservation is rolled back so subsequent allocations see consistent state.

§Errors

XlogError::Kernel if no runtime is attached.
XlogError::ResourceExhausted if the local budget cannot accommodate the request.
XlogError::Kernel (with the resource error rendered) if the runtime rejects the request — including the runtime’s own OutOfBudget, which is mapped here so callers see a single error surface.

Source

pub fn remaining_bytes(&self) -> u64

Get remaining budget in bytes

Source

pub fn reset_tracking(&self)

Reset allocation tracking

This should be called when GPU memory has been freed but the tracker hasn’t been updated (e.g., when CudaSlice instances are dropped without calling record_free). This is a temporary workaround until proper RAII-based tracking is implemented.