pub struct GpuMemoryManager { /* private fields */ }Expand description
GPU memory manager with budget enforcement
Tracks allocated GPU memory and enforces a memory budget.
When the budget would be exceeded, returns XlogError::ResourceExhausted.
§v0.6 device-runtime routing (opt-in)
Constructing via GpuMemoryManager::with_runtime attaches an
XlogDeviceRuntime that mediates allocations through the v0.6
resource stack (e.g., GlobalDeviceBudget → LoggingResource →
AsyncCudaResource). When attached:
GpuMemoryManager::alloc::<T>routes the underlying allocation through the runtime and produces a typed view via cudarc’supgrade_device_ptr::<T>. The returnedTrackedCudaSlicefrees through the runtime on drop.GpuMemoryManager::alloc_rawis the explicit raw-bytes entry point (no typed view), also runtime-routed.
Both budgets apply: the manager’s local MemoryBudget AND any
GlobalDeviceBudget stacked above the runtime’s underlying
resource.
When the manager is constructed via GpuMemoryManager::new
(no runtime attached), alloc::<T> and the rest of the public
API behave bit-for-bit identically to pre-migration: cudarc’s
device.alloc::<T>(len) allocates and cudarc frees on drop.
alloc_raw returns XlogError::Kernel when no runtime is
attached (no silent fallback). CudaKernelProvider::new
continues to construct the manager via new for now;
runtime-routed providers are an opt-in through with_runtime
at construction sites that need it.
Implementations§
Source§impl GpuMemoryManager
impl GpuMemoryManager
Sourcepub fn new(device: Arc<CudaDevice>, budget: MemoryBudget) -> Self
pub fn new(device: Arc<CudaDevice>, budget: MemoryBudget) -> Self
Create a new GPU memory manager
§Arguments
device- The CUDA device to allocate memory onbudget- Memory budget configuration
Sourcepub fn with_runtime(
device: Arc<CudaDevice>,
budget: MemoryBudget,
runtime: Arc<XlogDeviceRuntime>,
) -> Self
pub fn with_runtime( device: Arc<CudaDevice>, budget: MemoryBudget, runtime: Arc<XlogDeviceRuntime>, ) -> Self
Like [new], but additionally attaches a v0.6
XlogDeviceRuntime. The runtime mediates both
alloc::<T> and alloc_raw
through the v0.6 resource stack: typed alloc::<T> returns a
TrackedCudaSlice<T> whose underlying memory is owned by
the runtime (typed view via cudarc’s upgrade_device_ptr::<T>,
freed through the runtime on drop). The legacy cudarc path is
only used when the manager is built via [new] (no runtime
attached). Provider construction does not yet require the
runtime; callers that want runtime-routed allocations opt in
here.
Sourcepub fn runtime(&self) -> Option<&Arc<XlogDeviceRuntime>>
pub fn runtime(&self) -> Option<&Arc<XlogDeviceRuntime>>
Borrow the attached device runtime, if any. None when the
manager was constructed via [new]. Test/diagnostic
accessor; production call sites that need the runtime own
it directly.
Sourcepub fn alloc<T: DeviceRepr>(
self: &Arc<Self>,
len: usize,
) -> Result<TrackedCudaSlice<T>>
pub fn alloc<T: DeviceRepr>( self: &Arc<Self>, len: usize, ) -> Result<TrackedCudaSlice<T>>
Allocate GPU memory for len elements of type T
§Arguments
len- Number of elements to allocate
§Returns
A tracked CudaSlice<T> containing the allocated memory
§Errors
XlogError::ResourceExhaustedif allocation would exceed budgetXlogError::Kernelif CUDA allocation fails
§v0.6 routing
When the manager has an attached XlogDeviceRuntime
(constructed via [with_runtime]), the underlying allocation
is routed through the runtime’s resource stack and a typed
view is created via cudarc’s upgrade_device_ptr::<T> over
the runtime’s raw pointer. The returned TrackedCudaSlice
frees through the runtime on drop. Without a runtime
attached, the legacy cudarc alloc::<T> path is used and
drop frees through cudarc — bit-for-bit identical to
pre-migration behavior.
Sourcepub fn check_budget(&self, bytes: u64) -> Result<()>
pub fn check_budget(&self, bytes: u64) -> Result<()>
Sourcepub fn allocated_bytes(&self) -> u64
pub fn allocated_bytes(&self) -> u64
Get the current allocated memory in bytes
Sourcepub fn peak_bytes(&self) -> u64
pub fn peak_bytes(&self) -> u64
High-water mark of allocated bytes since construction or the
last reset_peak. Always ≥
allocated_bytes at the moment it was
recorded. Measurement-harness API (S3 peak-memory gate).
Sourcepub fn reset_peak(&self)
pub fn reset_peak(&self)
Reset the peak high-water mark to the current allocated level, so a measurement window starts from live state rather than zero. Measurement-harness API.
Sourcepub fn alloc_count(&self) -> u64
pub fn alloc_count(&self) -> u64
Number of alloc calls issued so far (device allocation requests).
The GPU-resident MC engine snapshots this around the measured region to
prove per_operator_host_allocations == 0 (all arenas pre-allocated).
Sourcepub fn reset_alloc_count(&self)
pub fn reset_alloc_count(&self)
Reset the allocation-request counter to zero.
Sourcepub fn device(&self) -> &Arc<CudaDevice>
pub fn device(&self) -> &Arc<CudaDevice>
Get the underlying CUDA device
Sourcepub fn record_free(&self, bytes: u64)
pub fn record_free(&self, bytes: u64)
Record that memory has been freed
Note: cudarc automatically frees memory when CudaSlice is dropped. This method should be called to update tracking when memory is freed.
Sourcepub fn alloc_raw(
self: &Arc<Self>,
bytes: usize,
tag: AllocTag,
) -> Result<RuntimeAllocBlock>
pub fn alloc_raw( self: &Arc<Self>, bytes: usize, tag: AllocTag, ) -> Result<RuntimeAllocBlock>
v0.6 device-runtime entry point: allocate bytes raw bytes
through the attached XlogDeviceRuntime.
Returns a RuntimeAllocBlock that owns the allocation. On
drop, the block deallocates through the runtime and updates
both the manager’s local allocated counter and the
runtime’s bookkeeping.
Both budgets apply: the manager’s local
MemoryBudget::device_bytes AND any GlobalDeviceBudget
stacked above the runtime’s underlying resource. Either
rejecting the request returns an XlogError. On runtime
rejection the local reservation is rolled back so subsequent
allocations see consistent state.
§Errors
XlogError::Kernelif no runtime is attached.XlogError::ResourceExhaustedif the local budget cannot accommodate the request.XlogError::Kernel(with the resource error rendered) if the runtime rejects the request — including the runtime’s ownOutOfBudget, which is mapped here so callers see a single error surface.
Sourcepub fn remaining_bytes(&self) -> u64
pub fn remaining_bytes(&self) -> u64
Get remaining budget in bytes
Sourcepub fn reset_tracking(&self)
pub fn reset_tracking(&self)
Reset allocation tracking
This should be called when GPU memory has been freed but the tracker hasn’t been updated (e.g., when CudaSlice instances are dropped without calling record_free). This is a temporary workaround until proper RAII-based tracking is implemented.