Skip to main content

LaunchRecorder

Struct LaunchRecorder 

Source
pub struct LaunchRecorder { /* private fields */ }
Expand description

Records buffer uses for a single launch / copy on launch_stream. Drop without commit is a programmer error; the recorder logs (debug builds only) and never panics.

§Lifetime model

The recorder snapshots each registered block’s identity (BlockId) at record time and immediately drops the source slice borrow. The recorder type itself carries no lifetime parameter, so callers can interleave rec.read(&buf) calls with later &mut buf kernel-param borrows freely. The runtime’s generation guard catches misuse where the snapshot outlives the underlying allocation.

§Required call order for non-empty recorders

preflight(&runtime) MUST be called and return Ok(()) BEFORE any CUDA work is enqueued, AND BEFORE commit. Preflight queues the cross-stream waits each recorded access kind requires (read waits on prior writes; write waits on prior writes AND prior reads), so the launch sees a well-fenced view of every input. Commit then records the new event on launch_stream so future ops can wait on it.

Empty recorders (no read/write/… calls) are a no-op and bypass the preflight requirement: there are no waits to queue, no events to record.

Implementations§

Source§

impl LaunchRecorder

Source

pub fn new_permissive(launch_stream: StreamId) -> Self

Permissive recorder: silently skips untracked buffers.

Source

pub fn new_strict(launch_stream: StreamId) -> Self

Strict recorder: rejects any untracked buffer. Production migrated launch paths use this.

Source

pub fn launch_stream(&self) -> StreamId

Configured launch stream.

Source

pub fn mode(&self) -> RecorderMode

Configured mode.

Source

pub fn read<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self

Record a runtime-backed crate::memory::TrackedCudaSlice the launch will read.

Source

pub fn write<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self

Record a runtime-backed slice the launch will write. Use this for both pre-existing buffers being overwritten AND for fresh runtime-backed allocations whose lifetime began in the same operator. The recorder snapshots block identity at record time and drops the borrow, so kernel &mut slice borrows after preflight are unaffected.

Source

pub fn read_write<T: DeviceRepr>( &mut self, slice: &TrackedCudaSlice<T>, ) -> &mut Self

Record a runtime-backed slice the launch will both read and write.

Source

pub fn read_column(&mut self, col: &CudaColumn) -> &mut Self

Record a crate::memory::CudaColumn the launch will read. Owned columns surface their runtime block; external (Dlpack / ArrowDevice) columns are rejected in strict mode and silently skipped in permissive mode.

Source

pub fn write_column(&mut self, col: &CudaColumn) -> &mut Self

Record a crate::memory::CudaColumn the launch will write.

Source

pub fn recorded_count(&self) -> usize

Number of recorded runtime-backed uses. Diagnostic.

Source

pub fn preflight(&mut self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>

Preflight: validate the recorder is ready to commit against runtime AND queue every cross-stream wait the recorded access kinds require. Stateful — sets a flag that commit checks. MUST be called BEFORE enqueueing the CUDA launch / copy. On failure no CUDA work has been queued yet, the flag remains unset, and the caller can either fix the recorder or abandon the launch.

Verifies (in order):

  • No strict-mode rejection accumulated during recording (untracked / external buffer in strict mode, or post-preflight note attempt).
  • The active resource stack supports cross-stream tracking (runtime.supports_block_use_tracking()) OR the recorder has zero tracked uses (no events to record).

Then for each recorded use, calls XlogDeviceRuntime::prepare_block_use which queues cuStreamWaitEvent calls on launch_stream for any prior write (read access) or any prior write + prior reads (write / read-write access) on a different stream. Same-stream events are skipped — already ordered.

Repeated registrations of the same block in the same recorder are deduplicated to a single prepare call (the strongest access kind wins): read + write of the same block becomes one Access::ReadWrite prepare.

Source

pub fn commit(self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>

Commit the recorded uses to the runtime. MUST be called AFTER preflight succeeded AND the CUDA launch has been enqueued on launch_stream.

Non-empty recorders that were not preflighted are rejected with StreamMisuse. This closes the footgun where a caller could enqueue CUDA work, then call commit, then discover at commit-time that the active resource is unsupported — leaving unprotected work in flight. Production migrated launch paths must therefore always preflight BEFORE the CUDA call.

Empty recorders (no recorded uses) bypass the check: nothing to record, no events to fire, no contract to honor.

For each recorded use, calls XlogDeviceRuntime::finish_block_use which records an event on launch_stream and folds it into the block’s dependency state (writers replace last_write and clear outstanding_reads; readers append to outstanding_reads). Repeated registrations of the same block are deduplicated identically to preflight.

Trait Implementations§

Source§

impl Drop for LaunchRecorder

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,