pub struct LaunchRecorder { /* private fields */ }Expand description
Records buffer uses for a single launch / copy on
launch_stream. Drop without commit is a programmer error;
the recorder logs (debug builds only) and never panics.
§Lifetime model
The recorder snapshots each registered block’s identity
(BlockId) at record time and immediately drops the source
slice borrow. The recorder type itself carries no lifetime
parameter, so callers can interleave rec.read(&buf) calls
with later &mut buf kernel-param borrows freely. The
runtime’s generation guard catches misuse where the snapshot
outlives the underlying allocation.
§Required call order for non-empty recorders
preflight(&runtime) MUST be called and return Ok(())
BEFORE any CUDA work is enqueued, AND BEFORE commit.
Preflight queues the cross-stream waits each recorded access
kind requires (read waits on prior writes; write waits on
prior writes AND prior reads), so the launch sees a
well-fenced view of every input. Commit then records the
new event on launch_stream so future ops can wait on it.
Empty recorders (no read/write/… calls) are a no-op
and bypass the preflight requirement: there are no waits
to queue, no events to record.
Implementations§
Source§impl LaunchRecorder
impl LaunchRecorder
Sourcepub fn new_permissive(launch_stream: StreamId) -> Self
pub fn new_permissive(launch_stream: StreamId) -> Self
Permissive recorder: silently skips untracked buffers.
Sourcepub fn new_strict(launch_stream: StreamId) -> Self
pub fn new_strict(launch_stream: StreamId) -> Self
Strict recorder: rejects any untracked buffer. Production migrated launch paths use this.
Sourcepub fn launch_stream(&self) -> StreamId
pub fn launch_stream(&self) -> StreamId
Configured launch stream.
Sourcepub fn mode(&self) -> RecorderMode
pub fn mode(&self) -> RecorderMode
Configured mode.
Sourcepub fn read<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self
pub fn read<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self
Record a runtime-backed crate::memory::TrackedCudaSlice
the launch will read.
Sourcepub fn write<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self
pub fn write<T: DeviceRepr>(&mut self, slice: &TrackedCudaSlice<T>) -> &mut Self
Record a runtime-backed slice the launch will write.
Use this for both pre-existing buffers being overwritten
AND for fresh runtime-backed allocations whose lifetime
began in the same operator. The recorder snapshots block
identity at record time and drops the borrow, so kernel
&mut slice borrows after preflight are unaffected.
Sourcepub fn read_write<T: DeviceRepr>(
&mut self,
slice: &TrackedCudaSlice<T>,
) -> &mut Self
pub fn read_write<T: DeviceRepr>( &mut self, slice: &TrackedCudaSlice<T>, ) -> &mut Self
Record a runtime-backed slice the launch will both read and write.
Sourcepub fn read_column(&mut self, col: &CudaColumn) -> &mut Self
pub fn read_column(&mut self, col: &CudaColumn) -> &mut Self
Record a crate::memory::CudaColumn the launch will
read. Owned columns surface their runtime block; external
(Dlpack / ArrowDevice) columns are rejected in strict
mode and silently skipped in permissive mode.
Sourcepub fn write_column(&mut self, col: &CudaColumn) -> &mut Self
pub fn write_column(&mut self, col: &CudaColumn) -> &mut Self
Record a crate::memory::CudaColumn the launch will
write.
Sourcepub fn recorded_count(&self) -> usize
pub fn recorded_count(&self) -> usize
Number of recorded runtime-backed uses. Diagnostic.
Sourcepub fn preflight(&mut self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>
pub fn preflight(&mut self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>
Preflight: validate the recorder is ready to commit
against runtime AND queue every cross-stream wait the
recorded access kinds require. Stateful — sets a flag
that commit checks. MUST be called BEFORE enqueueing
the CUDA launch / copy. On failure no CUDA work has been
queued yet, the flag remains unset, and the caller can
either fix the recorder or abandon the launch.
Verifies (in order):
- No strict-mode rejection accumulated during recording
(untracked / external buffer in strict mode, or
post-preflight
noteattempt). - The active resource stack supports cross-stream
tracking (
runtime.supports_block_use_tracking()) OR the recorder has zero tracked uses (no events to record).
Then for each recorded use, calls
XlogDeviceRuntime::prepare_block_use which queues
cuStreamWaitEvent calls on launch_stream for any
prior write (read access) or any prior write + prior
reads (write / read-write access) on a different stream.
Same-stream events are skipped — already ordered.
Repeated registrations of the same block in the same
recorder are deduplicated to a single prepare call (the
strongest access kind wins): read + write of the
same block becomes one Access::ReadWrite prepare.
Sourcepub fn commit(self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>
pub fn commit(self, runtime: &XlogDeviceRuntime) -> ResourceResult<()>
Commit the recorded uses to the runtime. MUST be called
AFTER preflight succeeded AND the CUDA launch has been
enqueued on launch_stream.
Non-empty recorders that were not preflighted are
rejected with StreamMisuse. This closes the footgun
where a caller could enqueue CUDA work, then call
commit, then discover at commit-time that the active
resource is unsupported — leaving unprotected work in
flight. Production migrated launch paths must therefore
always preflight BEFORE the CUDA call.
Empty recorders (no recorded uses) bypass the check: nothing to record, no events to fire, no contract to honor.
For each recorded use, calls
XlogDeviceRuntime::finish_block_use which records an
event on launch_stream and folds it into the block’s
dependency state (writers replace last_write and clear
outstanding_reads; readers append to
outstanding_reads). Repeated registrations of the same
block are deduplicated identically to preflight.