Host Control, Device Data
xlog-runtime::Executor manages:
- relation names, relation IDs, generations, and schemas;
- a
RelationStorebacked byCudaBuffervalues; - runtime statistics and join-selectivity observations;
- persistent build-side join indexes;
- dispatch counters for optimized routes;
- recursive SCC state for seed, delta, and merge phases.
xlog-cuda::CudaKernelProvider owns the device-facing operations. It loads CUDA
artifacts, allocates tracked slices, launches kernels, and records transfer
telemetry where a path needs a no-host-transfer assertion.
This means the executor is not itself GPU-resident. The relation state and kernel
workspaces are.
RIR Evaluation
The executor evaluates RIR nodes into relation buffers:| RIR work | GPU behavior |
|---|---|
| Scan | Return the current relation buffer from the store. |
| Filter | Build boolean masks with typed comparison, arithmetic, and boolean kernels, then compact selected rows. |
| Project | Select, reorder, or compute output columns. |
| Join | Dispatch hash join, nested-loop join, WCOJ, Free Join, or a fallback route depending on shape and runtime gates. |
| Groupby | Run recorded aggregate kernels for supported key/value widths. |
| Recursive SCC | Execute semi-naive seed and delta variants until convergence or a configured iteration limit. |
Predicate And Arithmetic Masks
Filters lower into a mask pipeline:- Arithmetic expression kernels produce temporary columns.
- Typed comparison kernels produce boolean masks.
- Boolean mask kernels combine predicates with
and,or, andnot. - Stream compaction writes the filtered output buffer.
Joins And Recursion
Ordinary joins remain the baseline path. The runtime can also route specific shapes through specialized kernels:- hash joins for normal binary joins;
- nested-loop joins for small eligible products;
- WCOJ kernels for recognized triangle, 4-cycle, and clique shapes;
- Free Join for broader multiway bodies on main, unreleased beyond 0.9.2;
- factorized recursive-delta routing on main, unreleased beyond 0.9.2.
Ingestion And Diagnostics
Large graph ingestion and delta diagnostics are adjacent runtime surfaces rather than the core RIR loop:xlog_gpu::biokg::StreamingGraphRelationLoaderstreams JSONL, CSV, and N-Triples graph rows into typed edge records with bounded-memory telemetry.DeltaPlannerTelemetryreports cache reuse, fallback decisions, affected SCCs, recomputed SCCs, and estimated versus measured delta behavior.pyxlogexposes planner telemetry through diagnostic result payloads.
What To Verify
When you need to prove that a workload used the intended GPU path, check:- the route counter for the optimized dispatch;
- kill-switch parity against the fallback route;
- transfer telemetry for no-host-transfer claims;
- CUDA-required validation when the claim depends on actual GPU execution.