Skip to main content

Module wcoj_dispatch

Module wcoj_dispatch 

Source
Expand description

v0.6.2 minimal env-gated GPU 3-way WCOJ triangle dispatch.

Single public entry: try_wcoj_triangle_u32_dispatch (env-driven) and try_wcoj_triangle_u32_dispatch_with_gate (boolean-driven, for tests). The slice is intentionally narrow:

  • Env flag only. XLOG_USE_WCOJ_TRIANGLE_U32=1 (or true/TRUE) opts in. Anything else (unset, 0, false, etc.) means the helper returns Ok(None) unconditionally — the caller takes the existing binary-join path.
  • Recognizes exactly one shape. A rule of the form tri(X, Y, Z) :- e1(X, Y), e2(Y, Z), e3(X, Z) over 2-column WCOJ-eligible relations (U32, Symbol, or U64 keys — see WcojKeyWidth): three positive 2-arity body atoms covering the head’s three distinct variables in head-position order. No negation, no comparison filters, no recursion (head predicate not in body), no reversed-axis atoms (e.g. e1(Y, X)), no constants in atom args. The planner must also return [xlog_logic::hypergraph::RulePlan::MultiwayCandidate].
  • Width uniformity. All three slots must share a key width. A mixed-width triangle (e.g. e1 U32, e2 U64) is rejected at this dispatch level — the binary-join chain handles it.
  • Silent fallback. Any mismatch — gate off, shape mismatch, planner verdict not multiway, missing input buffer, unsupported scalar type, mixed-width slots — returns Ok(None) without an error or log line. The caller is expected to silently route to the existing binary-join path. This keeps the env flag truly opt-in and prevents the helper from accidentally diverting work it can’t handle.
  • Strict GPU pipeline on dispatch. When all checks pass, the helper builds three sorted+deduped layouts and runs the matching WCOJ triangle kernel on the configured launch_streamwcoj_layout_u32_recorded / wcoj_triangle_u32_recorded for 4-byte keys, the _u64_recorded siblings for 8-byte keys. All [xlog_cuda::launch::LaunchRecorder] discipline carries through unchanged.

What this slice deliberately does NOT do:

  • No automatic detection at the executor level — callers pass the rule + input buffers explicitly. Executor wiring lives in xlog-runtime.
  • No recursion / SCC mixed execution.
  • No cost model.
  • No mixed-width admission (U32+U64 triangle stays on the binary-join path).
  • No histogram-guided block dispatch.

Constants§

ENV_USE_WCOJ_TRIANGLE_U32
Env variable controlling the dispatch gate. Treated as ON when set to "1" or case-insensitive "true"; anything else (unset, "0", "false", empty string, …) means OFF.

Functions§

try_wcoj_triangle_u32_dispatch
Env-driven entry. Reads XLOG_USE_WCOJ_TRIANGLE_U32 and delegates to try_wcoj_triangle_u32_dispatch_with_gate.
try_wcoj_triangle_u32_dispatch_with_gate
Test-friendly form that takes the gate as an explicit boolean. Production callers use try_wcoj_triangle_u32_dispatch which reads the env var.