When WCOJ dispatches
A rule is a candidate for WCOJ only when three things line up. Miss any one and the rule runs on the ordinary binary plan.The body is positive relational Datalog
Every body atom is a relation scan. Negation, aggregation,
is expressions, ground
facts, and too-few positive atoms all disqualify the rule.The body matches a certified shape
XLOG certifies a fixed set of join geometries — a triangle
e(X,Y), e(Y,Z), e(X,Z),
a 4-cycle e(W,X), e(X,Y), e(Y,Z), e(Z,W), and K5 / K6 cliques where every
C(K,2) edge is present. Shapes outside this set are not automatically WCOJ.Eligible is not the same as dispatched
This is the point most people miss. A rule can clear every check above, be promoted into aMultiWayJoin plan node, and still execute on binary joins. Promotion makes a rule
eligible; a second decision — made at runtime from cost and statistics — decides
whether it is actually dispatched.
Every promoted MultiWayJoin carries an embedded fallback: the post-optimizer binary
plan that would have run without promotion. When the dispatcher declines, the executor
descends into that fallback. Common reasons the dispatcher declines an eligible rule:
- The cost model, given the available statistics, predicts the binary plan wins.
- Statistics are missing or too thin for the cost model to justify WCOJ.
- The order planner would have to start from a bad prefix that materializes a larger intermediate than the binary plan, so it declines rather than dispatch the loss.
- A kill switch or force-off setting is active (see below).
- Runtime validation fails — missing relation buffers, mixed width classes, a projection that does not match the certified shape.
- For a
K-clique, the statistics are incomplete or a hash route is predicted to win. This is a planned hash route, not a promoter miss.
Environment-variable controls
Environment variables are process-global — set them once at process startup, and in tests prefer the per-runtime builders below to avoid cross-test bleed. Unset, empty,0, and false never force a route on.
| Variable | Values | Controls | Effect |
|---|---|---|---|
XLOG_USE_WCOJ_TRIANGLE_U32 | 1 / true | Triangle force gate | Forces recognized triangle dispatch, bypassing the adaptive classifier. |
XLOG_DISABLE_WCOJ_TRIANGLE | 1 / true | Triangle kill switch | Pins all triangle WCOJ off; beats every other triangle flag. |
XLOG_USE_WCOJ_4CYCLE | 1 / true | 4-cycle force gate | Forces recognized 4-cycle dispatch. |
XLOG_USE_WCOJ_4CYCLE_ADAPTIVE | 1 / true | 4-cycle adaptive opt-in | Lets the cost model decide 4-cycle dispatch. Off by default. |
XLOG_DISABLE_WCOJ_4CYCLE | 1 / true | 4-cycle kill switch | Beats both force and adaptive. |
XLOG_WCOJ_COST_MODEL | cardinality, skew, skewclassifier | Runtime cost model | Selects the dispatch cost model. Invalid non-empty values resolve to SkewClassifier; unset defaults to Cardinality. |
XLOG_WCOJ_BLOCK_WORK_UNIT | integer 1..8192 | Block-slice work unit | Per-block work granularity. Default 1024; invalid values warn and fall back to the default. |
Triangle adaptive dispatch is default-on and triangle hard-disable are exposed
as runtime-config builders, not environment resolvers. Use
with_wcoj_triangle_dispatch_adaptive(...) and
with_wcoj_triangle_dispatch_disabled(...) for those two controls.Factorized execution controls (development branch only)
The three variables below govern factorized execution — GPU Free Join, factorized
recursive deltas, and aggregate-fused WCOJ. These features live on the development branch
and are not part of the v0.9.2 release. In a released build the variables have no
effect because the features they gate are not present. See the
factorized execution guide for what they do and
their current status.
| Variable | Values | Gates (unreleased) | Effect |
|---|---|---|---|
XLOG_DISABLE_FREE_JOIN | 1 / true | GPU Free Join | Forces general multiway bodies through the binary fallback instead of the Free Join engine. |
XLOG_DISABLE_WCOJ_GROUPBY_FUSION | 1 / true | Aggregate-fused WCOJ | Forces count/sum/min/max-by-root over a triangle body to materialize then group, instead of the fused aggregate. |
XLOG_DISABLE_FACTORIZED_DELTA | 1 / true | Factorized recursive deltas | Forces every semi-naive delta step through the legacy hash-join then diff path. |
XLOG_FACTORIZED_DELTA_MAX_DOMAIN | integer | Factorized recursive deltas | Largest dense domain the bitvector delta route accepts (default 2^14, hard bound 2^16); above it the sparse or legacy route runs. |
Per-runtime and compile-time builders
When one process needs different WCOJ behavior for different executors — or you want to avoid process-global environment variables in tests — configureRuntimeConfig and
attach it to the executor:
RuntimeConfig decides whether a recognized shape dispatches at runtime. CompilerConfig
is separate: it decides whether the compiler emits a non-default variable ordering for
triangle and 4-cycle plans. The default preserves the baseline leader order; enable a
heat-aware ordering only when statistics are meaningful:
wcoj_var_ordering_threshold is a ratio gate: the model rotates the leader only when
candidate_score / default_leader_score is at or below the threshold. The default is
0.5; smaller values demand a clearer win before rotating. Values outside the interval
(0.0, 1.0], plus NaN and infinity, clamp back to the default.
Choosing a cost model
There are two independent cost-model layers. Runtime dispatch picks whether an eligible shape dispatches:CostModelKind::Cardinalityis the default. Use it when relation cardinalities and observed selectivity are populated, or when you just want the production route.CostModelKind::SkewClassifieris the conservative opt-out. Use it to prove fallback behavior or to bisect a suspected dispatch regression.
WcojVarOrderingKind::Disabled(default) keeps bit-identical leader order to the original slices.LeaderCardinalitypicks the smallest relation as leader when the threshold gate shows a clear win — the simplest useful model.HeatAwarecombines cardinality, relation heat, and observed selectivity. Use it for skewed graphs or repeated workloads where theStatsManagerhas enough evidence to identify hot relations.
| Situation | Runtime model | Compiler ordering |
|---|---|---|
| First run, little statistics | Cardinality (or force only for experiments) | Disabled |
| Stable batch with seeded cardinalities | Cardinality | LeaderCardinality |
| Repeated skewed workload with heat evidence | Cardinality | HeatAware |
| Debugging fallback or bisecting | SkewClassifier or explicit force-off | Disabled |
K-clique with incomplete statistics | Planned hash is expected | Seed complete statistics first |
Tuning workflow
Tune one knob at a time and record both the dispatch counters and row-set equality — never trust a speedup you have not proven row-equivalent.wcoj_var_ordering_threshold— lower it (for example0.25) when leader rotation is too eager and layout overhead dominates; raise it toward0.75only after row parity is stable and profiling shows the default leader is repeatedly expensive.XLOG_WCOJ_BLOCK_WORK_UNIT— lower it for severe leader-key skew where a few keys dominate work; raise it for uniform inputs where launch overhead dominates. Stay within1..8192. This knob must not change row sets.- Force gates — use them to prove the WCOJ route produces the same rows, not as a permanent substitute for a cost model. Leave them on only for a fixed, benchmarked workload.
Debug checklist
When WCOJ did not run:- Check the shape — is it a triangle, 4-cycle, or
K5/K6clique? - Check the key types — are they
u32,u64, orSymbol? - Check env and config — did you force off or disable the route?
- Check statistics — does the cost model have enough cardinality and selectivity?
- Check counters — did the route fire but emit zero rows?
- Check fallback parity — does forced-off output match the expected rows?
- Confirm the input is large enough to amortize layout and launch overhead.
- Try
LeaderCardinalityorHeatAware, but only with meaningful statistics. - Tune
XLOG_WCOJ_BLOCK_WORK_UNITin one direction at a time. - For 4-cycle, verify adaptive mode was intentionally enabled.
- For
K-clique, check whether the planner predicted hash but force was used anyway.
Factorized execution
GPU Free Join, factorized recursive deltas, and aggregate-fused WCOJ — the
development-branch work that extends WCOJ beyond the certified shapes.