XLOG can evaluate certain join shapes with a worst-case-optimal join (WCOJ) instead of a chain of binary hash joins. On the shapes it covers, a WCOJ avoids materializing large intermediate relations, which is where binary plans blow up on skewed graphs. This guide is operational: it answers whether your rule routes through WCOJ, why an eligible rule might still fall back, and which knobs move the decision. For the kernel architecture, see the WCOJ architecture guide.

When WCOJ dispatches

A rule is a candidate for WCOJ only when three things line up. Miss any one and the rule runs on the ordinary binary plan.
1

The body is positive relational Datalog

Every body atom is a relation scan. Negation, aggregation, is expressions, ground facts, and too-few positive atoms all disqualify the rule.
2

The body matches a certified shape

XLOG certifies a fixed set of join geometries — a triangle e(X,Y), e(Y,Z), e(X,Z), a 4-cycle e(W,X), e(X,Y), e(Y,Z), e(Z,W), and K5 / K6 cliques where every C(K,2) edge is present. Shapes outside this set are not automatically WCOJ.
3

The join keys are a supported type

WCOJ join keys must be u32, u64, or Symbol, and a single dispatch cannot mix width classes. (Symbol shares u32’s physical layout.)

Eligible is not the same as dispatched

This is the point most people miss. A rule can clear every check above, be promoted into a MultiWayJoin plan node, and still execute on binary joins. Promotion makes a rule eligible; a second decision — made at runtime from cost and statistics — decides whether it is actually dispatched. Every promoted MultiWayJoin carries an embedded fallback: the post-optimizer binary plan that would have run without promotion. When the dispatcher declines, the executor descends into that fallback. Common reasons the dispatcher declines an eligible rule:
  • The cost model, given the available statistics, predicts the binary plan wins.
  • Statistics are missing or too thin for the cost model to justify WCOJ.
  • The order planner would have to start from a bad prefix that materializes a larger intermediate than the binary plan, so it declines rather than dispatch the loss.
  • A kill switch or force-off setting is active (see below).
  • Runtime validation fails — missing relation buffers, mixed width classes, a projection that does not match the certified shape.
  • For a K-clique, the statistics are incomplete or a hash route is predicted to win. This is a planned hash route, not a promoter miss.
Fallback is not an error and it is never a correctness compromise. The embedded binary plan is always semantically equivalent to the WCOJ route — identical row sets. That equivalence is what lets you run with WCOJ enabled in production: the worst case is a slower plan, never a wrong answer. When you A/B a tuning change, compare a forced-on run against a forced-off run and confirm the row sets match before trusting any speedup.

Environment-variable controls

Environment variables are process-global — set them once at process startup, and in tests prefer the per-runtime builders below to avoid cross-test bleed. Unset, empty, 0, and false never force a route on.
VariableValuesControlsEffect
XLOG_USE_WCOJ_TRIANGLE_U321 / trueTriangle force gateForces recognized triangle dispatch, bypassing the adaptive classifier.
XLOG_DISABLE_WCOJ_TRIANGLE1 / trueTriangle kill switchPins all triangle WCOJ off; beats every other triangle flag.
XLOG_USE_WCOJ_4CYCLE1 / true4-cycle force gateForces recognized 4-cycle dispatch.
XLOG_USE_WCOJ_4CYCLE_ADAPTIVE1 / true4-cycle adaptive opt-inLets the cost model decide 4-cycle dispatch. Off by default.
XLOG_DISABLE_WCOJ_4CYCLE1 / true4-cycle kill switchBeats both force and adaptive.
XLOG_WCOJ_COST_MODELcardinality, skew, skewclassifierRuntime cost modelSelects the dispatch cost model. Invalid non-empty values resolve to SkewClassifier; unset defaults to Cardinality.
XLOG_WCOJ_BLOCK_WORK_UNITinteger 1..8192Block-slice work unitPer-block work granularity. Default 1024; invalid values warn and fall back to the default.
Triangle adaptive dispatch is default-on and triangle hard-disable are exposed as runtime-config builders, not environment resolvers. Use with_wcoj_triangle_dispatch_adaptive(...) and with_wcoj_triangle_dispatch_disabled(...) for those two controls.

Factorized execution controls (development branch only)

The three variables below govern factorized execution — GPU Free Join, factorized recursive deltas, and aggregate-fused WCOJ. These features live on the development branch and are not part of the v0.9.2 release. In a released build the variables have no effect because the features they gate are not present. See the factorized execution guide for what they do and their current status.
VariableValuesGates (unreleased)Effect
XLOG_DISABLE_FREE_JOIN1 / trueGPU Free JoinForces general multiway bodies through the binary fallback instead of the Free Join engine.
XLOG_DISABLE_WCOJ_GROUPBY_FUSION1 / trueAggregate-fused WCOJForces count/sum/min/max-by-root over a triangle body to materialize then group, instead of the fused aggregate.
XLOG_DISABLE_FACTORIZED_DELTA1 / trueFactorized recursive deltasForces every semi-naive delta step through the legacy hash-join then diff path.
XLOG_FACTORIZED_DELTA_MAX_DOMAINintegerFactorized recursive deltasLargest dense domain the bitvector delta route accepts (default 2^14, hard bound 2^16); above it the sparse or legacy route runs.

Per-runtime and compile-time builders

When one process needs different WCOJ behavior for different executors — or you want to avoid process-global environment variables in tests — configure RuntimeConfig and attach it to the executor:
use std::sync::Arc;
use xlog_core::{CostModelKind, RuntimeConfig};
use xlog_runtime::Executor;

// Production default: triangle adaptive on, 4-cycle adaptive off, Cardinality model.
let default_runtime = RuntimeConfig::default();

// Force triangle WCOJ for an A/B row-set comparison.
let force_triangle = RuntimeConfig::default().with_wcoj_triangle_dispatch(Some(true));

// Runtime-local triangle hard stop. Beats force and adaptive.
let triangle_off = RuntimeConfig::default().with_wcoj_triangle_dispatch_disabled(Some(true));

// Enable 4-cycle adaptive routing, or pick the runtime cost model explicitly.
let adaptive_4cycle = RuntimeConfig::default().with_wcoj_4cycle_dispatch_adaptive(Some(true));
let conservative = RuntimeConfig::default().with_wcoj_cost_model(Some(CostModelKind::SkewClassifier));

let config = RuntimeConfig::default().with_wcoj_triangle_dispatch(Some(true));
let mut executor = Executor::new_with_config(Arc::clone(&provider), config);
RuntimeConfig decides whether a recognized shape dispatches at runtime. CompilerConfig is separate: it decides whether the compiler emits a non-default variable ordering for triangle and 4-cycle plans. The default preserves the baseline leader order; enable a heat-aware ordering only when statistics are meaningful:
use xlog_logic::compiler_config::{CompilerConfig, WcojVarOrderingKind};

let config = CompilerConfig {
    wcoj_variable_ordering: WcojVarOrderingKind::HeatAware,
    wcoj_var_ordering_threshold: 0.5,
};

let plan =
    compiler.compile_with_config_and_stats_snapshot(source, &config, Some(&stats_snapshot))?;
wcoj_var_ordering_threshold is a ratio gate: the model rotates the leader only when candidate_score / default_leader_score is at or below the threshold. The default is 0.5; smaller values demand a clearer win before rotating. Values outside the interval (0.0, 1.0], plus NaN and infinity, clamp back to the default.

Choosing a cost model

There are two independent cost-model layers. Runtime dispatch picks whether an eligible shape dispatches:
  • CostModelKind::Cardinality is the default. Use it when relation cardinalities and observed selectivity are populated, or when you just want the production route.
  • CostModelKind::SkewClassifier is the conservative opt-out. Use it to prove fallback behavior or to bisect a suspected dispatch regression.
Compile-time variable ordering picks the join leader:
  • WcojVarOrderingKind::Disabled (default) keeps bit-identical leader order to the original slices.
  • LeaderCardinality picks the smallest relation as leader when the threshold gate shows a clear win — the simplest useful model.
  • HeatAware combines cardinality, relation heat, and observed selectivity. Use it for skewed graphs or repeated workloads where the StatsManager has enough evidence to identify hot relations.
SituationRuntime modelCompiler ordering
First run, little statisticsCardinality (or force only for experiments)Disabled
Stable batch with seeded cardinalitiesCardinalityLeaderCardinality
Repeated skewed workload with heat evidenceCardinalityHeatAware
Debugging fallback or bisectingSkewClassifier or explicit force-offDisabled
K-clique with incomplete statisticsPlanned hash is expectedSeed complete statistics first

Tuning workflow

Tune one knob at a time and record both the dispatch counters and row-set equality — never trust a speedup you have not proven row-equivalent.
  • wcoj_var_ordering_threshold — lower it (for example 0.25) when leader rotation is too eager and layout overhead dominates; raise it toward 0.75 only after row parity is stable and profiling shows the default leader is repeatedly expensive.
  • XLOG_WCOJ_BLOCK_WORK_UNIT — lower it for severe leader-key skew where a few keys dominate work; raise it for uniform inputs where launch overhead dominates. Stay within 1..8192. This knob must not change row sets.
  • Force gates — use them to prove the WCOJ route produces the same rows, not as a permanent substitute for a cost model. Leave them on only for a fixed, benchmarked workload.

Debug checklist

When WCOJ did not run:
  1. Check the shape — is it a triangle, 4-cycle, or K5 / K6 clique?
  2. Check the key types — are they u32, u64, or Symbol?
  3. Check env and config — did you force off or disable the route?
  4. Check statistics — does the cost model have enough cardinality and selectivity?
  5. Check counters — did the route fire but emit zero rows?
  6. Check fallback parity — does forced-off output match the expected rows?
When WCOJ ran but was slower:
  1. Confirm the input is large enough to amortize layout and launch overhead.
  2. Try LeaderCardinality or HeatAware, but only with meaningful statistics.
  3. Tune XLOG_WCOJ_BLOCK_WORK_UNIT in one direction at a time.
  4. For 4-cycle, verify adaptive mode was intentionally enabled.
  5. For K-clique, check whether the planner predicted hash but force was used anyway.
When rows differ, stop tuning and treat it as a correctness bug: capture the forced-off fallback rows and the forced WCOJ rows, record the plan shape and projection, and do not resume tuning until parity is restored.

Factorized execution

GPU Free Join, factorized recursive deltas, and aggregate-fused WCOJ — the development-branch work that extends WCOJ beyond the certified shapes.