Design Goals
- Learn rules, not weights — discover symbolic Datalog clauses (e.g.,
reach(X,Y) :- edge(X,Z), edge(Z,Y).) from data - GPU-resident hot loop — no semantic column downloads in the training step loop
- Sparse by default — candidate-indexed soft-probs instead of materializing N³ tensors
- Transactional promotion — learned rules pass gate checks before entering the knowledge base
- Auditable transfer evidence — learned rules carry fold, held-out-domain, gate, and base-kernel checksum metadata
Core Idea: Tensorized Super-Graph Masking
Traditional ILP systems compile candidate rules into executable programs — impossible at millisecond timescales. XLOG’s approach pre-compiles a “super-graph” of all candidate rules and activates them via continuous mask tensors optimized with Gumbel-Softmax:argmax(W) picks the winning rule. Temperature annealing (τ → τ_floor) drives the
soft mask toward a one-hot selection.
Architecture Overview
Key Entry Points
Python (pyxlog.ilp)
| File | Purpose |
|---|---|
pyxlog/ilp/trainer.py | train_only() — multi-start training loop |
pyxlog/ilp/promoter.py | train_and_promote() — training + gate pipeline |
pyxlog/ilp/neurosymbolic.py | train_neurosymbolic_program() — joint nn/4 and symbolic rule-weight training |
pyxlog/ilp/inventory.py | build_rule_inventory() — selected/rejected clause inventory with transfer metadata |
pyxlog/ilp/backend.py | MaskBackend protocol, SparseMaskBackend, DenseMaskBackend |
pyxlog/ilp/temperature.py | AdaptiveTempController — cosine-annealed τ schedule |
pyxlog/ilp/entropy.py | Entropy regularization helpers |
pyxlog/ilp/holdout.py | holdout_f1_and_variance() — LOO (<=20) and k-fold (>20) F1 scoring |
pyxlog/ilp/types.py | TrainConfig, TrainResult, PromotionResult, LearnedArtifact, etc. |
pyxlog/ilp/exceptions.py | IlpConfigError, IlpCandidateError, IlpTrainingError |
Rust (xlog-runtime, xlog-cuda)
| File | Purpose |
|---|---|
crates/xlog-runtime/src/ilp_registry.rs | IlpRegistry — mask storage, IlpTaggedResult metadata |
crates/xlog-runtime/tests/ilp_integration_tests.rs | Rust-side integration tests for mask round-trips |
crates/xlog-cuda/tests/ilp_kernel_tests.rs | CUDA kernel unit tests (extract_nonzero_indices) |
CUDA Kernels
| File | Purpose |
|---|---|
kernels/ilp.cu | extract_nonzero_indices() — N³ mask → sparse index extraction |
Mask Backends
TheMaskBackend protocol abstracts how the learnable tensor W is applied to the XLOG executor:
SparseMaskBackend (default)
- Learnable params:
Cfloats (one per candidate rule) - Memory: O(C) — typically C < 100
- Preferred hot-loop path calls
set_rule_mask_sparse_selected()on the compiled program - Legacy compatibility path
set_rule_mask_sparse()remains available when Rust-side ranking is desired
DenseMaskBackend (alpha-compatible, debug)
- Learnable params: N³ floats (N = schema size)
- Memory: O(N³) — expensive for large schemas
- Enabled via
TrainConfig(debug_dense_mask=True)for parity testing
Training Pipeline
train_only()
- Candidate enumeration —
valid_candidates(source, mask_name)returns all syntactically legal body-pair assignments - Multi-start — up to
max_attemptsindependent restarts with fresh logits - Step loop (per attempt, up to
step_budget_per_attempt):- Apply mask via backend
- Forward pass:
program.evaluate_device()(GPU-only, no host reads) - BCE loss between predicted and target fact membership
- Backward pass:
loss.backward()through PyTorch autograd - Optimizer step on W
- Temperature anneal: τ_start → τ_floor (cosine schedule)
- Optional deterministic controls (
deterministic=True) for reproducible attempt seeding - Early stopping: when argmax is stable and loss < threshold
- Decode —
argmax(W)maps to winning candidate → discovered rule string
train_and_promote()
- Call
train_only()— getTrainResult - If not converged →
PromotionStatus.NOT_CONVERGED - Trial compile — substitute discovered rule into source, compile via Rust
- Promotion gates (all must pass for
PROMOTED):- Convergence gate — training converged (already checked)
- Novel-rate gate — fraction of non-example derivations ≤
max_novel_rate - Protected-relation gate — no unwanted relation side-effects
- Holdout F1 gate — F1 on held-out examples ≥ threshold
- Ambiguity gate — top-M scan (or exhaustive mode) detects no alternative winning candidates
- Typed-schema gate — optional hard gate requiring relation type metadata (or waiver-driven manual review)
- All pass →
PromotionStatus.PROMOTEDwithcommitted_source
Higher-Level Neuro-Symbolic Training Surface
A higher-level training entry point handles sources that mix neural predicates and trainable symbolic clauses:nn(...), trainable_rule(...), and train(...)
declarations. The result reports neural gradient norms, symbolic gradients,
final symbolic weights, and a RuleInventory suitable for transfer audits.
Existential-join trainable bodies (Stage B)
Atrainable_rule body may join a neural predicate to an ordinary relation on an
existential (non-head) variable — the neural predicate is grounded over the
real join domain inside the circuit and OR-aggregated at the head:
Event appears only in the body. The engine materializes the join domain
from pre_before_post’s ground facts, emits one neural leaf per joined event, and
the differentiable provenance OR-aggregates the per-event contributions per head
binding, yielding P(plastic(Edge)) = σ(w) · (1 − ∏_{e : pre_before_post(e,Edge)} (1 − p_saliency(e))).
Gradient flows into the neural predicate (all joined events) and the rule guard,
but never into the deterministic join relation. The per-event features arrive
through a domain_inputs={"net": features} channel (row i = the i-th
join-domain constant in sorted order), and examples carry only per-head-binding
targets. Because saliency is learned as a function of the event feature (not an
id lookup), the trained predicate generalizes to unseen events.
Constraints: the join domain must be ground facts (a derived relation is rejected,
since its extension is not materialized); head-binding ids must be 0..N-1
row-aligned with targets; a single join network is supported; and the exact
d-DNNF compiler builds one circuit over all head-binding queries, so the planted
graph must stay within the compiler’s fixed buffer (empirically ~6–7 events). A
worked example lives in examples/plasticity_incircuit/ with a CUDA-gated
recovery test in python/tests/test_plasticity_incircuit.py. Head-variable
(“hard filter”) joins remain supported as pre-filters; only the existential-join
case is new.
train_and_promote(...) also accepts training_fold, held_out_domains,
base_kernel_checksum_before, and base_kernel_checksum_after. These fields are
recorded on PromotionResult.rule_inventory, along with selected and rejected
candidate clauses and gate outcomes.
Artifact Persistence
LearnedArtifact captures the full training result for reproducibility:
beta-v1. Fields: discovered rule, logits, candidate map, config, telemetry,
precision/recall, metadata (timestamp, schema version, candidate map hash).
GPU Contract
The training step loop obeys XLOG’s GPU-resident contract:evaluate_device()— no host reads for semantic resultsbatch_fact_membership_device()— returns a CUDA bool mask via DLPack with zero semantic-loop device-to-host transferbatch_tagged_credit_device()— returns CSR-style CUDA credit data via DLPack with zero semantic-loop device-to-host transferbatch_fact_membership()/batch_tagged_credit()remain available when host materialization is desiredAtomicU64device-to-host counter onCudaKernelProvider— hard gate raises ifdownload_column_*is observed during step loophost_transfer_stats()/reset_host_transfer_stats()expose broader host transfer accounting for profiling- Legacy
set_rule_mask_sparse()still performs a control-plane soft-probability download; the selected-candidate sparse path avoids it
Testing
- 86+ static test functions across ILP Python test files (expanded by parametrized GA/beta gates)
- Reliability gate: 20 consecutive
train_only()runs must all converge (20/20 pass) - GA reliability gate: default 50-seed statistical run (
test_ilp_ga_reliability.py) - GA performance/transfer tests:
forward_p95_us+ host transfer accounting (test_ilp_performance.py) - Dense/sparse parity: every sparse-path test has a
debug_dense_mask=Truevariant - Rust-side:
ilp_integration_tests.rs,ilp_kernel_tests.rs - CUDA certification:
extract_nonzero_indicescovered by kernel test suite
See Also
- Python Bindings — ILP Training API — user-facing API reference
- GPU Execution — mask DAG evaluation, stream compaction
- Probabilistic Engines — XGCF circuits, provenance (shared infrastructure)
- Arrow, DLPack, and cuDF Interop — DLPack details