This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Program Compilation
Relevant source files
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-table/src/planner/mod.rs
- llkv-table/src/planner/program.rs
- llkv-table/src/scalar_eval.rs
Purpose and Scope
This page documents the predicate compilation system in LLKV, which transforms typed predicate expressions (Expr<FieldId>) into stack-based bytecode programs for efficient evaluation during table scans. The compilation process produces two types of programs: EvalProgram for predicate evaluation and DomainProgram for domain analysis (determining which row IDs might match predicates).
For information about the expression AST structure itself, see Expression AST. For how these compiled programs are evaluated during execution, see Filter Evaluation. For how expressions are translated from string column names to field IDs, see Expression Translation.
Compilation Pipeline Overview
The compilation process transforms a normalized predicate expression into executable bytecode through the ProgramCompiler in llkv-table/src/planner/program.rs:257-284 The compiler produces a ProgramSet containing both evaluation and domain analysis programs.
Sources: llkv-table/src/planner/program.rs:257-284 llkv-table/src/planner/mod.rs:612-637
graph TB
Input["Expr<FieldId>\n(Raw predicate)"]
Normalize["normalize_predicate()\n(Apply De Morgan's laws,\nflatten And/Or)"]
Compiler["ProgramCompiler::new()\nProgramCompiler::compile()"]
ProgramSet["ProgramSet"]
EvalProgram["EvalProgram\n(Stack-based bytecode\nfor evaluation)"]
DomainRegistry["DomainRegistry\n(Domain programs\nfor optimization)"]
Input --> Normalize
Normalize --> Compiler
Compiler --> ProgramSet
ProgramSet --> EvalProgram
ProgramSet --> DomainRegistry
EvalOps["EvalOp instructions:\nPushPredicate, And, Or,\nNot, FusedAnd"]
DomainOps["DomainOp instructions:\nPushFieldAll, Union,\nIntersect"]
EvalProgram --> EvalOps
DomainRegistry --> DomainOps
Compilation Entry Point
The TablePlanner::plan_scan method invokes the compiler when preparing a scan operation. The process begins with predicate normalization followed by compilation into both evaluation and domain programs.
Sources: llkv-table/src/planner/mod.rs:625-628 llkv-table/src/planner/program.rs:266-283
graph LR
TablePlanner["TablePlanner::plan_scan"]
NormFilter["normalize_predicate(filter_expr)"]
CreateCompiler["ProgramCompiler::new(Arc<Expr>)"]
Compile["compiler.compile()"]
ProgramSet["ProgramSet<'expr>"]
TablePlanner --> NormFilter
NormFilter --> CreateCompiler
CreateCompiler --> Compile
Compile --> ProgramSet
ProgramSet --> EvalProgram["EvalProgram\n(ops: Vec<EvalOp>)"]
ProgramSet --> DomainRegistry["DomainRegistry\n(programs, index)"]
Predicate Normalization
Before compilation, predicates are normalized using normalize_predicate() to simplify the expression tree. Normalization applies two key transformations:
Transformation Rules
| Input Pattern | Output Pattern | Description |
|---|---|---|
And(And(a, b), c) | And(a, b, c) | Flatten nested AND operations |
Or(Or(a, b), c) | Or(a, b, c) | Flatten nested OR operations |
Not(And(a, b)) | Or(Not(a), Not(b)) | De Morgan's law for AND |
Not(Or(a, b)) | And(Not(a), Not(b)) | De Morgan's law for OR |
Not(Not(a)) | a | Double negation elimination |
Not(Literal(true)) | Literal(false) | Literal inversion |
Not(IsNull{expr, negated}) | IsNull{expr, !negated} | IsNull negation flip |
Normalization Algorithm
The normalize_expr() function recursively transforms the expression tree using pattern matching:
Sources: llkv-table/src/planner/program.rs:286-343
graph TD
Start["normalize_expr(expr)"]
CheckAnd{"expr is And?"}
CheckOr{"expr is Or?"}
CheckNot{"expr is Not?"}
Other["Return expr unchanged"]
FlattenAnd["Flatten nested And nodes\ninto single And"]
FlattenOr["Flatten nested Or nodes\ninto single Or"]
ApplyDeMorgan["normalize_negated(inner)\nApply De Morgan's laws"]
Start --> CheckAnd
CheckAnd -->|Yes| FlattenAnd
CheckAnd -->|No| CheckOr
CheckOr -->|Yes| FlattenOr
CheckOr -->|No| CheckNot
CheckNot -->|Yes| ApplyDeMorgan
CheckNot -->|No| Other
FlattenAnd --> Return["Return normalized expr"]
FlattenOr --> Return
ApplyDeMorgan --> Return
Other --> Return
EvalProgram Compilation
The compile_eval() function generates a sequence of EvalOp instructions using iterative traversal with an explicit work stack. This avoids stack overflow on deeply nested expressions and produces postorder bytecode.
EvalOp Instruction Set
The EvalOp enum defines the instruction types for predicate evaluation:
| Instruction | Description | Stack Effect |
|---|---|---|
PushPredicate(OwnedFilter) | Push single predicate result | → bool |
PushCompare{left, op, right} | Evaluate comparison expression | → bool |
PushInList{expr, list, negated} | Evaluate IN list membership | → bool |
PushIsNull{expr, negated} | Evaluate IS NULL test | → bool |
PushLiteral(bool) | Push constant boolean value | → bool |
FusedAnd{field_id, filters} | Optimized AND for same field | → bool |
And{child_count} | Pop N bools, push AND result | bool×N → bool |
Or{child_count} | Pop N bools, push OR result | bool×N → bool |
Not{domain} | Pop bool, push NOT result | bool → bool |
graph TB
subgraph "Compilation Phases"
Input["Expression Tree"]
Phase1["Enter Phase\n(Pre-order traversal)"]
Phase2["Exit Phase\n(Post-order emission)"]
Output["Vec<EvalOp>"]
end
Input --> Phase1
Phase1 --> Phase2
Phase2 --> Output
subgraph "Frame Types"
EnterFrame["EvalVisit::Enter(expr)\nPush children in reverse order"]
ExitFrame["EvalVisit::Exit(expr)\nEmit instruction"]
FusedFrame["EvalVisit::EmitFused\nEmit FusedAnd optimization"]
end
Phase1 --> EnterFrame
EnterFrame --> ExitFrame
EnterFrame --> FusedFrame
ExitFrame --> Phase2
FusedFrame --> Phase2
Compilation Process
The compiler uses a two-pass approach with EvalVisit frames to track traversal state:
Sources: llkv-table/src/planner/program.rs:407-516
Predicate Fusion Optimization
When the compiler encounters an And node where all children are predicates (Expr::Pred) on the same FieldId, it emits a single FusedAnd instruction instead of multiple individual predicates. This optimization is detected by gather_fused():
Sources: llkv-table/src/planner/program.rs:518-542
graph LR
AndNode["And(children)"]
Check["gather_fused(children)"]
Decision{"All children\nare Pred(field_id)\nwith same field_id?"}
Fused["Emit FusedAnd{\nfield_id,\nfilters\n}"]
Normal["Emit individual\nPushPredicate\nfor each child\n+ And instruction"]
AndNode --> Check
Check --> Decision
Decision -->|Yes| Fused
Decision -->|No| Normal
DomainProgram Compilation
The compile_domain() function generates DomainOp instructions for domain analysis. Domain programs determine which row IDs might satisfy a predicate without evaluating the full expression, enabling storage-layer optimizations.
DomainOp Instruction Set
| Instruction | Description | Stack Effect |
|---|---|---|
PushFieldAll(FieldId) | All rows where field exists | → RowSet |
PushCompareDomain{left, right, op, fields} | Domain of rows for comparison | → RowSet |
PushInListDomain{expr, list, fields, negated} | Domain of rows for IN list | → RowSet |
PushIsNullDomain{expr, fields, negated} | Domain of rows for IS NULL | → RowSet |
PushLiteralFalse | Empty row set | → RowSet |
PushAllRows | All rows in table | → RowSet |
Union{child_count} | Pop N sets, push union | RowSet×N → RowSet |
Intersect{child_count} | Pop N sets, push intersection | RowSet×N → RowSet |
Domain Analysis Algorithm
Domain compilation uses iterative traversal with DomainVisit frames, similar to eval compilation but with different semantics:
graph TB
Start["compile_domain(expr)"]
Stack["Work stack:\nVec<DomainVisit>"]
Ops["Output:\nVec<DomainOp>"]
EnterFrame["DomainVisit::Enter(node)\nPush children + Exit frame"]
ExitFrame["DomainVisit::Exit(node)\nEmit DomainOp"]
Start --> Stack
Stack --> Process{"Pop frame"}
Process -->|Enter| EnterFrame
Process -->|Exit| ExitFrame
EnterFrame --> Stack
ExitFrame --> Emit["Emit instruction to ops"]
Emit --> Stack
Process -->|Empty| Done["Return DomainProgram{ops}"]
Domain Semantics
The domain of an expression represents the set of row IDs where the expression could potentially be true (or false for Not):
| Expression Type | Domain Semantics |
|---|---|
Pred(filter) | All rows where filter.field_id exists |
Compare{left, right, op} | Union of domains of all fields in left and right |
InList{expr, list} | Union of domains of all fields in expr and list items |
IsNull{expr} | Union of domains of all fields in expr |
Literal(true) | All rows in table |
Literal(false) | Empty set |
And(children) | Intersection of children domains |
Or(children) | Union of children domains |
Not(inner) | Same as inner domain (NOT doesn't change domain) |
Sources: llkv-table/src/planner/program.rs:544-631
graph TB
Input["ScalarExpr<FieldId>"]
Stack["Work stack:\nVec<&ScalarExpr>"]
Seen["FxHashSet<FieldId>\n(deduplication)"]
Output["Vec<FieldId>\n(sorted)"]
Input --> Stack
Stack --> Process{"Pop expr"}
Process -->|Column fid| AddField["Insert fid into seen"]
Process -->|Literal| Skip["Skip (no fields)"]
Process -->|Binary| PushChildren["Push left, right\nto stack"]
Process -->|Compare| PushChildren
Process -->|Aggregate| PushAggExpr["Push aggregate expr\nto stack"]
Process -->|Other| PushNested["Push nested exprs\nto stack"]
AddField --> Stack
Skip --> Stack
PushChildren --> Stack
PushAggExpr --> Stack
PushNested --> Stack
Process -->|Empty| Collect["Collect seen into Vec\nSort unstable"]
Collect --> Output
Field Collection for Domain Analysis
The collect_fields() function extracts all FieldId references from scalar expressions using iterative traversal. This determines which columns' row sets need to be considered during domain evaluation.
Sources: llkv-table/src/planner/program.rs:633-709
Data Structures
ProgramSet
The top-level container returned by compilation, holding all compiled artifacts:
ProgramSet<'expr> {
eval: EvalProgram, // Bytecode for predicate evaluation
domains: DomainRegistry, // Domain programs for optimization
_root_expr: Arc<Expr<'expr, FieldId>> // Original expression (lifetime anchor)
}
Sources: llkv-table/src/planner/program.rs:23-29
DomainRegistry
Manages the collection of compiled domain programs with deduplication via ExprKey:
DomainRegistry {
programs: Vec<DomainProgram>, // All compiled domain programs
index: FxHashMap<ExprKey, DomainProgramId>, // Expression → program ID map
root: Option<DomainProgramId> // ID of root domain program
}
The registry uses ExprKey (a pointer-based key) to detect duplicate subexpressions and reuse compiled domain programs.
Sources: llkv-table/src/planner/program.rs:12-20 llkv-table/src/planner/program.rs:196-219
OwnedFilter and OwnedOperator
To support owned bytecode programs with no lifetime constraints, the compiler converts borrowed Filter<'a, FieldId> and Operator<'a> types into owned variants:
| Borrowed Type | Owned Type | Purpose |
|---|---|---|
Filter<'a, FieldId> | OwnedFilter | Stores field_id + owned operator |
Operator<'a> | OwnedOperator | Owns pattern strings and literal vectors |
This conversion happens during compile_eval() when creating PushPredicate and FusedAnd instructions.
Sources: llkv-table/src/planner/program.rs:69-191
Integration with Table Scanning
The compiled programs are used during table scan execution in two ways:
- EvalProgram : Evaluated per-row or per-batch during scan to determine which rows match the predicate
- DomainProgram : Used for storage-layer optimizations to skip scanning columns or chunks that cannot match
Usage in PlannedScan
The TablePlanner::plan_scan() method creates a PlannedScan struct that bundles the compiled programs with scan metadata:
PlannedScan<'expr, P> {
projections: Vec<ScanProjection>,
filter_expr: Arc<Expr<'expr, FieldId>>,
options: ScanStreamOptions<P>,
plan_graph: PlanGraph, // For query plan visualization
programs: ProgramSet<'expr> // Compiled evaluation and domain programs
}
Sources: llkv-table/src/planner/mod.rs:500-509 llkv-table/src/planner/mod.rs:630-636
Example Compilation
Consider the predicate: (age > 18 AND name LIKE 'A%') OR status = 'active'
After normalization, this remains unchanged (no nested And/Or to flatten). The compiler produces:
EvalProgram Instructions
1. PushPredicate(Filter { field_id: age, op: GreaterThan(18) })
2. PushPredicate(Filter { field_id: name, op: StartsWith("A", true) })
3. And { child_count: 2 }
4. PushPredicate(Filter { field_id: status, op: Equals("active") })
5. Or { child_count: 2 }
DomainProgram Instructions
1. PushFieldAll(age) // Domain for first predicate
2. PushFieldAll(name) // Domain for second predicate
3. Intersect { child_count: 2 } // AND combines via intersection
4. PushFieldAll(status) // Domain for third predicate
5. Union { child_count: 2 } // OR combines via union
The domain program indicates that rows must have (age AND name) OR status to potentially match.
Sources: llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:550-631
graph LR
Recursive["Recursive approach\n(Stack overflow risk)"]
Iterative["Iterative approach\n(Explicit work stack)"]
Recursive -->|Replace with| Iterative
Iterative --> WorkStack["Vec<Frame>\n(Heap-allocated)"]
Iterative --> ResultStack["Vec<Result>\n(Post-order accumulation)"]
Stack Overflow Prevention
Both compile_eval() and compile_domain() use explicit work stacks instead of recursion to handle deeply nested expressions (50k+ nodes) without stack overflow. This follows the iterative traversal pattern described in the codebase:
The pattern uses Enter/Exit frames to simulate recursive descent and ascent, accumulating results in a separate stack during the Exit phase.
Sources: llkv-table/src/planner/program.rs:407-516 llkv-table/src/planner/program.rs:544-631