Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Program Compilation

Relevant source files

Purpose and Scope

This page documents the predicate compilation system in LLKV, which transforms typed predicate expressions (Expr<FieldId>) into stack-based bytecode programs for efficient evaluation during table scans. The compilation process produces two types of programs: EvalProgram for predicate evaluation and DomainProgram for domain analysis (determining which row IDs might match predicates).

For information about the expression AST structure itself, see Expression AST. For how these compiled programs are evaluated during execution, see Filter Evaluation. For how expressions are translated from string column names to field IDs, see Expression Translation.


Compilation Pipeline Overview

The compilation process transforms a normalized predicate expression into executable bytecode through the ProgramCompiler in llkv-table/src/planner/program.rs:257-284 The compiler produces a ProgramSet containing both evaluation and domain analysis programs.

Sources: llkv-table/src/planner/program.rs:257-284 llkv-table/src/planner/mod.rs:612-637

graph TB
    Input["Expr&lt;FieldId&gt;\n(Raw predicate)"]
Normalize["normalize_predicate()\n(Apply De Morgan's laws,\nflatten And/Or)"]
Compiler["ProgramCompiler::new()\nProgramCompiler::compile()"]
ProgramSet["ProgramSet"]
EvalProgram["EvalProgram\n(Stack-based bytecode\nfor evaluation)"]
DomainRegistry["DomainRegistry\n(Domain programs\nfor optimization)"]
Input --> Normalize
 
   Normalize --> Compiler
 
   Compiler --> ProgramSet
 
   ProgramSet --> EvalProgram
 
   ProgramSet --> DomainRegistry
    
    EvalOps["EvalOp instructions:\nPushPredicate, And, Or,\nNot, FusedAnd"]
DomainOps["DomainOp instructions:\nPushFieldAll, Union,\nIntersect"]
EvalProgram --> EvalOps
 
   DomainRegistry --> DomainOps

Compilation Entry Point

The TablePlanner::plan_scan method invokes the compiler when preparing a scan operation. The process begins with predicate normalization followed by compilation into both evaluation and domain programs.

Sources: llkv-table/src/planner/mod.rs:625-628 llkv-table/src/planner/program.rs:266-283

graph LR
    TablePlanner["TablePlanner::plan_scan"]
NormFilter["normalize_predicate(filter_expr)"]
CreateCompiler["ProgramCompiler::new(Arc&lt;Expr&gt;)"]
Compile["compiler.compile()"]
ProgramSet["ProgramSet&lt;'expr&gt;"]
TablePlanner --> NormFilter
 
   NormFilter --> CreateCompiler
 
   CreateCompiler --> Compile
 
   Compile --> ProgramSet
    
 
   ProgramSet --> EvalProgram["EvalProgram\n(ops: Vec&lt;EvalOp&gt;)"]
ProgramSet --> DomainRegistry["DomainRegistry\n(programs, index)"]

Predicate Normalization

Before compilation, predicates are normalized using normalize_predicate() to simplify the expression tree. Normalization applies two key transformations:

Transformation Rules

Input PatternOutput PatternDescription
And(And(a, b), c)And(a, b, c)Flatten nested AND operations
Or(Or(a, b), c)Or(a, b, c)Flatten nested OR operations
Not(And(a, b))Or(Not(a), Not(b))De Morgan's law for AND
Not(Or(a, b))And(Not(a), Not(b))De Morgan's law for OR
Not(Not(a))aDouble negation elimination
Not(Literal(true))Literal(false)Literal inversion
Not(IsNull{expr, negated})IsNull{expr, !negated}IsNull negation flip

Normalization Algorithm

The normalize_expr() function recursively transforms the expression tree using pattern matching:

Sources: llkv-table/src/planner/program.rs:286-343

graph TD
    Start["normalize_expr(expr)"]
CheckAnd{"expr is And?"}
CheckOr{"expr is Or?"}
CheckNot{"expr is Not?"}
Other["Return expr unchanged"]
FlattenAnd["Flatten nested And nodes\ninto single And"]
FlattenOr["Flatten nested Or nodes\ninto single Or"]
ApplyDeMorgan["normalize_negated(inner)\nApply De Morgan's laws"]
Start --> CheckAnd
 
   CheckAnd -->|Yes| FlattenAnd
 
   CheckAnd -->|No| CheckOr
 
   CheckOr -->|Yes| FlattenOr
 
   CheckOr -->|No| CheckNot
 
   CheckNot -->|Yes| ApplyDeMorgan
 
   CheckNot -->|No| Other
    
 
   FlattenAnd --> Return["Return normalized expr"]
FlattenOr --> Return
 
   ApplyDeMorgan --> Return
 
   Other --> Return

EvalProgram Compilation

The compile_eval() function generates a sequence of EvalOp instructions using iterative traversal with an explicit work stack. This avoids stack overflow on deeply nested expressions and produces postorder bytecode.

EvalOp Instruction Set

The EvalOp enum defines the instruction types for predicate evaluation:

InstructionDescriptionStack Effect
PushPredicate(OwnedFilter)Push single predicate result→ bool
PushCompare{left, op, right}Evaluate comparison expression→ bool
PushInList{expr, list, negated}Evaluate IN list membership→ bool
PushIsNull{expr, negated}Evaluate IS NULL test→ bool
PushLiteral(bool)Push constant boolean value→ bool
FusedAnd{field_id, filters}Optimized AND for same field→ bool
And{child_count}Pop N bools, push AND resultbool×N → bool
Or{child_count}Pop N bools, push OR resultbool×N → bool
Not{domain}Pop bool, push NOT resultbool → bool
graph TB
    subgraph "Compilation Phases"
        Input["Expression Tree"]
Phase1["Enter Phase\n(Pre-order traversal)"]
Phase2["Exit Phase\n(Post-order emission)"]
Output["Vec&lt;EvalOp&gt;"]
end
    
 
   Input --> Phase1
 
   Phase1 --> Phase2
 
   Phase2 --> Output
    
    subgraph "Frame Types"
        EnterFrame["EvalVisit::Enter(expr)\nPush children in reverse order"]
ExitFrame["EvalVisit::Exit(expr)\nEmit instruction"]
FusedFrame["EvalVisit::EmitFused\nEmit FusedAnd optimization"]
end
    
 
   Phase1 --> EnterFrame
 
   EnterFrame --> ExitFrame
 
   EnterFrame --> FusedFrame
 
   ExitFrame --> Phase2
 
   FusedFrame --> Phase2

Compilation Process

The compiler uses a two-pass approach with EvalVisit frames to track traversal state:

Sources: llkv-table/src/planner/program.rs:407-516

Predicate Fusion Optimization

When the compiler encounters an And node where all children are predicates (Expr::Pred) on the same FieldId, it emits a single FusedAnd instruction instead of multiple individual predicates. This optimization is detected by gather_fused():

Sources: llkv-table/src/planner/program.rs:518-542

graph LR
    AndNode["And(children)"]
Check["gather_fused(children)"]
Decision{"All children\nare Pred(field_id)\nwith same field_id?"}
Fused["Emit FusedAnd{\nfield_id,\nfilters\n}"]
Normal["Emit individual\nPushPredicate\nfor each child\n+ And instruction"]
AndNode --> Check
 
   Check --> Decision
 
   Decision -->|Yes| Fused
 
   Decision -->|No| Normal

DomainProgram Compilation

The compile_domain() function generates DomainOp instructions for domain analysis. Domain programs determine which row IDs might satisfy a predicate without evaluating the full expression, enabling storage-layer optimizations.

DomainOp Instruction Set

InstructionDescriptionStack Effect
PushFieldAll(FieldId)All rows where field exists→ RowSet
PushCompareDomain{left, right, op, fields}Domain of rows for comparison→ RowSet
PushInListDomain{expr, list, fields, negated}Domain of rows for IN list→ RowSet
PushIsNullDomain{expr, fields, negated}Domain of rows for IS NULL→ RowSet
PushLiteralFalseEmpty row set→ RowSet
PushAllRowsAll rows in table→ RowSet
Union{child_count}Pop N sets, push unionRowSet×N → RowSet
Intersect{child_count}Pop N sets, push intersectionRowSet×N → RowSet

Domain Analysis Algorithm

Domain compilation uses iterative traversal with DomainVisit frames, similar to eval compilation but with different semantics:

graph TB
    Start["compile_domain(expr)"]
Stack["Work stack:\nVec&lt;DomainVisit&gt;"]
Ops["Output:\nVec&lt;DomainOp&gt;"]
EnterFrame["DomainVisit::Enter(node)\nPush children + Exit frame"]
ExitFrame["DomainVisit::Exit(node)\nEmit DomainOp"]
Start --> Stack
 
   Stack --> Process{"Pop frame"}
Process -->|Enter| EnterFrame
 
   Process -->|Exit| ExitFrame
    
 
   EnterFrame --> Stack
 
   ExitFrame --> Emit["Emit instruction to ops"]
Emit --> Stack
    
 
   Process -->|Empty| Done["Return DomainProgram{ops}"]

Domain Semantics

The domain of an expression represents the set of row IDs where the expression could potentially be true (or false for Not):

Expression TypeDomain Semantics
Pred(filter)All rows where filter.field_id exists
Compare{left, right, op}Union of domains of all fields in left and right
InList{expr, list}Union of domains of all fields in expr and list items
IsNull{expr}Union of domains of all fields in expr
Literal(true)All rows in table
Literal(false)Empty set
And(children)Intersection of children domains
Or(children)Union of children domains
Not(inner)Same as inner domain (NOT doesn't change domain)

Sources: llkv-table/src/planner/program.rs:544-631


graph TB
    Input["ScalarExpr&lt;FieldId&gt;"]
Stack["Work stack:\nVec&lt;&amp;ScalarExpr&gt;"]
Seen["FxHashSet&lt;FieldId&gt;\n(deduplication)"]
Output["Vec&lt;FieldId&gt;\n(sorted)"]
Input --> Stack
 
   Stack --> Process{"Pop expr"}
Process -->|Column fid| AddField["Insert fid into seen"]
Process -->|Literal| Skip["Skip (no fields)"]
Process -->|Binary| PushChildren["Push left, right\nto stack"]
Process -->|Compare| PushChildren
 
   Process -->|Aggregate| PushAggExpr["Push aggregate expr\nto stack"]
Process -->|Other| PushNested["Push nested exprs\nto stack"]
AddField --> Stack
 
   Skip --> Stack
 
   PushChildren --> Stack
 
   PushAggExpr --> Stack
 
   PushNested --> Stack
    
 
   Process -->|Empty| Collect["Collect seen into Vec\nSort unstable"]
Collect --> Output

Field Collection for Domain Analysis

The collect_fields() function extracts all FieldId references from scalar expressions using iterative traversal. This determines which columns' row sets need to be considered during domain evaluation.

Sources: llkv-table/src/planner/program.rs:633-709


Data Structures

ProgramSet

The top-level container returned by compilation, holding all compiled artifacts:

ProgramSet<'expr> {
    eval: EvalProgram,              // Bytecode for predicate evaluation
    domains: DomainRegistry,        // Domain programs for optimization
    _root_expr: Arc<Expr<'expr, FieldId>>  // Original expression (lifetime anchor)
}

Sources: llkv-table/src/planner/program.rs:23-29

DomainRegistry

Manages the collection of compiled domain programs with deduplication via ExprKey:

DomainRegistry {
    programs: Vec<DomainProgram>,           // All compiled domain programs
    index: FxHashMap<ExprKey, DomainProgramId>,  // Expression → program ID map
    root: Option<DomainProgramId>           // ID of root domain program
}

The registry uses ExprKey (a pointer-based key) to detect duplicate subexpressions and reuse compiled domain programs.

Sources: llkv-table/src/planner/program.rs:12-20 llkv-table/src/planner/program.rs:196-219

OwnedFilter and OwnedOperator

To support owned bytecode programs with no lifetime constraints, the compiler converts borrowed Filter<'a, FieldId> and Operator<'a> types into owned variants:

Borrowed TypeOwned TypePurpose
Filter<'a, FieldId>OwnedFilterStores field_id + owned operator
Operator<'a>OwnedOperatorOwns pattern strings and literal vectors

This conversion happens during compile_eval() when creating PushPredicate and FusedAnd instructions.

Sources: llkv-table/src/planner/program.rs:69-191


Integration with Table Scanning

The compiled programs are used during table scan execution in two ways:

  1. EvalProgram : Evaluated per-row or per-batch during scan to determine which rows match the predicate
  2. DomainProgram : Used for storage-layer optimizations to skip scanning columns or chunks that cannot match

Usage in PlannedScan

The TablePlanner::plan_scan() method creates a PlannedScan struct that bundles the compiled programs with scan metadata:

PlannedScan<'expr, P> {
    projections: Vec<ScanProjection>,
    filter_expr: Arc<Expr<'expr, FieldId>>,
    options: ScanStreamOptions<P>,
    plan_graph: PlanGraph,          // For query plan visualization
    programs: ProgramSet<'expr>     // Compiled evaluation and domain programs
}

Sources: llkv-table/src/planner/mod.rs:500-509 llkv-table/src/planner/mod.rs:630-636


Example Compilation

Consider the predicate: (age > 18 AND name LIKE 'A%') OR status = 'active'

After normalization, this remains unchanged (no nested And/Or to flatten). The compiler produces:

EvalProgram Instructions

1. PushPredicate(Filter { field_id: age, op: GreaterThan(18) })
2. PushPredicate(Filter { field_id: name, op: StartsWith("A", true) })
3. And { child_count: 2 }
4. PushPredicate(Filter { field_id: status, op: Equals("active") })
5. Or { child_count: 2 }

DomainProgram Instructions

1. PushFieldAll(age)      // Domain for first predicate
2. PushFieldAll(name)     // Domain for second predicate
3. Intersect { child_count: 2 }  // AND combines via intersection
4. PushFieldAll(status)   // Domain for third predicate
5. Union { child_count: 2 }      // OR combines via union

The domain program indicates that rows must have (age AND name) OR status to potentially match.

Sources: llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:550-631


graph LR
    Recursive["Recursive approach\n(Stack overflow risk)"]
Iterative["Iterative approach\n(Explicit work stack)"]
Recursive -->|Replace with| Iterative
    
 
   Iterative --> WorkStack["Vec&lt;Frame&gt;\n(Heap-allocated)"]
Iterative --> ResultStack["Vec&lt;Result&gt;\n(Post-order accumulation)"]

Stack Overflow Prevention

Both compile_eval() and compile_domain() use explicit work stacks instead of recursion to handle deeply nested expressions (50k+ nodes) without stack overflow. This follows the iterative traversal pattern described in the codebase:

The pattern uses Enter/Exit frames to simulate recursive descent and ascent, accumulating results in a separate stack during the Exit phase.

Sources: llkv-table/src/planner/program.rs:407-516 llkv-table/src/planner/program.rs:544-631