This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression System
Relevant source files
- llkv-executor/Cargo.toml
- llkv-executor/src/lib.rs
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-plan/src/plans.rs
- llkv-table/src/planner/program.rs
The expression system provides the intermediate representation for predicates, projections, and computations in LLKV. It defines a strongly-typed Abstract Syntax Tree (AST) that decouples query logic from concrete storage formats, enabling optimization and efficient evaluation against Apache Arrow data structures.
This page covers the overall architecture of the expression system. For details on specific components:
- Expression AST structure and types: see 5.1
- Column name resolution and translation: see 5.2
- Bytecode compilation and normalization: see 5.3
- Scalar evaluation engine: see 5.4
- Aggregate function evaluation: see 5.5
Purpose and Scope
The expression system serves three primary functions:
-
Representation : Defines generic AST types (
Expr<F>,ScalarExpr<F>) parameterized over field identifiers, supporting both string-based column names (during planning) and integer field IDs (during execution) -
Translation : Resolves column names to storage field identifiers by consulting the catalog, enabling type-safe access to table columns
-
Compilation : Transforms normalized expressions into stack-based bytecode (
EvalProgram,DomainProgram) for efficient vectorized evaluation
The system is located primarily in llkv-expr/, with translation logic in llkv-executor/src/translation/ and compilation in llkv-table/src/planner/program.rs.
Sources: llkv-expr/src/expr.rs:1-749 llkv-executor/src/translation/expression.rs:1-424 llkv-table/src/planner/program.rs:1-710
Expression Type Hierarchy
LLKV uses two primary expression types, both generic over the field identifier type F:
Expression Types
| Type | Purpose | Variants | Example |
|---|---|---|---|
Expr<'a, F> | Boolean predicates for filtering rows | And, Or, Not, Pred, Compare, InList, IsNull, Literal, Exists | WHERE age > 18 AND status = 'active' |
ScalarExpr<F> | Arithmetic/scalar computations returning values | Column, Literal, Binary, Aggregate, Cast, Case, Coalesce, GetField, Compare, Not, IsNull, Random, ScalarSubquery | SELECT price * 1.1 AS adjusted_price |
Filter<'a, F> | Single-field predicate | Field ID + Operator | age > 18 |
Operator<'a> | Comparison operator against literals | Equals, Range, GreaterThan, LessThan, In, StartsWith, EndsWith, Contains, IsNull, IsNotNull | IN (1, 2, 3) |
Type Parameterization Flow
Sources: llkv-expr/src/expr.rs:14-333 llkv-plan/src/plans.rs:28-67 llkv-executor/src/translation/expression.rs:18-174
Expression Lifecycle
Expressions flow through multiple stages from SQL text to execution against storage:
Stage 1: Planning
The SQL layer (llkv-sql) parses SQL statements and constructs plan structures containing expressions. At this stage, column references use string names from the SQL text:
- Predicates : Stored as
Expr<'static, String>inSelectFilter - Projections : Stored as
ScalarExpr<String>inSelectProjection::Computed - Assignments : Stored as
ScalarExpr<String>inUpdatePlan::assignments
Sources: llkv-plan/src/plans.rs:28-34 llkv-sql/src/translator/mod.rs (inferred from architecture)
Stage 2: Translation
The executor translates string-based expressions to field-ID-based expressions by consulting the table schema:
- Column Resolution :
translate_predicateandtranslate_scalarwalk the expression tree - Schema Lookup : Each column name is resolved to a
FieldIdusingExecutorSchema::resolve - Type Inference : For computed projections,
infer_computed_data_typedetermines the Arrow data type - Special Columns : System columns like
"rowid"map to special field IDs (e.g.,ROW_ID_FIELD_ID)
Translation is implemented via iterative traversal to avoid stack overflow on deeply nested expressions (50k+ nodes).
Sources: llkv-executor/src/translation/expression.rs:18-174 llkv-executor/src/translation/expression.rs:176-387 llkv-executor/src/translation/schema.rs:53-123
Stage 3: Compilation
The table layer compiles Expr<FieldId> into bytecode for efficient evaluation:
-
Normalization :
normalize_predicateapplies De Morgan's laws and flattens nested AND/OR nodes -
Compilation :
ProgramCompiler::compilegenerates two programs:EvalProgram: Stack-based bytecode for predicate evaluationDomainProgram: Bytecode for tracking which fields affect row visibility
-
Fusion Optimization : Multiple predicates on the same field are fused into a single
FusedAndoperation
graph TB
Input["Expr<FieldId>\nNOT(age > 18 AND status = 'active')"]
Norm["normalize_predicate\nApply De Morgan's law"]
Normal["Expr<FieldId>\nOR([\n NOT(age > 18),\n NOT(status = 'active')\n])"]
Compile["ProgramCompiler::compile"]
subgraph "Output Programs"
Eval["EvalProgram\nops: [\n PushCompare(age, Gt, 18),\n Not,\n PushCompare(status, Eq, 'active'),\n Not,\n Or(2)\n]"]
Domain["DomainProgram\nops: [\n PushCompareDomain(...),\n PushCompareDomain(...),\n Union(2)\n]"]
end
Input --> Norm
Norm --> Normal
Normal --> Compile
Compile --> Eval
Compile --> Domain
Sources: llkv-table/src/planner/program.rs:286-318 llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:550-631
Stage 4: Evaluation
Compiled programs are executed against Arrow RecordBatch data:
- EvalProgram : Evaluates predicates row-by-row using a value stack, producing boolean results
- DomainProgram : Identifies which row IDs could possibly match (used for optimization)
- ScalarExpr : Evaluated via
NumericKernelsfor vectorized arithmetic operations
The evaluation engine handles Arrow's columnar format efficiently through zero-copy operations and SIMD-friendly algorithms.
Sources: llkv-table/src/planner/evaluator.rs (inferred from architecture), llkv-executor/src/lib.rs:254-296
Key Components
ProgramCompiler
Compiles normalized expressions into bytecode:
Key Optimizations :
- Predicate Fusion :
gather_fuseddetects multiple predicates on the same field and emitsFusedAndoperations - Domain Caching : Domain programs are memoized by expression identity to avoid recompilation
- Stack-Based Evaluation : Operations push/pop from a value stack, avoiding recursive evaluation overhead
Sources: llkv-table/src/planner/program.rs:257-284 llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:518-542
Bytecode Operations
EvalOp Variants
| Operation | Purpose | Stack Effect |
|---|---|---|
PushPredicate(filter) | Evaluate single-field predicate | Push boolean |
PushCompare{left, op, right} | Evaluate comparison between scalar expressions | Push boolean |
PushInList{expr, list, negated} | Evaluate IN/NOT IN list membership | Push boolean |
PushIsNull{expr, negated} | Evaluate IS NULL / IS NOT NULL | Push boolean |
PushLiteral(bool) | Push constant boolean | Push boolean |
FusedAnd{field_id, filters} | Evaluate multiple predicates on same field (optimized) | Push boolean |
And{child_count} | Pop N booleans, push AND result | Pop N, push 1 |
Or{child_count} | Pop N booleans, push OR result | Pop N, push 1 |
Not{domain} | Pop boolean, negate, push result (uses domain for optimization) | Pop 1, push 1 |
DomainOp Variants
| Operation | Purpose | Stack Effect |
|---|---|---|
PushFieldAll(field_id) | All rows where field exists | Push RowSet |
PushCompareDomain{left, right, op, fields} | Rows where comparison could be true | Push RowSet |
PushInListDomain{expr, list, fields, negated} | Rows where IN list could be true | Push RowSet |
PushIsNullDomain{expr, fields, negated} | Rows where NULL test could be true | Push RowSet |
PushLiteralFalse | Empty row set | Push RowSet |
PushAllRows | All rows | Push RowSet |
Union{child_count} | Pop N row sets, push union | Pop N, push 1 |
Intersect{child_count} | Pop N row sets, push intersection | Pop N, push 1 |
Sources: llkv-table/src/planner/program.rs:36-67 llkv-table/src/planner/program.rs:221-254
Expression Translation
Translation resolves column names to field IDs through schema lookup:
Special Handling :
- Rowid Column :
"rowid"(case-insensitive) maps toROW_ID_FIELD_IDconstant - Flexible Matching : Supports qualified names (
table.column) and unqualified names (column) - Error Handling : Unknown columns produce descriptive error messages with the column name
Sources: llkv-executor/src/translation/expression.rs:390-407 llkv-executor/src/translation/expression.rs:410-423
Type Inference
The executor infers Arrow data types for computed projections to construct the output schema:
Type Inference Rules
| Expression Pattern | Inferred Type |
|---|---|
Literal(Integer) | DataType::Int64 |
Literal(Float) | DataType::Float64 |
Literal(Decimal(v)) | DataType::Decimal128(v.precision(), v.scale()) |
Literal(String) | DataType::Utf8 |
Literal(Date32) | DataType::Date32 |
Literal(Boolean) | DataType::Boolean |
Literal(Null) | DataType::Null |
Column(field_id) | Lookup from schema, normalized to Int64/Float64 |
Binary{...} | Float64 if any operand is float, else Int64 |
Compare{...} | DataType::Int64 (boolean as integer) |
Aggregate(...) | DataType::Int64 (most aggregates) |
Cast{data_type, ...} | data_type (explicit cast) |
Random | DataType::Float64 |
Numeric Type Normalization : Small integers (Int8, Int16, Int32, Boolean) normalize to Int64, while all floating-point types normalize to Float64. This simplifies arithmetic evaluation.
Sources: llkv-executor/src/translation/schema.rs:53-123 llkv-executor/src/translation/schema.rs:125-147 llkv-executor/src/translation/schema.rs:149-243
Subquery Support
Expressions support two types of subqueries:
EXISTS Predicates
Used in WHERE clauses to test for row existence:
- Structure :
Expr::Exists(SubqueryExpr{id, negated}) - Planning : Stored in
SelectFilter::subquerieswith correlated column bindings - Evaluation : Executor binds outer row values to correlated columns, executes subquery plan, returns true if any rows match
Scalar Subqueries
Used in projections to return a single value:
- Structure :
ScalarExpr::ScalarSubquery(ScalarSubqueryExpr{id}) - Planning : Stored in
SelectPlan::scalar_subquerieswith correlated column bindings - Evaluation : Executor binds outer row values, executes subquery, extracts single value
- Error Handling : Returns error if subquery returns multiple rows or columns
Sources: llkv-expr/src/expr.rs:42-63 llkv-plan/src/plans.rs:36-56 llkv-executor/src/lib.rs:774-839
Normalization
Expression normalization applies logical transformations before compilation:
De Morgan's Laws
NOT is pushed down through AND/OR using De Morgan's laws:
NOT(A AND B)→NOT(A) OR NOT(B)NOT(A OR B)→NOT(A) AND NOT(B)NOT(NOT(A))→A
Flattening
Nested AND/OR nodes are flattened:
AND(A, AND(B, C))→AND(A, B, C)OR(A, OR(B, C))→OR(A, B, C)
Special Cases
NOT(Literal(true))→Literal(false)NOT(IsNull{expr, false})→IsNull{expr, true}
Sources: llkv-table/src/planner/program.rs:286-343
Expression Operators
Binary Operators (BinaryOp)
| Operator | Semantics |
|---|---|
Add | Addition (a + b) |
Subtract | Subtraction (a - b) |
Multiply | Multiplication (a * b) |
Divide | Division (a / b) |
Modulo | Modulus (a % b) |
And | Bitwise AND (a & b) |
Or | Bitwise OR (`a |
BitwiseShiftLeft | Left shift (a << b) |
BitwiseShiftRight | Right shift (a >> b) |
Comparison Operators (CompareOp)
| Operator | Semantics |
|---|---|
Eq | Equality (a = b) |
NotEq | Inequality (a != b) |
Lt | Less than (a < b) |
LtEq | Less than or equal (a <= b) |
Gt | Greater than (a > b) |
GtEq | Greater than or equal (a >= b) |
Comparisons in ScalarExpr::Compare return 1 for true, 0 for false, NULL for NULL propagation.
Sources: llkv-expr/src/expr.rs:270-293
Memory Management
Expression Lifetimes
The 'a lifetime parameter in Expr<'a, F> allows borrowed operators to avoid allocations:
Operator::In(&'a [Literal]): Borrows slice from call siteOperator::StartsWith{pattern: &'a str, ...}: Borrows pattern stringFilter<'a, F>: Contains borrowedOperator<'a>
Owned Variants : EvalProgram and DomainProgram use OwnedOperator and OwnedFilter for storage, converting borrowed operators to owned values.
Zero-Copy Pattern
During evaluation, predicates borrow from the compiled program rather than cloning operators, enabling zero-copy predicate evaluation against Arrow arrays.
Sources: llkv-expr/src/expr.rs:295-333 llkv-table/src/planner/program.rs:69-143