Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression System

Relevant source files

The expression system provides the intermediate representation for predicates, projections, and computations in LLKV. It defines a strongly-typed Abstract Syntax Tree (AST) that decouples query logic from concrete storage formats, enabling optimization and efficient evaluation against Apache Arrow data structures.

This page covers the overall architecture of the expression system. For details on specific components:

  • Expression AST structure and types: see 5.1
  • Column name resolution and translation: see 5.2
  • Bytecode compilation and normalization: see 5.3
  • Scalar evaluation engine: see 5.4
  • Aggregate function evaluation: see 5.5

Purpose and Scope

The expression system serves three primary functions:

  1. Representation : Defines generic AST types (Expr<F>, ScalarExpr<F>) parameterized over field identifiers, supporting both string-based column names (during planning) and integer field IDs (during execution)

  2. Translation : Resolves column names to storage field identifiers by consulting the catalog, enabling type-safe access to table columns

  3. Compilation : Transforms normalized expressions into stack-based bytecode (EvalProgram, DomainProgram) for efficient vectorized evaluation

The system is located primarily in llkv-expr/, with translation logic in llkv-executor/src/translation/ and compilation in llkv-table/src/planner/program.rs.

Sources: llkv-expr/src/expr.rs:1-749 llkv-executor/src/translation/expression.rs:1-424 llkv-table/src/planner/program.rs:1-710

Expression Type Hierarchy

LLKV uses two primary expression types, both generic over the field identifier type F:

Expression Types

TypePurposeVariantsExample
Expr<'a, F>Boolean predicates for filtering rowsAnd, Or, Not, Pred, Compare, InList, IsNull, Literal, ExistsWHERE age > 18 AND status = 'active'
ScalarExpr<F>Arithmetic/scalar computations returning valuesColumn, Literal, Binary, Aggregate, Cast, Case, Coalesce, GetField, Compare, Not, IsNull, Random, ScalarSubquerySELECT price * 1.1 AS adjusted_price
Filter<'a, F>Single-field predicateField ID + Operatorage > 18
Operator<'a>Comparison operator against literalsEquals, Range, GreaterThan, LessThan, In, StartsWith, EndsWith, Contains, IsNull, IsNotNullIN (1, 2, 3)

Type Parameterization Flow

Sources: llkv-expr/src/expr.rs:14-333 llkv-plan/src/plans.rs:28-67 llkv-executor/src/translation/expression.rs:18-174

Expression Lifecycle

Expressions flow through multiple stages from SQL text to execution against storage:

Stage 1: Planning

The SQL layer (llkv-sql) parses SQL statements and constructs plan structures containing expressions. At this stage, column references use string names from the SQL text:

  • Predicates : Stored as Expr<'static, String> in SelectFilter
  • Projections : Stored as ScalarExpr<String> in SelectProjection::Computed
  • Assignments : Stored as ScalarExpr<String> in UpdatePlan::assignments

Sources: llkv-plan/src/plans.rs:28-34 llkv-sql/src/translator/mod.rs (inferred from architecture)

Stage 2: Translation

The executor translates string-based expressions to field-ID-based expressions by consulting the table schema:

  1. Column Resolution : translate_predicate and translate_scalar walk the expression tree
  2. Schema Lookup : Each column name is resolved to a FieldId using ExecutorSchema::resolve
  3. Type Inference : For computed projections, infer_computed_data_type determines the Arrow data type
  4. Special Columns : System columns like "rowid" map to special field IDs (e.g., ROW_ID_FIELD_ID)

Translation is implemented via iterative traversal to avoid stack overflow on deeply nested expressions (50k+ nodes).

Sources: llkv-executor/src/translation/expression.rs:18-174 llkv-executor/src/translation/expression.rs:176-387 llkv-executor/src/translation/schema.rs:53-123

Stage 3: Compilation

The table layer compiles Expr<FieldId> into bytecode for efficient evaluation:

  1. Normalization : normalize_predicate applies De Morgan's laws and flattens nested AND/OR nodes

  2. Compilation : ProgramCompiler::compile generates two programs:

    • EvalProgram: Stack-based bytecode for predicate evaluation
    • DomainProgram: Bytecode for tracking which fields affect row visibility
  3. Fusion Optimization : Multiple predicates on the same field are fused into a single FusedAnd operation

graph TB
    Input["Expr&lt;FieldId&gt;\nNOT(age &gt; 18 AND status = 'active')"]
Norm["normalize_predicate\nApply De Morgan's law"]
Normal["Expr&lt;FieldId&gt;\nOR([\n NOT(age &gt; 18),\n NOT(status = 'active')\n])"]
Compile["ProgramCompiler::compile"]
subgraph "Output Programs"
        Eval["EvalProgram\nops: [\n PushCompare(age, Gt, 18),\n Not,\n PushCompare(status, Eq, 'active'),\n Not,\n Or(2)\n]"]
Domain["DomainProgram\nops: [\n PushCompareDomain(...),\n PushCompareDomain(...),\n Union(2)\n]"]
end
    
 
   Input --> Norm
 
   Norm --> Normal
 
   Normal --> Compile
 
   Compile --> Eval
 
   Compile --> Domain

Sources: llkv-table/src/planner/program.rs:286-318 llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:550-631

Stage 4: Evaluation

Compiled programs are executed against Arrow RecordBatch data:

  1. EvalProgram : Evaluates predicates row-by-row using a value stack, producing boolean results
  2. DomainProgram : Identifies which row IDs could possibly match (used for optimization)
  3. ScalarExpr : Evaluated via NumericKernels for vectorized arithmetic operations

The evaluation engine handles Arrow's columnar format efficiently through zero-copy operations and SIMD-friendly algorithms.

Sources: llkv-table/src/planner/evaluator.rs (inferred from architecture), llkv-executor/src/lib.rs:254-296

Key Components

ProgramCompiler

Compiles normalized expressions into bytecode:

Key Optimizations :

  • Predicate Fusion : gather_fused detects multiple predicates on the same field and emits FusedAnd operations
  • Domain Caching : Domain programs are memoized by expression identity to avoid recompilation
  • Stack-Based Evaluation : Operations push/pop from a value stack, avoiding recursive evaluation overhead

Sources: llkv-table/src/planner/program.rs:257-284 llkv-table/src/planner/program.rs:416-516 llkv-table/src/planner/program.rs:518-542

Bytecode Operations

EvalOp Variants

OperationPurposeStack Effect
PushPredicate(filter)Evaluate single-field predicatePush boolean
PushCompare{left, op, right}Evaluate comparison between scalar expressionsPush boolean
PushInList{expr, list, negated}Evaluate IN/NOT IN list membershipPush boolean
PushIsNull{expr, negated}Evaluate IS NULL / IS NOT NULLPush boolean
PushLiteral(bool)Push constant booleanPush boolean
FusedAnd{field_id, filters}Evaluate multiple predicates on same field (optimized)Push boolean
And{child_count}Pop N booleans, push AND resultPop N, push 1
Or{child_count}Pop N booleans, push OR resultPop N, push 1
Not{domain}Pop boolean, negate, push result (uses domain for optimization)Pop 1, push 1

DomainOp Variants

OperationPurposeStack Effect
PushFieldAll(field_id)All rows where field existsPush RowSet
PushCompareDomain{left, right, op, fields}Rows where comparison could be truePush RowSet
PushInListDomain{expr, list, fields, negated}Rows where IN list could be truePush RowSet
PushIsNullDomain{expr, fields, negated}Rows where NULL test could be truePush RowSet
PushLiteralFalseEmpty row setPush RowSet
PushAllRowsAll rowsPush RowSet
Union{child_count}Pop N row sets, push unionPop N, push 1
Intersect{child_count}Pop N row sets, push intersectionPop N, push 1

Sources: llkv-table/src/planner/program.rs:36-67 llkv-table/src/planner/program.rs:221-254

Expression Translation

Translation resolves column names to field IDs through schema lookup:

Special Handling :

  • Rowid Column : "rowid" (case-insensitive) maps to ROW_ID_FIELD_ID constant
  • Flexible Matching : Supports qualified names (table.column) and unqualified names (column)
  • Error Handling : Unknown columns produce descriptive error messages with the column name

Sources: llkv-executor/src/translation/expression.rs:390-407 llkv-executor/src/translation/expression.rs:410-423

Type Inference

The executor infers Arrow data types for computed projections to construct the output schema:

Type Inference Rules

Expression PatternInferred Type
Literal(Integer)DataType::Int64
Literal(Float)DataType::Float64
Literal(Decimal(v))DataType::Decimal128(v.precision(), v.scale())
Literal(String)DataType::Utf8
Literal(Date32)DataType::Date32
Literal(Boolean)DataType::Boolean
Literal(Null)DataType::Null
Column(field_id)Lookup from schema, normalized to Int64/Float64
Binary{...}Float64 if any operand is float, else Int64
Compare{...}DataType::Int64 (boolean as integer)
Aggregate(...)DataType::Int64 (most aggregates)
Cast{data_type, ...}data_type (explicit cast)
RandomDataType::Float64

Numeric Type Normalization : Small integers (Int8, Int16, Int32, Boolean) normalize to Int64, while all floating-point types normalize to Float64. This simplifies arithmetic evaluation.

Sources: llkv-executor/src/translation/schema.rs:53-123 llkv-executor/src/translation/schema.rs:125-147 llkv-executor/src/translation/schema.rs:149-243

Subquery Support

Expressions support two types of subqueries:

EXISTS Predicates

Used in WHERE clauses to test for row existence:

  • Structure : Expr::Exists(SubqueryExpr{id, negated})
  • Planning : Stored in SelectFilter::subqueries with correlated column bindings
  • Evaluation : Executor binds outer row values to correlated columns, executes subquery plan, returns true if any rows match

Scalar Subqueries

Used in projections to return a single value:

  • Structure : ScalarExpr::ScalarSubquery(ScalarSubqueryExpr{id})
  • Planning : Stored in SelectPlan::scalar_subqueries with correlated column bindings
  • Evaluation : Executor binds outer row values, executes subquery, extracts single value
  • Error Handling : Returns error if subquery returns multiple rows or columns

Sources: llkv-expr/src/expr.rs:42-63 llkv-plan/src/plans.rs:36-56 llkv-executor/src/lib.rs:774-839

Normalization

Expression normalization applies logical transformations before compilation:

De Morgan's Laws

NOT is pushed down through AND/OR using De Morgan's laws:

  • NOT(A AND B)NOT(A) OR NOT(B)
  • NOT(A OR B)NOT(A) AND NOT(B)
  • NOT(NOT(A))A

Flattening

Nested AND/OR nodes are flattened:

  • AND(A, AND(B, C))AND(A, B, C)
  • OR(A, OR(B, C))OR(A, B, C)

Special Cases

  • NOT(Literal(true))Literal(false)
  • NOT(IsNull{expr, false})IsNull{expr, true}

Sources: llkv-table/src/planner/program.rs:286-343

Expression Operators

Binary Operators (BinaryOp)

OperatorSemantics
AddAddition (a + b)
SubtractSubtraction (a - b)
MultiplyMultiplication (a * b)
DivideDivision (a / b)
ModuloModulus (a % b)
AndBitwise AND (a & b)
OrBitwise OR (`a
BitwiseShiftLeftLeft shift (a << b)
BitwiseShiftRightRight shift (a >> b)

Comparison Operators (CompareOp)

OperatorSemantics
EqEquality (a = b)
NotEqInequality (a != b)
LtLess than (a < b)
LtEqLess than or equal (a <= b)
GtGreater than (a > b)
GtEqGreater than or equal (a >= b)

Comparisons in ScalarExpr::Compare return 1 for true, 0 for false, NULL for NULL propagation.

Sources: llkv-expr/src/expr.rs:270-293

Memory Management

Expression Lifetimes

The 'a lifetime parameter in Expr<'a, F> allows borrowed operators to avoid allocations:

  • Operator::In(&'a [Literal]): Borrows slice from call site
  • Operator::StartsWith{pattern: &'a str, ...}: Borrows pattern string
  • Filter<'a, F>: Contains borrowed Operator<'a>

Owned Variants : EvalProgram and DomainProgram use OwnedOperator and OwnedFilter for storage, converting borrowed operators to owned values.

Zero-Copy Pattern

During evaluation, predicates borrow from the compiled program rather than cloning operators, enabling zero-copy predicate evaluation against Arrow arrays.

Sources: llkv-expr/src/expr.rs:295-333 llkv-table/src/planner/program.rs:69-143