This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Scalar Evaluation and NumericKernels
Loading…
Scalar Evaluation and NumericKernels
Relevant source files
- llkv-csv/src/writer.rs
- llkv-expr/src/expr.rs
- llkv-plan/src/plans.rs
- llkv-table/examples/direct_comparison.rs
- llkv-table/examples/performance_benchmark.rs
- llkv-table/examples/test_streaming.rs
- llkv-table/src/table.rs
Purpose and Scope
This page documents the scalar expression evaluation engine in LLKV, covering how ScalarExpr instances are computed against columnar data to produce results for projections, filters, and computed columns. The evaluation system operates on Apache Arrow arrays and leverages vectorization for performance.
For information about the expression AST structure and variants, see Expression AST. For how expressions are translated from SQL column names to field identifiers, see Expression Translation. For the bytecode compilation pipeline, see Program Compilation. For aggregate function evaluation, see Aggregation System.
ScalarExpr Variants
The ScalarExpr<F> enum represents all scalar computations that can be performed in LLKV. Each variant corresponds to a specific type of operation that produces a single value per row.
Sources: llkv-expr/src/expr.rs:126-182
graph TB
ScalarExpr["ScalarExpr<F>"]
ScalarExpr --> Column["Column(F)\nDirect column reference"]
ScalarExpr --> Literal["Literal(Literal)\nConstant value"]
ScalarExpr --> Binary["Binary\nArithmetic operations"]
ScalarExpr --> Not["Not(Box<ScalarExpr>)\nLogical negation"]
ScalarExpr --> IsNull["IsNull\nNULL test"]
ScalarExpr --> Aggregate["Aggregate(AggregateCall)\nAggregate functions"]
ScalarExpr --> GetField["GetField\nStruct field access"]
ScalarExpr --> Cast["Cast\nType conversion"]
ScalarExpr --> Compare["Compare\nComparison ops"]
ScalarExpr --> Coalesce["Coalesce\nFirst non-null"]
ScalarExpr --> ScalarSubquery["ScalarSubquery\nSubquery result"]
ScalarExpr --> Case["Case\nCASE expression"]
ScalarExpr --> Random["Random\nRandom number"]
Binary --> BinaryOp["BinaryOp:\nAdd, Subtract, Multiply,\nDivide, Modulo,\nAnd, Or,\nBitwiseShiftLeft,\nBitwiseShiftRight"]
Compare --> CompareOp["CompareOp:\nEq, NotEq,\nLt, LtEq,\nGt, GtEq"]
The generic type parameter F represents the field identifier type, which is typically String (for SQL column names) or FieldId (for translated physical column identifiers).
Binary Operations
Binary operations perform arithmetic or logical computations between two scalar expressions. The BinaryOp enum defines the supported operators:
| Operator | Symbol | Description | Example |
|---|---|---|---|
Add | + | Addition | col1 + col2 |
Subtract | - | Subtraction | col1 - 100 |
Multiply | * | Multiplication | price * quantity |
Divide | / | Division | total / count |
Modulo | % | Remainder | id % 10 |
And | AND | Logical AND | flag1 AND flag2 |
Or | OR | Logical OR | status1 OR status2 |
BitwiseShiftLeft | << | Left bit shift | mask << 2 |
BitwiseShiftRight | >> | Right bit shift | flags >> 4 |
Sources: llkv-expr/src/expr.rs:310-338
Binary expressions are constructed recursively, allowing complex nested computations:
Comparison Operations
Comparison operations produce boolean results (represented as 1/0 integers) by comparing two scalar expressions. The CompareOp enum defines six comparison operators:
| Operator | Symbol | Description |
|---|---|---|
Eq | = | Equal to |
NotEq | != | Not equal to |
Lt | < | Less than |
LtEq | <= | Less than or equal |
Gt | > | Greater than |
GtEq | >= | Greater than or equal |
Sources: llkv-expr/src/expr.rs:341-363
Comparisons are represented as a ScalarExpr::Compare variant that contains the left operand, operator, and right operand. When evaluated, they produce a boolean value that follows SQL three-valued logic (true, false, NULL).
Evaluation Pipeline
The evaluation of scalar expressions follows a multi-stage pipeline from SQL text to computed Arrow arrays:
Sources: llkv-expr/src/expr.rs:1-819 llkv-table/src/table.rs:29-30
graph LR
SQL["SQL Expression\n'col1 + col2 * 10'"]
Parse["SQL Parser\nsqlparser-rs"]
Translate["Expression Translation\nString → FieldId"]
Compile["Program Compilation\nScalarExpr → EvalProgram"]
Execute["Evaluation Engine\nVectorized Execution"]
Result["Arrow Array\nComputed Results"]
SQL --> Parse
Parse --> Translate
Translate --> Compile
Compile --> Execute
Execute --> Result
Compile -.uses.-> ProgramCompiler["ProgramCompiler\nllkv-compute"]
Execute -.operates on.-> ArrowArrays["Arrow Arrays\nColumnar Data"]
The key stages are:
- Parsing : SQL expressions are parsed into AST nodes by
sqlparser-rs - Translation : Column names (strings) are resolved to
FieldIdidentifiers - Compilation :
ScalarExpris compiled into bytecode byProgramCompiler - Execution : The bytecode is evaluated against Arrow columnar data
Type System and Casting
LLKV uses Arrow’s type system for scalar evaluation. The Cast variant of ScalarExpr performs explicit type conversions:
Sources: llkv-expr/src/expr.rs:154-157
The data_type field is an Arrow DataType that specifies the target type for the conversion. Type casting follows Arrow’s casting rules and handles conversions between:
- Numeric types (integers, floats, decimals)
- String types (UTF-8)
- Temporal types (Date32, timestamps)
- Boolean types
- Struct types
Invalid casts produce NULL values following SQL semantics.
Null Handling and Propagation
Scalar expressions implement SQL three-valued logic where NULL values propagate through most operations. The IsNull variant provides explicit NULL testing:
Sources: llkv-expr/src/expr.rs:139-142
When negated is false, this evaluates to 1 (true) if the expression is NULL, 0 (false) otherwise. When negated is true, it performs the inverse test (IS NOT NULL).
The Coalesce variant provides NULL-coalescing behavior, returning the first non-NULL value from a list of expressions:
Sources: llkv-expr/src/expr.rs165
This is used to implement SQL’s COALESCE(expr1, expr2, ...) function.
CASE Expressions
The Case variant implements SQL CASE expressions with optional operand and ELSE branches:
Sources: llkv-expr/src/expr.rs:169-176
This represents both simple and searched CASE expressions:
- Simple CASE : When
operandisSome, each branch’s WHEN expression is compared to the operand - Searched CASE : When
operandisNone, each branch’s WHEN expression is evaluated as a boolean
The branches vector contains (WHEN, THEN) pairs evaluated in order. If no branch matches, else_expr is returned, or NULL if else_expr is None.
graph TB
subgraph "Row-by-Row Evaluation (Avoided)"
Row1["Row 1:\ncol1=10, col2=5\n→ 10 + 5 = 15"]
Row2["Row 2:\ncol1=20, col2=3\n→ 20 + 3 = 23"]
Row3["Row 3:\ncol1=15, col2=7\n→ 15 + 7 = 22"]
end
subgraph "Vectorized Evaluation (Used)"
Col1["Int64Array\n[10, 20, 15]"]
Col2["Int64Array\n[5, 3, 7]"]
Result["Int64Array\n[15, 23, 22]"]
Col1 --> VectorAdd["Vectorized Add\nSIMD Operations"]
Col2 --> VectorAdd
VectorAdd --> Result
end
style Row1 fill:#f9f9f9
style Row2 fill:#f9f9f9
style Row3 fill:#f9f9f9
Vectorization Strategy
Expression evaluation in LLKV leverages Apache Arrow’s columnar format for vectorized execution. Rather than evaluating expressions row-by-row, operations process entire arrays at once.
Sources: llkv-table/src/table.rs:1-681
Key benefits of vectorization:
- SIMD Instructions : Modern CPUs can process multiple values simultaneously
- Reduced Overhead : Eliminates per-row interpretation overhead
- Cache Efficiency : Columnar layout improves CPU cache utilization
- Arrow Compute Kernels : Leverages highly optimized Arrow implementations
graph TB
Expr["ScalarExpr::Binary\nop=Add"]
Dispatch["Type Dispatch"]
Expr --> Dispatch
Dispatch --> Int64["Int64Array\nAdd kernel"]
Dispatch --> Int32["Int32Array\nAdd kernel"]
Dispatch --> Float64["Float64Array\nAdd kernel"]
Dispatch --> Decimal128["Decimal128Array\nAdd kernel"]
Int64 --> Result1["Int64Array result"]
Int32 --> Result2["Int32Array result"]
Float64 --> Result3["Float64Array result"]
Decimal128 --> Result4["Decimal128Array result"]
Numeric Type Dispatch
LLKV handles multiple numeric types through Arrow’s type system. The evaluation engine uses Arrow’s primitive type traits to dispatch operations:
Sources: llkv-table/src/table.rs:17-20
The dispatch mechanism uses Arrow’s type system to select the appropriate kernel at evaluation time. The macros llkv_for_each_arrow_numeric, llkv_for_each_arrow_boolean, and llkv_for_each_arrow_string provide type-safe iteration over all supported Arrow types.
Aggregate Functions in Scalar Context
The Aggregate variant of ScalarExpr allows aggregate functions to appear in scalar contexts, such as COUNT(*) + 1:
Sources: llkv-expr/src/expr.rs145
The AggregateCall enum includes:
CountStar: Count all rowsCount { expr, distinct }: Count non-NULL valuesSum { expr, distinct }: Sum of valuesTotal { expr, distinct }: Sum with NULL-safe 0 defaultAvg { expr, distinct }: Average of valuesMin(expr): Minimum valueMax(expr): Maximum valueCountNulls(expr): Count NULL valuesGroupConcat { expr, distinct, separator }: String concatenation
Sources: llkv-expr/src/expr.rs:184-215
Each aggregate operates on a ScalarExpr, allowing nested computations like SUM(col1 + col2) or AVG(-price).
Random Number Generation
The Random variant produces pseudo-random floating-point values:
Sources: llkv-expr/src/expr.rs181
Following PostgreSQL and DuckDB semantics, each evaluation produces a new random value in the range [0.0, 1.0). The generator is seeded automatically and does not expose seed control at the SQL level.\n\n## Struct Field Access\n\nThe GetField variant extracts fields from struct expressions](https://github.com/jzombie/rust-llkv/blob/89777726/0.0, 1.0). The generator is seeded automatically and does not expose seed control at the SQL level.\n\n## Struct Field Access\n\nThe GetField variant extracts fields from struct expressions#LNaN-LNaN)
This enables navigation of nested data structures. For example, accessing user.address.city would be represented as:
The evaluation engine resolves field names against Arrow struct schemas at runtime.
Performance Considerations
The scalar evaluation engine includes several optimizations:
Expression Constant Folding
Constant subexpressions are evaluated once during compilation rather than per-row. For example, col1 + (10 * 20) is simplified to col1 + 200 before evaluation.
Predicate Pushdown
When scalar expressions appear in WHERE clauses, they may be pushed down to the storage layer for early filtering. The PredicateFusionCache in llkv-compute caches compiled predicates to avoid recompilation.
Sources: llkv-table/src/table.rs29
Type Specialization
Arrow kernels are specialized for each numeric type, avoiding generic dispatch overhead in tight loops. This ensures that Int64 + Int64 uses dedicated integer addition instructions rather than polymorphic dispatch.
SIMD Acceleration
The underlying storage layer (simd-r-drive) provides SIMD-optimized operations for bulk data movement and filtering, which complements the vectorized evaluation strategy.
Sources: llkv-storage/pager/MemPager llkv-table/src/table.rs21
sequenceDiagram
participant SQL as SQL Engine
participant Planner as TablePlanner
participant Scanner as ScanRowStream
participant Compute as Compute Layer
participant Store as ColumnStore
SQL->>Planner: Execute SELECT with expressions
Planner->>Planner: Compile ScalarExpr to bytecode
Planner->>Scanner: Create scan with projections
loop For each batch
Scanner->>Store: Gather column arrays
Store-->>Scanner: Arrow arrays
Scanner->>Compute: Evaluate expressions
Compute->>Compute: Vectorized operations
Compute-->>Scanner: Computed arrays
Scanner-->>SQL: RecordBatch
end
Integration with Scan Pipeline
Scalar expressions are evaluated during table scans through the ScanProjection system:
Sources: llkv-table/src/table.rs:447-488 llkv-scan/execute/execute_scan
The scan pipeline:
- Gathers base column arrays from the
ColumnStore - Passes arrays to the compute layer for expression evaluation
- Assembles computed results into
RecordBatchinstances - Streams batches to the caller
This design minimizes memory allocation by processing data in fixed-size batches (typically 1024 rows) rather than materializing entire result sets.
Expression Compilation Flow
The complete compilation flow from SQL to executed results:
Sources: llkv-expr/src/expr.rs:1-819 llkv-plan/src/plans.rs:1-1500 llkv-table/src/table.rs:1-681
This pipeline ensures that expressions are validated, optimized, and compiled before execution begins, minimizing runtime overhead.
Dismiss
Refresh this wiki
Enter email to refresh