This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Scalar Evaluation and NumericKernels
Relevant source files
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-table/src/planner/mod.rs
- llkv-table/src/planner/program.rs
- llkv-table/src/scalar_eval.rs
Purpose and Scope
This page documents the scalar expression evaluation engine used during table scans to compute expressions like col1 + col2 * 3, CAST(col AS Float64), and CASE statements. The NumericKernels utility centralizes numeric computation logic, providing both row-by-row and vectorized batch evaluation strategies. For the abstract expression AST that gets evaluated, see Expression AST. For how expressions are compiled into bytecode programs for predicate evaluation, see Program Compilation.
Overview
The scalar evaluation system provides a unified numeric computation layer that operates over Arrow arrays during table scans. When a query contains computed projections like SELECT col1 + col2 AS sum FROM table, the executor needs to efficiently evaluate these expressions across potentially millions of rows. The NumericKernels struct and associated types provide:
- Type abstraction : Wraps Arrow's
Int64Array,Float64Array, andDecimal128Arrayinto a unifiedNumericArrayinterface - Evaluation strategies : Supports both row-by-row evaluation (for complex expressions) and vectorized batch evaluation (for simple arithmetic)
- Optimization : Applies algebraic simplification to detect affine transformations and constant folding opportunities
- Type coercion : Handles implicit casting between integer, float, and decimal types following SQLite-style semantics
Sources : llkv-table/src/scalar_eval.rs:1-22
graph TB
subgraph "Input Layer"
ARROW_INT["Int64Array\n(Arrow)"]
ARROW_FLOAT["Float64Array\n(Arrow)"]
ARROW_DEC["Decimal128Array\n(Arrow)"]
end
subgraph "Abstraction Layer"
NUM_ARRAY["NumericArray\nkind: NumericKind\nlen: usize"]
NUM_VALUE["NumericValue\nInteger(i64)\nFloat(f64)\nDecimal(DecimalValue)"]
end
subgraph "Evaluation Engine"
KERNELS["NumericKernels\nevaluate_value()\nevaluate_batch()\nsimplify()"]
end
subgraph "Output Layer"
RESULT_ARRAY["ArrayRef\n(Arrow)"]
end
ARROW_INT --> NUM_ARRAY
ARROW_FLOAT --> NUM_ARRAY
ARROW_DEC --> NUM_ARRAY
NUM_ARRAY --> NUM_VALUE
NUM_VALUE --> KERNELS
KERNELS --> RESULT_ARRAY
style KERNELS fill:#e1f5ff
Core Data Types
NumericKind
An enum distinguishing the underlying numeric representation. This preserves type information through evaluation to enable intelligent casting decisions:
Sources : llkv-table/src/scalar_eval.rs:26-32
NumericValue
A tagged union representing a single numeric value while preserving its original type. Provides conversion methods to target types:
| Variant | Description | Conversion Methods |
|---|---|---|
Integer(i64) | Signed 64-bit integer | as_f64(), as_i64() |
Float(f64) | 64-bit floating point | as_f64() |
Decimal(DecimalValue) | Fixed-precision decimal | as_f64() |
All variants support .kind() to retrieve the original NumericKind.
Sources : llkv-table/src/scalar_eval.rs:34-69
NumericArray
Wraps Arrow array types with a unified interface for numeric access. Internally stores optional Arc<Int64Array>, Arc<Float64Array>, or Arc<Decimal128Array> based on the kind field:
Key Methods :
graph LR
subgraph "NumericArray"
KIND["kind: NumericKind"]
LEN["len: usize"]
INT_DATA["int_data: Option<Arc<Int64Array>>"]
FLOAT_DATA["float_data: Option<Arc<Float64Array>>"]
DECIMAL_DATA["decimal_data: Option<Arc<Decimal128Array>>"]
end
KIND -.determines.-> INT_DATA
KIND -.determines.-> FLOAT_DATA
KIND -.determines.-> DECIMAL_DATA
try_from_arrow(array: &ArrayRef): Constructs from any Arrow array, applying type casting as neededvalue(idx: usize): ExtractsOption<NumericValue>at the given indexpromote_to_float(): Converts to Float64 representation for mixed-type arithmeticto_array_ref(): Exports back to ArrowArrayRef
Sources : llkv-table/src/scalar_eval.rs:83-383
NumericKernels API
The NumericKernels struct provides static methods for expression evaluation and optimization. It serves as the primary entry point for scalar computation during table scans.
Field Collection
Recursively traverses a scalar expression to identify all referenced column fields. Used by the table planner to determine which columns must be fetched from storage.
Sources : llkv-table/src/scalar_eval.rs:455-526
Array Preparation
Converts a set of Arrow arrays into the NumericArray representation, applying type coercion as needed. The needed_fields parameter filters to only the columns referenced by the expression being evaluated. Returns a FxHashMap<FieldId, NumericArray> for fast lookup during evaluation.
Sources : llkv-table/src/scalar_eval.rs:528-547
Value-by-Value Evaluation
Evaluates a scalar expression for a single row at index idx. Supports:
- Binary arithmetic (
+,-,*,/,%) - Comparisons (
=,<,>, etc.) - Logical operators (
NOT,IS NULL) - Type casts (
CAST(... AS Float64)) - Control flow (
CASE,COALESCE) - Random number generation (
RANDOM())
Returns None for NULL propagation.
Sources : llkv-table/src/scalar_eval.rs:549-673
Batch Evaluation
Evaluates an expression across all rows in a batch, returning an ArrayRef. The implementation attempts vectorized evaluation for simple expressions (single column, literals, affine transformations) and falls back to row-by-row evaluation for complex cases.
Sources : llkv-table/src/scalar_eval.rs:676-712
graph TB
EXPR["ScalarExpr<FieldId>"]
SIMPLIFY["simplify()\nDetect affine patterns"]
VECTORIZE["try_evaluate_vectorized()\nCheck for fast path"]
FAST["Vectorized Evaluation\nDirect Arrow compute"]
SLOW["Row-by-Row Loop\nevaluate_value()
per row"]
RESULT["ArrayRef"]
EXPR --> SIMPLIFY
SIMPLIFY --> VECTORIZE
VECTORIZE -->|Success| FAST
VECTORIZE -->|Fallback| SLOW
FAST --> RESULT
SLOW --> RESULT
Vectorization and Optimization
VectorizedExpr
Internal representation for expressions that can be evaluated without per-row dispatch:
The try_evaluate_vectorized method attempts to decompose complex expressions into VectorizedExpr nodes, enabling efficient vectorized computation for binary operations between arrays and scalars.
Sources : llkv-table/src/scalar_eval.rs:385-414
graph LR
INPUT["col * 3 + 5"]
DETECT["Detect Affine Pattern"]
AFFINE["AffineExpr\nfield: col\nscale: 3.0\noffset: 5.0"]
FAST_EVAL["Single Pass Evaluation\nemit_no_nulls()"]
INPUT --> DETECT
DETECT --> AFFINE
AFFINE --> FAST_EVAL
Affine Expression Detection
The simplify method detects affine transformations of the form scale * field + offset:
When an affine pattern is detected, the executor can apply the transformation in a single pass without intermediate allocations. The try_extract_affine_expr method recursively analyzes binary arithmetic trees to identify this pattern.
Sources : llkv-table/src/scalar_eval.rs:1138-1261
Constant Folding
The simplification pass performs constant folding for expressions like 2 + 3 or 10.0 / 2.0, replacing them with Literal(5) or Literal(5.0). This eliminates redundant computation during execution.
Sources : llkv-table/src/scalar_eval.rs:997-1137
Type Coercion and Casting
Implicit Coercion
When evaluating binary operations on mixed types, the system applies implicit promotion rules:
| Left Type | Right Type | Result Type | Behavior |
|---|---|---|---|
| Integer | Integer | Integer | No conversion |
| Integer | Float | Float | Promote left to Float64 |
| Float | Integer | Float | Promote right to Float64 |
| Decimal | Any | Float | Convert both to Float64 |
The infer_result_kind method determines the target type before evaluation, and to_aligned_array_ref applies the necessary promotions.
Sources : llkv-table/src/scalar_eval.rs:1398-1447
Explicit Casting
The CAST expression variant supports explicit type conversion:
Casting is handled during evaluation by:
- Evaluating the inner expression to
NumericValue - Converting to the target
NumericKindviacast_numeric_value_to_kind - Constructing the result array with the target Arrow
DataType
Special handling exists for DataType::Date32 casts, which use the llkv-plan date utilities.
Sources : llkv-table/src/scalar_eval.rs:1449-1472 llkv-table/src/scalar_eval.rs:611-624
sequenceDiagram
participant Planner as TablePlanner
participant Executor as TableExecutor
participant Kernels as NumericKernels
participant Store as ColumnStore
Planner->>Planner: Analyze projections
Planner->>Kernels: collect_fields(expr)
Kernels-->>Planner: Set<FieldId>
Planner->>Planner: Build unique_lfids list
Executor->>Store: Gather columns for row batch
Store-->>Executor: Vec<ArrayRef>
Executor->>Kernels: prepare_numeric_arrays(lfids, arrays, fields)
Kernels-->>Executor: NumericArrayMap
Executor->>Kernels: evaluate_batch_simplified(expr, len, arrays)
Kernels->>Kernels: try_evaluate_vectorized()
alt Vectorized
Kernels->>Kernels: compute_binary_array_array()
else Fallback
Kernels->>Kernels: Loop: evaluate_value(expr, idx)
end
Kernels-->>Executor: ArrayRef (result column)
Executor->>Executor: Append to RecordBatch
Integration with Table Scans
The numeric evaluation engine is invoked by the table executor when processing computed projections. The integration flow:
Projection Evaluation Context
The ProjectionEval enum distinguishes between direct column references and computed expressions:
For Computed variants, the planner:
- Calls
NumericKernels::simplify()to optimize the expression - Invokes
NumericKernels::collect_fields()to determine dependencies - Stores the simplified expression for evaluation
During execution, RowStreamBuilder materializes computed columns by calling evaluate_batch_simplified for each expression.
Sources : llkv-table/src/planner/mod.rs:494-498 llkv-table/src/planner/mod.rs:1073-1107
Passthrough Optimization
The planner detects when a computed expression is simply a column reference (after simplification) via NumericKernels::passthrough_column(). In this case, the column is fetched directly from storage without re-evaluation:
This avoids redundant computation for queries like SELECT col + 0 AS x.
Sources : llkv-table/src/planner/mod.rs:1110-1116 llkv-table/src/scalar_eval.rs:874-907
Data Type Inference
The evaluation engine must determine result types for expressions before evaluation to construct properly-typed Arrow arrays. The infer_computed_data_type function in llkv-executor delegates to numeric kernel logic:
| Expression Type | Inferred Data Type | Rule |
|---|---|---|
Literal(Integer) | Int64 | Direct mapping |
Literal(Float) | Float64 | Direct mapping |
Binary { ... } | Int64 or Float64 | Based on operand types |
Compare { ... } | Int64 | Boolean as 0/1 integer |
Cast { data_type, ... } | data_type | Explicit type |
Random | Float64 | Always float |
The expression_uses_float helper recursively checks if any operand is floating-point, promoting the result type accordingly.
Sources : llkv-executor/src/translation/schema.rs:53-123
Performance Characteristics
Row-by-Row Evaluation
Used for:
- Expressions with control flow (
CASE,COALESCE) - Expressions containing
CASTto non-numeric types - Expressions with interval arithmetic (date operations)
Cost : O(n) row dispatch overhead, branch mispredictions on conditionals
Vectorized Evaluation
Used for:
- Simple arithmetic (
col1 + col2,col * 3) - Single column references
- Constant literals
Cost : O(n) with SIMD-friendly memory access patterns, no per-row dispatch
graph LR
INPUT["Int64Array\n[1,2,3,4,5]"]
AFFINE["scale=2.0\noffset=10.0"]
CALLBACK["emit_no_nulls(\nlen, /i/ 2.0*values[i]+10.0\n)"]
OUTPUT["Float64Array\n[12,14,16,18,20]"]
INPUT --> AFFINE
AFFINE --> CALLBACK
CALLBACK --> OUTPUT
Affine Evaluation
Special case for scale * field + offset expressions. The executor generates values directly into the output buffer using emit_no_nulls or emit_with_nulls callbacks, avoiding intermediate allocations.
Sources : llkv-table/src/planner/mod.rs:253-357
Key Implementation Details
NULL Handling
NULL values propagate through arithmetic operations according to SQL semantics:
NULL + 5→NULLNULL IS NULL→1(true)COALESCE(NULL, 5)→5
The NumericValue is wrapped in Option<T>, with None representing SQL NULL. Binary operations return None if either operand is None.
Sources : llkv-table/src/scalar_eval.rs:564-571
Type Safety
The system maintains type safety through:
- Tagged unions :
NumericValuepreserves original type via the discriminant - Explicit promotion :
promote_to_float()is called only when type mixing requires it - Result type inference : The planner determines output types before evaluation
This prevents silent precision loss and enables query optimizations based on type information.
Sources : llkv-table/src/scalar_eval.rs:295-342
Memory Efficiency
The NumericArray struct uses Arc<T> for backing arrays, enabling zero-copy sharing when:
- Returning a column directly without computation
- Slicing arrays for sorted run evaluation
- Sharing arrays across multiple expressions referencing the same column
The to_array_ref() method clones the Arc, not the underlying data.
Sources : llkv-table/src/scalar_eval.rs:275-293