Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Scalar Evaluation and NumericKernels

Relevant source files

Purpose and Scope

This page documents the scalar expression evaluation engine used during table scans to compute expressions like col1 + col2 * 3, CAST(col AS Float64), and CASE statements. The NumericKernels utility centralizes numeric computation logic, providing both row-by-row and vectorized batch evaluation strategies. For the abstract expression AST that gets evaluated, see Expression AST. For how expressions are compiled into bytecode programs for predicate evaluation, see Program Compilation.


Overview

The scalar evaluation system provides a unified numeric computation layer that operates over Arrow arrays during table scans. When a query contains computed projections like SELECT col1 + col2 AS sum FROM table, the executor needs to efficiently evaluate these expressions across potentially millions of rows. The NumericKernels struct and associated types provide:

  1. Type abstraction : Wraps Arrow's Int64Array, Float64Array, and Decimal128Array into a unified NumericArray interface
  2. Evaluation strategies : Supports both row-by-row evaluation (for complex expressions) and vectorized batch evaluation (for simple arithmetic)
  3. Optimization : Applies algebraic simplification to detect affine transformations and constant folding opportunities
  4. Type coercion : Handles implicit casting between integer, float, and decimal types following SQLite-style semantics

Sources : llkv-table/src/scalar_eval.rs:1-22

graph TB
    subgraph "Input Layer"
        ARROW_INT["Int64Array\n(Arrow)"]
ARROW_FLOAT["Float64Array\n(Arrow)"]
ARROW_DEC["Decimal128Array\n(Arrow)"]
end
    
    subgraph "Abstraction Layer"
        NUM_ARRAY["NumericArray\nkind: NumericKind\nlen: usize"]
NUM_VALUE["NumericValue\nInteger(i64)\nFloat(f64)\nDecimal(DecimalValue)"]
end
    
    subgraph "Evaluation Engine"
        KERNELS["NumericKernels\nevaluate_value()\nevaluate_batch()\nsimplify()"]
end
    
    subgraph "Output Layer"
        RESULT_ARRAY["ArrayRef\n(Arrow)"]
end
    
 
   ARROW_INT --> NUM_ARRAY
 
   ARROW_FLOAT --> NUM_ARRAY
 
   ARROW_DEC --> NUM_ARRAY
 
   NUM_ARRAY --> NUM_VALUE
 
   NUM_VALUE --> KERNELS
 
   KERNELS --> RESULT_ARRAY
    
    style KERNELS fill:#e1f5ff

Core Data Types

NumericKind

An enum distinguishing the underlying numeric representation. This preserves type information through evaluation to enable intelligent casting decisions:

Sources : llkv-table/src/scalar_eval.rs:26-32

NumericValue

A tagged union representing a single numeric value while preserving its original type. Provides conversion methods to target types:

VariantDescriptionConversion Methods
Integer(i64)Signed 64-bit integeras_f64(), as_i64()
Float(f64)64-bit floating pointas_f64()
Decimal(DecimalValue)Fixed-precision decimalas_f64()

All variants support .kind() to retrieve the original NumericKind.

Sources : llkv-table/src/scalar_eval.rs:34-69

NumericArray

Wraps Arrow array types with a unified interface for numeric access. Internally stores optional Arc<Int64Array>, Arc<Float64Array>, or Arc<Decimal128Array> based on the kind field:

Key Methods :

graph LR
    subgraph "NumericArray"
        KIND["kind: NumericKind"]
LEN["len: usize"]
INT_DATA["int_data: Option&lt;Arc&lt;Int64Array&gt;&gt;"]
FLOAT_DATA["float_data: Option&lt;Arc&lt;Float64Array&gt;&gt;"]
DECIMAL_DATA["decimal_data: Option&lt;Arc&lt;Decimal128Array&gt;&gt;"]
end
    
    KIND -.determines.-> INT_DATA
    KIND -.determines.-> FLOAT_DATA
    KIND -.determines.-> DECIMAL_DATA
  • try_from_arrow(array: &ArrayRef): Constructs from any Arrow array, applying type casting as needed
  • value(idx: usize): Extracts Option<NumericValue> at the given index
  • promote_to_float(): Converts to Float64 representation for mixed-type arithmetic
  • to_array_ref(): Exports back to Arrow ArrayRef

Sources : llkv-table/src/scalar_eval.rs:83-383


NumericKernels API

The NumericKernels struct provides static methods for expression evaluation and optimization. It serves as the primary entry point for scalar computation during table scans.

Field Collection

Recursively traverses a scalar expression to identify all referenced column fields. Used by the table planner to determine which columns must be fetched from storage.

Sources : llkv-table/src/scalar_eval.rs:455-526

Array Preparation

Converts a set of Arrow arrays into the NumericArray representation, applying type coercion as needed. The needed_fields parameter filters to only the columns referenced by the expression being evaluated. Returns a FxHashMap<FieldId, NumericArray> for fast lookup during evaluation.

Sources : llkv-table/src/scalar_eval.rs:528-547

Value-by-Value Evaluation

Evaluates a scalar expression for a single row at index idx. Supports:

  • Binary arithmetic (+, -, *, /, %)
  • Comparisons (=, <, >, etc.)
  • Logical operators (NOT, IS NULL)
  • Type casts (CAST(... AS Float64))
  • Control flow (CASE, COALESCE)
  • Random number generation (RANDOM())

Returns None for NULL propagation.

Sources : llkv-table/src/scalar_eval.rs:549-673

Batch Evaluation

Evaluates an expression across all rows in a batch, returning an ArrayRef. The implementation attempts vectorized evaluation for simple expressions (single column, literals, affine transformations) and falls back to row-by-row evaluation for complex cases.

Sources : llkv-table/src/scalar_eval.rs:676-712

graph TB
    EXPR["ScalarExpr&lt;FieldId&gt;"]
SIMPLIFY["simplify()\nDetect affine patterns"]
VECTORIZE["try_evaluate_vectorized()\nCheck for fast path"]
FAST["Vectorized Evaluation\nDirect Arrow compute"]
SLOW["Row-by-Row Loop\nevaluate_value()
per row"]
RESULT["ArrayRef"]
EXPR --> SIMPLIFY
 
   SIMPLIFY --> VECTORIZE
 
   VECTORIZE -->|Success| FAST
 
   VECTORIZE -->|Fallback| SLOW
 
   FAST --> RESULT
 
   SLOW --> RESULT

Vectorization and Optimization

VectorizedExpr

Internal representation for expressions that can be evaluated without per-row dispatch:

The try_evaluate_vectorized method attempts to decompose complex expressions into VectorizedExpr nodes, enabling efficient vectorized computation for binary operations between arrays and scalars.

Sources : llkv-table/src/scalar_eval.rs:385-414

graph LR
    INPUT["col * 3 + 5"]
DETECT["Detect Affine Pattern"]
AFFINE["AffineExpr\nfield: col\nscale: 3.0\noffset: 5.0"]
FAST_EVAL["Single Pass Evaluation\nemit_no_nulls()"]
INPUT --> DETECT
 
   DETECT --> AFFINE
 
   AFFINE --> FAST_EVAL

Affine Expression Detection

The simplify method detects affine transformations of the form scale * field + offset:

When an affine pattern is detected, the executor can apply the transformation in a single pass without intermediate allocations. The try_extract_affine_expr method recursively analyzes binary arithmetic trees to identify this pattern.

Sources : llkv-table/src/scalar_eval.rs:1138-1261

Constant Folding

The simplification pass performs constant folding for expressions like 2 + 3 or 10.0 / 2.0, replacing them with Literal(5) or Literal(5.0). This eliminates redundant computation during execution.

Sources : llkv-table/src/scalar_eval.rs:997-1137


Type Coercion and Casting

Implicit Coercion

When evaluating binary operations on mixed types, the system applies implicit promotion rules:

Left TypeRight TypeResult TypeBehavior
IntegerIntegerIntegerNo conversion
IntegerFloatFloatPromote left to Float64
FloatIntegerFloatPromote right to Float64
DecimalAnyFloatConvert both to Float64

The infer_result_kind method determines the target type before evaluation, and to_aligned_array_ref applies the necessary promotions.

Sources : llkv-table/src/scalar_eval.rs:1398-1447

Explicit Casting

The CAST expression variant supports explicit type conversion:

Casting is handled during evaluation by:

  1. Evaluating the inner expression to NumericValue
  2. Converting to the target NumericKind via cast_numeric_value_to_kind
  3. Constructing the result array with the target Arrow DataType

Special handling exists for DataType::Date32 casts, which use the llkv-plan date utilities.

Sources : llkv-table/src/scalar_eval.rs:1449-1472 llkv-table/src/scalar_eval.rs:611-624


sequenceDiagram
    participant Planner as TablePlanner
    participant Executor as TableExecutor
    participant Kernels as NumericKernels
    participant Store as ColumnStore
    
    Planner->>Planner: Analyze projections
    Planner->>Kernels: collect_fields(expr)
    Kernels-->>Planner: Set&lt;FieldId&gt;
    Planner->>Planner: Build unique_lfids list
    
    Executor->>Store: Gather columns for row batch
    Store-->>Executor: Vec&lt;ArrayRef&gt;
    
    Executor->>Kernels: prepare_numeric_arrays(lfids, arrays, fields)
    Kernels-->>Executor: NumericArrayMap
    
    Executor->>Kernels: evaluate_batch_simplified(expr, len, arrays)
    Kernels->>Kernels: try_evaluate_vectorized()
    alt Vectorized
        Kernels->>Kernels: compute_binary_array_array()
    else Fallback
        Kernels->>Kernels: Loop: evaluate_value(expr, idx)
    end
    Kernels-->>Executor: ArrayRef (result column)
    
    Executor->>Executor: Append to RecordBatch

Integration with Table Scans

The numeric evaluation engine is invoked by the table executor when processing computed projections. The integration flow:

Projection Evaluation Context

The ProjectionEval enum distinguishes between direct column references and computed expressions:

For Computed variants, the planner:

  1. Calls NumericKernels::simplify() to optimize the expression
  2. Invokes NumericKernels::collect_fields() to determine dependencies
  3. Stores the simplified expression for evaluation

During execution, RowStreamBuilder materializes computed columns by calling evaluate_batch_simplified for each expression.

Sources : llkv-table/src/planner/mod.rs:494-498 llkv-table/src/planner/mod.rs:1073-1107

Passthrough Optimization

The planner detects when a computed expression is simply a column reference (after simplification) via NumericKernels::passthrough_column(). In this case, the column is fetched directly from storage without re-evaluation:

This avoids redundant computation for queries like SELECT col + 0 AS x.

Sources : llkv-table/src/planner/mod.rs:1110-1116 llkv-table/src/scalar_eval.rs:874-907


Data Type Inference

The evaluation engine must determine result types for expressions before evaluation to construct properly-typed Arrow arrays. The infer_computed_data_type function in llkv-executor delegates to numeric kernel logic:

Expression TypeInferred Data TypeRule
Literal(Integer)Int64Direct mapping
Literal(Float)Float64Direct mapping
Binary { ... }Int64 or Float64Based on operand types
Compare { ... }Int64Boolean as 0/1 integer
Cast { data_type, ... }data_typeExplicit type
RandomFloat64Always float

The expression_uses_float helper recursively checks if any operand is floating-point, promoting the result type accordingly.

Sources : llkv-executor/src/translation/schema.rs:53-123


Performance Characteristics

Row-by-Row Evaluation

Used for:

  • Expressions with control flow (CASE, COALESCE)
  • Expressions containing CAST to non-numeric types
  • Expressions with interval arithmetic (date operations)

Cost : O(n) row dispatch overhead, branch mispredictions on conditionals

Vectorized Evaluation

Used for:

  • Simple arithmetic (col1 + col2, col * 3)
  • Single column references
  • Constant literals

Cost : O(n) with SIMD-friendly memory access patterns, no per-row dispatch

graph LR
    INPUT["Int64Array\n[1,2,3,4,5]"]
AFFINE["scale=2.0\noffset=10.0"]
CALLBACK["emit_no_nulls(\nlen, /i/ 2.0*values[i]+10.0\n)"]
OUTPUT["Float64Array\n[12,14,16,18,20]"]
INPUT --> AFFINE
 
   AFFINE --> CALLBACK
 
   CALLBACK --> OUTPUT

Affine Evaluation

Special case for scale * field + offset expressions. The executor generates values directly into the output buffer using emit_no_nulls or emit_with_nulls callbacks, avoiding intermediate allocations.

Sources : llkv-table/src/planner/mod.rs:253-357


Key Implementation Details

NULL Handling

NULL values propagate through arithmetic operations according to SQL semantics:

  • NULL + 5NULL
  • NULL IS NULL1 (true)
  • COALESCE(NULL, 5)5

The NumericValue is wrapped in Option<T>, with None representing SQL NULL. Binary operations return None if either operand is None.

Sources : llkv-table/src/scalar_eval.rs:564-571

Type Safety

The system maintains type safety through:

  1. Tagged unions : NumericValue preserves original type via the discriminant
  2. Explicit promotion : promote_to_float() is called only when type mixing requires it
  3. Result type inference : The planner determines output types before evaluation

This prevents silent precision loss and enables query optimizations based on type information.

Sources : llkv-table/src/scalar_eval.rs:295-342

Memory Efficiency

The NumericArray struct uses Arc<T> for backing arrays, enabling zero-copy sharing when:

  • Returning a column directly without computation
  • Slicing arrays for sorted run evaluation
  • Sharing arrays across multiple expressions referencing the same column

The to_array_ref() method clones the Arc, not the underlying data.

Sources : llkv-table/src/scalar_eval.rs:275-293