This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Scan Execution and Optimization

Relevant source files

Purpose and Scope

This page documents the table scan execution engine in the llkv-table crate, which implements the low-level scanning and streaming of data from the column store to higher layers. It covers planning, optimization paths, predicate compilation, and expression evaluation strategies that enable efficient data retrieval.

For information about higher-level query planning and the translation of SQL plans to table operations, see TablePlanner and TableExecutor. For details on how filters are evaluated against individual rows, see Filter Evaluation.

Architecture Overview

The scan execution system is split into two primary components:

Component	Responsibility
`TablePlanner`	Analyzes scan requests, builds plan graphs, compiles predicates into bytecode programs
`TableExecutor`	Executes planned scans using optimization paths, coordinates streaming results

Scan Execution Flow

Sources: llkv-table/src/planner/mod.rs:591-637

Planning Phase

Plan Construction

The TablePlanner::plan_scan method orchestrates plan construction by:

Validating projections are non-empty
Normalizing the filter predicate
Building a plan graph for visualization and analysis
Compiling predicates into executable programs

graph LR
    Input["scan_stream_with_exprs\n(projections, filter, options)"]
Validate["Validate projections"]
Normalize["normalize_predicate\n(flatten And/Or,\napply De Morgan's)"]
Graph["build_plan_graph\n(TableScan → Filter →\nProject → Output)"]
Compile["ProgramCompiler\n(EvalProgram +\nDomainProgram)"]
Output["PlannedScan"]
Input --> Validate
 
   Validate --> Normalize
 
   Validate --> Graph
 
   Normalize --> Compile
 
   Compile --> Output
 
   Graph --> Output

The plan graph encodes the logical operator tree for diagnostic purposes, with nodes representing TableScan, Filter, Project, and Output operators.

Sources: llkv-table/src/planner/mod.rs:612-637 llkv-table/src/planner/mod.rs:639-725

Predicate Compilation

Predicates are compiled into two bytecode programs:

Program Type	Purpose
`EvalProgram`	Stack-based bytecode for evaluating filter conditions
`DomainProgram`	Tracks which row IDs satisfy the predicate during scanning

The ProgramCompiler analyzes the normalized predicate tree and emits instructions for efficient evaluation. Predicate fusion merges multiple predicates on the same field when beneficial.

graph TB
    Filter["Expr&lt;FieldId&gt;\n(normalized predicate)"]
Fusion["PredicateFusionCache\n(analyze per-field stats)"]
Compiler["ProgramCompiler::compile"]
EvalProg["EvalProgram\n(stack-based bytecode)"]
DomainProg["DomainProgram\n(row ID tracking)"]
ProgramSet["ProgramSet\n(contains both programs)"]
Filter --> Fusion
 
   Filter --> Compiler
 
   Fusion --> Compiler
 
   Compiler --> EvalProg
 
   Compiler --> DomainProg
 
   EvalProg --> ProgramSet
 
   DomainProg --> ProgramSet

Sources: llkv-table/src/planner/mod.rs:625-629 llkv-table/src/planner/program.rs

Execution Phase and Optimization Paths

The TableExecutor::execute method attempts multiple optimization paths before falling back to the general scan:

Fast Path: Single Column Direct Scan

When the scan requests a single column with a simple predicate, the executor uses try_single_column_direct_scan to stream data directly from the column store without materializing row IDs.

Conditions for single column fast path:

Exactly one column projection
No computed projections
Simple predicate structure (optional)
Compatible data types

This path bypasses row ID collection and gather operations, streaming column chunks directly to the caller.

Sources: llkv-table/src/planner/mod.rs:1021-1031 llkv-table/src/planner/mod.rs:1157-1343

Fast Path: Full Table Scan Streaming

For queries without ordering requirements, try_stream_full_table_scan uses incremental row ID streaming to avoid buffering all row IDs in memory:

graph TB
    Start["try_stream_full_table_scan"]
CheckOrder["Check options.order\n(must be None)"]
StreamRIDs["stream_table_row_ids\n(chunk_size batches)"]
Shadow["Try shadow column\nscan (fast)"]
Fallback["Multi-column scan\n(fallback)"]
ProcessChunk["Process chunk:\n1. Apply row_id_filter\n2. Build RowStream\n3. Emit batches"]
Batch["RecordBatch\n(via on_batch)"]
Start --> CheckOrder
 
   CheckOrder -->|order is Some| Return["Return Fallback"]
CheckOrder -->|order is None| StreamRIDs
    
 
   StreamRIDs --> Shadow
 
   Shadow -->|Success| ProcessChunk
 
   Shadow -->|Not found| Fallback
 
   Fallback --> ProcessChunk
    
 
   ProcessChunk --> Batch
 
   Batch -->|Next chunk| StreamRIDs

Sources: llkv-table/src/planner/mod.rs:904-999 llkv-table/src/planner/mod.rs:859-902

Row ID Collection Optimization

Row ID collection uses a two-tier strategy:

Fast path (shadow column) : Scan the dedicated row_id shadow column which contains all row IDs for the table
Fallback (multi-column scan) : Scan user columns and deduplicate row IDs when shadow column is unavailable

The shadow column optimization is significantly faster because:

Single column scan instead of multiple
No deduplication required
Direct row ID format

Sources: llkv-table/src/planner/mod.rs:748-857

General Scan Execution

When fast paths don't apply, the executor uses the general scan path:

Projection analysis : Classify projections as column references or computed expressions
Field collection : Build unique field lists and numeric field maps
Row ID collection : Gather all relevant row IDs (using optimizations above)
Row ID filtering : Apply predicate programs to filter row IDs
Gather and stream : Use RowStreamBuilder to materialize columns and emit batches

General Scan Pipeline

graph TB
    Execute["TableExecutor::execute"]
Analyze["Analyze projections:\n- Column refs\n- Computed exprs\n- Build unique_lfids"]
CollectRIDs["table_row_ids\n(with caching)"]
FilterRIDs["Filter row IDs:\n- collect_row_ids_for_rowid_filter\n- Apply EvalProgram\n- Apply DomainProgram"]
Order["Apply ordering\n(if options.order present)"]
Gather["RowStreamBuilder:\n- prepare_gather_context\n- stream chunks\n- evaluate computed exprs"]
Emit["Emit RecordBatch"]
Execute --> Analyze
 
   Analyze --> CollectRIDs
 
   CollectRIDs --> FilterRIDs
 
   FilterRIDs --> Order
 
   Order --> Gather
 
   Gather --> Emit

Sources: llkv-table/src/planner/mod.rs:1009-1343 llkv-table/src/planner/mod.rs:1345-1710

Predicate Optimization

Normalization

The normalize_predicate function applies logical transformations to simplify filter expressions:

De Morgan's laws : NOT (a AND b) → (NOT a) OR (NOT b)
Flatten nested operators : AND[AND[a,b],c] → AND[a,b,c]
Constant folding : AND[true, x] → x
Double negation elimination : NOT (NOT x) → x

These transformations expose optimization opportunities and simplify compilation.

Sources: llkv-table/src/planner/program.rs

Predicate Fusion

The PredicateFusionCache analyzes predicates to determine when multiple conditions on the same field should be fused:

Data Type	Fusion Criteria
String types	`contains` count ≥ 1 AND total predicates ≥ 2
Other types	Total predicates ≥ 2

graph TB
    Expr["Filter Expression"]
Cache["PredicateFusionCache"]
Traverse["Traverse expression tree"]
Stats["Per-field stats:\n- total predicates\n- contains predicates"]
Decision["should_fuse(field, dtype)"]
Fuse["Fuse predicates into\nsingle evaluation"]
Separate["Keep predicates separate"]
Expr --> Cache
 
   Cache --> Traverse
 
   Traverse --> Stats
 
   Stats --> Decision
    
 
   Decision -->|Meets criteria| Fuse
 
   Decision -->|Below threshold| Separate

Fusion enables single-pass evaluation rather than multiple column scans for the same field.

Sources: llkv-table/src/planner/mod.rs:517-570

Expression Evaluation

Numeric Kernels

The NumericKernels system in llkv-table/src/scalar_eval.rs provides vectorized evaluation for scalar expressions:

Kernel Operation	Description
`collect_fields`	Extract all field references from expression
`prepare_numeric_arrays`	Cast columns to unified numeric representation
`evaluate_value`	Row-by-row scalar evaluation
`evaluate_batch`	Vectorized batch evaluation
`simplify`	Detect affine expressions for optimization

Numeric Type Hierarchy

Sources: llkv-table/src/scalar_eval.rs:1-90 llkv-table/src/scalar_eval.rs:451-712

Vectorized Evaluation

When possible, expressions are evaluated using vectorized paths:

Column access : Direct array reference (zero-copy)
Literals : Broadcast scalar to array length
Binary operations : Arrow compute kernels for array-array or array-scalar operations
Affine expressions : Specialized scale * field + offset fast path

The try_evaluate_vectorized method attempts vectorization before falling back to row-by-row evaluation.

Sources: llkv-table/src/scalar_eval.rs:714-762

graph LR
    Expr["ScalarExpr"]
Detect["NumericKernels::simplify"]
Check["is_affine_column_expr"]
Affine["AffineExpr:\nfield, scale, offset"]
Direct["Direct column reference"]
Complex["Complex expression"]
Expr --> Detect
 
   Detect --> Check
    
 
   Check -->|Matches pattern| Affine
 
   Check -->|Single column| Direct
 
   Check -->|Other| Complex

Affine Expression Optimization

Expressions matching the pattern scale * field + offset are detected and optimized:

Affine expressions enable:

Single column scan with arithmetic applied
Reduced memory allocation
Better cache locality

Sources: llkv-table/src/scalar_eval.rs:1038-1174 llkv-table/src/planner/mod.rs:1711-1872

graph TB
    Builder["RowStreamBuilder::new"]
Config["Configuration:\n- store\n- table_id\n- schema\n- unique_lfids\n- projection_evals\n- row_ids\n- batch_size"]
GatherCtx["prepare_gather_context\n(optional reuse)"]
Build["build()"]
Stream["RowStream"]
NextChunk["next_chunk()"]
Gather["Gather columns\nfor batch_size rows"]
Evaluate["Evaluate computed\nprojections"]
Batch["StreamChunk\n(arrays + schema)"]
Builder --> Config
 
   Config --> GatherCtx
 
   GatherCtx --> Build
 
   Build --> Stream
    
 
   Stream --> NextChunk
 
   NextChunk --> Gather
 
   Gather --> Evaluate
 
   Evaluate --> Batch
 
   Batch -->|More rows| NextChunk

Streaming Architecture

Row Stream Builder

The RowStreamBuilder constructs streaming result iterators with configurable batch sizes:

The stream uses STREAM_BATCH_ROWS (default 1024) as the chunk size for incremental result production.

Sources: llkv-table/src/stream.rs llkv-table/src/constants.rs:1-7

Gather Context Reuse

MultiGatherContext enables amortization of setup costs across multiple scans:

Caches physical key lookups
Reuses internal buffers
Reduces allocations in streaming scenarios

The context is optional but improves performance for repeated scans of the same columns.

Sources: llkv-column-map/src/store/scan.rs

Performance Characteristics

Scan Type	Row ID Collection	Column Access	Memory Usage
Single column direct	None (streams directly)	Direct column chunks	O(chunk_size)
Full table streaming	Shadow column (fast)	Incremental gather	O(batch_size × columns)
Filtered scan	Shadow or multi-column	Full gather	O(row_count × columns)
Ordered scan	Shadow or multi-column	Full gather + sort	O(row_count × columns)

The executor prioritizes fast paths that minimize memory usage and avoid full table materialization when possible.

Sources: llkv-table/src/planner/mod.rs:748-999 llkv-table/README.md:1-57

Keyboard shortcuts

rust-llkv Documentation