Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Scan Execution and Optimization

Relevant source files

Purpose and Scope

This page documents the table scan execution engine in the llkv-table crate, which implements the low-level scanning and streaming of data from the column store to higher layers. It covers planning, optimization paths, predicate compilation, and expression evaluation strategies that enable efficient data retrieval.

For information about higher-level query planning and the translation of SQL plans to table operations, see TablePlanner and TableExecutor. For details on how filters are evaluated against individual rows, see Filter Evaluation.

Architecture Overview

The scan execution system is split into two primary components:

ComponentResponsibility
TablePlannerAnalyzes scan requests, builds plan graphs, compiles predicates into bytecode programs
TableExecutorExecutes planned scans using optimization paths, coordinates streaming results

Scan Execution Flow

Sources: llkv-table/src/planner/mod.rs:591-637

Planning Phase

Plan Construction

The TablePlanner::plan_scan method orchestrates plan construction by:

  1. Validating projections are non-empty
  2. Normalizing the filter predicate
  3. Building a plan graph for visualization and analysis
  4. Compiling predicates into executable programs
graph LR
    Input["scan_stream_with_exprs\n(projections, filter, options)"]
Validate["Validate projections"]
Normalize["normalize_predicate\n(flatten And/Or,\napply De Morgan's)"]
Graph["build_plan_graph\n(TableScan → Filter →\nProject → Output)"]
Compile["ProgramCompiler\n(EvalProgram +\nDomainProgram)"]
Output["PlannedScan"]
Input --> Validate
 
   Validate --> Normalize
 
   Validate --> Graph
 
   Normalize --> Compile
 
   Compile --> Output
 
   Graph --> Output

The plan graph encodes the logical operator tree for diagnostic purposes, with nodes representing TableScan, Filter, Project, and Output operators.

Sources: llkv-table/src/planner/mod.rs:612-637 llkv-table/src/planner/mod.rs:639-725

Predicate Compilation

Predicates are compiled into two bytecode programs:

Program TypePurpose
EvalProgramStack-based bytecode for evaluating filter conditions
DomainProgramTracks which row IDs satisfy the predicate during scanning

The ProgramCompiler analyzes the normalized predicate tree and emits instructions for efficient evaluation. Predicate fusion merges multiple predicates on the same field when beneficial.

graph TB
    Filter["Expr<FieldId>\n(normalized predicate)"]
Fusion["PredicateFusionCache\n(analyze per-field stats)"]
Compiler["ProgramCompiler::compile"]
EvalProg["EvalProgram\n(stack-based bytecode)"]
DomainProg["DomainProgram\n(row ID tracking)"]
ProgramSet["ProgramSet\n(contains both programs)"]
Filter --> Fusion
 
   Filter --> Compiler
 
   Fusion --> Compiler
 
   Compiler --> EvalProg
 
   Compiler --> DomainProg
 
   EvalProg --> ProgramSet
 
   DomainProg --> ProgramSet

Sources: llkv-table/src/planner/mod.rs:625-629 llkv-table/src/planner/program.rs

Execution Phase and Optimization Paths

The TableExecutor::execute method attempts multiple optimization paths before falling back to the general scan:

Fast Path: Single Column Direct Scan

When the scan requests a single column with a simple predicate, the executor uses try_single_column_direct_scan to stream data directly from the column store without materializing row IDs.

Conditions for single column fast path:

  • Exactly one column projection
  • No computed projections
  • Simple predicate structure (optional)
  • Compatible data types

This path bypasses row ID collection and gather operations, streaming column chunks directly to the caller.

Sources: llkv-table/src/planner/mod.rs:1021-1031 llkv-table/src/planner/mod.rs:1157-1343

Fast Path: Full Table Scan Streaming

For queries without ordering requirements, try_stream_full_table_scan uses incremental row ID streaming to avoid buffering all row IDs in memory:

graph TB
    Start["try_stream_full_table_scan"]
CheckOrder["Check options.order\n(must be None)"]
StreamRIDs["stream_table_row_ids\n(chunk_size batches)"]
Shadow["Try shadow column\nscan (fast)"]
Fallback["Multi-column scan\n(fallback)"]
ProcessChunk["Process chunk:\n1. Apply row_id_filter\n2. Build RowStream\n3. Emit batches"]
Batch["RecordBatch\n(via on_batch)"]
Start --> CheckOrder
 
   CheckOrder -->|order is Some| Return["Return Fallback"]
CheckOrder -->|order is None| StreamRIDs
    
 
   StreamRIDs --> Shadow
 
   Shadow -->|Success| ProcessChunk
 
   Shadow -->|Not found| Fallback
 
   Fallback --> ProcessChunk
    
 
   ProcessChunk --> Batch
 
   Batch -->|Next chunk| StreamRIDs

Sources: llkv-table/src/planner/mod.rs:904-999 llkv-table/src/planner/mod.rs:859-902

Row ID Collection Optimization

Row ID collection uses a two-tier strategy:

  1. Fast path (shadow column) : Scan the dedicated row_id shadow column which contains all row IDs for the table
  2. Fallback (multi-column scan) : Scan user columns and deduplicate row IDs when shadow column is unavailable

The shadow column optimization is significantly faster because:

  • Single column scan instead of multiple
  • No deduplication required
  • Direct row ID format

Sources: llkv-table/src/planner/mod.rs:748-857

General Scan Execution

When fast paths don't apply, the executor uses the general scan path:

  1. Projection analysis : Classify projections as column references or computed expressions
  2. Field collection : Build unique field lists and numeric field maps
  3. Row ID collection : Gather all relevant row IDs (using optimizations above)
  4. Row ID filtering : Apply predicate programs to filter row IDs
  5. Gather and stream : Use RowStreamBuilder to materialize columns and emit batches

General Scan Pipeline

graph TB
    Execute["TableExecutor::execute"]
Analyze["Analyze projections:\n- Column refs\n- Computed exprs\n- Build unique_lfids"]
CollectRIDs["table_row_ids\n(with caching)"]
FilterRIDs["Filter row IDs:\n- collect_row_ids_for_rowid_filter\n- Apply EvalProgram\n- Apply DomainProgram"]
Order["Apply ordering\n(if options.order present)"]
Gather["RowStreamBuilder:\n- prepare_gather_context\n- stream chunks\n- evaluate computed exprs"]
Emit["Emit RecordBatch"]
Execute --> Analyze
 
   Analyze --> CollectRIDs
 
   CollectRIDs --> FilterRIDs
 
   FilterRIDs --> Order
 
   Order --> Gather
 
   Gather --> Emit

Sources: llkv-table/src/planner/mod.rs:1009-1343 llkv-table/src/planner/mod.rs:1345-1710

Predicate Optimization

Normalization

The normalize_predicate function applies logical transformations to simplify filter expressions:

  • De Morgan's laws : NOT (a AND b)(NOT a) OR (NOT b)
  • Flatten nested operators : AND[AND[a,b],c]AND[a,b,c]
  • Constant folding : AND[true, x]x
  • Double negation elimination : NOT (NOT x)x

These transformations expose optimization opportunities and simplify compilation.

Sources: llkv-table/src/planner/program.rs

Predicate Fusion

The PredicateFusionCache analyzes predicates to determine when multiple conditions on the same field should be fused:

Data TypeFusion Criteria
String typescontains count ≥ 1 AND total predicates ≥ 2
Other typesTotal predicates ≥ 2
graph TB
    Expr["Filter Expression"]
Cache["PredicateFusionCache"]
Traverse["Traverse expression tree"]
Stats["Per-field stats:\n- total predicates\n- contains predicates"]
Decision["should_fuse(field, dtype)"]
Fuse["Fuse predicates into\nsingle evaluation"]
Separate["Keep predicates separate"]
Expr --> Cache
 
   Cache --> Traverse
 
   Traverse --> Stats
 
   Stats --> Decision
    
 
   Decision -->|Meets criteria| Fuse
 
   Decision -->|Below threshold| Separate

Fusion enables single-pass evaluation rather than multiple column scans for the same field.

Sources: llkv-table/src/planner/mod.rs:517-570

Expression Evaluation

Numeric Kernels

The NumericKernels system in llkv-table/src/scalar_eval.rs provides vectorized evaluation for scalar expressions:

Kernel OperationDescription
collect_fieldsExtract all field references from expression
prepare_numeric_arraysCast columns to unified numeric representation
evaluate_valueRow-by-row scalar evaluation
evaluate_batchVectorized batch evaluation
simplifyDetect affine expressions for optimization

Numeric Type Hierarchy

Sources: llkv-table/src/scalar_eval.rs:1-90 llkv-table/src/scalar_eval.rs:451-712

Vectorized Evaluation

When possible, expressions are evaluated using vectorized paths:

  1. Column access : Direct array reference (zero-copy)
  2. Literals : Broadcast scalar to array length
  3. Binary operations : Arrow compute kernels for array-array or array-scalar operations
  4. Affine expressions : Specialized scale * field + offset fast path

The try_evaluate_vectorized method attempts vectorization before falling back to row-by-row evaluation.

Sources: llkv-table/src/scalar_eval.rs:714-762

graph LR
    Expr["ScalarExpr"]
Detect["NumericKernels::simplify"]
Check["is_affine_column_expr"]
Affine["AffineExpr:\nfield, scale, offset"]
Direct["Direct column reference"]
Complex["Complex expression"]
Expr --> Detect
 
   Detect --> Check
    
 
   Check -->|Matches pattern| Affine
 
   Check -->|Single column| Direct
 
   Check -->|Other| Complex

Affine Expression Optimization

Expressions matching the pattern scale * field + offset are detected and optimized:

Affine expressions enable:

  • Single column scan with arithmetic applied
  • Reduced memory allocation
  • Better cache locality

Sources: llkv-table/src/scalar_eval.rs:1038-1174 llkv-table/src/planner/mod.rs:1711-1872

graph TB
    Builder["RowStreamBuilder::new"]
Config["Configuration:\n- store\n- table_id\n- schema\n- unique_lfids\n- projection_evals\n- row_ids\n- batch_size"]
GatherCtx["prepare_gather_context\n(optional reuse)"]
Build["build()"]
Stream["RowStream"]
NextChunk["next_chunk()"]
Gather["Gather columns\nfor batch_size rows"]
Evaluate["Evaluate computed\nprojections"]
Batch["StreamChunk\n(arrays + schema)"]
Builder --> Config
 
   Config --> GatherCtx
 
   GatherCtx --> Build
 
   Build --> Stream
    
 
   Stream --> NextChunk
 
   NextChunk --> Gather
 
   Gather --> Evaluate
 
   Evaluate --> Batch
 
   Batch -->|More rows| NextChunk

Streaming Architecture

Row Stream Builder

The RowStreamBuilder constructs streaming result iterators with configurable batch sizes:

The stream uses STREAM_BATCH_ROWS (default 1024) as the chunk size for incremental result production.

Sources: llkv-table/src/stream.rs llkv-table/src/constants.rs:1-7

Gather Context Reuse

MultiGatherContext enables amortization of setup costs across multiple scans:

  • Caches physical key lookups
  • Reuses internal buffers
  • Reduces allocations in streaming scenarios

The context is optional but improves performance for repeated scans of the same columns.

Sources: llkv-column-map/src/store/scan.rs

Performance Characteristics

Scan TypeRow ID CollectionColumn AccessMemory Usage
Single column directNone (streams directly)Direct column chunksO(chunk_size)
Full table streamingShadow column (fast)Incremental gatherO(batch_size × columns)
Filtered scanShadow or multi-columnFull gatherO(row_count × columns)
Ordered scanShadow or multi-columnFull gather + sortO(row_count × columns)

The executor prioritizes fast paths that minimize memory usage and avoid full table materialization when possible.

Sources: llkv-table/src/planner/mod.rs:748-999 llkv-table/README.md:1-57