This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Scan Execution and Optimization
Relevant source files
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-csv/README.md
- llkv-expr/README.md
- llkv-join/README.md
- llkv-runtime/README.md
- llkv-storage/README.md
- llkv-table/README.md
- llkv-table/src/planner/mod.rs
- llkv-table/src/scalar_eval.rs
Purpose and Scope
This page documents the table scan execution engine in the llkv-table crate, which implements the low-level scanning and streaming of data from the column store to higher layers. It covers planning, optimization paths, predicate compilation, and expression evaluation strategies that enable efficient data retrieval.
For information about higher-level query planning and the translation of SQL plans to table operations, see TablePlanner and TableExecutor. For details on how filters are evaluated against individual rows, see Filter Evaluation.
Architecture Overview
The scan execution system is split into two primary components:
| Component | Responsibility |
|---|---|
TablePlanner | Analyzes scan requests, builds plan graphs, compiles predicates into bytecode programs |
TableExecutor | Executes planned scans using optimization paths, coordinates streaming results |
Scan Execution Flow
Sources: llkv-table/src/planner/mod.rs:591-637
Planning Phase
Plan Construction
The TablePlanner::plan_scan method orchestrates plan construction by:
- Validating projections are non-empty
- Normalizing the filter predicate
- Building a plan graph for visualization and analysis
- Compiling predicates into executable programs
graph LR
Input["scan_stream_with_exprs\n(projections, filter, options)"]
Validate["Validate projections"]
Normalize["normalize_predicate\n(flatten And/Or,\napply De Morgan's)"]
Graph["build_plan_graph\n(TableScan → Filter →\nProject → Output)"]
Compile["ProgramCompiler\n(EvalProgram +\nDomainProgram)"]
Output["PlannedScan"]
Input --> Validate
Validate --> Normalize
Validate --> Graph
Normalize --> Compile
Compile --> Output
Graph --> Output
The plan graph encodes the logical operator tree for diagnostic purposes, with nodes representing TableScan, Filter, Project, and Output operators.
Sources: llkv-table/src/planner/mod.rs:612-637 llkv-table/src/planner/mod.rs:639-725
Predicate Compilation
Predicates are compiled into two bytecode programs:
| Program Type | Purpose |
|---|---|
EvalProgram | Stack-based bytecode for evaluating filter conditions |
DomainProgram | Tracks which row IDs satisfy the predicate during scanning |
The ProgramCompiler analyzes the normalized predicate tree and emits instructions for efficient evaluation. Predicate fusion merges multiple predicates on the same field when beneficial.
graph TB
Filter["Expr<FieldId>\n(normalized predicate)"]
Fusion["PredicateFusionCache\n(analyze per-field stats)"]
Compiler["ProgramCompiler::compile"]
EvalProg["EvalProgram\n(stack-based bytecode)"]
DomainProg["DomainProgram\n(row ID tracking)"]
ProgramSet["ProgramSet\n(contains both programs)"]
Filter --> Fusion
Filter --> Compiler
Fusion --> Compiler
Compiler --> EvalProg
Compiler --> DomainProg
EvalProg --> ProgramSet
DomainProg --> ProgramSet
Sources: llkv-table/src/planner/mod.rs:625-629 llkv-table/src/planner/program.rs
Execution Phase and Optimization Paths
The TableExecutor::execute method attempts multiple optimization paths before falling back to the general scan:
Fast Path: Single Column Direct Scan
When the scan requests a single column with a simple predicate, the executor uses try_single_column_direct_scan to stream data directly from the column store without materializing row IDs.
Conditions for single column fast path:
- Exactly one column projection
- No computed projections
- Simple predicate structure (optional)
- Compatible data types
This path bypasses row ID collection and gather operations, streaming column chunks directly to the caller.
Sources: llkv-table/src/planner/mod.rs:1021-1031 llkv-table/src/planner/mod.rs:1157-1343
Fast Path: Full Table Scan Streaming
For queries without ordering requirements, try_stream_full_table_scan uses incremental row ID streaming to avoid buffering all row IDs in memory:
graph TB
Start["try_stream_full_table_scan"]
CheckOrder["Check options.order\n(must be None)"]
StreamRIDs["stream_table_row_ids\n(chunk_size batches)"]
Shadow["Try shadow column\nscan (fast)"]
Fallback["Multi-column scan\n(fallback)"]
ProcessChunk["Process chunk:\n1. Apply row_id_filter\n2. Build RowStream\n3. Emit batches"]
Batch["RecordBatch\n(via on_batch)"]
Start --> CheckOrder
CheckOrder -->|order is Some| Return["Return Fallback"]
CheckOrder -->|order is None| StreamRIDs
StreamRIDs --> Shadow
Shadow -->|Success| ProcessChunk
Shadow -->|Not found| Fallback
Fallback --> ProcessChunk
ProcessChunk --> Batch
Batch -->|Next chunk| StreamRIDs
Sources: llkv-table/src/planner/mod.rs:904-999 llkv-table/src/planner/mod.rs:859-902
Row ID Collection Optimization
Row ID collection uses a two-tier strategy:
- Fast path (shadow column) : Scan the dedicated
row_idshadow column which contains all row IDs for the table - Fallback (multi-column scan) : Scan user columns and deduplicate row IDs when shadow column is unavailable
The shadow column optimization is significantly faster because:
- Single column scan instead of multiple
- No deduplication required
- Direct row ID format
Sources: llkv-table/src/planner/mod.rs:748-857
General Scan Execution
When fast paths don't apply, the executor uses the general scan path:
- Projection analysis : Classify projections as column references or computed expressions
- Field collection : Build unique field lists and numeric field maps
- Row ID collection : Gather all relevant row IDs (using optimizations above)
- Row ID filtering : Apply predicate programs to filter row IDs
- Gather and stream : Use
RowStreamBuilderto materialize columns and emit batches
General Scan Pipeline
graph TB
Execute["TableExecutor::execute"]
Analyze["Analyze projections:\n- Column refs\n- Computed exprs\n- Build unique_lfids"]
CollectRIDs["table_row_ids\n(with caching)"]
FilterRIDs["Filter row IDs:\n- collect_row_ids_for_rowid_filter\n- Apply EvalProgram\n- Apply DomainProgram"]
Order["Apply ordering\n(if options.order present)"]
Gather["RowStreamBuilder:\n- prepare_gather_context\n- stream chunks\n- evaluate computed exprs"]
Emit["Emit RecordBatch"]
Execute --> Analyze
Analyze --> CollectRIDs
CollectRIDs --> FilterRIDs
FilterRIDs --> Order
Order --> Gather
Gather --> Emit
Sources: llkv-table/src/planner/mod.rs:1009-1343 llkv-table/src/planner/mod.rs:1345-1710
Predicate Optimization
Normalization
The normalize_predicate function applies logical transformations to simplify filter expressions:
- De Morgan's laws :
NOT (a AND b)→(NOT a) OR (NOT b) - Flatten nested operators :
AND[AND[a,b],c]→AND[a,b,c] - Constant folding :
AND[true, x]→x - Double negation elimination :
NOT (NOT x)→x
These transformations expose optimization opportunities and simplify compilation.
Sources: llkv-table/src/planner/program.rs
Predicate Fusion
The PredicateFusionCache analyzes predicates to determine when multiple conditions on the same field should be fused:
| Data Type | Fusion Criteria |
|---|---|
| String types | contains count ≥ 1 AND total predicates ≥ 2 |
| Other types | Total predicates ≥ 2 |
graph TB
Expr["Filter Expression"]
Cache["PredicateFusionCache"]
Traverse["Traverse expression tree"]
Stats["Per-field stats:\n- total predicates\n- contains predicates"]
Decision["should_fuse(field, dtype)"]
Fuse["Fuse predicates into\nsingle evaluation"]
Separate["Keep predicates separate"]
Expr --> Cache
Cache --> Traverse
Traverse --> Stats
Stats --> Decision
Decision -->|Meets criteria| Fuse
Decision -->|Below threshold| Separate
Fusion enables single-pass evaluation rather than multiple column scans for the same field.
Sources: llkv-table/src/planner/mod.rs:517-570
Expression Evaluation
Numeric Kernels
The NumericKernels system in llkv-table/src/scalar_eval.rs provides vectorized evaluation for scalar expressions:
| Kernel Operation | Description |
|---|---|
collect_fields | Extract all field references from expression |
prepare_numeric_arrays | Cast columns to unified numeric representation |
evaluate_value | Row-by-row scalar evaluation |
evaluate_batch | Vectorized batch evaluation |
simplify | Detect affine expressions for optimization |
Numeric Type Hierarchy
Sources: llkv-table/src/scalar_eval.rs:1-90 llkv-table/src/scalar_eval.rs:451-712
Vectorized Evaluation
When possible, expressions are evaluated using vectorized paths:
- Column access : Direct array reference (zero-copy)
- Literals : Broadcast scalar to array length
- Binary operations : Arrow compute kernels for array-array or array-scalar operations
- Affine expressions : Specialized
scale * field + offsetfast path
The try_evaluate_vectorized method attempts vectorization before falling back to row-by-row evaluation.
Sources: llkv-table/src/scalar_eval.rs:714-762
graph LR
Expr["ScalarExpr"]
Detect["NumericKernels::simplify"]
Check["is_affine_column_expr"]
Affine["AffineExpr:\nfield, scale, offset"]
Direct["Direct column reference"]
Complex["Complex expression"]
Expr --> Detect
Detect --> Check
Check -->|Matches pattern| Affine
Check -->|Single column| Direct
Check -->|Other| Complex
Affine Expression Optimization
Expressions matching the pattern scale * field + offset are detected and optimized:
Affine expressions enable:
- Single column scan with arithmetic applied
- Reduced memory allocation
- Better cache locality
Sources: llkv-table/src/scalar_eval.rs:1038-1174 llkv-table/src/planner/mod.rs:1711-1872
graph TB
Builder["RowStreamBuilder::new"]
Config["Configuration:\n- store\n- table_id\n- schema\n- unique_lfids\n- projection_evals\n- row_ids\n- batch_size"]
GatherCtx["prepare_gather_context\n(optional reuse)"]
Build["build()"]
Stream["RowStream"]
NextChunk["next_chunk()"]
Gather["Gather columns\nfor batch_size rows"]
Evaluate["Evaluate computed\nprojections"]
Batch["StreamChunk\n(arrays + schema)"]
Builder --> Config
Config --> GatherCtx
GatherCtx --> Build
Build --> Stream
Stream --> NextChunk
NextChunk --> Gather
Gather --> Evaluate
Evaluate --> Batch
Batch -->|More rows| NextChunk
Streaming Architecture
Row Stream Builder
The RowStreamBuilder constructs streaming result iterators with configurable batch sizes:
The stream uses STREAM_BATCH_ROWS (default 1024) as the chunk size for incremental result production.
Sources: llkv-table/src/stream.rs llkv-table/src/constants.rs:1-7
Gather Context Reuse
MultiGatherContext enables amortization of setup costs across multiple scans:
- Caches physical key lookups
- Reuses internal buffers
- Reduces allocations in streaming scenarios
The context is optional but improves performance for repeated scans of the same columns.
Sources: llkv-column-map/src/store/scan.rs
Performance Characteristics
| Scan Type | Row ID Collection | Column Access | Memory Usage |
|---|---|---|---|
| Single column direct | None (streams directly) | Direct column chunks | O(chunk_size) |
| Full table streaming | Shadow column (fast) | Incremental gather | O(batch_size × columns) |
| Filtered scan | Shadow or multi-column | Full gather | O(row_count × columns) |
| Ordered scan | Shadow or multi-column | Full gather + sort | O(row_count × columns) |
The executor prioritizes fast paths that minimize memory usage and avoid full table materialization when possible.
Sources: llkv-table/src/planner/mod.rs:748-999 llkv-table/README.md:1-57