Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Query Execution

Relevant source files

Purpose and Scope

Query execution is the process of converting logical query plans into physical result sets by coordinating table scans, expression evaluation, aggregation, joins, and result streaming. This page documents the execution engine's architecture, core components, and high-level execution flow.

For details on table-level planning and execution, see TablePlanner and TableExecutor. For scan optimization strategies, see Scan Execution and Optimization. For predicate evaluation mechanics, see Filter Evaluation.


System Architecture

Query execution spans two primary crates:

CrateResponsibilityKey Types
llkv-executorOrchestrates multi-table queries, aggregates, and result formattingQueryExecutor, SelectExecution
llkv-tableExecutes table scans with predicates and projectionsTablePlanner, TableExecutor

The executor operates on logical plans produced by llkv-plan and delegates to llkv-table for single-table operations, llkv-join for join algorithms, and llkv-aggregate for aggregate computations.

Execution Architecture

graph TB
    subgraph "Plan Layer"
        PLAN["SelectPlan\n(llkv-plan)"]
end
    
    subgraph "Execution Orchestration (llkv-executor)"
        QE["QueryExecutor<P>"]
EXEC["SelectExecution<P>"]
STRAT["Strategy Selection:\nprojection, aggregate,\njoin, compound"]
end
    
    subgraph "Table Execution (llkv-table)"
        TP["TablePlanner"]
TE["TableExecutor"]
PS["PlannedScan"]
end
    
    subgraph "Specialized Operations"
        AGG["llkv-aggregate\nAccumulator"]
JOIN["llkv-join\nhash_join, cross_join"]
EVAL["NumericKernels\nscalar evaluation"]
end
    
    subgraph "Storage"
        TABLE["Table<P>"]
STORE["ColumnStore"]
end
    
 
   PLAN --> QE
 
   QE --> STRAT
 
   STRAT -->|single table| TP
 
   STRAT -->|aggregates| AGG
 
   STRAT -->|joins| JOIN
    
 
   TP --> PS
 
   PS --> TE
 
   TE --> TABLE
 
   TABLE --> STORE
    
 
   STRAT --> EVAL
 
   EVAL --> TABLE
    
 
   QE --> EXEC
 
   TE --> EXEC
 
   AGG --> EXEC
 
   JOIN --> EXEC

Sources: llkv-executor/src/lib.rs:507-521 llkv-table/src/planner/mod.rs:580-726


Core Components

QueryExecutor

QueryExecutor<P> is the top-level execution coordinator in llkv-executor. It consumes SelectPlan structures and produces SelectExecution result containers.

Key Responsibilities:

  • Strategy selection based on plan characteristics (single table, joins, aggregates, compound operations)
  • Multi-table query orchestration (cross products, hash joins)
  • Aggregate computation coordination
  • Subquery evaluation (correlated EXISTS, scalar subqueries)
  • Result streaming and batching

Entry points:

Sources: llkv-executor/src/lib.rs:507-521

SelectExecution

SelectExecution<P> encapsulates query results and provides streaming access via batched iteration. Results may be materialized upfront or generated lazily depending on the execution strategy.

Streaming Interface:

  • stream<F>(on_batch: F) - Process results batch-by-batch
  • into_rows() - Materialize all rows into memory (for sorting, deduplication)
  • schema() - Access result schema

Sources: llkv-executor/src/lib.rs:2500-2700 (approximate location based on file structure)

TablePlanner and TableExecutor

The table-level execution layer handles single-table scans with predicates and projections. TablePlanner analyzes the request and produces a PlannedScan, which TableExecutor then executes.

Planning Process:

  1. Validate projections against schema
  2. Normalize filter predicates (apply De Morgan's laws, flatten boolean operators)
  3. Compile predicates into EvalProgram and DomainProgram bytecode
  4. Build PlanGraph metadata for tracing

Execution Process:

  1. Try optimized fast paths (single column scans, full table scans)
  2. Fall back to general execution with expression evaluation
  3. Stream results in batches

These components are detailed in TablePlanner and TableExecutor.

Sources: llkv-table/src/planner/mod.rs:580-637 llkv-table/src/planner/mod.rs:728-1007


Execution Flow

Top-Level SELECT Execution Sequence

Sources: llkv-executor/src/lib.rs:523-569 llkv-table/src/planner/mod.rs:595-607 llkv-table/src/planner/mod.rs:1009-1400


graph TD
 
   START["SelectPlan"] --> COMPOUND{compound?}
COMPOUND -->|yes| EXEC_COMPOUND["execute_compound_select\nUNION/EXCEPT/INTERSECT"]
COMPOUND -->|no| FROM{tables.is_empty?}
FROM -->|yes| EXEC_CONST["execute_select_without_table\nEvaluate constant expressions"]
FROM -->|no| MULTI{tables.len > 1?}
MULTI -->|yes| EXEC_CROSS["execute_cross_product\nor hash_join optimization"]
MULTI -->|no| GROUPBY{group_by.is_empty?}
GROUPBY -->|no| EXEC_GROUP["execute_group_by_single_table\nGroup rows + compute aggregates"]
GROUPBY -->|yes| AGG{aggregates.is_empty?}
AGG -->|no| EXEC_AGG["execute_aggregates\nCollect all rows + compute"]
AGG -->|yes| COMPUTED{has_computed_aggregates?}
COMPUTED -->|yes| EXEC_COMP_AGG["execute_computed_aggregates\nEmbedded agg in expressions"]
COMPUTED -->|no| EXEC_PROJ["execute_projection\nStream scan with projections"]
EXEC_COMPOUND --> RESULT["SelectExecution"]
EXEC_CONST --> RESULT
 
   EXEC_CROSS --> RESULT
 
   EXEC_GROUP --> RESULT
 
   EXEC_AGG --> RESULT
 
   EXEC_COMP_AGG --> RESULT
 
   EXEC_PROJ --> RESULT

Execution Strategies

QueryExecutor selects an execution strategy based on plan characteristics:

Strategy Decision Tree

Sources: llkv-executor/src/lib.rs:527-569

Strategy Implementations

StrategyMethodWhen AppliedKey Operations
Constant Evaluationexecute_select_without_tableNo FROM clauseEvaluate literals, struct constructors
Simple Projectionexecute_projectionSingle table, no aggregatesStream scan with filter + projections
Aggregationexecute_aggregatesHas aggregates, no GROUP BYCollect all rows, compute aggregates, emit single row
Grouped Aggregationexecute_group_by_single_tableHas GROUP BYHash rows by key, compute per-group aggregates
Computed Aggregatesexecute_computed_aggregatesAggregates embedded in computed projectionsExtract aggregate expressions, evaluate separately
Cross Productexecute_cross_productMultiple tablesCartesian product or hash join optimization
Compoundexecute_compound_selectUNION/EXCEPT/INTERSECTExecute components, apply set operations

Sources: llkv-executor/src/lib.rs:926-975 llkv-executor/src/lib.rs:1700-2100 llkv-executor/src/lib.rs:2200-2400 llkv-executor/src/lib.rs:1057-1400 llkv-executor/src/lib.rs:590-701


Streaming Execution Model

LLKV executes queries in a streaming fashion to avoid materializing large intermediate results. Results flow through the system as RecordBatch chunks (typically 4096 rows).

Streaming Characteristics:

Execution TypeStreaming BehaviorMemory Characteristics
ProjectionFull streamingO(batch_size) memory
FilterFull streamingO(batch_size) memory
AggregatesRequires full materializationO(input_rows) memory
GROUP BYRequires full materializationO(group_count) memory
ORDER BYRequires full materializationO(input_rows) memory
DISTINCTRequires full materializationO(distinct_rows) memory
LIMITEarly terminationO(limit × batch_size) memory

Streaming Projection Example Flow:

Sources: llkv-table/src/planner/mod.rs:1009-1400 llkv-table/src/constants.rs:1-10 (defines STREAM_BATCH_ROWS = 4096)

Materialization Points

Certain operations require collecting all rows before producing output:

  1. Sorting - Must see all rows to determine order llkv-executor/src/lib.rs:2800-2900
  2. Deduplication (DISTINCT) - Must track all seen rows llkv-executor/src/lib.rs:2950-3050
  3. Aggregation - Must accumulate state across all rows llkv-executor/src/lib.rs:1700-1900
  4. Set Operations - Must materialize both sides for comparison llkv-executor/src/lib.rs:590-701

These operations call into_rows() on SelectExecution to materialize results as Vec<CanonicalRow>.

Sources: llkv-executor/src/lib.rs:2600-2700


Expression Evaluation

Query execution evaluates two types of expressions:

Predicate Evaluation (Filtering)

Predicates are compiled to bytecode and evaluated during table scans:

  1. Normalization - Apply De Morgan's laws, flatten AND/OR llkv-table/src/planner/program.rs:50-150
  2. Compilation - Convert to EvalProgram (stack-based) and DomainProgram (row tracking) llkv-table/src/planner/program.rs:200-400
  3. Vectorized Evaluation - Process chunks of rows efficiently llkv-table/src/planner/mod.rs:1100-1300

See Filter Evaluation for detailed mechanics.

Scalar Expression Evaluation (Projections)

Computed projections are evaluated row-by-row or vectorized when possible:

  1. Translation - Convert ScalarExpr<String> to ScalarExpr<FieldId> llkv-executor/src/translation/scalar.rs:1-200
  2. Type Inference - Determine output data type llkv-executor/src/translation/schema.rs:50-150
  3. Evaluation - Use NumericKernels for numeric operations llkv-table/src/scalar_eval.rs:450-685

Vectorized vs Row-by-Row:

Sources: llkv-table/src/scalar_eval.rs:675-712 llkv-table/src/scalar_eval.rs:549-673


Integration with Runtime

The execution layer coordinates with llkv-runtime for transaction and catalog management:

Runtime Integration Points:

OperationRuntime ResponsibilityExecutor Responsibility
Table LookupCatalogManager::table()ExecutorTableProvider::get_table()
MVCC FilteringProvide RowIdFilter with snapshotApply filter during scan
Transaction StateTrack transaction ID, commit watermarkInclude created_by/deleted_by in scans
Schema ResolutionMaintain system catalogTranslate column names to FieldId

The ExecutorTableProvider trait abstracts runtime integration, allowing executor to remain runtime-agnostic.

Sources: llkv-executor/src/types.rs:100-200 llkv-runtime/src/catalog/mod.rs:50-150


Performance Characteristics

Execution performance depends on query characteristics and chosen strategy:

Query PatternTypical PerformanceOptimization Opportunities
SELECT * FROM t~1M rows/secFast path: shadow column scan llkv-table/src/planner/mod.rs:765-821
SELECT col FROM t WHERE pred~500K rows/secPredicate fusion llkv-table/src/planner/mod.rs:518-570
Single-table aggregatesFull table scanColumn-only projections for aggregate inputs
Hash join (2 tables)O(n + m) with O(n) memorySmaller table as build side llkv-executor/src/lib.rs:1500-1700
Cross product (n tables)O(∏ row_counts)Avoid if possible; rewrite to joins

Sources: llkv-table/src/planner/mod.rs:738-856 llkv-executor/src/lib.rs:1082-1400