This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
TablePlanner and TableExecutor
Loading…
TablePlanner and TableExecutor
Relevant source files
- llkv-csv/src/writer.rs
- llkv-table/examples/direct_comparison.rs
- llkv-table/examples/performance_benchmark.rs
- llkv-table/examples/test_streaming.rs
- llkv-table/src/lib.rs
- llkv-table/src/table.rs
Purpose and Scope
This page documents the table-level query execution interface provided by the Table struct and its optimization strategies. The table layer acts as a bridge between high-level query plans (covered in Query Planning) and low-level columnar storage operations (covered in Column Storage and ColumnStore). It orchestrates scan execution, manages field ID namespacing, handles MVCC visibility, and selects execution strategies based on query characteristics.
For details about the lower-level scan execution mechanics and streaming strategies, see Scan Execution and Optimization. For filter expression evaluation, see Filter Evaluation.
Architecture Overview
The Table struct serves as the primary interface for executing queries against table data. Rather than implementing query execution logic directly, it orchestrates lower-level components and applies table-specific concerns like schema management, MVCC column injection, and field ID translation.
graph TB
subgraph "Table Layer"
API["Table API\nscan_stream, filter_row_ids"]
SCHEMA["Schema Management\nField ID → LogicalFieldId"]
MVCC["MVCC Integration\ncreated_by, deleted_by"]
OPTIONS["Execution Options\nScanStreamOptions"]
end
subgraph "Scan Execution Layer"
EXECUTE["execute_scan()\nllkv-scan"]
STREAM["RowStreamBuilder\nBatch streaming"]
FILTER["Filter Evaluation\nPredicate matching"]
end
subgraph "Storage Layer"
STORE["ColumnStore\nPhysical storage"]
GATHER["Column Gathering\nRecordBatch assembly"]
end
API --> SCHEMA
API --> MVCC
API --> OPTIONS
SCHEMA --> EXECUTE
MVCC --> EXECUTE
OPTIONS --> EXECUTE
EXECUTE --> STREAM
STREAM --> FILTER
FILTER --> GATHER
GATHER --> STORE
STORE -.results.-> GATHER
GATHER -.batches.-> STREAM
STREAM -.batches.-> API
Table Layer Responsibilities
Sources: llkv-table/src/table.rs:60-69 llkv-table/src/table.rs:447-488
Table Struct and Core Methods
The Table<P> struct wraps a ColumnStore with table-specific metadata and caching:
| Component | Type | Purpose |
|---|---|---|
store | Arc<ColumnStore<P>> | Physical columnar storage |
table_id | TableId | Unique table identifier |
mvcc_cache | RwLock<Option<MvccColumnCache>> | Cached MVCC column presence |
Primary execution methods:
scan_stream: Main scan interface accepting flexible projection typesscan_stream_with_exprs: Direct execution with resolvedScanProjectionexpressionsfilter_row_ids: Collect row IDs matching a filter predicatestream_columns: Stream raw column data without expression evaluation
Sources: llkv-table/src/table.rs:60-69 llkv-table/src/table.rs:447-462 llkv-table/src/table.rs:469-488 llkv-table/src/table.rs:490-496 llkv-table/src/table.rs:589-645
Scan Execution Flow
High-Level Execution Pipeline
Sources: llkv-table/src/table.rs:447-488 llkv-table/examples/performance_benchmark.rs:1-211
Scan Configuration: ScanStreamOptions
ScanStreamOptions controls execution behavior:
| Field | Type | Purpose |
|---|---|---|
include_nulls | bool | Include rows where all projected columns are null |
order | Option<ScanOrderSpec> | Ordering specification (ASC/DESC) |
row_id_filter | Option<Arc<dyn RowIdFilter>> | Pre-filtered row ID set (e.g., MVCC visibility) |
include_row_ids | bool | Include row_id column in output |
ranges | Option<Vec<(Bound<u64>, Bound<u64>)>> | Row ID range restrictions |
driving_column | Option<LogicalFieldId> | Column to drive scan ordering |
Sources: llkv-table/src/table.rs:43-46
Projection Types: ScanProjection
Projections specify what data to retrieve:
Column(ColumnProjectionInfo): Direct column access with optional aliaslogical_field_id: Column to readdata_type: Expected Arrow data typeoutput_name: Column name in result
Expression(ProjectionEval): Computed expressions over columns- Arithmetic operations, functions, literals
- Evaluated during result assembly
Conversion: Projection (from llkv-column-map) can be converted to ScanProjection via Into trait.
Sources: llkv-table/src/table.rs:40-46
Optimization Strategies
Strategy Selection Logic
Sources: llkv-table/examples/performance_benchmark.rs:26-79 llkv-table/examples/test_streaming.rs:26-174
Fast Path: Direct Streaming
When conditions are met for the fast path, scan execution bypasses row ID collection and materializes chunks directly from ScanBuilder:
Eligibility criteria:
- Exactly one column projection
- Unbounded range filter (
Bound::Unboundedon both ends) - No null inclusion (
include_nulls = false) - No ordering requirements (
order = None) - No row ID filter (
row_id_filter = None)
Performance characteristics:
- 10-100x faster than standard path for large scans
- Zero-copy chunk streaming when possible
- Minimal memory overhead (streaming chunks, not full materialization)
Example usage:
Sources: llkv-table/examples/test_streaming.rs:65-110 llkv-table/examples/performance_benchmark.rs:145-151
Standard Path: Row ID Collection and Gathering
When fast path conditions are not met, execution uses a two-phase approach:
Phase 1: Row ID Collection
- Evaluate filter expression against the table
- Build a bitmap (
Treemap) of matching row IDs - Filter by MVCC visibility if
row_id_filteris provided - Apply range restrictions if
rangesis specified
Phase 2: Column Gathering
- Split row IDs into fixed-size windows
- For each window:
- Gather all projected columns
- Evaluate computed expressions
- Assemble into
RecordBatch - Stream batch to callback
Batch size: Controlled by STREAM_BATCH_ROWS constant (default varies by workload).
Sources: llkv-table/src/table.rs:490-496 llkv-table/src/table.rs:589-645
Performance Comparison: Layer Overhead
ColumnStore Direct Access vs Table Layer
Measured overhead (1M row single-column scan):
- Direct ColumnStore : ~5-10ms baseline
- Table Layer (fast path) : ~10-20ms (1.5-2x overhead)
- Table Layer (standard path) : ~50-100ms (5-10x overhead)
The fast path optimization significantly reduces the gap by bypassing row ID collection and expression evaluation when not needed.
Sources: llkv-table/examples/direct_comparison.rs:276-399
Row ID Filtering and Collection
RowIdScanCollector
Internal visitor for collecting row IDs that match filter predicates:
Implementation strategy:
- Implements
PrimitiveWithRowIdsVisitorandPrimitiveSortedWithRowIdsVisitor - Accumulates row IDs into a
Treemap(compressed bitmap) - Ignores actual column values (focus on row IDs only)
- Handles both chunked and sorted run formats
Key methods:
extend_from_array: Add row IDs from a chunkextend_from_slice: Add a slice of row IDsinto_inner: Extract the final bitmap
Sources: llkv-table/src/table.rs:805-858
RowIdChunkEmitter
Streaming alternative that emits row IDs in fixed-size chunks without materializing the full set:
Use case: Memory-efficient processing when full row ID set is not needed.
Features:
- Configurable chunk size
- Optional reverse ordering for sorted runs
- Error propagation via
errorfield - Zero allocation when chunks align with storage chunks
Methods:
extend_from_array: Process a chunk of row IDsextend_sorted_run: Handle sorted run with optional reversalflush: Emit current bufferfinish: Final flush and error check
Sources: llkv-table/src/table.rs:860-1017
Integration with MVCC and Transactions
The table layer handles MVCC visibility transparently:
MVCC Column Injection
When appending data, the Table::append method automatically injects MVCC columns if not present:
Columns added:
created_by(UInt64): Transaction ID that created the row (defaults to 1 for auto-commit)deleted_by(UInt64): Transaction ID that deleted the row (defaults to 0 for not deleted)
Field ID assignment:
- MVCC columns get reserved logical field IDs via
LogicalFieldId::for_mvcc_created_byandLogicalFieldId::for_mvcc_deleted_by - Separate from user field IDs to avoid collisions
Caching: The MvccColumnCache stores whether MVCC columns exist to avoid repeated schema inspections.
Sources: llkv-table/src/table.rs:50-56 llkv-table/src/table.rs:231-438
Transaction Visibility Filtering
Query execution can filter by transaction visibility using the row_id_filter option:
The filter is applied during row ID collection phase, ensuring only visible rows are included in results.
Sources: llkv-table/src/table.rs:43-46
graph LR
USER["User Field ID\n(FieldId: u32)"]
META["Field Metadata\n'field_id' key"]
TABLE["Table ID\n(TableId: u32)"]
LOGICAL["LogicalFieldId\n(u64)"]
USER --> COMPOSE
TABLE --> COMPOSE
COMPOSE["LogicalFieldId::for_user()"] --> LOGICAL
META -.annotates.-> USER
Field ID Translation and Namespacing
The table layer translates user-visible field IDs to logical field IDs that include table ID:
Translation Process
Why namespacing?
- Multiple tables can have the same user field IDs
ColumnStoreoperates onLogicalFieldIdto avoid collisions- Table layer encapsulates this translation
Key functions:
LogicalFieldId::for_user(table_id, field_id): Create user data fieldLogicalFieldId::for_mvcc_created_by(table_id): Create MVCC created_by fieldLogicalFieldId::for_mvcc_deleted_by(table_id): Create MVCC deleted_by field
Sources: llkv-table/src/table.rs:231-438 llkv-table/src/table.rs:589-645
Schema Management
Schema Construction
The Table::schema() method builds an Arrow schema from catalog metadata:
Process:
- Query
ColumnStorefor all logical fields belonging to this table - Sort fields by field ID for consistent ordering
- Retrieve column names from
SysCatalog - Build Arrow
Fieldwith data type and metadata - Add
row_idfield first, then user fields
Schema metadata: Each field includes field_id metadata for round-trip compatibility.
Sources: llkv-table/src/table.rs:519-549
Schema as RecordBatch
For display purposes, schema_recordbatch() formats the schema as a table:
| Column | Type | Contents |
|---|---|---|
name | Utf8 | Column name |
field_id | UInt32 | User field ID |
data_type | Utf8 | Arrow data type string |
Sources: llkv-table/src/table.rs:554-586
Example: CSV Export Integration
The CSV export writer demonstrates table-level query execution in practice:
Export Pipeline
Key steps:
- Convert
CsvExportColumntoScanProjectionwith field ID translation - Resolve aliases from catalog or use defaults
- Create unbounded filter for full table scan
- Stream batches directly to Arrow CSV writer
- No intermediate materialization
Sources: llkv-csv/src/writer.rs:139-268
Summary: Optimization Decision Matrix
| Query Characteristic | Fast Path | Standard Path |
|---|---|---|
| Single column projection | ✓ | ✓ |
| Multiple columns | ✗ | ✓ |
| Unbounded filter | ✓ | ✓ |
| Bounded/complex filter | ✗ | ✓ |
| Include nulls | ✗ | ✓ |
| Exclude nulls | ✓ | ✓ |
| No ordering | ✓ | ✓ |
| Ordered results | ✗ | ✓ |
| No MVCC filter | ✓ | ✓ |
| MVCC filtering | ✗ | ✓ |
| No range restrictions | ✓ | ✓ |
| Range restrictions | ✗ | ✓ |
Performance impact:
- Fast path: 1.5-2x slower than direct
ColumnStoreaccess - Standard path: 5-10x slower than direct
ColumnStoreaccess - Still significantly faster than alternatives due to columnar storage and zero-copy optimizations
Sources: llkv-table/examples/performance_benchmark.rs:1-211 llkv-table/examples/direct_comparison.rs:276-399
Dismiss
Refresh this wiki
Enter email to refresh