This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Table Abstraction
Loading…
Table Abstraction
Relevant source files
- llkv-csv/src/writer.rs
- llkv-table/examples/direct_comparison.rs
- llkv-table/examples/performance_benchmark.rs
- llkv-table/examples/test_streaming.rs
- llkv-table/src/lib.rs
- llkv-table/src/table.rs
The Table struct provides a schema-aware, transactional interface over the columnar storage layer. It sits between the SQL execution layer and the physical storage, managing field ID namespacing, MVCC column injection, schema validation, and catalog integration. The Table abstraction transforms low-level columnar operations into relational table semantics.
For information about the underlying columnar storage, see Column Storage and ColumnStore. For query execution that uses tables, see Query Execution. For metadata management, see System Catalog and SysCatalog.
Purpose and Responsibilities
The Table abstraction serves as the primary interface for relational data operations in LLKV. Its key responsibilities include:
| Responsibility | Description |
|---|---|
| Field ID Namespacing | Maps user FieldId values to table-scoped LogicalFieldId identifiers |
| MVCC Integration | Automatically injects and manages created_by and deleted_by columns |
| Schema Management | Provides Arrow schema generation and validation |
| Catalog Integration | Coordinates with SysCatalog for metadata persistence |
| Scan Operations | Implements streaming scans with projection, filtering, and ordering |
| Index Management | Registers and manages persisted sort indexes |
Sources: llkv-table/src/lib.rs:1-27 llkv-table/src/table.rs:58-69
Table Structure and Creation
Table Struct
The Table struct is a lightweight wrapper that combines a ColumnStore reference with a TableId. The MVCC cache is an optimization to avoid repeated schema introspection during appends.
Sources: llkv-table/src/table.rs:58-69 llkv-table/src/table.rs:50-56
Table Creation
Tables are created through the CatalogManager which coordinates metadata persistence and storage initialization:
sequenceDiagram
participant User
participant CatalogManager
participant SysCatalog
participant ColumnStore
participant Pager
User->>CatalogManager: create_table_from_columns(name, columns)
CatalogManager->>CatalogManager: Allocate TableId
CatalogManager->>SysCatalog: Register TableMeta
CatalogManager->>SysCatalog: Register ColMeta for each column
CatalogManager->>ColumnStore: Ensure physical storage
ColumnStore->>Pager: Initialize descriptors if needed
CatalogManager->>User: Return Table~P~
The creation process ensures metadata consistency before returning a usable Table handle.
Sources: llkv-table/src/table.rs:77-103
Field ID Namespace Mapping
Logical vs User Field IDs
LLKV uses a two-tier field ID system to prevent collisions between tables:
The mapping occurs during append operations where each field’s metadata is rewritten:
| Component | Type | Example |
|---|---|---|
| User Field ID | FieldId (u32) | 10 |
| Table ID | TableId (u32) | 1 |
| Logical Field ID | LogicalFieldId (u64) | 0x0000_0001_0000_000A |
Sources: llkv-table/src/table.rs:304-315
Data Operations
Append Operation
The append method is the primary write operation, handling field ID conversion, MVCC column injection, and catalog updates:
flowchart TD
START["append(batch)"]
CACHE["Get/Initialize\nMvccColumnCache"]
PROCESS["Process each field"]
subgraph "Per Field Processing"
SYSCHECK{"System column?\n(row_id, MVCC)"}
MVCC_ASSIGN["Assign MVCC\nLogicalFieldId"]
USER_CONVERT["Convert user FieldId\nto LogicalFieldId"]
META_UPDATE["Update catalog\nif needed"]
end
INJECT_CREATED{"created_by\npresent?"}
INJECT_DELETED{"deleted_by\npresent?"}
BUILD_CREATED["Build created_by column\nwith TXN_ID_AUTO_COMMIT"]
BUILD_DELETED["Build deleted_by column\nwith TXN_ID_NONE"]
RECONSTRUCT["Reconstruct RecordBatch\nwith LogicalFieldIds"]
STORE["store.append(namespaced_batch)"]
END["Return Result"]
START --> CACHE
CACHE --> PROCESS
PROCESS --> SYSCHECK
SYSCHECK -->|Yes| MVCC_ASSIGN
SYSCHECK -->|No| USER_CONVERT
USER_CONVERT --> META_UPDATE
MVCC_ASSIGN --> INJECT_CREATED
META_UPDATE --> INJECT_CREATED
INJECT_CREATED -->|No| BUILD_CREATED
INJECT_CREATED -->|Yes| INJECT_DELETED
BUILD_CREATED --> INJECT_DELETED
INJECT_DELETED -->|No| BUILD_DELETED
INJECT_DELETED -->|Yes| RECONSTRUCT
BUILD_DELETED --> RECONSTRUCT
RECONSTRUCT --> STORE
STORE --> END
MVCC Column Injection
For non-transactional appends (e.g., CSV import), the system automatically injects MVCC columns:
graph LR
subgraph "Input Batch"
IR["row_id: UInt64"]
IC1["col_a: Int64\nfield_id=10"]
IC2["col_b: Utf8\nfield_id=11"]
end
subgraph "Output Batch"
OR["row_id: UInt64"]
OC1["col_a: Int64\nlfid=0x0000_0001_0000_000A"]
OC2["col_b: Utf8\nlfid=0x0000_0001_0000_000B"]
OCB["created_by: UInt64\nlfid=MVCC_CREATED\nvalue=1"]
ODB["deleted_by: UInt64\nlfid=MVCC_DELETED\nvalue=0"]
end
IR --> OR
IC1 --> OC1
IC2 --> OC2
OC2 --> OCB
OCB --> ODB
The constant values used are:
TXN_ID_AUTO_COMMIT = 1forcreated_by(indicates system-committed, always visible)TXN_ID_NONE = 0fordeleted_by(indicates not deleted)
Sources: llkv-table/src/table.rs:231-438 llkv-table/src/table.rs:347-399
graph TB
subgraph "Public API"
SS["scan_stream\n(projections, filter, options, callback)"]
SSE["scan_stream_with_exprs\n(projections, filter, options, callback)"]
FR["filter_row_ids\n(filter_expr)"]
SC["stream_columns\n(logical_fields, row_ids, policy)"]
end
subgraph "Internal Execution"
ES["llkv_scan::execute_scan"]
RSB["RowStreamBuilder"]
CRS["collect_row_ids_for_table"]
end
subgraph "Storage Layer"
CS["ColumnStore::ScanBuilder"]
GR["gather_rows"]
FM["filter_matches"]
end
SS --> SSE
SSE --> ES
FR --> CRS
SC --> RSB
ES --> FM
ES --> GR
RSB --> GR
CRS --> FM
FM --> CS
GR --> CS
Scan Operations
The Table provides multiple scan methods that build on the lower-level ColumnStore scan infrastructure:
Scan Flow with Projections
Sources: llkv-table/src/table.rs:447-488 llkv-table/src/table.rs:490-496
Schema Management
Schema Generation
The schema() method constructs an Arrow Schema from catalog metadata:
Each field in the generated schema includes field_id metadata:
| Schema Component | Source | Example |
|---|---|---|
| Field name | ColMeta.name or generated | "customer_id" or "col_10" |
| Data type | ColumnStore.data_type(lfid) | DataType::Int64 |
| Field ID metadata | field_id key in metadata map | "10" |
| Nullability | Always true for user columns | true |
Sources: llkv-table/src/table.rs:519-549
graph TD
SCHEMA["schema()"]
subgraph "Extract Components"
NAMES["Collect field names"]
FIDS["Extract field_id metadata"]
TYPES["Format data types"]
end
subgraph "Build RecordBatch"
NAME_ARR["StringArray(names)"]
FID_ARR["UInt32Array(field_ids)"]
TYPE_ARR["StringArray(data_types)"]
RB_SCHEMA["Schema(name, field_id, data_type)"]
BATCH["RecordBatch::try_new"]
end
SCHEMA --> NAMES
SCHEMA --> FIDS
SCHEMA --> TYPES
NAMES --> NAME_ARR
FIDS --> FID_ARR
TYPES --> TYPE_ARR
NAME_ARR --> BATCH
FID_ARR --> BATCH
TYPE_ARR --> BATCH
RB_SCHEMA --> BATCH
Schema RecordBatch
For interactive display, the schema_recordbatch() method returns a tabular representation:
Sources: llkv-table/src/table.rs:554-586
stateDiagram-v2
[*] --> Uninitialized : Table created
state Uninitialized {
[*] --> CheckCache : append() called
CheckCache --> Initialize : Cache is None
Initialize --> Store : Scan schema for created_by, deleted_by
Store --> [*] : Cache populated
}
Uninitialized --> Cached : First append
state Cached {
[*] --> ReadCache : Subsequent appends
ReadCache --> UseCached : Return cached values
UseCached --> [*]
}
MVCC Column Management
Column Cache Optimization
The MvccColumnCache avoids repeated string comparisons during append operations:
The cache stores two boolean flags:
| Field | Type | Meaning |
|---|---|---|
has_created_by | bool | Schema includes created_by column |
has_deleted_by | bool | Schema includes deleted_by column |
This optimization is critical for bulk insert performance as it eliminates O(columns) string comparisons per batch.
Sources: llkv-table/src/table.rs:50-56 llkv-table/src/table.rs:175-205
MVCC Column LogicalFieldIds
MVCC columns use specially reserved LogicalFieldId values:
These reserved patterns ensure MVCC columns are stored separately from user data in the ColumnStore.
Sources: llkv-table/src/table.rs:255-259 llkv-table/src/table.rs:360-366 llkv-table/src/table.rs:383-389
Index Management
The Table provides operations to register and manage persisted sort indexes:
Registered indexes are maintained by the ColumnStore and used to optimize range scans and ordered queries.
Sources: llkv-table/src/table.rs:145-173
classDiagram
class RowIdScanCollector {
-Treemap row_ids
+extend_from_array(row_ids)
+extend_from_slice(row_ids, start, len)
+into_inner() Treemap
}
class PrimitiveWithRowIdsVisitor {<<trait>>\n+i64_chunk_with_rids(values, row_ids)\n+utf8_chunk_with_rids(values, row_ids)\n...}
class PrimitiveSortedWithRowIdsVisitor {<<trait>>\n+i64_run_with_rids(values, row_ids, start, len)\n+null_run(row_ids, start, len)\n...}
RowIdScanCollector ..|> PrimitiveWithRowIdsVisitor
RowIdScanCollector ..|> PrimitiveSortedWithRowIdsVisitor
Row ID Collection and Filtering
RowIdScanCollector
The RowIdScanCollector implements visitor traits to collect row IDs during filtered scans:
The collector ignores actual data values and only accumulates row IDs into a Treemap (roaring bitmap).
Sources: llkv-table/src/table.rs:805-858
sequenceDiagram
participant Scan
participant Emitter
participant Callback
participant Buffer
loop For each chunk from ColumnStore
Scan->>Emitter: chunk_with_rids(values, row_ids)
Emitter->>Buffer: Accumulate row_ids
alt Buffer reaches chunk_size
Emitter->>Callback: on_chunk(&row_ids[..chunk_size])
Callback-->>Emitter: Result
Emitter->>Buffer: Clear buffer
end
end
Note over Emitter: Scan complete
Emitter->>Callback: on_chunk(&remaining_row_ids)
Callback-->>Emitter: Result
RowIdChunkEmitter
For streaming scenarios, RowIdChunkEmitter buffers and emits row IDs in fixed-size chunks:
The emitter includes an optimization: when the buffer is empty, it passes slices directly without copying.
Sources: llkv-table/src/table.rs:860-993
classDiagram
class ScanStorage~P~ {<<trait>>\n+table_id() TableId\n+field_data_type(lfid) Result~DataType~\n+total_rows() Result~u64~\n+gather_column(lfid, row_ids, policy) Result~ArrayRef~\n+filter_matches(lfid, predicate) Result~Treemap~\n...}
class Table~P~ {-store: Arc~ColumnStore~P~~\n-table_id: TableId\n+catalog() SysCatalog}
Table~P~ ..|> ScanStorage~P~ : implements
ScanStorage Trait Implementation
The Table implements the ScanStorage trait to integrate with the llkv-scan execution layer:
Key delegations:
| ScanStorage Method | Table Implementation | Delegation Target |
|---|---|---|
table_id() | Direct field access | self.table_id |
field_data_type(lfid) | Forward to store | self.store.data_type(lfid) |
gather_column(...) | Forward to store | self.store.gather_rows(...) |
filter_matches(...) | Build predicate and scan | ColumnStore::ScanBuilder |
stream_row_ids_in_chunks(...) | Use RowIdChunkEmitter | Custom visitor |
Sources: llkv-table/src/table.rs:1019-1232
Integration with Lower Layers
Relationship to ColumnStore
The Table never directly interacts with the Pager; all storage operations go through the ColumnStore.
Sources: llkv-table/src/table.rs:64-69 llkv-table/src/table.rs:647-649
Usage Examples
Creating and Populating a Table
Sources: llkv-table/examples/test_streaming.rs:35-60 llkv-csv/src/writer.rs:104-137
Scanning with Projections and Filters
Sources: llkv-table/examples/test_streaming.rs:64-110 llkv-table/examples/performance_benchmark.rs:128-151
Performance Characteristics
The Table layer adds minimal overhead to ColumnStore operations:
| Operation | Overhead | Source |
|---|---|---|
| Append (cached MVCC) | Field ID conversion + metadata map construction | ~1-2% per field |
| Append (uncached MVCC) | Additional schema scan | ~5% first append only |
| Single-column scan | Filter compilation + streaming setup | ~1.1x vs direct ColumnStore |
| Multi-column scan | RecordBatch construction per batch | ~1.5-2x vs direct ColumnStore |
| Schema introspection | Catalog lookups + field sorting | O(fields) |
The design prioritizes zero-copy operations and streaming to minimize memory overhead.
Sources: llkv-table/examples/direct_comparison.rs:277-399 llkv-table/examples/performance_benchmark.rs:82-211
Dismiss
Refresh this wiki
Enter email to refresh