Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Table Abstraction

Loading…

Table Abstraction

Relevant source files

The Table struct provides a schema-aware, transactional interface over the columnar storage layer. It sits between the SQL execution layer and the physical storage, managing field ID namespacing, MVCC column injection, schema validation, and catalog integration. The Table abstraction transforms low-level columnar operations into relational table semantics.

For information about the underlying columnar storage, see Column Storage and ColumnStore. For query execution that uses tables, see Query Execution. For metadata management, see System Catalog and SysCatalog.


Purpose and Responsibilities

The Table abstraction serves as the primary interface for relational data operations in LLKV. Its key responsibilities include:

ResponsibilityDescription
Field ID NamespacingMaps user FieldId values to table-scoped LogicalFieldId identifiers
MVCC IntegrationAutomatically injects and manages created_by and deleted_by columns
Schema ManagementProvides Arrow schema generation and validation
Catalog IntegrationCoordinates with SysCatalog for metadata persistence
Scan OperationsImplements streaming scans with projection, filtering, and ordering
Index ManagementRegisters and manages persisted sort indexes

Sources: llkv-table/src/lib.rs:1-27 llkv-table/src/table.rs:58-69


Table Structure and Creation

Table Struct

The Table struct is a lightweight wrapper that combines a ColumnStore reference with a TableId. The MVCC cache is an optimization to avoid repeated schema introspection during appends.

Sources: llkv-table/src/table.rs:58-69 llkv-table/src/table.rs:50-56

Table Creation

Tables are created through the CatalogManager which coordinates metadata persistence and storage initialization:

sequenceDiagram
    participant User
    participant CatalogManager
    participant SysCatalog
    participant ColumnStore
    participant Pager
    
    User->>CatalogManager: create_table_from_columns(name, columns)
    CatalogManager->>CatalogManager: Allocate TableId
    CatalogManager->>SysCatalog: Register TableMeta
    CatalogManager->>SysCatalog: Register ColMeta for each column
    CatalogManager->>ColumnStore: Ensure physical storage
    ColumnStore->>Pager: Initialize descriptors if needed
    CatalogManager->>User: Return Table~P~

The creation process ensures metadata consistency before returning a usable Table handle.

Sources: llkv-table/src/table.rs:77-103


Field ID Namespace Mapping

Logical vs User Field IDs

LLKV uses a two-tier field ID system to prevent collisions between tables:

The mapping occurs during append operations where each field’s metadata is rewritten:

ComponentTypeExample
User Field IDFieldId (u32)10
Table IDTableId (u32)1
Logical Field IDLogicalFieldId (u64)0x0000_0001_0000_000A

Sources: llkv-table/src/table.rs:304-315


Data Operations

Append Operation

The append method is the primary write operation, handling field ID conversion, MVCC column injection, and catalog updates:

flowchart TD
    START["append(batch)"]
CACHE["Get/Initialize\nMvccColumnCache"]
PROCESS["Process each field"]
subgraph "Per Field Processing"
        SYSCHECK{"System column?\n(row_id, MVCC)"}
MVCC_ASSIGN["Assign MVCC\nLogicalFieldId"]
USER_CONVERT["Convert user FieldId\nto LogicalFieldId"]
META_UPDATE["Update catalog\nif needed"]
end
    
    INJECT_CREATED{"created_by\npresent?"}
INJECT_DELETED{"deleted_by\npresent?"}
BUILD_CREATED["Build created_by column\nwith TXN_ID_AUTO_COMMIT"]
BUILD_DELETED["Build deleted_by column\nwith TXN_ID_NONE"]
RECONSTRUCT["Reconstruct RecordBatch\nwith LogicalFieldIds"]
STORE["store.append(namespaced_batch)"]
END["Return Result"]
START --> CACHE
 
   CACHE --> PROCESS
 
   PROCESS --> SYSCHECK
 
   SYSCHECK -->|Yes| MVCC_ASSIGN
 
   SYSCHECK -->|No| USER_CONVERT
 
   USER_CONVERT --> META_UPDATE
 
   MVCC_ASSIGN --> INJECT_CREATED
 
   META_UPDATE --> INJECT_CREATED
    
 
   INJECT_CREATED -->|No| BUILD_CREATED
 
   INJECT_CREATED -->|Yes| INJECT_DELETED
 
   BUILD_CREATED --> INJECT_DELETED
    
 
   INJECT_DELETED -->|No| BUILD_DELETED
 
   INJECT_DELETED -->|Yes| RECONSTRUCT
 
   BUILD_DELETED --> RECONSTRUCT
    
 
   RECONSTRUCT --> STORE
 
   STORE --> END

MVCC Column Injection

For non-transactional appends (e.g., CSV import), the system automatically injects MVCC columns:

graph LR
    subgraph "Input Batch"
        IR["row_id: UInt64"]
IC1["col_a: Int64\nfield_id=10"]
IC2["col_b: Utf8\nfield_id=11"]
end
    
    subgraph "Output Batch"
        OR["row_id: UInt64"]
OC1["col_a: Int64\nlfid=0x0000_0001_0000_000A"]
OC2["col_b: Utf8\nlfid=0x0000_0001_0000_000B"]
OCB["created_by: UInt64\nlfid=MVCC_CREATED\nvalue=1"]
ODB["deleted_by: UInt64\nlfid=MVCC_DELETED\nvalue=0"]
end
    
 
   IR --> OR
 
   IC1 --> OC1
 
   IC2 --> OC2
 
   OC2 --> OCB
 
   OCB --> ODB

The constant values used are:

  • TXN_ID_AUTO_COMMIT = 1 for created_by (indicates system-committed, always visible)
  • TXN_ID_NONE = 0 for deleted_by (indicates not deleted)

Sources: llkv-table/src/table.rs:231-438 llkv-table/src/table.rs:347-399

graph TB
    subgraph "Public API"
        SS["scan_stream\n(projections, filter, options, callback)"]
SSE["scan_stream_with_exprs\n(projections, filter, options, callback)"]
FR["filter_row_ids\n(filter_expr)"]
SC["stream_columns\n(logical_fields, row_ids, policy)"]
end
    
    subgraph "Internal Execution"
        ES["llkv_scan::execute_scan"]
RSB["RowStreamBuilder"]
CRS["collect_row_ids_for_table"]
end
    
    subgraph "Storage Layer"
        CS["ColumnStore::ScanBuilder"]
GR["gather_rows"]
FM["filter_matches"]
end
    
 
   SS --> SSE
 
   SSE --> ES
 
   FR --> CRS
 
   SC --> RSB
    
 
   ES --> FM
 
   ES --> GR
 
   RSB --> GR
 
   CRS --> FM
    
 
   FM --> CS
 
   GR --> CS

Scan Operations

The Table provides multiple scan methods that build on the lower-level ColumnStore scan infrastructure:

Scan Flow with Projections

Sources: llkv-table/src/table.rs:447-488 llkv-table/src/table.rs:490-496


Schema Management

Schema Generation

The schema() method constructs an Arrow Schema from catalog metadata:

Each field in the generated schema includes field_id metadata:

Schema ComponentSourceExample
Field nameColMeta.name or generated"customer_id" or "col_10"
Data typeColumnStore.data_type(lfid)DataType::Int64
Field ID metadatafield_id key in metadata map"10"
NullabilityAlways true for user columnstrue

Sources: llkv-table/src/table.rs:519-549

graph TD
    SCHEMA["schema()"]
subgraph "Extract Components"
        NAMES["Collect field names"]
FIDS["Extract field_id metadata"]
TYPES["Format data types"]
end
    
    subgraph "Build RecordBatch"
        NAME_ARR["StringArray(names)"]
FID_ARR["UInt32Array(field_ids)"]
TYPE_ARR["StringArray(data_types)"]
RB_SCHEMA["Schema(name, field_id, data_type)"]
BATCH["RecordBatch::try_new"]
end
    
 
   SCHEMA --> NAMES
 
   SCHEMA --> FIDS
 
   SCHEMA --> TYPES
    
 
   NAMES --> NAME_ARR
 
   FIDS --> FID_ARR
 
   TYPES --> TYPE_ARR
    
 
   NAME_ARR --> BATCH
 
   FID_ARR --> BATCH
 
   TYPE_ARR --> BATCH
 
   RB_SCHEMA --> BATCH

Schema RecordBatch

For interactive display, the schema_recordbatch() method returns a tabular representation:

Sources: llkv-table/src/table.rs:554-586


stateDiagram-v2
    [*] --> Uninitialized : Table created
    
    state Uninitialized {
        [*] --> CheckCache : append() called
        CheckCache --> Initialize : Cache is None
        Initialize --> Store : Scan schema for created_by, deleted_by
        Store --> [*] : Cache populated
    }
    
    Uninitialized --> Cached : First append
    
    state Cached {
        [*] --> ReadCache : Subsequent appends
        ReadCache --> UseCached : Return cached values
        UseCached --> [*]
    }

MVCC Column Management

Column Cache Optimization

The MvccColumnCache avoids repeated string comparisons during append operations:

The cache stores two boolean flags:

FieldTypeMeaning
has_created_byboolSchema includes created_by column
has_deleted_byboolSchema includes deleted_by column

This optimization is critical for bulk insert performance as it eliminates O(columns) string comparisons per batch.

Sources: llkv-table/src/table.rs:50-56 llkv-table/src/table.rs:175-205

MVCC Column LogicalFieldIds

MVCC columns use specially reserved LogicalFieldId values:

These reserved patterns ensure MVCC columns are stored separately from user data in the ColumnStore.

Sources: llkv-table/src/table.rs:255-259 llkv-table/src/table.rs:360-366 llkv-table/src/table.rs:383-389


Index Management

The Table provides operations to register and manage persisted sort indexes:

Registered indexes are maintained by the ColumnStore and used to optimize range scans and ordered queries.

Sources: llkv-table/src/table.rs:145-173


classDiagram
    class RowIdScanCollector {
        -Treemap row_ids
        +extend_from_array(row_ids)
        +extend_from_slice(row_ids, start, len)
        +into_inner() Treemap
    }
    
    class PrimitiveWithRowIdsVisitor {<<trait>>\n+i64_chunk_with_rids(values, row_ids)\n+utf8_chunk_with_rids(values, row_ids)\n...}
    
    class PrimitiveSortedWithRowIdsVisitor {<<trait>>\n+i64_run_with_rids(values, row_ids, start, len)\n+null_run(row_ids, start, len)\n...}
    
    RowIdScanCollector ..|> PrimitiveWithRowIdsVisitor
    RowIdScanCollector ..|> PrimitiveSortedWithRowIdsVisitor

Row ID Collection and Filtering

RowIdScanCollector

The RowIdScanCollector implements visitor traits to collect row IDs during filtered scans:

The collector ignores actual data values and only accumulates row IDs into a Treemap (roaring bitmap).

Sources: llkv-table/src/table.rs:805-858

sequenceDiagram
    participant Scan
    participant Emitter
    participant Callback
    participant Buffer
    
    loop For each chunk from ColumnStore
        Scan->>Emitter: chunk_with_rids(values, row_ids)
        Emitter->>Buffer: Accumulate row_ids
        
        alt Buffer reaches chunk_size
            Emitter->>Callback: on_chunk(&row_ids[..chunk_size])
            Callback-->>Emitter: Result
            Emitter->>Buffer: Clear buffer
        end
    end
    
    Note over Emitter: Scan complete
    Emitter->>Callback: on_chunk(&remaining_row_ids)
    Callback-->>Emitter: Result

RowIdChunkEmitter

For streaming scenarios, RowIdChunkEmitter buffers and emits row IDs in fixed-size chunks:

The emitter includes an optimization: when the buffer is empty, it passes slices directly without copying.

Sources: llkv-table/src/table.rs:860-993


classDiagram
    class ScanStorage~P~ {<<trait>>\n+table_id() TableId\n+field_data_type(lfid) Result~DataType~\n+total_rows() Result~u64~\n+gather_column(lfid, row_ids, policy) Result~ArrayRef~\n+filter_matches(lfid, predicate) Result~Treemap~\n...}
    
    class Table~P~ {-store: Arc~ColumnStore~P~~\n-table_id: TableId\n+catalog() SysCatalog}
    
    Table~P~ ..|> ScanStorage~P~ : implements

ScanStorage Trait Implementation

The Table implements the ScanStorage trait to integrate with the llkv-scan execution layer:

Key delegations:

ScanStorage MethodTable ImplementationDelegation Target
table_id()Direct field accessself.table_id
field_data_type(lfid)Forward to storeself.store.data_type(lfid)
gather_column(...)Forward to storeself.store.gather_rows(...)
filter_matches(...)Build predicate and scanColumnStore::ScanBuilder
stream_row_ids_in_chunks(...)Use RowIdChunkEmitterCustom visitor

Sources: llkv-table/src/table.rs:1019-1232


Integration with Lower Layers

Relationship to ColumnStore

The Table never directly interacts with the Pager; all storage operations go through the ColumnStore.

Sources: llkv-table/src/table.rs:64-69 llkv-table/src/table.rs:647-649


Usage Examples

Creating and Populating a Table

Sources: llkv-table/examples/test_streaming.rs:35-60 llkv-csv/src/writer.rs:104-137

Scanning with Projections and Filters

Sources: llkv-table/examples/test_streaming.rs:64-110 llkv-table/examples/performance_benchmark.rs:128-151


Performance Characteristics

The Table layer adds minimal overhead to ColumnStore operations:

OperationOverheadSource
Append (cached MVCC)Field ID conversion + metadata map construction~1-2% per field
Append (uncached MVCC)Additional schema scan~5% first append only
Single-column scanFilter compilation + streaming setup~1.1x vs direct ColumnStore
Multi-column scanRecordBatch construction per batch~1.5-2x vs direct ColumnStore
Schema introspectionCatalog lookups + field sortingO(fields)

The design prioritizes zero-copy operations and streaming to minimize memory overhead.

Sources: llkv-table/examples/direct_comparison.rs:277-399 llkv-table/examples/performance_benchmark.rs:82-211

Dismiss

Refresh this wiki

Enter email to refresh