Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Pager Interface and SIMD Optimization

Loading…

Pager Interface and SIMD Optimization

Relevant source files

Purpose and Scope

This document describes the storage abstraction layer that provides persistent key-value storage for the columnar data model. The Pager trait defines a generic interface for batch-oriented storage operations, while the SIMD R-Drive library provides a memory-mapped, SIMD-optimized concrete implementation.

For column-level storage operations built on top of the pager, see Column Storage and ColumnStore. For table-level abstractions, see Table Abstraction.


System Architecture

The storage system follows a layered design where the Pager provides a simple key-value abstraction that the ColumnStore builds upon. All persistence operations flow through batch APIs to minimize I/O overhead and enable atomic commits.

graph TB
    subgraph "Column Storage Layer"
        ColumnStore["ColumnStore<P: Pager>"]
ColumnCatalog["ColumnCatalog"]
ColumnDescriptor["ColumnDescriptor"]
ChunkMetadata["ChunkMetadata"]
end
    
    subgraph "Pager Abstraction Layer"
        PagerTrait["Pager Trait"]
BatchGet["BatchGet"]
BatchPut["BatchPut"]
GetResult["GetResult"]
PhysicalKey["PhysicalKey (u64)"]
end
    
    subgraph "SIMD R-Drive Implementation"
        SimdRDrive["simd-r-drive 0.15.5"]
EntryHandle["EntryHandle"]
MemoryMap["Memory-Mapped Files"]
SimdOps["SIMD Operations"]
end
    
 
   ColumnStore --> PagerTrait
 
   ColumnStore --> BatchGet
 
   ColumnStore --> BatchPut
 
   ColumnCatalog --> PagerTrait
 
   ColumnDescriptor --> PagerTrait
 
   ChunkMetadata --> PhysicalKey
    
 
   PagerTrait --> SimdRDrive
 
   BatchGet --> PhysicalKey
 
   BatchPut --> PhysicalKey
 
   GetResult --> EntryHandle
    
 
   SimdRDrive --> EntryHandle
 
   SimdRDrive --> MemoryMap
 
   SimdRDrive --> SimdOps
 
   EntryHandle --> MemoryMap

Storage Layering

Sources: llkv-column-map/src/store/core.rs:1-89 Cargo.toml:85-86


The Pager Trait

The Pager trait defines the storage interface used throughout the system. It abstracts over key-value stores with batch-oriented operations and explicit memory management.

Core Operations

OperationPurposeBatch Size
batch_get(&[BatchGet])Retrieve multiple keys atomically1-N keys
batch_put(&[BatchPut])Write multiple keys atomically1-N keys
alloc()Allocate a single physical key1 key
alloc_many(&[usize])Allocate multiple physical keysN keys
free(PhysicalKey)Deallocate a single key1 key
free_many(&[PhysicalKey])Deallocate multiple keysN keys

Type Parameters

The Pager trait is generic over the blob type returned from reads:

The SIMD R-Drive implementation uses EntryHandle as the blob type, which provides zero-copy access to memory-mapped regions without allocating new buffers.

Sources: llkv-column-map/src/store/core.rs:60-68 llkv-column-map/src/store/core.rs:89-124

Physical Keys

Physical keys are 64-bit unsigned integers (u64) that uniquely identify storage locations. The system partitions the key space for different purposes:

  • Catalog Root : Key 0 stores the ColumnCatalog
  • Column Descriptors : User-allocated keys for column metadata
  • Data Chunks : User-allocated keys for serialized Arrow arrays
  • Index Structures : User-allocated keys for permutations and bitmaps

Sources: llkv-storage/src/constants.rs (implied), llkv-column-map/src/store/core.rs:100-124


sequenceDiagram
    participant CS as ColumnStore
    participant Pager as Pager
    participant SIMD as SIMD R-Drive
    participant MM as Memory Map
    
    CS->>Pager: batch_get([BatchGet::Raw{key: 42}, ...])
    Pager->>SIMD: batch read request
    SIMD->>MM: locate pages for keys
    MM->>SIMD: zero-copy EntryHandles
    SIMD->>Pager: Vec<GetResult>
    Pager->>CS: Results with EntryHandles
    CS->>CS: deserialize Arrow arrays

Batch Operations

All storage I/O uses batch operations to minimize system call overhead and enable atomic multi-key transactions.

BatchGet Flow

Sources: llkv-column-map/src/store/core.rs:100-110 llkv-column-map/src/store/core.rs:414-425

The BatchGet enum specifies requests:

VariantPurpose
Raw { key: PhysicalKey }Retrieve raw bytes for a key

The GetResult enum returns outcomes:

VariantMeaning
Raw { key: PhysicalKey, bytes: Blob }Successfully retrieved bytes
(others implied by usage)Key not found, or other statuses

BatchPut Flow

Sources: llkv-column-map/src/store/core.rs:217-222 llkv-column-map/src/store/core.rs:1117-1125

The BatchPut enum specifies writes:

VariantPurpose
Raw { key: PhysicalKey, bytes: Vec<u8> }Write raw bytes to a key

Why Batching Matters

Batching provides three critical benefits:

  1. Atomicity : All keys in a batch are committed together or not at all
  2. Performance : Amortizes system call overhead across many keys
  3. Prefetching : Enables SIMD R-Drive to optimize I/O patterns

The ColumnStore::append method demonstrates this pattern—it stages hundreds of puts (data chunks, descriptors, catalog updates) and commits them atomically at the end:

Sources: llkv-column-map/src/store/core.rs:1117-1125


Physical Key Allocation

The pager manages a free-list of physical keys. Allocations are persistent—once allocated, a key remains valid until explicitly freed.

graph TD
    subgraph "Allocation Operations"
        A1["alloc() → PhysicalKey"]
A2["alloc_many(n) → Vec&lt;PhysicalKey&gt;"]
end
    
    subgraph "Deallocation Operations"
        F1["free(key)"]
F2["free_many(&[key])"]
end
    
    subgraph "Usage Examples"
        Desc["Column Descriptor\n1 key per column"]
Chunks["Data Chunks\nN keys per append"]
Indexes["Indexes\n1 key per index"]
end
    
 
   A1 --> Desc
 
   A2 --> Chunks
 
   A1 --> Indexes
    
 
   Desc -.-> F1
 
   Chunks -.-> F2

Allocation Patterns

Sources: llkv-column-map/src/store/core.rs:250-263 llkv-column-map/src/store/core.rs:989-1006

Allocation Examples

The ColumnStore::append method allocates keys in bulk:

The ColumnStore::remove_column method frees all keys associated with a column:

Sources: llkv-column-map/src/store/core.rs:990-1005 llkv-column-map/src/store/core.rs:563-587


graph TB
    subgraph "User Space"
        Pager["Pager API"]
EntryHandle["EntryHandle\n(zero-copy view)"]
end
    
    subgraph "SIMD R-Drive"
        PageTable["Page Table\n(key → page mapping)"]
FreeList["Free List\n(available keys)"]
Allocator["Allocator\n(SIMD-optimized)"]
end
    
    subgraph "Operating System"
        MMap["mmap()
system call"]
FileBackend["Backing File"]
end
    
 
   Pager --> EntryHandle
 
   Pager --> PageTable
 
   Pager --> FreeList
 
   PageTable --> Allocator
 
   FreeList --> Allocator
    
 
   Allocator --> MMap
 
   MMap --> FileBackend
    EntryHandle -.references.-> MMap

SIMD R-Drive Implementation

The simd-r-drive crate (version 0.15.5) provides a memory-mapped key-value store optimized with SIMD instructions.

Memory-Mapped Architecture

Sources: Cargo.toml:85-86 llkv-column-map/src/store/core.rs:22-25

EntryHandle: Zero-Copy Blobs

The EntryHandle type implements AsRef<[u8]> and provides direct access to memory-mapped regions without copying data:

When deserializing Arrow arrays, the system reads directly from the mapped memory:

Sources: llkv-column-map/src/store/core.rs:60-89 llkv-column-map/src/store/core.rs:196-210

SIMD Optimization Benefits

The SIMD R-Drive uses vectorized instructions for:

  1. Page Scanning : Parallel search for keys in page tables
  2. Free List Management : Vectorized bitmap operations for allocation
  3. Data Movement : SIMD memcpy for large blobs

While the Rust code in this repository doesn’t directly show SIMD instructions, the external simd-r-drive dependency provides these optimizations transparently through the Pager trait.

Sources: Cargo.toml85 Cargo.lock:671-678


Integration with ColumnStore

The ColumnStore uses the pager for all persistent state, following consistent patterns across operations.

graph LR
    Catalog["ColumnCatalog\n(at key 0)"]
Catalog -->|LogicalFieldId → pk_data| DataDesc["Data Descriptor\n(at pk_data)"]
Catalog -->|LogicalFieldId → pk_rid| RidDesc["RowID Descriptor\n(at pk_rid)"]
DataDesc -->|linked list| Page1["Descriptor Page\nChunkMetadata[]"]
Page1 -->|next_page_pk| Page2["Descriptor Page\nChunkMetadata[]"]
RidDesc -->|linked list| Page3["Descriptor Page\nChunkMetadata[]"]

Storage Pattern: Column Descriptors

Each column has two descriptors (one for data, one for row IDs) stored at fixed physical keys tracked in the catalog:

Sources: llkv-column-map/src/store/core.rs:100-124 llkv-column-map/src/store/core.rs:234-344

graph TB
    ChunkMeta["ChunkMetadata"]
ChunkMeta -->|chunk_pk| DataBlob["Serialized Arrow Array\n(at chunk_pk)"]
ChunkMeta -->|value_order_perm_pk| PermBlob["Sort Permutation\n(at perm_pk)"]
DataBlob -.->|deserialized to| ArrayRef["ArrayRef\n(in-memory)"]

Storage Pattern: Data Chunks

Each chunk of Arrow array data is stored at a physical key referenced by ChunkMetadata:

Sources: llkv-column-map/src/store/core.rs:988-1006 llkv-column-map/src/store/descriptor.rs (implied)

sequenceDiagram
    participant App as Application
    participant CS as ColumnStore
    participant Pager as Pager
    
    App->>CS: append(RecordBatch)
    
    Note over CS: Stage Phase
    CS->>CS: serialize data chunks
    CS->>CS: serialize row-id chunks
    CS->>CS: update descriptors
    CS->>CS: update catalog
    CS->>CS: accumulate BatchPut[]
    
    Note over CS: Commit Phase
    CS->>Pager: batch_put(all_puts)
    Pager->>Pager: atomic commit
    
    alt Success
        Pager-->>CS: Ok(())
        CS->>CS: increment append_epoch
        CS-->>App: Ok(())
    else Failure
        Pager-->>CS: Err(...)
        CS-->>App: Err(...)
        Note over CS: No changes visible
    end

Transaction Boundary Example

The append operation demonstrates how the pager enables atomic multi-key transactions:

Sources: llkv-column-map/src/store/core.rs:787-1126

The key insight: all puts are staged in memory (Vec<BatchPut>) until a single atomic commit at line 1121. This ensures that partial failures leave the store in a consistent state.


Performance Characteristics

Batch Size Recommendations

The ColumnStore follows these heuristics:

OperationTypical Batch SizeJustification
Catalog Load1 keySingle root key at startup
Descriptor Load2 keysData + RowID descriptors
Chunk Append10-100 keysMultiple columns × chunks per column
Chunk Rewrite1-10 keysTargeted LWW updates
Full Scan100+ keysPrefetch all chunks for a column

Sources: llkv-column-map/src/store/core.rs:100-110 llkv-column-map/src/store/core.rs:1117-1125

Memory Mapping Benefits

Memory-mapped storage provides:

  1. Zero-Copy Deserialization : Arrow arrays reference mapped memory directly
  2. OS-Level Caching : The kernel manages page cache automatically
  3. Lazy Loading : Pages are only faulted in when accessed

The EntryHandle type enables these patterns by avoiding buffer copies during reads.

Sources: llkv-column-map/src/store/core.rs:60-85 llkv-column-map/src/store/scan/filter.rs18

Allocation Strategies

The system uses bulk allocation to reduce pager round-trips:

Sources: llkv-column-map/src/store/core.rs:989-1006


Error Handling and Recovery

Allocation Failures

When allocation fails mid-operation, the system attempts to roll back:

Sources: llkv-column-map/src/store/core.rs:563-587

Batch Operation Atomicity

The pager guarantees that either all keys in a batch_put are written or none are. This prevents partial writes from corrupting the storage state. The ColumnStore relies on this guarantee to implement transactional append and update operations.

Sources: llkv-column-map/src/store/core.rs:1117-1125


Summary

The Pager interface provides a minimal, batch-oriented abstraction over persistent key-value storage. The SIMD R-Drive implementation uses memory-mapped files and vectorized operations to optimize common patterns like bulk reads and zero-copy deserialization. By staging all writes and committing atomically, the ColumnStore builds transactional semantics on top of the simple key-value model.

Key Takeaways:

  • All I/O flows through batch operations (batch_get, batch_put)
  • Physical keys (u64) uniquely identify all persistent objects
  • EntryHandle enables zero-copy access to memory-mapped data
  • Bulk allocation (alloc_many) minimizes pager round-trips
  • Atomic batch commits enable transactional storage operations

Sources: llkv-column-map/src/store/core.rs:60-1126 Cargo.toml:85-86 llkv-storage (trait definitions implied by usage)

Dismiss

Refresh this wiki

Enter email to refresh