This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Pager Interface and SIMD Optimization
Loading…
Pager Interface and SIMD Optimization
Relevant source files
Purpose and Scope
This document describes the storage abstraction layer that provides persistent key-value storage for the columnar data model. The Pager trait defines a generic interface for batch-oriented storage operations, while the SIMD R-Drive library provides a memory-mapped, SIMD-optimized concrete implementation.
For column-level storage operations built on top of the pager, see Column Storage and ColumnStore. For table-level abstractions, see Table Abstraction.
System Architecture
The storage system follows a layered design where the Pager provides a simple key-value abstraction that the ColumnStore builds upon. All persistence operations flow through batch APIs to minimize I/O overhead and enable atomic commits.
graph TB
subgraph "Column Storage Layer"
ColumnStore["ColumnStore<P: Pager>"]
ColumnCatalog["ColumnCatalog"]
ColumnDescriptor["ColumnDescriptor"]
ChunkMetadata["ChunkMetadata"]
end
subgraph "Pager Abstraction Layer"
PagerTrait["Pager Trait"]
BatchGet["BatchGet"]
BatchPut["BatchPut"]
GetResult["GetResult"]
PhysicalKey["PhysicalKey (u64)"]
end
subgraph "SIMD R-Drive Implementation"
SimdRDrive["simd-r-drive 0.15.5"]
EntryHandle["EntryHandle"]
MemoryMap["Memory-Mapped Files"]
SimdOps["SIMD Operations"]
end
ColumnStore --> PagerTrait
ColumnStore --> BatchGet
ColumnStore --> BatchPut
ColumnCatalog --> PagerTrait
ColumnDescriptor --> PagerTrait
ChunkMetadata --> PhysicalKey
PagerTrait --> SimdRDrive
BatchGet --> PhysicalKey
BatchPut --> PhysicalKey
GetResult --> EntryHandle
SimdRDrive --> EntryHandle
SimdRDrive --> MemoryMap
SimdRDrive --> SimdOps
EntryHandle --> MemoryMap
Storage Layering
Sources: llkv-column-map/src/store/core.rs:1-89 Cargo.toml:85-86
The Pager Trait
The Pager trait defines the storage interface used throughout the system. It abstracts over key-value stores with batch-oriented operations and explicit memory management.
Core Operations
| Operation | Purpose | Batch Size |
|---|---|---|
batch_get(&[BatchGet]) | Retrieve multiple keys atomically | 1-N keys |
batch_put(&[BatchPut]) | Write multiple keys atomically | 1-N keys |
alloc() | Allocate a single physical key | 1 key |
alloc_many(&[usize]) | Allocate multiple physical keys | N keys |
free(PhysicalKey) | Deallocate a single key | 1 key |
free_many(&[PhysicalKey]) | Deallocate multiple keys | N keys |
Type Parameters
The Pager trait is generic over the blob type returned from reads:
The SIMD R-Drive implementation uses EntryHandle as the blob type, which provides zero-copy access to memory-mapped regions without allocating new buffers.
Sources: llkv-column-map/src/store/core.rs:60-68 llkv-column-map/src/store/core.rs:89-124
Physical Keys
Physical keys are 64-bit unsigned integers (u64) that uniquely identify storage locations. The system partitions the key space for different purposes:
- Catalog Root : Key
0stores theColumnCatalog - Column Descriptors : User-allocated keys for column metadata
- Data Chunks : User-allocated keys for serialized Arrow arrays
- Index Structures : User-allocated keys for permutations and bitmaps
Sources: llkv-storage/src/constants.rs (implied), llkv-column-map/src/store/core.rs:100-124
sequenceDiagram
participant CS as ColumnStore
participant Pager as Pager
participant SIMD as SIMD R-Drive
participant MM as Memory Map
CS->>Pager: batch_get([BatchGet::Raw{key: 42}, ...])
Pager->>SIMD: batch read request
SIMD->>MM: locate pages for keys
MM->>SIMD: zero-copy EntryHandles
SIMD->>Pager: Vec<GetResult>
Pager->>CS: Results with EntryHandles
CS->>CS: deserialize Arrow arrays
Batch Operations
All storage I/O uses batch operations to minimize system call overhead and enable atomic multi-key transactions.
BatchGet Flow
Sources: llkv-column-map/src/store/core.rs:100-110 llkv-column-map/src/store/core.rs:414-425
The BatchGet enum specifies requests:
| Variant | Purpose |
|---|---|
Raw { key: PhysicalKey } | Retrieve raw bytes for a key |
The GetResult enum returns outcomes:
| Variant | Meaning |
|---|---|
Raw { key: PhysicalKey, bytes: Blob } | Successfully retrieved bytes |
| (others implied by usage) | Key not found, or other statuses |
BatchPut Flow
Sources: llkv-column-map/src/store/core.rs:217-222 llkv-column-map/src/store/core.rs:1117-1125
The BatchPut enum specifies writes:
| Variant | Purpose |
|---|---|
Raw { key: PhysicalKey, bytes: Vec<u8> } | Write raw bytes to a key |
Why Batching Matters
Batching provides three critical benefits:
- Atomicity : All keys in a batch are committed together or not at all
- Performance : Amortizes system call overhead across many keys
- Prefetching : Enables SIMD R-Drive to optimize I/O patterns
The ColumnStore::append method demonstrates this pattern—it stages hundreds of puts (data chunks, descriptors, catalog updates) and commits them atomically at the end:
Sources: llkv-column-map/src/store/core.rs:1117-1125
Physical Key Allocation
The pager manages a free-list of physical keys. Allocations are persistent—once allocated, a key remains valid until explicitly freed.
graph TD
subgraph "Allocation Operations"
A1["alloc() → PhysicalKey"]
A2["alloc_many(n) → Vec<PhysicalKey>"]
end
subgraph "Deallocation Operations"
F1["free(key)"]
F2["free_many(&[key])"]
end
subgraph "Usage Examples"
Desc["Column Descriptor\n1 key per column"]
Chunks["Data Chunks\nN keys per append"]
Indexes["Indexes\n1 key per index"]
end
A1 --> Desc
A2 --> Chunks
A1 --> Indexes
Desc -.-> F1
Chunks -.-> F2
Allocation Patterns
Sources: llkv-column-map/src/store/core.rs:250-263 llkv-column-map/src/store/core.rs:989-1006
Allocation Examples
The ColumnStore::append method allocates keys in bulk:
The ColumnStore::remove_column method frees all keys associated with a column:
Sources: llkv-column-map/src/store/core.rs:990-1005 llkv-column-map/src/store/core.rs:563-587
graph TB
subgraph "User Space"
Pager["Pager API"]
EntryHandle["EntryHandle\n(zero-copy view)"]
end
subgraph "SIMD R-Drive"
PageTable["Page Table\n(key → page mapping)"]
FreeList["Free List\n(available keys)"]
Allocator["Allocator\n(SIMD-optimized)"]
end
subgraph "Operating System"
MMap["mmap()
system call"]
FileBackend["Backing File"]
end
Pager --> EntryHandle
Pager --> PageTable
Pager --> FreeList
PageTable --> Allocator
FreeList --> Allocator
Allocator --> MMap
MMap --> FileBackend
EntryHandle -.references.-> MMap
SIMD R-Drive Implementation
The simd-r-drive crate (version 0.15.5) provides a memory-mapped key-value store optimized with SIMD instructions.
Memory-Mapped Architecture
Sources: Cargo.toml:85-86 llkv-column-map/src/store/core.rs:22-25
EntryHandle: Zero-Copy Blobs
The EntryHandle type implements AsRef<[u8]> and provides direct access to memory-mapped regions without copying data:
When deserializing Arrow arrays, the system reads directly from the mapped memory:
Sources: llkv-column-map/src/store/core.rs:60-89 llkv-column-map/src/store/core.rs:196-210
SIMD Optimization Benefits
The SIMD R-Drive uses vectorized instructions for:
- Page Scanning : Parallel search for keys in page tables
- Free List Management : Vectorized bitmap operations for allocation
- Data Movement : SIMD memcpy for large blobs
While the Rust code in this repository doesn’t directly show SIMD instructions, the external simd-r-drive dependency provides these optimizations transparently through the Pager trait.
Sources: Cargo.toml85 Cargo.lock:671-678
Integration with ColumnStore
The ColumnStore uses the pager for all persistent state, following consistent patterns across operations.
graph LR
Catalog["ColumnCatalog\n(at key 0)"]
Catalog -->|LogicalFieldId → pk_data| DataDesc["Data Descriptor\n(at pk_data)"]
Catalog -->|LogicalFieldId → pk_rid| RidDesc["RowID Descriptor\n(at pk_rid)"]
DataDesc -->|linked list| Page1["Descriptor Page\nChunkMetadata[]"]
Page1 -->|next_page_pk| Page2["Descriptor Page\nChunkMetadata[]"]
RidDesc -->|linked list| Page3["Descriptor Page\nChunkMetadata[]"]
Storage Pattern: Column Descriptors
Each column has two descriptors (one for data, one for row IDs) stored at fixed physical keys tracked in the catalog:
Sources: llkv-column-map/src/store/core.rs:100-124 llkv-column-map/src/store/core.rs:234-344
graph TB
ChunkMeta["ChunkMetadata"]
ChunkMeta -->|chunk_pk| DataBlob["Serialized Arrow Array\n(at chunk_pk)"]
ChunkMeta -->|value_order_perm_pk| PermBlob["Sort Permutation\n(at perm_pk)"]
DataBlob -.->|deserialized to| ArrayRef["ArrayRef\n(in-memory)"]
Storage Pattern: Data Chunks
Each chunk of Arrow array data is stored at a physical key referenced by ChunkMetadata:
Sources: llkv-column-map/src/store/core.rs:988-1006 llkv-column-map/src/store/descriptor.rs (implied)
sequenceDiagram
participant App as Application
participant CS as ColumnStore
participant Pager as Pager
App->>CS: append(RecordBatch)
Note over CS: Stage Phase
CS->>CS: serialize data chunks
CS->>CS: serialize row-id chunks
CS->>CS: update descriptors
CS->>CS: update catalog
CS->>CS: accumulate BatchPut[]
Note over CS: Commit Phase
CS->>Pager: batch_put(all_puts)
Pager->>Pager: atomic commit
alt Success
Pager-->>CS: Ok(())
CS->>CS: increment append_epoch
CS-->>App: Ok(())
else Failure
Pager-->>CS: Err(...)
CS-->>App: Err(...)
Note over CS: No changes visible
end
Transaction Boundary Example
The append operation demonstrates how the pager enables atomic multi-key transactions:
Sources: llkv-column-map/src/store/core.rs:787-1126
The key insight: all puts are staged in memory (Vec<BatchPut>) until a single atomic commit at line 1121. This ensures that partial failures leave the store in a consistent state.
Performance Characteristics
Batch Size Recommendations
The ColumnStore follows these heuristics:
| Operation | Typical Batch Size | Justification |
|---|---|---|
| Catalog Load | 1 key | Single root key at startup |
| Descriptor Load | 2 keys | Data + RowID descriptors |
| Chunk Append | 10-100 keys | Multiple columns × chunks per column |
| Chunk Rewrite | 1-10 keys | Targeted LWW updates |
| Full Scan | 100+ keys | Prefetch all chunks for a column |
Sources: llkv-column-map/src/store/core.rs:100-110 llkv-column-map/src/store/core.rs:1117-1125
Memory Mapping Benefits
Memory-mapped storage provides:
- Zero-Copy Deserialization : Arrow arrays reference mapped memory directly
- OS-Level Caching : The kernel manages page cache automatically
- Lazy Loading : Pages are only faulted in when accessed
The EntryHandle type enables these patterns by avoiding buffer copies during reads.
Sources: llkv-column-map/src/store/core.rs:60-85 llkv-column-map/src/store/scan/filter.rs18
Allocation Strategies
The system uses bulk allocation to reduce pager round-trips:
Sources: llkv-column-map/src/store/core.rs:989-1006
Error Handling and Recovery
Allocation Failures
When allocation fails mid-operation, the system attempts to roll back:
Sources: llkv-column-map/src/store/core.rs:563-587
Batch Operation Atomicity
The pager guarantees that either all keys in a batch_put are written or none are. This prevents partial writes from corrupting the storage state. The ColumnStore relies on this guarantee to implement transactional append and update operations.
Sources: llkv-column-map/src/store/core.rs:1117-1125
Summary
The Pager interface provides a minimal, batch-oriented abstraction over persistent key-value storage. The SIMD R-Drive implementation uses memory-mapped files and vectorized operations to optimize common patterns like bulk reads and zero-copy deserialization. By staging all writes and committing atomically, the ColumnStore builds transactional semantics on top of the simple key-value model.
Key Takeaways:
- All I/O flows through batch operations (
batch_get,batch_put) - Physical keys (
u64) uniquely identify all persistent objects EntryHandleenables zero-copy access to memory-mapped data- Bulk allocation (
alloc_many) minimizes pager round-trips - Atomic batch commits enable transactional storage operations
Sources: llkv-column-map/src/store/core.rs:60-1126 Cargo.toml:85-86 llkv-storage (trait definitions implied by usage)
Dismiss
Refresh this wiki
Enter email to refresh