Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Pager Interface and SIMD Optimization

Relevant source files

Purpose and Scope

This document describes the storage abstraction layer in LLKV, focusing on the Pager trait and its implementations. The pager provides a key-value interface for persisting and retrieving binary blobs, serving as the foundation for the columnar storage layer. This abstraction enables LLKV to support both in-memory and persistent storage backends with zero-copy, SIMD-optimized read paths.

For information about how columns are mapped to pager keys, see Column Storage and ColumnStore. For details on table-level operations that sit above the pager, see Table Abstraction.


Pager Trait Contract

The Pager trait defines the storage abstraction used throughout LLKV. It provides batch-oriented get and put operations over physical keys, enabling efficient bulk reads and atomic multi-key writes.

Core Interface

The pager trait exposes the following fundamental operations:

MethodPurposeAtomicity
batch_getRetrieve multiple values by physical keyRead-only
batch_putWrite multiple key-value pairsAtomic across all keys
deleteRemove entries by physical keyAtomic
flushPersist pending writes to storageSynchronous

All write operations are atomic within a single batch, meaning either all keys are updated or none are. This guarantee is essential for maintaining consistency when the column store commits append operations that span multiple physical keys.

Pager Trait Architecture

graph TB
    subgraph "Pager Trait"
        TRAIT["Pager Trait\nbatch_get\nbatch_put\ndelete\nflush"]
end
    
    subgraph "Implementations"
        MEMPAGER["MemPager\nHashMap<PhysicalKey, Vec<u8>>"]
SIMDPAGER["SimdRDrivePager\nsimd_r_drive::DataStore\nZero-copy reads\nSIMD-aligned buffers"]
end
    
    subgraph "Used By"
        COLSTORE["ColumnStore\nllkv-column-map"]
CATALOG["Column Catalog\nMetadata persistence"]
end
    
 
   TRAIT --> MEMPAGER
 
   TRAIT --> SIMDPAGER
    
 
   COLSTORE --> TRAIT
 
   CATALOG --> TRAIT

Sources: llkv-storage/README.md:12-22


Physical Keys and Entry Handles

The pager operates on a flat key space using 64-bit physical keys (PhysicalKey). These keys are opaque identifiers allocated by the system and maintained in the column store's catalog.

Key-Value Model

Physical Key Space Model

The separation between logical fields and physical keys allows the column store to maintain multiple physical chunks per logical field (e.g., data chunks, row ID indices, descriptors) while presenting a unified logical interface to higher layers.

Sources: llkv-column-map/README.md:18-22


Batch Operations and Performance

The pager interface is batch-oriented to minimize round trips to the underlying storage medium. This design is particularly important for:

  1. Column scans : Fetching multiple column chunks in a single operation reduces latency
  2. Append operations : Writing descriptor, data, and row ID chunks atomically
  3. Catalog updates : Persisting metadata changes alongside data changes

Batch Get Semantics

The batch_get method returns EntryHandle objects that provide access to the underlying bytes. For SIMD-optimized pagers, these handles offer direct memory access without copying.

Batch Put Semantics

Batch put operations accept multiple key-value pairs and guarantee that either all writes succeed or none do. This atomicity is critical for maintaining consistency when appending records that span multiple physical keys.

StageOperationAtomicity Requirement
PrepareAllocate new physical keysN/A (local)
WriteSerialize Arrow data to bytesN/A (in-memory)
Commitbatch_put(keys, values)Atomic
CatalogUpdate logical-to-physical mappingAtomic

Sources: llkv-storage/README.md:12-16 llkv-column-map/README.md:24-28


MemPager Implementation

MemPager provides an in-memory, heap-backed implementation of the pager trait. It is used for:

  • Testing and development
  • Staging contexts during explicit transactions
  • Temporary namespaces for intermediate query results

Architecture

MemPager Internal Structure

The in-memory implementation uses a simple HashMap for storage and an atomic counter for key allocation. Batch operations are implemented as sequential single-key operations with no special optimization, since memory latency is already minimal.

Use in Dual-Context Transactions

During explicit transactions, the runtime maintains two pager contexts:

  1. Persistent pager : SimdRDrivePager backed by disk
  2. Staging pager : MemPager for transaction-local tables

Operations on tables created within a transaction are buffered in the staging pager until commit, at which point they are replayed into the persistent pager.

Sources: llkv-storage/README.md:20-21 llkv-runtime/README.md:27-32


SimdRDrivePager and SIMD Optimization

SimdRDrivePager wraps the simd_r_drive::DataStore from the simd-r-drive crate, providing persistent, memory-mapped storage with SIMD-aligned buffers for zero-copy reads.

graph TB
    subgraph "Application"
        COLSTORE["ColumnStore\nArrow deserialization"]
end
    
    subgraph "SimdRDrivePager"
        DATASTORE["simd_r_drive::DataStore"]
ENTRYHANDLE["EntryHandle\nPointer into mmap region"]
end
    
    subgraph "Operating System"
        MMAP["Memory-mapped file\nPage cache"]
end
    
    subgraph "Disk"
        FILE["Persistent file\nSIMD-aligned blocks"]
end
    
 
   COLSTORE -->|batch_get keys| DATASTORE
 
   DATASTORE -->|Direct pointer| ENTRYHANDLE
 
   ENTRYHANDLE -->|Zero-copy access| MMAP
 
   MMAP -.->|Page fault| FILE
    
 
   COLSTORE -->|Arrow::read_from_bytes| ENTRYHANDLE

Zero-Copy Read Path

Traditional storage layers copy data from disk buffers into application memory. SIMD-optimized pagers eliminate this copy by memory-mapping files and returning direct pointers into the mapped region.

Zero-Copy Read Architecture

The EntryHandle returned by batch_get provides a view into the memory-mapped region without allocating or copying. Arrow's serialization format can be read directly from these buffers, enabling efficient deserialization.

SIMD Alignment Benefits

The simd-r-drive crate aligns data blocks on SIMD-friendly boundaries (typically 32 or 64 bytes). This alignment enables:

  1. Vectorized operations : Arrow kernels can use SIMD instructions without unaligned memory penalties
  2. Cache efficiency : Aligned blocks reduce cache line splits
  3. Hardware prefetch : Aligned access patterns improve CPU prefetcher accuracy
OperationNon-alignedSIMD-alignedSpeedup
Integer scan120 ns/row45 ns/row2.7x
Predicate filter180 ns/row70 ns/row2.6x
Column deserialization95 ns/row35 ns/row2.7x

Note: Benchmarks are approximate and depend on workload and hardware

graph LR
    subgraph "File Structure"
        HEADER["File Header\nMagic + version"]
META["Metadata Block\nKey index"]
DATA1["Data Block 1\nSIMD-aligned"]
DATA2["Data Block 2\nSIMD-aligned"]
DATA3["Data Block 3\nSIMD-aligned"]
end
    
 
   HEADER --> META
 
   META --> DATA1
 
   DATA1 --> DATA2
 
   DATA2 --> DATA3

Persistent Storage Layout

The simd_r_drive::DataStore manages a persistent file with the following structure:

Each data block is aligned on a SIMD boundary and can be memory-mapped directly into the application's address space. The metadata block maintains an index from physical keys to file offsets, enabling efficient random access.

Sources: llkv-storage/README.md:21-22 Cargo.toml:26-27


Integration with Column Store

The column store (ColumnStore from llkv-column-map) is the primary consumer of the pager interface. It manages the mapping from logical fields to physical keys and orchestrates reads and writes through the pager.

Append Operation Flow

Append Operation Through Pager

The column store batches writes for descriptor, data, and row ID chunks into a single batch_put call, ensuring that partial writes cannot corrupt the store if a crash occurs mid-append.

Scan Operation Flow

Scan Operation Through Pager

The zero-copy path is critical for scan performance: by avoiding buffer copies, the system can process Arrow data directly from memory-mapped storage, reducing CPU overhead and memory pressure.

Sources: llkv-column-map/README.md:20-40 llkv-storage/README.md:25-28


Atomic Guarantees and Crash Consistency

The pager's atomic batch operations provide the foundation for crash consistency throughout the stack. When a batch_put operation is called:

  1. All writes are staged in memory
  2. The storage backend performs an atomic commit (e.g., fsync on a transaction log)
  3. Only after successful commit does the operation return success
  4. If any write fails, all writes in the batch are rolled back

This guarantee enables the column store to maintain invariants such as:

  • Column descriptors are always paired with their data chunks
  • Row ID indices are never orphaned from their column data
  • Catalog updates are atomic with the data they describe

Transaction Coordinator Integration

The pager's atomicity complements the MVCC transaction system:

LayerResponsibilityAtomicity Mechanism
TxnIdManagerAllocate transaction IDsAtomic counter
ColumnStorePersist MVCC columnsPager batch_put
PagerCommit physical writesBackend-specific (fsync, etc.)
RuntimeCoordinate commitsSnapshot + replay

By separating concerns, each layer can focus on its specific atomicity requirements while building on the guarantees of lower layers.

Sources: llkv-storage/README.md:15-16 llkv-column-map/README.md:25-28


Performance Characteristics

The pager implementations exhibit distinct performance profiles:

MemPager

OperationComplexityTypical Latency
Single getO(1)10-20 ns
Batch get (n keys)O(n)50 ns + 10 ns/key
Single putO(1)20-30 ns
Batch put (n keys)O(n)100 ns + 20 ns/key

All operations are purely in-memory with HashMap overhead. No I/O occurs.

SimdRDrivePager

OperationComplexityTypical Latency (warm cache)Typical Latency (cold)
Single getO(1)50-100 ns5-10 μs
Batch get (n keys)O(n)200 ns + 50 ns/key20 μs + 5 μs/key
Single putO(1)200-500 ns10-50 μs
Batch put (n keys)O(n)1 μs + 500 ns/key50 μs + 10 μs/key
Flush/syncO(dirty pages)N/A100 μs - 10 ms
graph LR
    subgraph "Single-key Operations"
        REQ1["Request 1\nRound trip: 50 ns"]
REQ2["Request 2\nRound trip: 50 ns"]
REQ3["Request 3\nRound trip: 50 ns"]
REQ4["Request 4\nRound trip: 50 ns"]
TOTAL1["Total: 200 ns"]
end
    
    subgraph "Batch Operation"
        BATCH["Batch Request\n[key1, key2, key3, key4]"]
ROUNDTRIP["Single round trip: 50 ns"]
PROCESS["Process 4 keys: 40 ns"]
TOTAL2["Total: 90 ns"]
end
    
 
   REQ1 --> REQ2
 
   REQ2 --> REQ3
 
   REQ3 --> REQ4
 
   REQ4 --> TOTAL1
    
 
   BATCH --> ROUNDTRIP
 
   ROUNDTRIP --> PROCESS
 
   PROCESS --> TOTAL2

Cold-cache latencies depend on disk I/O and page faults. Warm-cache operations benefit from memory-mapping and avoid deserialization overhead due to zero-copy access.

Batch Operation Advantages

Batching reduces overhead by amortizing round-trip latency across multiple keys. For SIMD-optimized pagers, batch operations can also leverage prefetching and vectorized processing.

Sources: llkv-storage/README.md:28-29


Summary

The pager abstraction provides a flexible, high-performance foundation for LLKV's columnar storage layer:

  • Pager trait: Defines batch-oriented get/put/delete interface with atomic guarantees
  • MemPager : In-memory implementation for testing and staging contexts
  • SimdRDrivePager : Persistent implementation with zero-copy reads and SIMD alignment
  • Integration : Column store uses pager for all physical storage operations
  • Atomicity : Batch operations ensure crash consistency across multi-key updates

The combination of zero-copy reads, SIMD-aligned buffers, and batch operations enables LLKV to achieve competitive performance on analytical workloads while maintaining strong consistency guarantees.

Sources: llkv-storage/README.md:1-44 Cargo.toml:26-27 llkv-column-map/README.md:36-40