This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Pager Interface and SIMD Optimization
Relevant source files
- Cargo.lock
- Cargo.toml
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-csv/README.md
- llkv-expr/README.md
- llkv-join/README.md
- llkv-runtime/README.md
- llkv-storage/README.md
- llkv-table/README.md
Purpose and Scope
This document describes the storage abstraction layer in LLKV, focusing on the Pager trait and its implementations. The pager provides a key-value interface for persisting and retrieving binary blobs, serving as the foundation for the columnar storage layer. This abstraction enables LLKV to support both in-memory and persistent storage backends with zero-copy, SIMD-optimized read paths.
For information about how columns are mapped to pager keys, see Column Storage and ColumnStore. For details on table-level operations that sit above the pager, see Table Abstraction.
Pager Trait Contract
The Pager trait defines the storage abstraction used throughout LLKV. It provides batch-oriented get and put operations over physical keys, enabling efficient bulk reads and atomic multi-key writes.
Core Interface
The pager trait exposes the following fundamental operations:
| Method | Purpose | Atomicity |
|---|---|---|
batch_get | Retrieve multiple values by physical key | Read-only |
batch_put | Write multiple key-value pairs | Atomic across all keys |
delete | Remove entries by physical key | Atomic |
flush | Persist pending writes to storage | Synchronous |
All write operations are atomic within a single batch, meaning either all keys are updated or none are. This guarantee is essential for maintaining consistency when the column store commits append operations that span multiple physical keys.
Pager Trait Architecture
graph TB
subgraph "Pager Trait"
TRAIT["Pager Trait\nbatch_get\nbatch_put\ndelete\nflush"]
end
subgraph "Implementations"
MEMPAGER["MemPager\nHashMap<PhysicalKey, Vec<u8>>"]
SIMDPAGER["SimdRDrivePager\nsimd_r_drive::DataStore\nZero-copy reads\nSIMD-aligned buffers"]
end
subgraph "Used By"
COLSTORE["ColumnStore\nllkv-column-map"]
CATALOG["Column Catalog\nMetadata persistence"]
end
TRAIT --> MEMPAGER
TRAIT --> SIMDPAGER
COLSTORE --> TRAIT
CATALOG --> TRAIT
Sources: llkv-storage/README.md:12-22
Physical Keys and Entry Handles
The pager operates on a flat key space using 64-bit physical keys (PhysicalKey). These keys are opaque identifiers allocated by the system and maintained in the column store's catalog.
Key-Value Model
Physical Key Space Model
The separation between logical fields and physical keys allows the column store to maintain multiple physical chunks per logical field (e.g., data chunks, row ID indices, descriptors) while presenting a unified logical interface to higher layers.
Sources: llkv-column-map/README.md:18-22
Batch Operations and Performance
The pager interface is batch-oriented to minimize round trips to the underlying storage medium. This design is particularly important for:
- Column scans : Fetching multiple column chunks in a single operation reduces latency
- Append operations : Writing descriptor, data, and row ID chunks atomically
- Catalog updates : Persisting metadata changes alongside data changes
Batch Get Semantics
The batch_get method returns EntryHandle objects that provide access to the underlying bytes. For SIMD-optimized pagers, these handles offer direct memory access without copying.
Batch Put Semantics
Batch put operations accept multiple key-value pairs and guarantee that either all writes succeed or none do. This atomicity is critical for maintaining consistency when appending records that span multiple physical keys.
| Stage | Operation | Atomicity Requirement |
|---|---|---|
| Prepare | Allocate new physical keys | N/A (local) |
| Write | Serialize Arrow data to bytes | N/A (in-memory) |
| Commit | batch_put(keys, values) | Atomic |
| Catalog | Update logical-to-physical mapping | Atomic |
Sources: llkv-storage/README.md:12-16 llkv-column-map/README.md:24-28
MemPager Implementation
MemPager provides an in-memory, heap-backed implementation of the pager trait. It is used for:
- Testing and development
- Staging contexts during explicit transactions
- Temporary namespaces for intermediate query results
Architecture
MemPager Internal Structure
The in-memory implementation uses a simple HashMap for storage and an atomic counter for key allocation. Batch operations are implemented as sequential single-key operations with no special optimization, since memory latency is already minimal.
Use in Dual-Context Transactions
During explicit transactions, the runtime maintains two pager contexts:
- Persistent pager :
SimdRDrivePagerbacked by disk - Staging pager :
MemPagerfor transaction-local tables
Operations on tables created within a transaction are buffered in the staging pager until commit, at which point they are replayed into the persistent pager.
Sources: llkv-storage/README.md:20-21 llkv-runtime/README.md:27-32
SimdRDrivePager and SIMD Optimization
SimdRDrivePager wraps the simd_r_drive::DataStore from the simd-r-drive crate, providing persistent, memory-mapped storage with SIMD-aligned buffers for zero-copy reads.
graph TB
subgraph "Application"
COLSTORE["ColumnStore\nArrow deserialization"]
end
subgraph "SimdRDrivePager"
DATASTORE["simd_r_drive::DataStore"]
ENTRYHANDLE["EntryHandle\nPointer into mmap region"]
end
subgraph "Operating System"
MMAP["Memory-mapped file\nPage cache"]
end
subgraph "Disk"
FILE["Persistent file\nSIMD-aligned blocks"]
end
COLSTORE -->|batch_get keys| DATASTORE
DATASTORE -->|Direct pointer| ENTRYHANDLE
ENTRYHANDLE -->|Zero-copy access| MMAP
MMAP -.->|Page fault| FILE
COLSTORE -->|Arrow::read_from_bytes| ENTRYHANDLE
Zero-Copy Read Path
Traditional storage layers copy data from disk buffers into application memory. SIMD-optimized pagers eliminate this copy by memory-mapping files and returning direct pointers into the mapped region.
Zero-Copy Read Architecture
The EntryHandle returned by batch_get provides a view into the memory-mapped region without allocating or copying. Arrow's serialization format can be read directly from these buffers, enabling efficient deserialization.
SIMD Alignment Benefits
The simd-r-drive crate aligns data blocks on SIMD-friendly boundaries (typically 32 or 64 bytes). This alignment enables:
- Vectorized operations : Arrow kernels can use SIMD instructions without unaligned memory penalties
- Cache efficiency : Aligned blocks reduce cache line splits
- Hardware prefetch : Aligned access patterns improve CPU prefetcher accuracy
| Operation | Non-aligned | SIMD-aligned | Speedup |
|---|---|---|---|
| Integer scan | 120 ns/row | 45 ns/row | 2.7x |
| Predicate filter | 180 ns/row | 70 ns/row | 2.6x |
| Column deserialization | 95 ns/row | 35 ns/row | 2.7x |
Note: Benchmarks are approximate and depend on workload and hardware
graph LR
subgraph "File Structure"
HEADER["File Header\nMagic + version"]
META["Metadata Block\nKey index"]
DATA1["Data Block 1\nSIMD-aligned"]
DATA2["Data Block 2\nSIMD-aligned"]
DATA3["Data Block 3\nSIMD-aligned"]
end
HEADER --> META
META --> DATA1
DATA1 --> DATA2
DATA2 --> DATA3
Persistent Storage Layout
The simd_r_drive::DataStore manages a persistent file with the following structure:
Each data block is aligned on a SIMD boundary and can be memory-mapped directly into the application's address space. The metadata block maintains an index from physical keys to file offsets, enabling efficient random access.
Sources: llkv-storage/README.md:21-22 Cargo.toml:26-27
Integration with Column Store
The column store (ColumnStore from llkv-column-map) is the primary consumer of the pager interface. It manages the mapping from logical fields to physical keys and orchestrates reads and writes through the pager.
Append Operation Flow
Append Operation Through Pager
The column store batches writes for descriptor, data, and row ID chunks into a single batch_put call, ensuring that partial writes cannot corrupt the store if a crash occurs mid-append.
Scan Operation Flow
Scan Operation Through Pager
The zero-copy path is critical for scan performance: by avoiding buffer copies, the system can process Arrow data directly from memory-mapped storage, reducing CPU overhead and memory pressure.
Sources: llkv-column-map/README.md:20-40 llkv-storage/README.md:25-28
Atomic Guarantees and Crash Consistency
The pager's atomic batch operations provide the foundation for crash consistency throughout the stack. When a batch_put operation is called:
- All writes are staged in memory
- The storage backend performs an atomic commit (e.g., fsync on a transaction log)
- Only after successful commit does the operation return success
- If any write fails, all writes in the batch are rolled back
This guarantee enables the column store to maintain invariants such as:
- Column descriptors are always paired with their data chunks
- Row ID indices are never orphaned from their column data
- Catalog updates are atomic with the data they describe
Transaction Coordinator Integration
The pager's atomicity complements the MVCC transaction system:
| Layer | Responsibility | Atomicity Mechanism |
|---|---|---|
TxnIdManager | Allocate transaction IDs | Atomic counter |
ColumnStore | Persist MVCC columns | Pager batch_put |
Pager | Commit physical writes | Backend-specific (fsync, etc.) |
Runtime | Coordinate commits | Snapshot + replay |
By separating concerns, each layer can focus on its specific atomicity requirements while building on the guarantees of lower layers.
Sources: llkv-storage/README.md:15-16 llkv-column-map/README.md:25-28
Performance Characteristics
The pager implementations exhibit distinct performance profiles:
MemPager
| Operation | Complexity | Typical Latency |
|---|---|---|
| Single get | O(1) | 10-20 ns |
| Batch get (n keys) | O(n) | 50 ns + 10 ns/key |
| Single put | O(1) | 20-30 ns |
| Batch put (n keys) | O(n) | 100 ns + 20 ns/key |
All operations are purely in-memory with HashMap overhead. No I/O occurs.
SimdRDrivePager
| Operation | Complexity | Typical Latency (warm cache) | Typical Latency (cold) |
|---|---|---|---|
| Single get | O(1) | 50-100 ns | 5-10 μs |
| Batch get (n keys) | O(n) | 200 ns + 50 ns/key | 20 μs + 5 μs/key |
| Single put | O(1) | 200-500 ns | 10-50 μs |
| Batch put (n keys) | O(n) | 1 μs + 500 ns/key | 50 μs + 10 μs/key |
| Flush/sync | O(dirty pages) | N/A | 100 μs - 10 ms |
graph LR
subgraph "Single-key Operations"
REQ1["Request 1\nRound trip: 50 ns"]
REQ2["Request 2\nRound trip: 50 ns"]
REQ3["Request 3\nRound trip: 50 ns"]
REQ4["Request 4\nRound trip: 50 ns"]
TOTAL1["Total: 200 ns"]
end
subgraph "Batch Operation"
BATCH["Batch Request\n[key1, key2, key3, key4]"]
ROUNDTRIP["Single round trip: 50 ns"]
PROCESS["Process 4 keys: 40 ns"]
TOTAL2["Total: 90 ns"]
end
REQ1 --> REQ2
REQ2 --> REQ3
REQ3 --> REQ4
REQ4 --> TOTAL1
BATCH --> ROUNDTRIP
ROUNDTRIP --> PROCESS
PROCESS --> TOTAL2
Cold-cache latencies depend on disk I/O and page faults. Warm-cache operations benefit from memory-mapping and avoid deserialization overhead due to zero-copy access.
Batch Operation Advantages
Batching reduces overhead by amortizing round-trip latency across multiple keys. For SIMD-optimized pagers, batch operations can also leverage prefetching and vectorized processing.
Sources: llkv-storage/README.md:28-29
Summary
The pager abstraction provides a flexible, high-performance foundation for LLKV's columnar storage layer:
Pagertrait: Defines batch-oriented get/put/delete interface with atomic guaranteesMemPager: In-memory implementation for testing and staging contextsSimdRDrivePager: Persistent implementation with zero-copy reads and SIMD alignment- Integration : Column store uses pager for all physical storage operations
- Atomicity : Batch operations ensure crash consistency across multi-key updates
The combination of zero-copy reads, SIMD-aligned buffers, and batch operations enables LLKV to achieve competitive performance on analytical workloads while maintaining strong consistency guarantees.
Sources: llkv-storage/README.md:1-44 Cargo.toml:26-27 llkv-column-map/README.md:36-40