Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Architecture

Relevant source files

This document describes the overall system architecture of LLKV, explaining its layered design, core abstractions, and how components interact to provide SQL functionality over key-value storage. For details on individual crates and their dependencies, see Workspace and Crates. For the end-to-end query execution flow, see SQL Query Processing Pipeline. For Arrow integration specifics, see Data Formats and Arrow Integration.

Layered Design

LLKV is organized into six architectural layers, each with focused responsibilities. Higher layers depend only on lower layers, and all layers communicate through Apache Arrow RecordBatch structures as the universal data interchange format.

Sources: Cargo.toml:1-89 README.md:44-53 llkv-sql/README.md:1-10 llkv-runtime/README.md:1-10 llkv-executor/README.md:1-10 llkv-table/README.md:1-18 llkv-storage/README.md:1-17

graph TB
    subgraph L1["Layer 1: User Interface"]
SQL["SQL Queries"]
REPL["CLI REPL"]
DEMO["Demo Applications"]
BENCH["TPC-H Benchmarks"]
end
    
    subgraph L2["Layer 2: SQL Processing"]
SQLENG["SqlEngine\nllkv-sql"]
PLAN["Query Plans\nllkv-plan"]
EXPR["Expression AST\nllkv-expr"]
end
    
    subgraph L3["Layer 3: Runtime & Orchestration"]
RUNTIME["RuntimeContext\nllkv-runtime"]
TXNMGR["TxnIdManager\nllkv-transaction"]
CATALOG["CatalogManager\nllkv-runtime"]
end
    
    subgraph L4["Layer 4: Query Execution"]
EXECUTOR["TableExecutor\nllkv-executor"]
AGG["Accumulators\nllkv-aggregate"]
JOIN["HashJoinExecutor\nllkv-join"]
end
    
    subgraph L5["Layer 5: Data Management"]
TABLE["Table\nllkv-table"]
COLMAP["ColumnStore\nllkv-column-map"]
SYSCAT["SysCatalog\nllkv-table"]
end
    
    subgraph L6["Layer 6: Storage"]
PAGER["Pager trait\nllkv-storage"]
MEMPAGER["MemPager"]
SIMDPAGER["SimdRDrivePager"]
end
    
 
   SQL --> SQLENG
 
   REPL --> SQLENG
 
   DEMO --> SQLENG
 
   BENCH --> SQLENG
    
 
   SQLENG --> PLAN
 
   SQLENG --> EXPR
 
   PLAN --> RUNTIME
    
 
   RUNTIME --> TXNMGR
 
   RUNTIME --> CATALOG
 
   RUNTIME --> EXECUTOR
 
   RUNTIME --> TABLE
    
 
   EXECUTOR --> AGG
 
   EXECUTOR --> JOIN
 
   EXECUTOR --> TABLE
    
 
   TABLE --> COLMAP
 
   TABLE --> SYSCAT
 
   COLMAP --> PAGER
 
   SYSCAT --> COLMAP
    
 
   PAGER --> MEMPAGER
 
   PAGER --> SIMDPAGER

Core Architectural Principles

Arrow-Native Data Flow

All data flowing between components is represented as Apache Arrow RecordBatch structures. This enables:

  • Zero-copy operations : Arrow buffers can be passed between layers without serialization
  • SIMD-friendly processing : Columnar layout supports vectorized operations
  • Consistent memory model : All layers use the same in-memory representation

The RecordBatch abstraction appears at every boundary: SQL parsing produces plans that operate on batches, the executor streams batches, tables persist batches, and the column store chunks batches for storage.

Sources: README.md:10-12 README.md:22-23 llkv-table/README.md:10-11 llkv-column-map/README.md:10-14

Storage Abstraction Through Pager Trait

The Pager trait in llkv-storage provides a pluggable storage backend interface:

Pager TypeUse CaseKey Properties
MemPagerTests, temporary namespaces, staging contextsHeap-backed, fast
SimdRDrivePagerPersistent storageZero-copy reads, SIMD-aligned, memory-mapped

Both implementations satisfy the same batch get/put contract, allowing higher layers to remain storage-agnostic. The runtime uses dual-pager contexts: persistent storage for committed tables and in-memory staging for uncommitted transaction objects.

Sources: llkv-storage/README.md:12-22 llkv-runtime/README.md:26-32

MVCC Integration

Multi-version concurrency control (MVCC) is implemented as system metadata columns injected at the table layer:

  • row_id: Monotonic row identifier
  • created_by: Transaction ID that created this row version
  • deleted_by: Transaction ID that deleted this row (or NULL if active)

These columns are stored alongside user data in ColumnStore, enabling snapshot isolation without separate version chains. The TxnIdManager in llkv-transaction allocates monotonic transaction IDs and tracks commit watermarks. The runtime enforces visibility rules during scans by filtering based on snapshot transaction IDs.

Sources: llkv-table/README.md:13-17 llkv-runtime/README.md:19-25 llkv-column-map/README.md:27-28

Component Interaction Patterns

Query Execution Flow

Sources: README.md:56-62 llkv-sql/README.md:15-20 llkv-runtime/README.md:12-17 llkv-executor/README.md:12-17 llkv-table/README.md:19-25 llkv-column-map/README.md:24-28

Dual-Context Transaction Management

The runtime maintains two execution contexts during explicit transactions. The persistent context operates on committed tables directly, while the staging context buffers newly created tables in memory. On commit, staged operations are replayed into the persistent context after the TxnIdManager confirms no conflicts and advances the commit watermark. On rollback, the staging context is dropped and all uncommitted work is discarded.

Sources: llkv-runtime/README.md:26-32 llkv-runtime/README.md:12-17

Column Storage and Logical Field Mapping

The ColumnStore maintains a mapping from LogicalFieldId (namespace + table ID + field ID) to physical storage keys. Each logical field has a descriptor chunk (metadata about the column), data chunks (Arrow-serialized column arrays), and row ID chunks (per-chunk row identifiers for filtering). This three-level mapping isolates user data from system metadata while allowing efficient scans and appends.

Sources: llkv-column-map/README.md:18-23 llkv-table/README.md:13-17 llkv-column-map/README.md:10-17

Key Abstractions

SqlEngine

Entry point for SQL execution. Located in llkv-sql, it:

  • Preprocesses SQL for dialect compatibility (DuckDB, SQLite quirks)
  • Parses with sqlparser crate
  • Batches compatible INSERT statements
  • Delegates execution to RuntimeContext
  • Returns ExecutionResult enums

Sources: llkv-sql/README.md:1-20 README.md:56-59

RuntimeContext

Orchestration layer in llkv-runtime that:

  • Executes all statement types (DDL, DML, queries)
  • Manages transaction snapshots and MVCC injection
  • Coordinates between table layer and executor
  • Maintains catalog manager for schema metadata
  • Implements dual-context staging for transactions

Sources: llkv-runtime/README.md:12-25 llkv-runtime/README.md:34-40

Table and ColumnStore

Table in llkv-table provides schema-aware APIs:

  • Schema validation on CREATE TABLE and append
  • MVCC column injection (row_id, created_by, deleted_by)
  • Streaming scan API with predicate pushdown
  • Integration with system catalog (table 0)

ColumnStore in llkv-column-map handles physical storage:

  • Arrow-serialized column chunks
  • Logical-to-physical key mapping
  • Append pipeline with row-id sorting and last-writer-wins semantics
  • Atomic multi-key commits through pager

Sources: llkv-table/README.md:12-25 llkv-column-map/README.md:12-28

TableExecutor

Execution engine in llkv-executor that:

  • Streams RecordBatch results from table scans
  • Evaluates projections, filters, and scalar expressions
  • Coordinates with llkv-aggregate for aggregation
  • Coordinates with llkv-join for join operations
  • Applies MVCC visibility filters during scans

Sources: llkv-executor/README.md:1-17 README.md:60-61

Pager Trait

Storage abstraction in llkv-storage that:

  • Exposes batch get/put over (PhysicalKey, EntryHandle) pairs
  • Supports atomic multi-key updates
  • Enables zero-copy reads when backed by memory-mapped storage
  • Implementations: MemPager (heap), SimdRDrivePager (persistent)

Sources: llkv-storage/README.md:12-22 README.md:11-12

Crate Organization

The workspace contains 15 crates organized by layer:

LayerCratesResponsibilities
SQL Processingllkv-sql, llkv-plan, llkv-exprParse SQL, build typed plans, represent expressions
Runtimellkv-runtime, llkv-transactionOrchestrate execution, manage MVCC and sessions
Executionllkv-executor, llkv-aggregate, llkv-joinStream results, compute aggregates, evaluate joins
Data Managementllkv-table, llkv-column-mapSchema-aware tables, columnar storage
Storagellkv-storagePager trait and implementations
Supportingllkv-result, llkv-csv, llkv-test-utilsResult types, CSV ingestion, test utilities
Testingllkv-slt-tester, llkv-tpchSQL Logic Tests, TPC-H benchmarks
Entry PointsllkvMain library and CLI

For detailed dependency graphs and crate responsibilities, see Workspace and Crates.

Sources: Cargo.toml:67-87 README.md:44-53

Execution Model

Synchronous with Work-Stealing

LLKV defaults to synchronous execution using Rayon for parallelism:

  • Query execution is synchronous, not async
  • Rayon work-stealing parallelizes scans and projections
  • Crossbeam channels coordinate between threads
  • Embeds cleanly inside Tokio when needed (e.g., SLT test runner)

This design minimizes scheduler overhead for individual queries while maintaining high throughput for concurrent workloads.

Sources: README.md:38-41 llkv-column-map/README.md:32-34

Streaming Results

Queries produce results incrementally:

  • TableExecutor yields fixed-size RecordBatches
  • No full result set materialization
  • Callers process batches via callback or iterator
  • Join and aggregate operators buffer only necessary state

Sources: llkv-table/README.md:24-25 llkv-executor/README.md:14-17 llkv-join/README.md:19-22

Data Lifecycle

Write Path

  1. User submits INSERT or UPDATE through SqlEngine
  2. RuntimeContext validates schema and injects MVCC columns
  3. Table::append validates RecordBatch schema
  4. ColumnStore::append sorts by row_id, rewrites conflicts
  5. Pager::batch_put commits Arrow-serialized chunks atomically
  6. Transaction manager advances commit watermark

Read Path

  1. User submits SELECT through SqlEngine
  2. RuntimeContext acquires transaction snapshot
  3. TableExecutor creates scan with projection and filter
  4. Table::scan_stream initiates ColumnStream
  5. ColumnStore fetches chunks via Pager::batch_get (zero-copy)
  6. MVCC filtering applied using snapshot visibility rules
  7. Executor evaluates expressions and streams RecordBatches to caller

Sources: README.md:56-62 llkv-column-map/README.md:24-28 llkv-table/README.md:19-25