This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Architecture
Relevant source files
- Cargo.lock
- Cargo.toml
- README.md
- demos/llkv-sql-pong-demo/src/main.rs
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-csv/README.md
- llkv-expr/README.md
- llkv-join/README.md
- llkv-runtime/README.md
- llkv-sql/src/tpch.rs
- llkv-storage/README.md
- llkv-table/README.md
- llkv-tpch/.gitignore
- llkv-tpch/Cargo.toml
- llkv-tpch/DRAFT-PRE-FINAL.md
This document describes the overall system architecture of LLKV, explaining its layered design, core abstractions, and how components interact to provide SQL functionality over key-value storage. For details on individual crates and their dependencies, see Workspace and Crates. For the end-to-end query execution flow, see SQL Query Processing Pipeline. For Arrow integration specifics, see Data Formats and Arrow Integration.
Layered Design
LLKV is organized into six architectural layers, each with focused responsibilities. Higher layers depend only on lower layers, and all layers communicate through Apache Arrow RecordBatch structures as the universal data interchange format.
Sources: Cargo.toml:1-89 README.md:44-53 llkv-sql/README.md:1-10 llkv-runtime/README.md:1-10 llkv-executor/README.md:1-10 llkv-table/README.md:1-18 llkv-storage/README.md:1-17
graph TB
subgraph L1["Layer 1: User Interface"]
SQL["SQL Queries"]
REPL["CLI REPL"]
DEMO["Demo Applications"]
BENCH["TPC-H Benchmarks"]
end
subgraph L2["Layer 2: SQL Processing"]
SQLENG["SqlEngine\nllkv-sql"]
PLAN["Query Plans\nllkv-plan"]
EXPR["Expression AST\nllkv-expr"]
end
subgraph L3["Layer 3: Runtime & Orchestration"]
RUNTIME["RuntimeContext\nllkv-runtime"]
TXNMGR["TxnIdManager\nllkv-transaction"]
CATALOG["CatalogManager\nllkv-runtime"]
end
subgraph L4["Layer 4: Query Execution"]
EXECUTOR["TableExecutor\nllkv-executor"]
AGG["Accumulators\nllkv-aggregate"]
JOIN["HashJoinExecutor\nllkv-join"]
end
subgraph L5["Layer 5: Data Management"]
TABLE["Table\nllkv-table"]
COLMAP["ColumnStore\nllkv-column-map"]
SYSCAT["SysCatalog\nllkv-table"]
end
subgraph L6["Layer 6: Storage"]
PAGER["Pager trait\nllkv-storage"]
MEMPAGER["MemPager"]
SIMDPAGER["SimdRDrivePager"]
end
SQL --> SQLENG
REPL --> SQLENG
DEMO --> SQLENG
BENCH --> SQLENG
SQLENG --> PLAN
SQLENG --> EXPR
PLAN --> RUNTIME
RUNTIME --> TXNMGR
RUNTIME --> CATALOG
RUNTIME --> EXECUTOR
RUNTIME --> TABLE
EXECUTOR --> AGG
EXECUTOR --> JOIN
EXECUTOR --> TABLE
TABLE --> COLMAP
TABLE --> SYSCAT
COLMAP --> PAGER
SYSCAT --> COLMAP
PAGER --> MEMPAGER
PAGER --> SIMDPAGER
Core Architectural Principles
Arrow-Native Data Flow
All data flowing between components is represented as Apache Arrow RecordBatch structures. This enables:
- Zero-copy operations : Arrow buffers can be passed between layers without serialization
- SIMD-friendly processing : Columnar layout supports vectorized operations
- Consistent memory model : All layers use the same in-memory representation
The RecordBatch abstraction appears at every boundary: SQL parsing produces plans that operate on batches, the executor streams batches, tables persist batches, and the column store chunks batches for storage.
Sources: README.md:10-12 README.md:22-23 llkv-table/README.md:10-11 llkv-column-map/README.md:10-14
Storage Abstraction Through Pager Trait
The Pager trait in llkv-storage provides a pluggable storage backend interface:
| Pager Type | Use Case | Key Properties |
|---|---|---|
MemPager | Tests, temporary namespaces, staging contexts | Heap-backed, fast |
SimdRDrivePager | Persistent storage | Zero-copy reads, SIMD-aligned, memory-mapped |
Both implementations satisfy the same batch get/put contract, allowing higher layers to remain storage-agnostic. The runtime uses dual-pager contexts: persistent storage for committed tables and in-memory staging for uncommitted transaction objects.
Sources: llkv-storage/README.md:12-22 llkv-runtime/README.md:26-32
MVCC Integration
Multi-version concurrency control (MVCC) is implemented as system metadata columns injected at the table layer:
row_id: Monotonic row identifiercreated_by: Transaction ID that created this row versiondeleted_by: Transaction ID that deleted this row (orNULLif active)
These columns are stored alongside user data in ColumnStore, enabling snapshot isolation without separate version chains. The TxnIdManager in llkv-transaction allocates monotonic transaction IDs and tracks commit watermarks. The runtime enforces visibility rules during scans by filtering based on snapshot transaction IDs.
Sources: llkv-table/README.md:13-17 llkv-runtime/README.md:19-25 llkv-column-map/README.md:27-28
Component Interaction Patterns
Query Execution Flow
Sources: README.md:56-62 llkv-sql/README.md:15-20 llkv-runtime/README.md:12-17 llkv-executor/README.md:12-17 llkv-table/README.md:19-25 llkv-column-map/README.md:24-28
Dual-Context Transaction Management
The runtime maintains two execution contexts during explicit transactions. The persistent context operates on committed tables directly, while the staging context buffers newly created tables in memory. On commit, staged operations are replayed into the persistent context after the TxnIdManager confirms no conflicts and advances the commit watermark. On rollback, the staging context is dropped and all uncommitted work is discarded.
Sources: llkv-runtime/README.md:26-32 llkv-runtime/README.md:12-17
Column Storage and Logical Field Mapping
The ColumnStore maintains a mapping from LogicalFieldId (namespace + table ID + field ID) to physical storage keys. Each logical field has a descriptor chunk (metadata about the column), data chunks (Arrow-serialized column arrays), and row ID chunks (per-chunk row identifiers for filtering). This three-level mapping isolates user data from system metadata while allowing efficient scans and appends.
Sources: llkv-column-map/README.md:18-23 llkv-table/README.md:13-17 llkv-column-map/README.md:10-17
Key Abstractions
SqlEngine
Entry point for SQL execution. Located in llkv-sql, it:
- Preprocesses SQL for dialect compatibility (DuckDB, SQLite quirks)
- Parses with
sqlparsercrate - Batches compatible
INSERTstatements - Delegates execution to
RuntimeContext - Returns
ExecutionResultenums
Sources: llkv-sql/README.md:1-20 README.md:56-59
RuntimeContext
Orchestration layer in llkv-runtime that:
- Executes all statement types (DDL, DML, queries)
- Manages transaction snapshots and MVCC injection
- Coordinates between table layer and executor
- Maintains catalog manager for schema metadata
- Implements dual-context staging for transactions
Sources: llkv-runtime/README.md:12-25 llkv-runtime/README.md:34-40
Table and ColumnStore
Table in llkv-table provides schema-aware APIs:
- Schema validation on
CREATE TABLEand append - MVCC column injection (
row_id,created_by,deleted_by) - Streaming scan API with predicate pushdown
- Integration with system catalog (table 0)
ColumnStore in llkv-column-map handles physical storage:
- Arrow-serialized column chunks
- Logical-to-physical key mapping
- Append pipeline with row-id sorting and last-writer-wins semantics
- Atomic multi-key commits through pager
Sources: llkv-table/README.md:12-25 llkv-column-map/README.md:12-28
TableExecutor
Execution engine in llkv-executor that:
- Streams
RecordBatchresults from table scans - Evaluates projections, filters, and scalar expressions
- Coordinates with
llkv-aggregatefor aggregation - Coordinates with
llkv-joinfor join operations - Applies MVCC visibility filters during scans
Sources: llkv-executor/README.md:1-17 README.md:60-61
Pager Trait
Storage abstraction in llkv-storage that:
- Exposes batch
get/putover(PhysicalKey, EntryHandle)pairs - Supports atomic multi-key updates
- Enables zero-copy reads when backed by memory-mapped storage
- Implementations:
MemPager(heap),SimdRDrivePager(persistent)
Sources: llkv-storage/README.md:12-22 README.md:11-12
Crate Organization
The workspace contains 15 crates organized by layer:
| Layer | Crates | Responsibilities |
|---|---|---|
| SQL Processing | llkv-sql, llkv-plan, llkv-expr | Parse SQL, build typed plans, represent expressions |
| Runtime | llkv-runtime, llkv-transaction | Orchestrate execution, manage MVCC and sessions |
| Execution | llkv-executor, llkv-aggregate, llkv-join | Stream results, compute aggregates, evaluate joins |
| Data Management | llkv-table, llkv-column-map | Schema-aware tables, columnar storage |
| Storage | llkv-storage | Pager trait and implementations |
| Supporting | llkv-result, llkv-csv, llkv-test-utils | Result types, CSV ingestion, test utilities |
| Testing | llkv-slt-tester, llkv-tpch | SQL Logic Tests, TPC-H benchmarks |
| Entry Points | llkv | Main library and CLI |
For detailed dependency graphs and crate responsibilities, see Workspace and Crates.
Sources: Cargo.toml:67-87 README.md:44-53
Execution Model
Synchronous with Work-Stealing
LLKV defaults to synchronous execution using Rayon for parallelism:
- Query execution is synchronous, not async
- Rayon work-stealing parallelizes scans and projections
- Crossbeam channels coordinate between threads
- Embeds cleanly inside Tokio when needed (e.g., SLT test runner)
This design minimizes scheduler overhead for individual queries while maintaining high throughput for concurrent workloads.
Sources: README.md:38-41 llkv-column-map/README.md:32-34
Streaming Results
Queries produce results incrementally:
TableExecutoryields fixed-sizeRecordBatches- No full result set materialization
- Callers process batches via callback or iterator
- Join and aggregate operators buffer only necessary state
Sources: llkv-table/README.md:24-25 llkv-executor/README.md:14-17 llkv-join/README.md:19-22
Data Lifecycle
Write Path
- User submits
INSERTorUPDATEthroughSqlEngine RuntimeContextvalidates schema and injects MVCC columnsTable::appendvalidatesRecordBatchschemaColumnStore::appendsorts byrow_id, rewrites conflictsPager::batch_putcommits Arrow-serialized chunks atomically- Transaction manager advances commit watermark
Read Path
- User submits
SELECTthroughSqlEngine RuntimeContextacquires transaction snapshotTableExecutorcreates scan with projection and filterTable::scan_streaminitiatesColumnStreamColumnStorefetches chunks viaPager::batch_get(zero-copy)- MVCC filtering applied using snapshot visibility rules
- Executor evaluates expressions and streams
RecordBatches to caller
Sources: README.md:56-62 llkv-column-map/README.md:24-28 llkv-table/README.md:19-25