This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Overview
Relevant source files
- Cargo.lock
- Cargo.toml
- README.md
- demos/llkv-sql-pong-demo/src/main.rs
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-csv/README.md
- llkv-expr/README.md
- llkv-join/README.md
- llkv-runtime/README.md
- llkv-sql/src/tpch.rs
- llkv-storage/README.md
- llkv-table/README.md
- llkv-tpch/.gitignore
- llkv-tpch/Cargo.toml
- llkv-tpch/DRAFT-PRE-FINAL.md
This document introduces the LLKV database system, its architectural principles, and the relationships between its constituent crates. It provides a high-level map of how SQL queries flow through the system from parsing to storage, and explains the role of Apache Arrow as the universal data interchange format.
For details on individual subsystems, see:
- Workspace organization and crate dependencies: Workspace and Crates
- SQL query processing pipeline: SQL Query Processing Pipeline
- Arrow integration details: Data Formats and Arrow Integration
What is LLKV
LLKV is an experimental SQL database implemented as a Rust workspace of 15 crates. It layers SQL processing, streaming query execution, and MVCC transaction management on top of pluggable key-value storage backends. The system uses Apache Arrow RecordBatch as its primary data representation at every layer, enabling zero-copy operations and SIMD-friendly columnar processing.
The architecture separates concerns into six distinct layers:
- SQL Interface
- Query Planning
- Runtime and Orchestration
- Query Execution
- Table and Metadata Management
- Storage and I/O
Each layer communicates through well-defined interfaces centered on Arrow data structures.
Sources: README.md:1-107 Cargo.toml:1-89
Core Design Principles
LLKV's design reflects several intentional trade-offs:
| Principle | Implementation | Rationale |
|---|---|---|
| Arrow-Native | RecordBatch is the universal data format across all layers | Enables zero-copy operations, SIMD vectorization, and interoperability with the Arrow ecosystem |
| Synchronous Execution | Work-stealing via Rayon instead of async runtime | Reduces scheduler overhead for individual queries while remaining embeddable in Tokio contexts |
| Layered Modularity | 15 independent crates with clear boundaries | Allows independent evolution and testing of subsystems |
| MVCC Throughout | System metadata columns (row_id, created_by, deleted_by) injected at storage layer | Provides snapshot isolation without write locks |
| Storage Abstraction | Pager trait with multiple implementations | Supports both in-memory and persistent backends with zero-copy reads |
| Compiled Predicates | Expressions compile to stack-based bytecode | Enables efficient vectorized evaluation without interpretation overhead |
Sources: README.md:36-42 llkv-storage/README.md:12-22 llkv-expr/README.md:66-72
Workspace Structure
The LLKV workspace consists of 15 crates organized by layer:
| Layer | Crate | Primary Responsibility |
|---|---|---|
| SQL Interface | llkv-sql | SQL parsing, dialect normalization, INSERT buffering |
| Query Planning | llkv-plan | Typed query plan structures (SelectPlan, InsertPlan, etc.) |
llkv-expr | Expression AST (Expr, ScalarExpr) | |
| Runtime | llkv-runtime | Session management, MVCC orchestration, plan execution |
llkv-transaction | Transaction ID allocation, snapshot management | |
| Execution | llkv-executor | Streaming query evaluation |
llkv-aggregate | Aggregate function implementation (SUM, COUNT, AVG, etc.) | |
llkv-join | Join algorithms (hash join with specialized fast paths) | |
| Table/Metadata | llkv-table | Schema-aware table abstraction, system catalog |
llkv-column-map | Column-oriented storage, logical-to-physical key mapping | |
| Storage | llkv-storage | Pager trait, MemPager, SimdRDrivePager |
| Utilities | llkv-csv | CSV ingestion helper |
llkv-result | Result type definitions | |
llkv-test-utils | Testing utilities | |
llkv-slt-tester | SQL Logic Test harness |
Sources: Cargo.toml:9-26 Cargo.toml:67-87 README.md:44-53
Component Architecture and Data Flow
The following diagram shows the major components and how Arrow RecordBatch flows through the system:
Sources: README.md:44-72 Cargo.toml:67-87
graph TB
User["User / Application"]
subgraph "llkv-sql Crate"
SqlEngine["SqlEngine"]
Preprocessor["SQL Preprocessor"]
Parser["sqlparser"]
InsertBuffer["InsertBuffer"]
end
subgraph "llkv-plan Crate"
SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
CreateTablePlan["CreateTablePlan"]
OtherPlans["Other Plan Types"]
end
subgraph "llkv-expr Crate"
Expr["Expr<F>"]
ScalarExpr["ScalarExpr<F>"]
end
subgraph "llkv-runtime Crate"
RuntimeContext["RuntimeContext"]
SessionHandle["SessionHandle"]
TxnSnapshot["TransactionSnapshot"]
end
subgraph "llkv-executor Crate"
TableExecutor["TableExecutor"]
StreamingOps["Streaming Operators"]
end
subgraph "llkv-table Crate"
Table["Table"]
SysCatalog["SysCatalog (Table 0)"]
FieldId["FieldId Resolution"]
end
subgraph "llkv-column-map Crate"
ColumnStore["ColumnStore"]
LogicalFieldId["LogicalFieldId"]
PhysicalKey["PhysicalKey Mapping"]
end
subgraph "llkv-storage Crate"
Pager["Pager Trait"]
MemPager["MemPager"]
SimdPager["SimdRDrivePager"]
end
ArrowBatch["Arrow RecordBatch\n(Universal Format)"]
User -->|SQL String| SqlEngine
SqlEngine --> Preprocessor
Preprocessor --> Parser
Parser -->|AST| SelectPlan
Parser -->|AST| InsertPlan
Parser -->|AST| CreateTablePlan
SelectPlan --> Expr
InsertPlan --> ScalarExpr
SelectPlan --> RuntimeContext
InsertPlan --> RuntimeContext
CreateTablePlan --> RuntimeContext
RuntimeContext --> SessionHandle
RuntimeContext --> TxnSnapshot
RuntimeContext --> TableExecutor
RuntimeContext --> Table
TableExecutor --> StreamingOps
StreamingOps --> Table
Table --> SysCatalog
Table --> FieldId
Table --> ColumnStore
ColumnStore --> LogicalFieldId
ColumnStore --> PhysicalKey
ColumnStore --> Pager
Pager --> MemPager
Pager --> SimdPager
Table -.->|Produces/Consumes| ArrowBatch
StreamingOps -.->|Produces/Consumes| ArrowBatch
ColumnStore -.->|Serializes/Deserializes| ArrowBatch
SqlEngine -.->|Returns| ArrowBatch
End-to-End Query Execution
This diagram traces a SELECT query from SQL text to results, showing the concrete code entities involved:
Sources: README.md:56-63 llkv-sql/README.md:1-107 llkv-runtime/README.md:33-41 llkv-table/README.md:10-25
sequenceDiagram
participant App as "Application"
participant SqlEngine as "SqlEngine::execute()"
participant Preprocessor as "preprocess_sql()"
participant Parser as "sqlparser::Parser"
participant Planner as "build_select_plan()"
participant Runtime as "RuntimeContext::execute_plan()"
participant Executor as "TableExecutor::execute()"
participant Table as "Table::scan_stream()"
participant ColStore as "ColumnStore::gather_columns()"
participant Pager as "Pager::batch_get()"
App->>SqlEngine: SELECT * FROM users WHERE age > 18
Note over SqlEngine,Preprocessor: Dialect normalization
SqlEngine->>Preprocessor: Normalize SQLite/DuckDB syntax
SqlEngine->>Parser: Parse normalized SQL
Parser-->>SqlEngine: Statement AST
SqlEngine->>Planner: Translate AST to SelectPlan
Note over Planner: Build SelectPlan with\nExpr<String> predicates
Planner-->>SqlEngine: SelectPlan
SqlEngine->>Runtime: execute_plan(SelectPlan)
Note over Runtime: Acquire TransactionSnapshot\nResolve field names to FieldId
Runtime->>Executor: execute(SelectPlan, context)
Note over Executor: Compile Expr<FieldId>\ninto EvalProgram
Executor->>Table: scan_stream(fields, predicate)
Note over Table: Apply MVCC filtering\nPush down predicates
Table->>ColStore: gather_columns(LogicalFieldId[])
Note over ColStore: Map LogicalFieldId\nto PhysicalKey
ColStore->>Pager: batch_get(PhysicalKey[])
Pager-->>ColStore: EntryHandle[] (zero-copy)
Note over ColStore: Deserialize Arrow buffers\nApply row_id filtering
ColStore-->>Table: RecordBatch
Table-->>Executor: RecordBatch
Note over Executor: Apply projections\nEvaluate expressions
Executor-->>Runtime: RecordBatch stream
Runtime-->>SqlEngine: Vec<RecordBatch>
SqlEngine-->>App: Query results
Key Features
MVCC Transaction Management
LLKV implements multi-version concurrency control with snapshot isolation:
- Every table includes three system columns:
row_id(monotonic),created_by(transaction ID), anddeleted_by(transaction ID or NULL) TxnIdManagerinllkv-transactionallocates monotonic transaction IDs and tracks commit watermarksTransactionSnapshotcaptures a consistent view of the database at transaction start- Auto-commit statements use
TXN_ID_AUTO_COMMIT = 1 - Explicit transactions maintain both persistent and staging contexts for isolation
Sources: README.md:64-72 llkv-runtime/README.md:20-32 llkv-table/README.md:32-35
Zero-Copy Storage Pipeline
The storage layer supports zero-copy reads when backed by SimdRDrivePager:
ColumnStoremapsLogicalFieldIdtoPhysicalKeyPager::batch_get()returnsEntryHandlewrappers around memory-mapped regions- Arrow arrays are deserialized directly from the mapped memory without intermediate copies
- SIMD-aligned buffers enable vectorized predicate evaluation
Sources: llkv-column-map/README.md:19-41 llkv-storage/README.md:12-28 README.md:12-13
Compiled Expression Evaluation
Predicates and scalar expressions compile to stack-based bytecode:
Expr<FieldId>structures inllkv-exprrepresent logical predicatesProgramCompilerinllkv-tabletranslates expressions intoEvalProgrambytecodeDomainProgramtracks which row IDs satisfy predicates- Bytecode evaluation uses stack-based execution for efficient vectorized operations
Sources: llkv-expr/README.md:1-88 llkv-table/README.md:10-18 README.md:46-53
SQL Logic Test Infrastructure
LLKV includes comprehensive SQL correctness testing:
llkv-slt-testerwraps thesqllogictestframeworkLlkvSltRunnerdiscovers.sltfiles and executes test suites- Supports remote test fetching via
.slturlpointer files - Environment variable
LLKV_SLT_STATS=1enables detailed query statistics - CI runs the full suite on Linux, macOS, and Windows
Sources: README.md:75-77 llkv-slt-tester/README.md:1-57
Getting Started
The main entry point is the llkv crate, which re-exports the SQL interface:
For persistent storage, use SimdRDrivePager instead of MemPager. For transaction control beyond auto-commit, obtain a SessionHandle via SqlEngine::session().
Sources: README.md:14-33 demos/llkv-sql-pong-demo/src/main.rs:386-393
Relationship to Related Projects
LLKV shares architectural concepts with Apache DataFusion but differs in several key areas:
| Aspect | LLKV | DataFusion |
|---|---|---|
| Execution Model | Synchronous with Rayon work-stealing | Async with Tokio runtime |
| Storage Backend | Custom key-value via Pager trait | Parquet, CSV, object stores |
| SQL Parser | sqlparser crate (same) | sqlparser crate |
| Data Format | Arrow RecordBatch (same) | Arrow RecordBatch |
| Maturity | Alpha / Experimental | Production-ready |
| Transaction Support | MVCC snapshot isolation | Read-only (no writes) |
LLKV deliberately avoids the DataFusion task scheduler to explore trade-offs in a synchronous execution model, while maintaining compatibility with the same SQL parser and Arrow memory layout.
Sources: README.md:36-42 README.md:8-13