Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Overview

Relevant source files

This document introduces the LLKV database system, its architectural principles, and the relationships between its constituent crates. It provides a high-level map of how SQL queries flow through the system from parsing to storage, and explains the role of Apache Arrow as the universal data interchange format.

For details on individual subsystems, see:

What is LLKV

LLKV is an experimental SQL database implemented as a Rust workspace of 15 crates. It layers SQL processing, streaming query execution, and MVCC transaction management on top of pluggable key-value storage backends. The system uses Apache Arrow RecordBatch as its primary data representation at every layer, enabling zero-copy operations and SIMD-friendly columnar processing.

The architecture separates concerns into six distinct layers:

  1. SQL Interface
  2. Query Planning
  3. Runtime and Orchestration
  4. Query Execution
  5. Table and Metadata Management
  6. Storage and I/O

Each layer communicates through well-defined interfaces centered on Arrow data structures.

Sources: README.md:1-107 Cargo.toml:1-89

Core Design Principles

LLKV's design reflects several intentional trade-offs:

PrincipleImplementationRationale
Arrow-NativeRecordBatch is the universal data format across all layersEnables zero-copy operations, SIMD vectorization, and interoperability with the Arrow ecosystem
Synchronous ExecutionWork-stealing via Rayon instead of async runtimeReduces scheduler overhead for individual queries while remaining embeddable in Tokio contexts
Layered Modularity15 independent crates with clear boundariesAllows independent evolution and testing of subsystems
MVCC ThroughoutSystem metadata columns (row_id, created_by, deleted_by) injected at storage layerProvides snapshot isolation without write locks
Storage AbstractionPager trait with multiple implementationsSupports both in-memory and persistent backends with zero-copy reads
Compiled PredicatesExpressions compile to stack-based bytecodeEnables efficient vectorized evaluation without interpretation overhead

Sources: README.md:36-42 llkv-storage/README.md:12-22 llkv-expr/README.md:66-72

Workspace Structure

The LLKV workspace consists of 15 crates organized by layer:

LayerCratePrimary Responsibility
SQL Interfacellkv-sqlSQL parsing, dialect normalization, INSERT buffering
Query Planningllkv-planTyped query plan structures (SelectPlan, InsertPlan, etc.)
llkv-exprExpression AST (Expr, ScalarExpr)
Runtimellkv-runtimeSession management, MVCC orchestration, plan execution
llkv-transactionTransaction ID allocation, snapshot management
Executionllkv-executorStreaming query evaluation
llkv-aggregateAggregate function implementation (SUM, COUNT, AVG, etc.)
llkv-joinJoin algorithms (hash join with specialized fast paths)
Table/Metadatallkv-tableSchema-aware table abstraction, system catalog
llkv-column-mapColumn-oriented storage, logical-to-physical key mapping
Storagellkv-storagePager trait, MemPager, SimdRDrivePager
Utilitiesllkv-csvCSV ingestion helper
llkv-resultResult type definitions
llkv-test-utilsTesting utilities
llkv-slt-testerSQL Logic Test harness

Sources: Cargo.toml:9-26 Cargo.toml:67-87 README.md:44-53

Component Architecture and Data Flow

The following diagram shows the major components and how Arrow RecordBatch flows through the system:

Sources: README.md:44-72 Cargo.toml:67-87

graph TB
    User["User / Application"]
subgraph "llkv-sql Crate"
        SqlEngine["SqlEngine"]
Preprocessor["SQL Preprocessor"]
Parser["sqlparser"]
InsertBuffer["InsertBuffer"]
end
    
    subgraph "llkv-plan Crate"
        SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
CreateTablePlan["CreateTablePlan"]
OtherPlans["Other Plan Types"]
end
    
    subgraph "llkv-expr Crate"
        Expr["Expr<F>"]
ScalarExpr["ScalarExpr<F>"]
end
    
    subgraph "llkv-runtime Crate"
        RuntimeContext["RuntimeContext"]
SessionHandle["SessionHandle"]
TxnSnapshot["TransactionSnapshot"]
end
    
    subgraph "llkv-executor Crate"
        TableExecutor["TableExecutor"]
StreamingOps["Streaming Operators"]
end
    
    subgraph "llkv-table Crate"
        Table["Table"]
SysCatalog["SysCatalog (Table 0)"]
FieldId["FieldId Resolution"]
end
    
    subgraph "llkv-column-map Crate"
        ColumnStore["ColumnStore"]
LogicalFieldId["LogicalFieldId"]
PhysicalKey["PhysicalKey Mapping"]
end
    
    subgraph "llkv-storage Crate"
        Pager["Pager Trait"]
MemPager["MemPager"]
SimdPager["SimdRDrivePager"]
end
    
    ArrowBatch["Arrow RecordBatch\n(Universal Format)"]
User -->|SQL String| SqlEngine
 
   SqlEngine --> Preprocessor
 
   Preprocessor --> Parser
 
   Parser -->|AST| SelectPlan
 
   Parser -->|AST| InsertPlan
 
   Parser -->|AST| CreateTablePlan
    
 
   SelectPlan --> Expr
 
   InsertPlan --> ScalarExpr
    
 
   SelectPlan --> RuntimeContext
 
   InsertPlan --> RuntimeContext
 
   CreateTablePlan --> RuntimeContext
    
 
   RuntimeContext --> SessionHandle
 
   RuntimeContext --> TxnSnapshot
 
   RuntimeContext --> TableExecutor
 
   RuntimeContext --> Table
    
 
   TableExecutor --> StreamingOps
 
   StreamingOps --> Table
    
 
   Table --> SysCatalog
 
   Table --> FieldId
 
   Table --> ColumnStore
    
 
   ColumnStore --> LogicalFieldId
 
   ColumnStore --> PhysicalKey
 
   ColumnStore --> Pager
    
 
   Pager --> MemPager
 
   Pager --> SimdPager
    
 
   Table -.->|Produces/Consumes| ArrowBatch
 
   StreamingOps -.->|Produces/Consumes| ArrowBatch
 
   ColumnStore -.->|Serializes/Deserializes| ArrowBatch
 
   SqlEngine -.->|Returns| ArrowBatch

End-to-End Query Execution

This diagram traces a SELECT query from SQL text to results, showing the concrete code entities involved:

Sources: README.md:56-63 llkv-sql/README.md:1-107 llkv-runtime/README.md:33-41 llkv-table/README.md:10-25

sequenceDiagram
    participant App as "Application"
    participant SqlEngine as "SqlEngine::execute()"
    participant Preprocessor as "preprocess_sql()"
    participant Parser as "sqlparser::Parser"
    participant Planner as "build_select_plan()"
    participant Runtime as "RuntimeContext::execute_plan()"
    participant Executor as "TableExecutor::execute()"
    participant Table as "Table::scan_stream()"
    participant ColStore as "ColumnStore::gather_columns()"
    participant Pager as "Pager::batch_get()"
    
    App->>SqlEngine: SELECT * FROM users WHERE age > 18
    
    Note over SqlEngine,Preprocessor: Dialect normalization
    SqlEngine->>Preprocessor: Normalize SQLite/DuckDB syntax
    
    SqlEngine->>Parser: Parse normalized SQL
    Parser-->>SqlEngine: Statement AST
    
    SqlEngine->>Planner: Translate AST to SelectPlan
    Note over Planner: Build SelectPlan with\nExpr<String> predicates
    Planner-->>SqlEngine: SelectPlan
    
    SqlEngine->>Runtime: execute_plan(SelectPlan)
    
    Note over Runtime: Acquire TransactionSnapshot\nResolve field names to FieldId
    
    Runtime->>Executor: execute(SelectPlan, context)
    
    Note over Executor: Compile Expr<FieldId>\ninto EvalProgram
    
    Executor->>Table: scan_stream(fields, predicate)
    
    Note over Table: Apply MVCC filtering\nPush down predicates
    
    Table->>ColStore: gather_columns(LogicalFieldId[])
    
    Note over ColStore: Map LogicalFieldId\nto PhysicalKey
    
    ColStore->>Pager: batch_get(PhysicalKey[])
    Pager-->>ColStore: EntryHandle[] (zero-copy)
    
    Note over ColStore: Deserialize Arrow buffers\nApply row_id filtering
    
    ColStore-->>Table: RecordBatch
    Table-->>Executor: RecordBatch
    
    Note over Executor: Apply projections\nEvaluate expressions
    
    Executor-->>Runtime: RecordBatch stream
    Runtime-->>SqlEngine: Vec<RecordBatch>
    SqlEngine-->>App: Query results

Key Features

MVCC Transaction Management

LLKV implements multi-version concurrency control with snapshot isolation:

  • Every table includes three system columns: row_id (monotonic), created_by (transaction ID), and deleted_by (transaction ID or NULL)
  • TxnIdManager in llkv-transaction allocates monotonic transaction IDs and tracks commit watermarks
  • TransactionSnapshot captures a consistent view of the database at transaction start
  • Auto-commit statements use TXN_ID_AUTO_COMMIT = 1
  • Explicit transactions maintain both persistent and staging contexts for isolation

Sources: README.md:64-72 llkv-runtime/README.md:20-32 llkv-table/README.md:32-35

Zero-Copy Storage Pipeline

The storage layer supports zero-copy reads when backed by SimdRDrivePager:

  1. ColumnStore maps LogicalFieldId to PhysicalKey
  2. Pager::batch_get() returns EntryHandle wrappers around memory-mapped regions
  3. Arrow arrays are deserialized directly from the mapped memory without intermediate copies
  4. SIMD-aligned buffers enable vectorized predicate evaluation

Sources: llkv-column-map/README.md:19-41 llkv-storage/README.md:12-28 README.md:12-13

Compiled Expression Evaluation

Predicates and scalar expressions compile to stack-based bytecode:

  • Expr<FieldId> structures in llkv-expr represent logical predicates
  • ProgramCompiler in llkv-table translates expressions into EvalProgram bytecode
  • DomainProgram tracks which row IDs satisfy predicates
  • Bytecode evaluation uses stack-based execution for efficient vectorized operations

Sources: llkv-expr/README.md:1-88 llkv-table/README.md:10-18 README.md:46-53

SQL Logic Test Infrastructure

LLKV includes comprehensive SQL correctness testing:

  • llkv-slt-tester wraps the sqllogictest framework
  • LlkvSltRunner discovers .slt files and executes test suites
  • Supports remote test fetching via .slturl pointer files
  • Environment variable LLKV_SLT_STATS=1 enables detailed query statistics
  • CI runs the full suite on Linux, macOS, and Windows

Sources: README.md:75-77 llkv-slt-tester/README.md:1-57

Getting Started

The main entry point is the llkv crate, which re-exports the SQL interface:

For persistent storage, use SimdRDrivePager instead of MemPager. For transaction control beyond auto-commit, obtain a SessionHandle via SqlEngine::session().

Sources: README.md:14-33 demos/llkv-sql-pong-demo/src/main.rs:386-393

LLKV shares architectural concepts with Apache DataFusion but differs in several key areas:

AspectLLKVDataFusion
Execution ModelSynchronous with Rayon work-stealingAsync with Tokio runtime
Storage BackendCustom key-value via Pager traitParquet, CSV, object stores
SQL Parsersqlparser crate (same)sqlparser crate
Data FormatArrow RecordBatch (same)Arrow RecordBatch
MaturityAlpha / ExperimentalProduction-ready
Transaction SupportMVCC snapshot isolationRead-only (no writes)

LLKV deliberately avoids the DataFusion task scheduler to explore trade-offs in a synchronous execution model, while maintaining compatibility with the same SQL parser and Arrow memory layout.

Sources: README.md:36-42 README.md:8-13