This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Overview

Loading…

Overview

Relevant source files

Purpose and Scope

LLKV is an embedded SQL database system implemented in Rust that combines Apache Arrow’s columnar memory format with a key-value storage backend. This document provides a high-level introduction to the system architecture, component organization, and core data flows.

For detailed information about specific subsystems, see:

Architecture and component organization : Architecture
SQL query processing : SQL Interface and Query Planning
Storage implementation : Storage Layer
Metadata management : Catalog and Metadata Management

System Architecture

LLKV is organized as a Rust workspace containing 15 specialized crates that form a layered architecture. The system processes SQL queries through multiple stages—parsing, planning, execution—before ultimately persisting data in a memory-mapped key-value store.

graph TB
    subgraph "User Interface"
        SqlEngine["SqlEngine\n(llkv-sql)"]
PreparedStatement["PreparedStatement"]
end
    
    subgraph "Query Processing"
        Parser["SQL Parser\n(sqlparser-rs)"]
Planner["Query Planner\n(llkv-plan)"]
ExprSystem["Expression System\n(llkv-expr)"]
end
    
    subgraph "Execution"
        QueryExecutor["QueryExecutor\n(llkv-executor)"]
RuntimeEngine["RuntimeEngine\n(llkv-runtime)"]
Aggregate["AggregateAccumulator\n(llkv-aggregate)"]
Join["Hash Join\n(llkv-join)"]
end
    
    subgraph "Data Management"
        Table["Table\n(llkv-table)"]
SysCatalog["SysCatalog\nTable ID = 0"]
Scanner["Scanner\n(llkv-scan)"]
TxManager["TransactionManager\n(llkv-transaction)"]
end
    
    subgraph "Storage"
        ColumnStore["ColumnStore\n(llkv-column-map)"]
ArrowBatches["RecordBatch\nArrow Arrays"]
Pager["Pager Trait\n(llkv-storage)"]
end
    
    subgraph "Persistence"
        SimdRDrive["simd-r-drive\nMemory-Mapped K-V"]
EntryHandle["EntryHandle\nByte Blobs"]
end
    
 
   SqlEngine --> Parser
 
   Parser --> Planner
 
   Planner --> ExprSystem
 
   Planner --> QueryExecutor
 
   QueryExecutor --> RuntimeEngine
 
   QueryExecutor --> Aggregate
 
   QueryExecutor --> Join
 
   RuntimeEngine --> Table
 
   RuntimeEngine --> TxManager
 
   Table --> SysCatalog
 
   Table --> Scanner
 
   Scanner --> ColumnStore
 
   ColumnStore --> ArrowBatches
 
   ColumnStore --> Pager
 
   Pager --> SimdRDrive
 
   SimdRDrive --> EntryHandle
    
    style SqlEngine fill:#e8e8e8
    style ColumnStore fill:#e8e8e8
    style SimdRDrive fill:#e8e8e8

Layered Architecture

Diagram: End-to-End System Layering

The system flows from SQL text at the top through progressive layers of abstraction down to persistent storage. Each layer is implemented as one or more dedicated crates with well-defined responsibilities.

Sources: Cargo.toml:1-109 llkv-sql/src/sql_engine.rs:1-100 llkv-executor/src/lib.rs:1-100

Workspace Structure

The LLKV workspace is divided into 15 crates, each handling a specific concern. The following diagram maps crate names to their primary responsibilities:

Diagram: Crate Dependency Structure

graph LR
    subgraph "Foundation"
        types["llkv-types\nShared types\nLogicalFieldId"]
result["llkv-result\nError handling\nError enum"]
storage["llkv-storage\nPager trait\nMemPager"]
end
    
    subgraph "Expression & Planning"
        expr["llkv-expr\nExpression AST\nScalarExpr, Expr"]
plan["llkv-plan\nQuery plans\nSelectPlan, InsertPlan"]
end
    
    subgraph "Execution"
        executor["llkv-executor\nQuery execution\nQueryExecutor"]
compute["llkv-compute\nCompute kernels\nNumericKernels"]
aggregate["llkv-aggregate\nAggregation\nAggregateAccumulator"]
join["llkv-join\nJoin ops\nhash_join"]
scan["llkv-scan\nTable scans\nScanner"]
end
    
    subgraph "Data Management"
        table["llkv-table\nTable abstraction\nTable, SysCatalog"]
colmap["llkv-column-map\nColumn store\nColumnStore"]
transaction["llkv-transaction\nMVCC\nTransactionManager"]
end
    
    subgraph "User Interface"
        sql["llkv-sql\nSQL engine\nSqlEngine"]
runtime["llkv-runtime\nRuntime engine\nRuntimeEngine"]
end
    
    subgraph "Utilities"
        csv["llkv-csv\nCSV import/export"]
threading["llkv-threading\nThread pool"]
testutils["llkv-test-utils\nTest helpers"]
slttester["llkv-slt-tester\nSQLite test harness"]
end
    
 
   types -.-> expr
 
   types -.-> storage
 
   types -.-> table
 
   result -.-> storage
 
   result -.-> table
 
   storage -.-> colmap
 
   expr -.-> plan
 
   expr -.-> compute
 
   plan -.-> executor
 
   colmap -.-> table
 
   table -.-> executor
 
   executor -.-> runtime
 
   runtime -.-> sql

This diagram shows the primary dependency flow between crates. Foundation crates (llkv-types, llkv-result, llkv-storage) provide shared infrastructure. Middle layers handle query planning and execution. Top layers expose the SQL interface.

Sources: Cargo.toml:2-26 Cargo.toml:37-96

Key Components

SQL Interface Layer

The SqlEngine struct in llkv-sql is the primary entry point for executing SQL statements. It handles statement parsing, preprocessing, and orchestrates execution through the runtime layer.

Diagram: SQL Interface Entry Points

graph TD
    User["Application Code"]
SqlEngine["SqlEngine::new(pager)"]
Execute["SqlEngine::execute(sql)"]
Sql["SqlEngine::sql(sql)"]
Prepare["SqlEngine::prepare(sql)"]
User --> SqlEngine
 
   SqlEngine --> Execute
 
   SqlEngine --> Sql
 
   SqlEngine --> Prepare
    
 
   Execute --> Parse["parse_sql_with_recursion_limit"]
Parse --> Preprocess["preprocess_sql_input"]
Preprocess --> BuildPlan["build_*_plan methods"]
BuildPlan --> RuntimeExec["RuntimeEngine::execute_statement"]
Sql --> Execute
 
   Prepare --> PreparedStatement["PreparedStatement"]

The SqlEngine provides three primary methods: execute() for mixed statements, sql() for SELECT queries returning RecordBatch results, and prepare() for parameterized statements.

Sources: llkv-sql/src/sql_engine.rs:440-486 llkv-sql/src/sql_engine.rs:1045-1134 llkv-sql/src/sql_engine.rs:1560-1612

Query Planning

The llkv-plan crate transforms parsed SQL AST into executable plans. Key plan types include:

Plan Type	Purpose	Key Fields
`SelectPlan`	Query execution	`projections`, `tables`, `filter`, `group_by`, `order_by`
`InsertPlan`	Data insertion	`table`, `columns`, `source`, `on_conflict`
`UpdatePlan`	Row updates	`table`, `assignments`, `filter`
`DeletePlan`	Row deletion	`table`, `filter`
`CreateTablePlan`	Schema definition	`table`, `columns`, `constraints`

Sources: [llkv-plan crate referenced in llkv-executor/src/lib.rs31-35](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-plan crate referenced in llkv-executor/src/lib.rs#L31-L35)

Expression System

The llkv-expr crate defines two core expression types:

Expr<F>: Boolean predicate expressions for filtering (used in WHERE clauses)
ScalarExpr<F>: Scalar value expressions for projections and computations

Both are generic over field identifier type F, allowing translation from string column names to numeric FieldId identifiers.

Sources: llkv-executor/src/lib.rs:23-26

graph TB
    subgraph "Logical Layer"
        RecordBatch["RecordBatch\nArrow columnar format"]
Schema["Schema\nColumn definitions"]
end
    
    subgraph "Column Store Layer"
        ColumnStore["ColumnStore"]
ColumnDescriptor["ColumnDescriptor\nLinked list of chunks"]
ChunkMetadata["ChunkMetadata\nmin, max, size, nulls"]
end
    
    subgraph "Physical Layer"
        Pager["Pager trait\nbatch_get, batch_put"]
MemPager["MemPager"]
SimdRDrive["simd-r-drive\nMemory-mapped storage"]
end
    
    subgraph "Persistence"
        EntryHandle["EntryHandle\nByte blob references"]
PhysicalKeys["Physical keys (u64)"]
end
    
 
   RecordBatch --> ColumnStore
 
   Schema --> ColumnStore
 
   ColumnStore --> ColumnDescriptor
 
   ColumnDescriptor --> ChunkMetadata
 
   ColumnDescriptor --> Pager
 
   Pager --> MemPager
 
   Pager --> SimdRDrive
 
   MemPager --> EntryHandle
 
   SimdRDrive --> EntryHandle
 
   EntryHandle --> PhysicalKeys

Storage Architecture

LLKV stores data in a columnar format using Apache Arrow, persisted through a key-value storage backend:

Diagram: Storage Architecture Layers

Arrow RecordBatches are decomposed into individual column chunks, each serialized and stored via the Pager trait. The simd-r-drive backend provides memory-mapped, SIMD-optimized key-value operations.

Sources: Cargo.lock:126-143 Cargo.lock:671-687 [llkv-column-map references in llkv-executor/src/lib.rs20](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-column-map references in llkv-executor/src/lib.rs#L20-L20)

Query Execution Flow

The following diagram traces a SELECT query through the execution pipeline:

Diagram: SELECT Query Execution Sequence

Query execution proceeds in two phases: (1) filter evaluation to collect matching row IDs, and (2) column gathering to assemble the final RecordBatch. Metadata-based chunk pruning optimizes filter evaluation by skipping chunks that cannot contain matching rows.

Sources: llkv-sql/src/sql_engine.rs:1596-1612 llkv-executor/src/lib.rs:519-563

Data Model

Tables and Schemas

Every table in LLKV has:

A unique numeric table_id
A Schema defining column names, types, and nullability
Optional constraints (primary key, foreign keys, unique, check)
Optional indexes (single-column and multi-column)

graph TB
    SysCatalog["SysCatalog (Table 0)"]
TableMeta["TableMeta records"]
ColMeta["ColMeta records"]
IndexMeta["Index metadata"]
ConstraintMeta["Constraint records"]
TriggerMeta["Trigger definitions"]
SysCatalog --> TableMeta
 
   SysCatalog --> ColMeta
 
   SysCatalog --> IndexMeta
 
   SysCatalog --> ConstraintMeta
 
   SysCatalog --> TriggerMeta
    
    UserTable1["User Table 1"]
UserTable2["User Table 2"]
TableMeta -.describes.-> UserTable1
    TableMeta -.describes.-> UserTable2
    ColMeta -.describes.-> UserTable1
    ColMeta -.describes.-> UserTable2

System Catalog

Table ID 0 is reserved for the SysCatalog, a special table that stores metadata about all other tables, columns, indexes, triggers, and constraints. The catalog is self-describing—it uses the same columnar storage as user tables.

Diagram: System Catalog Structure

All DDL operations (CREATE TABLE, ALTER TABLE, etc.) modify the system catalog. At startup, the catalog is read to reconstruct the complete database schema.

Sources: [llkv-sql/src/sql_engine.rs references to SysCatalog](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-sql/src/sql_engine.rs references to SysCatalog)

Columnar Storage

Data is stored column-wise using Apache Arrow’s in-memory format, with each column divided into chunks. Each chunk contains:

Serialized Arrow array data
Row ID bitmap (which rows are present)
Metadata (min/max values, null count, size)

This organization enables:

Efficient predicate pushdown (skip chunks via min/max)
Vectorized operations on decompressed data
Compaction (merging small chunks)

Sources: [llkv-executor/src/lib.rs references to ColumnStore and RecordBatch](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-executor/src/lib.rs references to ColumnStore and RecordBatch)

Transaction Support

LLKV implements Multi-Version Concurrency Control (MVCC) using hidden columns:

Column	Type	Purpose
`__created_by`	`u64`	Transaction ID that created this row version
`__deleted_by`	`u64`	Transaction ID that deleted this row version (or `u64::MAX` if active)

The TransactionManager in llkv-transaction coordinates transaction boundaries and assigns transaction IDs. Queries automatically filter rows based on the current transaction’s visibility rules.

Sources: [llkv-transaction crate in Cargo.toml24](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-transaction crate in Cargo.toml#L24-L24)

External Dependencies

LLKV relies on several external crates for core functionality:

Dependency	Version	Purpose
`arrow`	57.1.0	Columnar data format, compute kernels
`sqlparser`	0.59.0	SQL parsing (supports multiple dialects)
`simd-r-drive`	0.15.5-alpha	Memory-mapped key-value storage with SIMD optimization
`rayon`	1.10.0	Parallel processing (used in joins, aggregations)
`croaring`	2.5.1	Bitmap indexes for row ID sets

Sources: Cargo.toml:40-49 Cargo.lock:126-143 Cargo.lock:671-687

Usage Example

Sources: llkv-sql/src/sql_engine.rs:443-485

Summary

LLKV is a layered SQL database system that marries Apache Arrow’s columnar format with key-value storage. The architecture separates concerns across 15 crates, enabling modular development and testing. Queries flow from SQL text through parsing, planning, and execution stages before accessing columnar data persisted in a memory-mapped store. The system supports transactions, indexes, constraints, and SQL features including joins, aggregations, and subqueries.

For deeper exploration of specific subsystems, consult the following sections:

Architecture - Detailed crate organization and dependencies
SQL Interface - SQL preprocessing and dialect handling
Query Execution - Execution strategies and optimizations
Storage Layer - Column store implementation details
Catalog and Metadata Management - Schema management and type system

Sources: Cargo.toml:1-109 llkv-sql/src/sql_engine.rs:1-100 llkv-executor/src/lib.rs:1-100

Dismiss

Refresh this wiki

Enter email to refresh

Keyboard shortcuts

rust-llkv Documentation