This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Architecture
Loading…
Architecture
Relevant source files
- .github/workflows/build.docs.yml
- Cargo.lock
- Cargo.toml
- demos/llkv-sql-pong-demo/assets/llkv-sql-pong-screenshot.png
- dev-docs/doc-preview.md
- llkv-executor/Cargo.toml
- llkv-join/Cargo.toml
- llkv-plan/Cargo.toml
- llkv-sql/Cargo.toml
- llkv-table/Cargo.toml
- llkv-test-utils/Cargo.toml
This page describes the overall architectural design of LLKV, including the layered system structure, key design decisions, and how major components interact. For detailed information about individual crates and their responsibilities, see Workspace and Crates. For the end-to-end query execution flow, see SQL Query Processing Pipeline. For details on Arrow integration and data representation, see Data Formats and Arrow Integration.
Architectural Overview
LLKV is a columnar SQL database that stores Apache Arrow data structures directly in a key-value persistence layer. The architecture consists of six distinct layers, each implemented as one or more Rust crates. The system translates SQL statements into query plans, executes those plans against columnar table storage, and persists data using memory-mapped key-value stores.
The core architectural innovation is the llkv-column-map layer, which bridges Apache Arrow’s in-memory columnar format with the simd-r-drive key-value storage backend. This design enables zero-copy operations on columnar data while maintaining ACID properties through the underlying storage engine.
Sources: Cargo.toml:1-109 high-level overview diagrams
System Layers
The following diagram shows the six architectural layers with their implementing crates and key data structures:
graph TB
subgraph "Layer 1: User Interface"
SQL["llkv-sql\nSqlEngine"]
DEMO["llkv-sql-pong-demo"]
TPCH["llkv-tpch"]
CSV["llkv-csv"]
end
subgraph "Layer 2: Query Processing"
PARSER["sqlparser-rs\nParser, Statement"]
PLANNER["llkv-plan\nSelectPlan, InsertPlan"]
EXPR["llkv-expr\nExpr, ScalarExpr"]
end
subgraph "Layer 3: Execution"
EXECUTOR["llkv-executor\nTableExecutor"]
RUNTIME["llkv-runtime\nDatabaseRuntime"]
AGGREGATE["llkv-aggregate\nAccumulator"]
JOIN["llkv-join\nhash_join"]
COMPUTE["llkv-compute\nNumericKernels"]
end
subgraph "Layer 4: Data Management"
TABLE["llkv-table\nTable, SysCatalog"]
TRANSACTION["llkv-transaction\nTransaction"]
SCAN["llkv-scan\nScanOp"]
end
subgraph "Layer 5: Storage - Arrow Native"
COLMAP["llkv-column-map\nColumnStore, ColumnDescriptor"]
STORAGE["llkv-storage\nPager trait"]
ARROW["arrow-array\nRecordBatch, ArrayRef"]
end
subgraph "Layer 6: Persistence - Key-Value"
PAGER["Pager implementations\nbatch_get, batch_put"]
SIMD["simd-r-drive\nRDrive, EntryHandle"]
end
SQL --> PARSER
PARSER --> PLANNER
PLANNER --> EXPR
EXPR --> EXECUTOR
EXECUTOR --> AGGREGATE
EXECUTOR --> JOIN
EXECUTOR --> COMPUTE
EXECUTOR --> RUNTIME
RUNTIME --> TABLE
RUNTIME --> TRANSACTION
TABLE --> SCAN
SCAN --> COLMAP
TABLE --> COLMAP
COLMAP --> ARROW
COLMAP --> STORAGE
STORAGE --> PAGER
PAGER --> SIMD
Each layer has well-defined responsibilities. Layer 1 provides user-facing interfaces. Layer 2 translates SQL into executable plans. Layer 3 executes those plans using specialized operators. Layer 4 manages logical tables and transactions. Layer 5 implements columnar storage using Arrow data structures. Layer 6 provides persistent key-value storage with memory-mapping.
Sources: Cargo.toml:2-26 llkv-sql/Cargo.toml:1-45 llkv-executor/Cargo.toml:1-48 llkv-table/Cargo.toml:1-72
Key Architectural Decisions
Arrow-Native Columnar Storage
The system uses Apache Arrow as its native in-memory data format. All data is represented as RecordBatch instances containing ArrayRef columns. This design decision enables:
- Zero-copy interoperability with Arrow-based analytics tools
- Vectorized computation using Arrow kernels
- Efficient memory layouts for SIMD operations
- Type-safe column operations through Arrow’s schema system
The arrow dependency (version 57.1.0) provides the foundation for all data operations.
Key-Value Persistence Backend
Rather than implementing a custom storage engine, LLKV persists data through the Pager trait abstraction defined in llkv-storage. The primary implementation uses simd-r-drive (version 0.15.5-alpha), a memory-mapped key-value store with SIMD-optimized operations.
This design separates logical data management from physical storage concerns. The Pager trait defines operations:
alloc()- allocate new storage keysbatch_get()- retrieve multiple valuesbatch_put()- atomically write multiple key-value pairsfree()- release storage keys
Modular Crate Organization
The workspace contains 15+ specialized crates, each focused on a specific concern. This modularity enables:
- Independent testing and benchmarking per crate
- Clear dependency boundaries
- Parallel development across subsystems
- Selective feature compilation
Core crates follow a naming convention: llkv-{subsystem}. Foundational crates like llkv-types and llkv-result provide shared types and error handling used throughout the system.
MVCC Transaction Isolation
The system implements Multi-Version Concurrency Control through llkv-transaction. Each table row includes system columns created_by and deleted_by that track transaction visibility. The Transaction struct manages transaction state and visibility rules.
Sources: Cargo.toml:37-96 llkv-storage/Cargo.toml:1-48 llkv-table/Cargo.toml:20-41
Component Organization
The following diagram maps the workspace crates to their architectural roles:
Dependencies flow upward through the layers. Lower-level crates like llkv-types and llkv-storage have no dependencies on higher layers. The llkv-sql crate sits at the top, orchestrating all subsystems.
Sources: Cargo.toml:2-26 Cargo.toml:55-74
Data Flow Architecture
Data flows through the system in two primary patterns: write operations (INSERT, UPDATE, DELETE) and read operations (SELECT).
flowchart LR SQL["SQL Statement"] --> PARSE["sqlparser\nparse()"] PARSE --> PLAN["llkv-plan\nInsertPlan"] PLAN --> RUNTIME["DatabaseRuntime\nexecute_insert_plan()"] RUNTIME --> TABLE["Table\nappend()"] TABLE --> COLMAP["ColumnStore\nappend()"] COLMAP --> CHUNK["Chunking\nLWW deduplication"] CHUNK --> SERIAL["Serialize\nArrow arrays"] SERIAL --> PAGER["Pager\nbatch_put()"] PAGER --> KV["simd-r-drive\nEntryHandle"]
Write Path
Write operations follow a path from SQL parsing through plan creation, runtime execution, table operations, column store append, chunking/deduplication, serialization, and finally persistence via the Pager trait.
flowchart LR SQL["SQL Statement"] --> PARSE["sqlparser\nparse()"] PARSE --> PLAN["llkv-plan\nSelectPlan"] PLAN --> EXECUTOR["TableExecutor\nexecute_select()"] EXECUTOR --> FILTER["Phase 1:\nfilter_row_ids()"] FILTER --> GATHER["Phase 2:\ngather_rows()"] GATHER --> COLMAP["ColumnStore\ngather()"] COLMAP --> PAGER["Pager\nbatch_get()"] PAGER --> DESER["Deserialize\nArrow arrays"] DESER --> BATCH["RecordBatch\nassembly"] BATCH --> RESULT["Query Results"]
The ColumnStore::append() method implements Last-Write-Wins semantics for upserts by detecting duplicate row IDs and replacing older values.
Read Path
Read operations use a two-phase approach: first collecting matching row IDs via predicate evaluation, then gathering column data for those rows. This minimizes data movement by filtering before gathering.
Sources: llkv-sql/Cargo.toml:20-38 llkv-executor/Cargo.toml:20-42 llkv-table/Cargo.toml:20-41
Storage Architecture: Arrow to Key-Value Bridge
The llkv-column-map crate implements the critical bridge between Arrow’s columnar format and key-value storage:
graph TB
subgraph "Logical Layer"
FIELD["LogicalFieldId\n(table_id, field_name)"]
ROWID["RowId\n(u64)"]
end
subgraph "Column Organization"
CATALOG["ColumnCatalog\nfield_id → ColumnDescriptor"]
DESC["ColumnDescriptor\nlinked list of chunks"]
META["ChunkMetadata\n(min, max, size, null_count)"]
end
subgraph "Physical Storage"
CHUNK["Data Chunks\nserialized Arrow arrays"]
RIDARRAY["RowId Arrays\nsorted u64 arrays"]
PKEY["Physical Keys\n(chunk_pk, rid_pk)"]
end
subgraph "Key-Value Layer"
ENTRY["EntryHandle\nbyte blobs"]
MMAP["Memory-Mapped Files"]
end
FIELD --> CATALOG
CATALOG --> DESC
DESC --> META
META --> CHUNK
CHUNK --> PKEY
DESC --> RIDARRAY
RIDARRAY --> PKEY
PKEY --> ENTRY
ENTRY --> MMAP
ROWID -.used for.-> RIDARRAY
The ColumnCatalog maps logical field identifiers to physical storage. Each column is represented by a ColumnDescriptor that maintains a linked list of data chunks. Each chunk contains:
- A serialized Arrow array (the actual column data)
- A corresponding sorted array of row IDs
- Metadata including min/max values for predicate pushdown
- Physical keys (
chunk_pk,rid_pk) pointing to storage
The ChunkMetadata enables chunk pruning during scans: chunks whose min/max ranges don’t overlap with query predicates can be skipped entirely.
Data chunks are stored as serialized byte blobs accessed through EntryHandle instances from simd-r-drive. The storage layer uses memory-mapped files for efficient I/O.
Sources: llkv-column-map/Cargo.toml:1-65 llkv-storage/Cargo.toml:1-48 Cargo.toml:85-86
System Catalog
The system catalog (table ID 0) stores metadata about all tables, columns, indexes, and constraints. It is itself stored in the same ColumnStore as user data, creating a self-describing bootstrapped system.
The SysCatalog struct provides typed access to catalog tables. The CatalogManager coordinates table lifecycle operations (CREATE, ALTER, DROP) by manipulating catalog entries.
Table ID ranges partition the namespace:
- ID 0: System catalog
- IDs 1-999: User tables
- IDs 1000+: Information schema views
- IDs 10000+: Temporary tables
This design allows the catalog to leverage the same storage, transaction, and query infrastructure as user data.
Sources: llkv-table/Cargo.toml:20-41
Dismiss
Refresh this wiki
Enter email to refresh