This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Data Formats and Arrow Integration
Relevant source files
- Cargo.lock
- Cargo.toml
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-column-map/src/store/projection.rs
- llkv-csv/README.md
- llkv-expr/Cargo.toml
- llkv-expr/README.md
- llkv-expr/src/lib.rs
- llkv-expr/src/literal.rs
- llkv-join/README.md
- llkv-plan/Cargo.toml
- llkv-plan/src/lib.rs
- llkv-runtime/README.md
- llkv-storage/README.md
- llkv-storage/src/serialization.rs
- llkv-table/Cargo.toml
- llkv-table/README.md
Purpose and Scope
This page documents how Apache Arrow columnar data structures serve as the universal data interchange format throughout LLKV. It covers the supported Arrow data types, the custom zero-copy serialization format used for persistence, and how RecordBatch flows between layers.
For information about how expressions evaluate over Arrow data, see Expression System. For details on storage pager abstractions, see Pager Interface and SIMD Optimization.
Arrow as the Universal Data Format
LLKV is Arrow-native : every layer of the system produces, consumes, and operates on arrow::record_batch::RecordBatch structures. This design choice enables:
- Zero-copy data access across layer boundaries
- SIMD-friendly vectorized operations via contiguous columnar buffers
- Unified type system from SQL parsing through storage
- Efficient interoperability with external Arrow-compatible tools
The following table maps system layers to their Arrow usage:
| Layer | Arrow Usage |
|---|---|
| llkv-sql | Parses SQL and returns RecordBatch query results to callers |
| llkv-plan | Constructs plans that reference Arrow DataType and Field structures |
| llkv-executor | Streams RecordBatch results during SELECT evaluation |
| llkv-table | Validates incoming batches against table schemas and persists columns |
| llkv-column-map | Chunks RecordBatch columns into serialized Arrow arrays for pager storage |
| llkv-storage | Serializes/deserializes Arrow arrays with a custom zero-copy format |
Sources: Cargo.toml32 llkv-table/README.md:10-11 llkv-column-map/README.md10 llkv-storage/README.md10
RecordBatch Flow Diagram
Key Data Flow Observations:
- INSERT Path :
RecordBatch→ schema validation → MVCC column injection → column chunking → serialization → pager write - SELECT Path : Pager read → deserialization → column gather →
RecordBatchconstruction → streaming to executor - Zero-Copy :
EntryHandlewraps memory-mapped regions that Arrow arrays reference directly without copying
Sources: llkv-column-map/src/store/projection.rs:240-446 llkv-storage/src/serialization.rs:226-254 llkv-table/README.md:19-25
Supported Arrow Data Types
LLKV supports the following Arrow primitive and complex types:
Primitive Types
| Arrow DataType | Storage Size | SQL Type Mapping |
|---|---|---|
UInt8 | 1 byte | TINYINT UNSIGNED |
UInt16 | 2 bytes | SMALLINT UNSIGNED |
UInt32 | 4 bytes | INT UNSIGNED |
UInt64 | 8 bytes | BIGINT UNSIGNED |
Int8 | 1 byte | TINYINT |
Int16 | 2 bytes | SMALLINT |
Int32 | 4 bytes | INT |
Int64 | 8 bytes | BIGINT |
Float32 | 4 bytes | REAL, FLOAT |
Float64 | 8 bytes | DOUBLE PRECISION |
Boolean | 1 bit (packed) | BOOLEAN |
Date32 | 4 bytes | DATE |
Date64 | 8 bytes | TIMESTAMP |
Decimal128(p, s) | 16 bytes | DECIMAL(p, s) |
Variable-Length Types
| Arrow DataType | Storage Layout | SQL Type Mapping |
|---|---|---|
Utf8 | i32 offsets + UTF-8 bytes | VARCHAR, TEXT |
LargeUtf8 | i64 offsets + UTF-8 bytes | TEXT (large) |
Binary | i32 offsets + raw bytes | VARBINARY, BLOB |
LargeBinary | i64 offsets + raw bytes | BLOB (large) |
Complex Types
| Arrow DataType | Description | Use Cases |
|---|---|---|
Struct(fields) | Nested record with named fields | Composite values, JSON-like data |
FixedSizeList(T, n) | Fixed-length array of type T | Vector embeddings, coordinate tuples |
Null Handling:
The current serialization format does not yet support null bitmaps. Arrays with null_count() > 0 will return an error during serialization. Null support is planned for future releases.
Sources: llkv-storage/src/serialization.rs:144-165 llkv-storage/src/serialization.rs:199-224 llkv-expr/src/literal.rs:78-94
Custom Serialization Format
Why Not Arrow IPC?
LLKV uses a custom minimal serialization format instead of Arrow's standard IPC (Inter-Process Communication) format for several reasons:
Trade-offs:
graph LR
subgraph "Arrow IPC Format"
IPC_SCHEMA["Schema object\nframing metadata"]
IPC_PADDING["Padding alignment\n8/64 byte boundaries"]
IPC_BUFFERS["Buffer pointers\n+ offsets"]
IPC_SIZE["Larger file size\n~20-40% overhead"]
end
subgraph "LLKV Custom Format"
CUSTOM_HEADER["24-byte header\nfixed size"]
CUSTOM_PAYLOAD["Raw buffer bytes\ncontiguous"]
CUSTOM_ZERO["Zero-copy rebuild\ndirect mmap"]
CUSTOM_SIZE["Minimal size\nno framing"]
end
IPC_SCHEMA --> IPC_SIZE
IPC_PADDING --> IPC_SIZE
IPC_BUFFERS --> IPC_SIZE
CUSTOM_HEADER --> CUSTOM_SIZE
CUSTOM_PAYLOAD --> CUSTOM_ZERO
| Aspect | Arrow IPC | LLKV Custom Format |
|---|---|---|
| File size | Larger (metadata + padding) | Minimal (24-byte header + payload) |
| Deserialization | Allocates and copies buffers | Zero-copy via EntryHandle |
| Flexibility | Supports all Arrow features | Limited to non-null arrays |
| Scan performance | Moderate (copy overhead) | Fast (direct SIMD access) |
| Null support | Full bitmap support | Not yet implemented |
Design Goals:
- Minimal headers : 24-byte fixed header, no schema objects per array
- Predictable payloads : contiguous buffers for mmap-friendly access
- True zero-copy : reconstruct
ArrayDatareferencing original buffer directly - Stable on-disk codes : type tags are compile-time pinned to prevent corruption
Sources: llkv-storage/src/serialization.rs:1-28 llkv-storage/src/serialization.rs:41-135
Serialization Format Details
Header Structure
Every serialized array begins with a 24-byte header:
Offset Size Field
------ ---- -----
0-3 4 Magic bytes: b"ARR0"
4 1 Layout code (Primitive=0, FslFloat32=1, Varlen=2, Struct=3)
5 1 Type code (PrimType enum value)
6 1 Precision (for Decimal128) or padding
7 1 Scale (for Decimal128) or padding
8-15 8 Array length (u64, element count)
16-19 4 extra_a (layout-specific u32)
20-23 4 extra_b (layout-specific u32)
24+ var Payload (layout-specific buffer bytes)
Layout Variants
Primitive Layout
For fixed-width types (Int32, Float64, etc.):
extra_a: Length of values buffer in bytesextra_b: Unused (0)- Payload: Raw values buffer
Varlen Layout
For variable-length types (Utf8, Binary, etc.):
extra_a: Length of offsets buffer in bytesextra_b: Length of values buffer in bytes- Payload: Offsets buffer followed by values buffer
FixedSizeList Layout
Special optimization for vector embeddings:
extra_a: List size (elements per list)extra_b: Total child buffer length in bytes- Payload: Contiguous child
Float32buffer
This enables direct SIMD access to embedding vectors without indirection.
Struct Layout
For nested composite types:
extra_a: Unused (0)extra_b: IPC payload length in bytes- Payload: Arrow IPC-serialized struct array
Struct types fall back to Arrow IPC format because their complex nested structure doesn't benefit from the custom layout.
Sources: llkv-storage/src/serialization.rs:44-135 llkv-storage/src/serialization.rs:256-378
graph TB
PAGER["Pager::batch_get"]
HANDLE["EntryHandle\nmemory-mapped region"]
BUFFER["Arrow Buffer\nslice of EntryHandle"]
ARRAYDATA["ArrayData\nreferences Buffer"]
ARRAY["Concrete Array\nInt32Array, etc."]
PAGER --> HANDLE
HANDLE --> BUFFER
BUFFER --> ARRAYDATA
ARRAYDATA --> ARRAY
style HANDLE fill:#f9f9f9
style BUFFER fill:#f9f9f9
Zero-Copy Deserialization
EntryHandle Integration
The EntryHandle type from simd-r-drive-entry-handle provides a zero-copy wrapper around memory-mapped buffers:
Key Operations:
- Pager read : Returns
GetResult::Raw { key, bytes: EntryHandle } - Buffer slice :
EntryHandle::as_arrow_buffer()creates an ArrowBufferview - ArrayData build :
ArrayData::builder().add_buffer(buffer).build() - Array cast :
make_array(data)produces typed arrays
The entire chain avoids copying data — Arrow arrays directly reference the memory-mapped region.
Alignment Requirements:
Decimal128 requires 16-byte alignment. If the EntryHandle buffer is not properly aligned, the deserializer copies it to an aligned buffer:
Sources: llkv-storage/src/serialization.rs:429-559 llkv-column-map/src/store/projection.rs:619-629
graph TB
ROWIDS["row_ids: &[u64]\nrequested rows"]
FIELDIDS["field_ids: &[LogicalFieldId]\nrequested columns"]
PLANS["FieldPlan\nper-column metadata"]
CHUNKS["Chunk selection\ncandidate_indices"]
BATCH_GET["Pager::batch_get\nfetch chunks"]
CACHE["Chunk cache\nArrayRef map"]
GATHER["gather_rows_from_chunks\nper-type specialization"]
ARRAYS["Vec<ArrayRef>\none per field"]
SCHEMA["Arrow Schema\nField metadata"]
RECORDBATCH["RecordBatch::try_new"]
ROWIDS --> PLANS
FIELDIDS --> PLANS
PLANS --> CHUNKS
CHUNKS --> BATCH_GET
BATCH_GET --> CACHE
CACHE --> GATHER
GATHER --> ARRAYS
ARRAYS --> RECORDBATCH
SCHEMA --> RECORDBATCH
RecordBatch Construction and Projection
Gather Operations
The ColumnStore::gather_rows family of methods reconstructs RecordBatch from chunked columns:
Projection Flow:
- Prepare context : Load column descriptors, determine chunk candidates
- Batch fetch : Request all needed chunks from pager in one call
- Type-specific gather : Dispatch to specialized routines based on
DataType - Null policy : Apply
GatherNullPolicy(ErrorOnMissing, IncludeNulls, DropNulls) - Schema construction : Build
Schemawith correct field names and nullability - RecordBatch assembly :
RecordBatch::try_new(schema, arrays)
Type Dispatch Table:
| DataType | Gather Function |
|---|---|
Utf8 | gather_rows_from_chunks_string::<i32> |
LargeUtf8 | gather_rows_from_chunks_string::<i64> |
Binary | gather_rows_from_chunks_binary::<i32> |
LargeBinary | gather_rows_from_chunks_binary::<i64> |
Boolean | gather_rows_from_chunks_bool |
Struct(_) | gather_rows_from_chunks_struct |
Decimal128(_, _) | gather_rows_from_chunks_decimal128 |
| Primitives | gather_rows_from_chunks::<ArrowTy> (generic) |
Sources: llkv-column-map/src/store/projection.rs:245-446 llkv-column-map/src/store/projection.rs:636-726
graph TB
BATCH["RecordBatch\nuser data"]
TABLE_SCHEMA["Stored Schema\nfrom catalog"]
VALIDATE["Schema validation"]
FIELD_CHECK["Field count\nname\ntype match"]
MVCC["Inject MVCC columns\nrow_id, created_by,\ndeleted_by"]
EXTENDED["Extended RecordBatch"]
COLMAP["ColumnStore::append"]
BATCH --> VALIDATE
TABLE_SCHEMA --> VALIDATE
VALIDATE --> FIELD_CHECK
FIELD_CHECK -->|Pass| MVCC
FIELD_CHECK -->|Fail| ERROR["Error::SchemaMismatch"]
MVCC --> EXTENDED
EXTENDED --> COLMAP
Schema Validation
Table-Level Schema Enforcement
The Table layer validates incoming RecordBatch schemas against the stored table schema before appending:
Validation Rules:
- Field count : Batch must have exactly the same number of columns as the table schema
- Field names : Column names must match (case-sensitive)
- Field types :
DataTypemust match exactly (no implicit coercion) - Nullability : Currently not strictly enforced (planned improvement)
MVCC Column Injection:
After validation, the table appends three system columns:
row_id(UInt64): Unique row identifiercreated_by(UInt64): Transaction ID that created the rowdeleted_by(UInt64): Transaction ID that deleted the row (0 if active)
These columns are stored in separate logical namespaces but physically alongside user data.
Sources: llkv-table/README.md:14-17 llkv-column-map/README.md:26-28
graph LR
SQL_LITERAL["SQL Literal\n123, 'text', etc."]
LITERAL["Literal enum\nInteger, String, etc."]
SCHEMA["Table Schema\nArrow DataType"]
NATIVE["Native Value\ni32, String, etc."]
ARRAY["Arrow Array\nInt32Array, etc."]
SQL_LITERAL --> LITERAL
LITERAL --> SCHEMA
SCHEMA --> NATIVE
NATIVE --> ARRAY
Type Mapping from SQL to Arrow
Literal Conversion
The llkv-expr crate defines a Literal enum that captures untyped SQL values before schema resolution:
Supported Literal Types:
| Literal Variant | Arrow Target Types |
|---|---|
Integer(i128) | Any integer or float type (with range checks) |
Float(f64) | Float32, Float64 |
Decimal(DecimalValue) | Decimal128(p, s) |
String(String) | Utf8, LargeUtf8 |
Boolean(bool) | Boolean |
Date32(i32) | Date32 |
Struct(fields) | Struct(...) |
Interval(IntervalValue) | Not directly stored; used for date arithmetic |
Conversion Mechanism:
The FromLiteral trait provides type-aware conversion:
Implementations perform range checking and type validation:
Sources: llkv-expr/src/literal.rs:78-94 llkv-expr/src/literal.rs:156-219 llkv-expr/src/literal.rs:395-419
Performance Characteristics
Zero-Copy Benefits
The combination of Arrow's columnar layout and the custom serialization format delivers measurable performance benefits:
| Operation | Traditional DB | LLKV Arrow-Native |
|---|---|---|
| Column scan | Row-by-row decode | Vectorized SIMD over mmap |
| Type dispatch | Virtual function calls | Monomorphized at compile time |
| Buffer management | Multiple allocations | Single mmap region |
| Predicate evaluation | Interpreted per row | Compiled bytecode over vectors |
Chunking Strategy
The ColumnStore organizes data into chunks sized for cache locality and pager efficiency:
- Target chunk size : Configurable, typically 64KB-256KB per column
- Row alignment : All columns in a table share the same row boundaries per chunk
- Append optimization : Incoming batches are chunked and sorted by
row_idbefore persistence
This design minimizes pager I/O and maximizes CPU cache hit rates during scans.
Sources: llkv-column-map/README.md:24-28 llkv-storage/README.md:15-17
Integration with External Tools
Arrow Compatibility
Because LLKV uses standard Arrow data structures at its boundaries, it can integrate with the broader Arrow ecosystem:
- Export : Query results can be serialized to Arrow IPC files for external processing
- Import : Arrow IPC files can be read and ingested via
Table::append - Parquet : Future work could add direct Parquet read/write using Arrow's
parquetcrate - DataFusion : LLKV's table scan APIs could potentially integrate as a DataFusion TableProvider
Current Limitations
- Null support : The serialization format doesn't yet handle null bitmaps
- Nested types : Only
StructandFixedSizeList<Float32>are fully supported - Dictionary encoding : Not yet implemented (planned)
- Compression : No built-in compression (relies on storage-layer features)
Sources: llkv-storage/src/serialization.rs:257-260 llkv-column-map/README.md:10-11
Summary
LLKV's Arrow-native architecture provides:
- Universal interchange format via
RecordBatchacross all layers - Zero-copy operations through
EntryHandleand memory-mapped buffers - Custom serialization optimized for mmap and SIMD access patterns
- Type safety from SQL literals through to persisted columns
- SIMD-friendly layout for efficient vectorized query evaluation
The trade-off of using a custom format instead of Arrow IPC is reduced flexibility (no nulls yet, fewer complex types) in exchange for smaller files, faster scans, and true zero-copy deserialization.
For details on how Arrow arrays are evaluated during query execution, see Scalar Evaluation and NumericKernels. For information on how MVCC metadata is stored alongside Arrow columns, see Column Storage and ColumnStore.