This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Data Formats and Arrow Integration

Relevant source files

Purpose and Scope

This page documents how Apache Arrow columnar data structures serve as the universal data interchange format throughout LLKV. It covers the supported Arrow data types, the custom zero-copy serialization format used for persistence, and how RecordBatch flows between layers.

For information about how expressions evaluate over Arrow data, see Expression System. For details on storage pager abstractions, see Pager Interface and SIMD Optimization.

Arrow as the Universal Data Format

LLKV is Arrow-native : every layer of the system produces, consumes, and operates on arrow::record_batch::RecordBatch structures. This design choice enables:

Zero-copy data access across layer boundaries
SIMD-friendly vectorized operations via contiguous columnar buffers
Unified type system from SQL parsing through storage
Efficient interoperability with external Arrow-compatible tools

The following table maps system layers to their Arrow usage:

Layer	Arrow Usage
llkv-sql	Parses SQL and returns `RecordBatch` query results to callers
llkv-plan	Constructs plans that reference Arrow `DataType` and `Field` structures
llkv-executor	Streams `RecordBatch` results during `SELECT` evaluation
llkv-table	Validates incoming batches against table schemas and persists columns
llkv-column-map	Chunks `RecordBatch` columns into serialized Arrow arrays for pager storage
llkv-storage	Serializes/deserializes Arrow arrays with a custom zero-copy format

Sources: Cargo.toml32 llkv-table/README.md:10-11 llkv-column-map/README.md10 llkv-storage/README.md10

RecordBatch Flow Diagram

Key Data Flow Observations:

INSERT Path : RecordBatch → schema validation → MVCC column injection → column chunking → serialization → pager write
SELECT Path : Pager read → deserialization → column gather → RecordBatch construction → streaming to executor
Zero-Copy : EntryHandle wraps memory-mapped regions that Arrow arrays reference directly without copying

Sources: llkv-column-map/src/store/projection.rs:240-446 llkv-storage/src/serialization.rs:226-254 llkv-table/README.md:19-25

Supported Arrow Data Types

LLKV supports the following Arrow primitive and complex types:

Primitive Types

Arrow DataType	Storage Size	SQL Type Mapping
`UInt8`	1 byte	`TINYINT UNSIGNED`
`UInt16`	2 bytes	`SMALLINT UNSIGNED`
`UInt32`	4 bytes	`INT UNSIGNED`
`UInt64`	8 bytes	`BIGINT UNSIGNED`
`Int8`	1 byte	`TINYINT`
`Int16`	2 bytes	`SMALLINT`
`Int32`	4 bytes	`INT`
`Int64`	8 bytes	`BIGINT`
`Float32`	4 bytes	`REAL`, `FLOAT`
`Float64`	8 bytes	`DOUBLE PRECISION`
`Boolean`	1 bit (packed)	`BOOLEAN`
`Date32`	4 bytes	`DATE`
`Date64`	8 bytes	`TIMESTAMP`
`Decimal128(p, s)`	16 bytes	`DECIMAL(p, s)`

Variable-Length Types

Arrow DataType	Storage Layout	SQL Type Mapping
`Utf8`	i32 offsets + UTF-8 bytes	`VARCHAR`, `TEXT`
`LargeUtf8`	i64 offsets + UTF-8 bytes	`TEXT` (large)
`Binary`	i32 offsets + raw bytes	`VARBINARY`, `BLOB`
`LargeBinary`	i64 offsets + raw bytes	`BLOB` (large)

Complex Types

Arrow DataType	Description	Use Cases
`Struct(fields)`	Nested record with named fields	Composite values, JSON-like data
`FixedSizeList(T, n)`	Fixed-length array of type T	Vector embeddings, coordinate tuples

Null Handling:
The current serialization format does not yet support null bitmaps. Arrays with null_count() > 0 will return an error during serialization. Null support is planned for future releases.

Sources: llkv-storage/src/serialization.rs:144-165 llkv-storage/src/serialization.rs:199-224 llkv-expr/src/literal.rs:78-94

Custom Serialization Format

Why Not Arrow IPC?

LLKV uses a custom minimal serialization format instead of Arrow's standard IPC (Inter-Process Communication) format for several reasons:

Trade-offs:

graph LR
    subgraph "Arrow IPC Format"
        IPC_SCHEMA["Schema object\nframing metadata"]
IPC_PADDING["Padding alignment\n8/64 byte boundaries"]
IPC_BUFFERS["Buffer pointers\n+ offsets"]
IPC_SIZE["Larger file size\n~20-40% overhead"]
end
    
    subgraph "LLKV Custom Format"
        CUSTOM_HEADER["24-byte header\nfixed size"]
CUSTOM_PAYLOAD["Raw buffer bytes\ncontiguous"]
CUSTOM_ZERO["Zero-copy rebuild\ndirect mmap"]
CUSTOM_SIZE["Minimal size\nno framing"]
end
    
 
   IPC_SCHEMA --> IPC_SIZE
 
   IPC_PADDING --> IPC_SIZE
 
   IPC_BUFFERS --> IPC_SIZE
    
 
   CUSTOM_HEADER --> CUSTOM_SIZE
 
   CUSTOM_PAYLOAD --> CUSTOM_ZERO

Aspect	Arrow IPC	LLKV Custom Format
File size	Larger (metadata + padding)	Minimal (24-byte header + payload)
Deserialization	Allocates and copies buffers	Zero-copy via `EntryHandle`
Flexibility	Supports all Arrow features	Limited to non-null arrays
Scan performance	Moderate (copy overhead)	Fast (direct SIMD access)
Null support	Full bitmap support	Not yet implemented

Design Goals:

Minimal headers : 24-byte fixed header, no schema objects per array
Predictable payloads : contiguous buffers for mmap-friendly access
True zero-copy : reconstruct ArrayData referencing original buffer directly
Stable on-disk codes : type tags are compile-time pinned to prevent corruption

Sources: llkv-storage/src/serialization.rs:1-28 llkv-storage/src/serialization.rs:41-135

Serialization Format Details

Header Structure

Every serialized array begins with a 24-byte header:

Offset  Size  Field
------  ----  -----
0-3     4     Magic bytes: b"ARR0"
4       1     Layout code (Primitive=0, FslFloat32=1, Varlen=2, Struct=3)
5       1     Type code (PrimType enum value)
6       1     Precision (for Decimal128) or padding
7       1     Scale (for Decimal128) or padding
8-15    8     Array length (u64, element count)
16-19   4     extra_a (layout-specific u32)
20-23   4     extra_b (layout-specific u32)
24+     var   Payload (layout-specific buffer bytes)

Layout Variants

Primitive Layout

For fixed-width types (Int32, Float64, etc.):

extra_a: Length of values buffer in bytes
extra_b: Unused (0)
Payload: Raw values buffer

Varlen Layout

For variable-length types (Utf8, Binary, etc.):

extra_a: Length of offsets buffer in bytes
extra_b: Length of values buffer in bytes
Payload: Offsets buffer followed by values buffer

FixedSizeList Layout

Special optimization for vector embeddings:

extra_a: List size (elements per list)
extra_b: Total child buffer length in bytes
Payload: Contiguous child Float32 buffer

This enables direct SIMD access to embedding vectors without indirection.

Struct Layout

For nested composite types:

extra_a: Unused (0)
extra_b: IPC payload length in bytes
Payload: Arrow IPC-serialized struct array

Struct types fall back to Arrow IPC format because their complex nested structure doesn't benefit from the custom layout.

Sources: llkv-storage/src/serialization.rs:44-135 llkv-storage/src/serialization.rs:256-378

graph TB
    PAGER["Pager::batch_get"]
HANDLE["EntryHandle\nmemory-mapped region"]
BUFFER["Arrow Buffer\nslice of EntryHandle"]
ARRAYDATA["ArrayData\nreferences Buffer"]
ARRAY["Concrete Array\nInt32Array, etc."]
PAGER --> HANDLE
 
   HANDLE --> BUFFER
 
   BUFFER --> ARRAYDATA
 
   ARRAYDATA --> ARRAY
    
    style HANDLE fill:#f9f9f9
    style BUFFER fill:#f9f9f9

Zero-Copy Deserialization

EntryHandle Integration

The EntryHandle type from simd-r-drive-entry-handle provides a zero-copy wrapper around memory-mapped buffers:

Key Operations:

Pager read : Returns GetResult::Raw { key, bytes: EntryHandle }
Buffer slice : EntryHandle::as_arrow_buffer() creates an Arrow Buffer view
ArrayData build : ArrayData::builder().add_buffer(buffer).build()
Array cast : make_array(data) produces typed arrays

The entire chain avoids copying data — Arrow arrays directly reference the memory-mapped region.

Alignment Requirements:

Decimal128 requires 16-byte alignment. If the EntryHandle buffer is not properly aligned, the deserializer copies it to an aligned buffer:

Sources: llkv-storage/src/serialization.rs:429-559 llkv-column-map/src/store/projection.rs:619-629

graph TB
    ROWIDS["row_ids: &[u64]\nrequested rows"]
FIELDIDS["field_ids: &[LogicalFieldId]\nrequested columns"]
PLANS["FieldPlan\nper-column metadata"]
CHUNKS["Chunk selection\ncandidate_indices"]
BATCH_GET["Pager::batch_get\nfetch chunks"]
CACHE["Chunk cache\nArrayRef map"]
GATHER["gather_rows_from_chunks\nper-type specialization"]
ARRAYS["Vec<ArrayRef>\none per field"]
SCHEMA["Arrow Schema\nField metadata"]
RECORDBATCH["RecordBatch::try_new"]
ROWIDS --> PLANS
 
   FIELDIDS --> PLANS
 
   PLANS --> CHUNKS
 
   CHUNKS --> BATCH_GET
 
   BATCH_GET --> CACHE
 
   CACHE --> GATHER
 
   GATHER --> ARRAYS
 
   ARRAYS --> RECORDBATCH
 
   SCHEMA --> RECORDBATCH

RecordBatch Construction and Projection

Gather Operations

The ColumnStore::gather_rows family of methods reconstructs RecordBatch from chunked columns:

Projection Flow:

Prepare context : Load column descriptors, determine chunk candidates
Batch fetch : Request all needed chunks from pager in one call
Type-specific gather : Dispatch to specialized routines based on DataType
Null policy : Apply GatherNullPolicy (ErrorOnMissing, IncludeNulls, DropNulls)
Schema construction : Build Schema with correct field names and nullability
RecordBatch assembly : RecordBatch::try_new(schema, arrays)

Type Dispatch Table:

DataType	Gather Function
`Utf8`	`gather_rows_from_chunks_string::<i32>`
`LargeUtf8`	`gather_rows_from_chunks_string::<i64>`
`Binary`	`gather_rows_from_chunks_binary::<i32>`
`LargeBinary`	`gather_rows_from_chunks_binary::<i64>`
`Boolean`	`gather_rows_from_chunks_bool`
`Struct(_)`	`gather_rows_from_chunks_struct`
`Decimal128(_, _)`	`gather_rows_from_chunks_decimal128`
Primitives	`gather_rows_from_chunks::<ArrowTy>` (generic)

Sources: llkv-column-map/src/store/projection.rs:245-446 llkv-column-map/src/store/projection.rs:636-726

graph TB
    BATCH["RecordBatch\nuser data"]
TABLE_SCHEMA["Stored Schema\nfrom catalog"]
VALIDATE["Schema validation"]
FIELD_CHECK["Field count\nname\ntype match"]
MVCC["Inject MVCC columns\nrow_id, created_by,\ndeleted_by"]
EXTENDED["Extended RecordBatch"]
COLMAP["ColumnStore::append"]
BATCH --> VALIDATE
 
   TABLE_SCHEMA --> VALIDATE
 
   VALIDATE --> FIELD_CHECK
 
   FIELD_CHECK -->|Pass| MVCC
 
   FIELD_CHECK -->|Fail| ERROR["Error::SchemaMismatch"]
MVCC --> EXTENDED
 
   EXTENDED --> COLMAP

Schema Validation

Table-Level Schema Enforcement

The Table layer validates incoming RecordBatch schemas against the stored table schema before appending:

Validation Rules:

Field count : Batch must have exactly the same number of columns as the table schema
Field names : Column names must match (case-sensitive)
Field types : DataType must match exactly (no implicit coercion)
Nullability : Currently not strictly enforced (planned improvement)

MVCC Column Injection:

After validation, the table appends three system columns:

row_id (UInt64): Unique row identifier
created_by (UInt64): Transaction ID that created the row
deleted_by (UInt64): Transaction ID that deleted the row (0 if active)

These columns are stored in separate logical namespaces but physically alongside user data.

Sources: llkv-table/README.md:14-17 llkv-column-map/README.md:26-28

graph LR
    SQL_LITERAL["SQL Literal\n123, 'text', etc."]
LITERAL["Literal enum\nInteger, String, etc."]
SCHEMA["Table Schema\nArrow DataType"]
NATIVE["Native Value\ni32, String, etc."]
ARRAY["Arrow Array\nInt32Array, etc."]
SQL_LITERAL --> LITERAL
 
   LITERAL --> SCHEMA
 
   SCHEMA --> NATIVE
 
   NATIVE --> ARRAY

Type Mapping from SQL to Arrow

Literal Conversion

The llkv-expr crate defines a Literal enum that captures untyped SQL values before schema resolution:

Supported Literal Types:

Literal Variant	Arrow Target Types
`Integer(i128)`	Any integer or float type (with range checks)
`Float(f64)`	`Float32`, `Float64`
`Decimal(DecimalValue)`	`Decimal128(p, s)`
`String(String)`	`Utf8`, `LargeUtf8`
`Boolean(bool)`	`Boolean`
`Date32(i32)`	`Date32`
`Struct(fields)`	`Struct(...)`
`Interval(IntervalValue)`	Not directly stored; used for date arithmetic

Conversion Mechanism:

The FromLiteral trait provides type-aware conversion:

Implementations perform range checking and type validation:

Sources: llkv-expr/src/literal.rs:78-94 llkv-expr/src/literal.rs:156-219 llkv-expr/src/literal.rs:395-419

Performance Characteristics

Zero-Copy Benefits

The combination of Arrow's columnar layout and the custom serialization format delivers measurable performance benefits:

Operation	Traditional DB	LLKV Arrow-Native
Column scan	Row-by-row decode	Vectorized SIMD over mmap
Type dispatch	Virtual function calls	Monomorphized at compile time
Buffer management	Multiple allocations	Single mmap region
Predicate evaluation	Interpreted per row	Compiled bytecode over vectors

Chunking Strategy

The ColumnStore organizes data into chunks sized for cache locality and pager efficiency:

Target chunk size : Configurable, typically 64KB-256KB per column
Row alignment : All columns in a table share the same row boundaries per chunk
Append optimization : Incoming batches are chunked and sorted by row_id before persistence

This design minimizes pager I/O and maximizes CPU cache hit rates during scans.

Sources: llkv-column-map/README.md:24-28 llkv-storage/README.md:15-17

Integration with External Tools

Arrow Compatibility

Because LLKV uses standard Arrow data structures at its boundaries, it can integrate with the broader Arrow ecosystem:

Export : Query results can be serialized to Arrow IPC files for external processing
Import : Arrow IPC files can be read and ingested via Table::append
Parquet : Future work could add direct Parquet read/write using Arrow's parquet crate
DataFusion : LLKV's table scan APIs could potentially integrate as a DataFusion TableProvider

Current Limitations

Null support : The serialization format doesn't yet handle null bitmaps
Nested types : Only Struct and FixedSizeList<Float32> are fully supported
Dictionary encoding : Not yet implemented (planned)
Compression : No built-in compression (relies on storage-layer features)

Sources: llkv-storage/src/serialization.rs:257-260 llkv-column-map/README.md:10-11

Summary

LLKV's Arrow-native architecture provides:

Universal interchange format via RecordBatch across all layers
Zero-copy operations through EntryHandle and memory-mapped buffers
Custom serialization optimized for mmap and SIMD access patterns
Type safety from SQL literals through to persisted columns
SIMD-friendly layout for efficient vectorized query evaluation

The trade-off of using a custom format instead of Arrow IPC is reduced flexibility (no nulls yet, fewer complex types) in exchange for smaller files, faster scans, and true zero-copy deserialization.

For details on how Arrow arrays are evaluated during query execution, see Scalar Evaluation and NumericKernels. For information on how MVCC metadata is stored alongside Arrow columns, see Column Storage and ColumnStore.

Keyboard shortcuts

rust-llkv Documentation