Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Data Formats and Arrow Integration

Relevant source files

Purpose and Scope

This page documents how Apache Arrow columnar data structures serve as the universal data interchange format throughout LLKV. It covers the supported Arrow data types, the custom zero-copy serialization format used for persistence, and how RecordBatch flows between layers.

For information about how expressions evaluate over Arrow data, see Expression System. For details on storage pager abstractions, see Pager Interface and SIMD Optimization.


Arrow as the Universal Data Format

LLKV is Arrow-native : every layer of the system produces, consumes, and operates on arrow::record_batch::RecordBatch structures. This design choice enables:

  • Zero-copy data access across layer boundaries
  • SIMD-friendly vectorized operations via contiguous columnar buffers
  • Unified type system from SQL parsing through storage
  • Efficient interoperability with external Arrow-compatible tools

The following table maps system layers to their Arrow usage:

LayerArrow Usage
llkv-sqlParses SQL and returns RecordBatch query results to callers
llkv-planConstructs plans that reference Arrow DataType and Field structures
llkv-executorStreams RecordBatch results during SELECT evaluation
llkv-tableValidates incoming batches against table schemas and persists columns
llkv-column-mapChunks RecordBatch columns into serialized Arrow arrays for pager storage
llkv-storageSerializes/deserializes Arrow arrays with a custom zero-copy format

Sources: Cargo.toml32 llkv-table/README.md:10-11 llkv-column-map/README.md10 llkv-storage/README.md10


RecordBatch Flow Diagram

Key Data Flow Observations:

  1. INSERT Path : RecordBatch → schema validation → MVCC column injection → column chunking → serialization → pager write
  2. SELECT Path : Pager read → deserialization → column gather → RecordBatch construction → streaming to executor
  3. Zero-Copy : EntryHandle wraps memory-mapped regions that Arrow arrays reference directly without copying

Sources: llkv-column-map/src/store/projection.rs:240-446 llkv-storage/src/serialization.rs:226-254 llkv-table/README.md:19-25


Supported Arrow Data Types

LLKV supports the following Arrow primitive and complex types:

Primitive Types

Arrow DataTypeStorage SizeSQL Type Mapping
UInt81 byteTINYINT UNSIGNED
UInt162 bytesSMALLINT UNSIGNED
UInt324 bytesINT UNSIGNED
UInt648 bytesBIGINT UNSIGNED
Int81 byteTINYINT
Int162 bytesSMALLINT
Int324 bytesINT
Int648 bytesBIGINT
Float324 bytesREAL, FLOAT
Float648 bytesDOUBLE PRECISION
Boolean1 bit (packed)BOOLEAN
Date324 bytesDATE
Date648 bytesTIMESTAMP
Decimal128(p, s)16 bytesDECIMAL(p, s)

Variable-Length Types

Arrow DataTypeStorage LayoutSQL Type Mapping
Utf8i32 offsets + UTF-8 bytesVARCHAR, TEXT
LargeUtf8i64 offsets + UTF-8 bytesTEXT (large)
Binaryi32 offsets + raw bytesVARBINARY, BLOB
LargeBinaryi64 offsets + raw bytesBLOB (large)

Complex Types

Arrow DataTypeDescriptionUse Cases
Struct(fields)Nested record with named fieldsComposite values, JSON-like data
FixedSizeList(T, n)Fixed-length array of type TVector embeddings, coordinate tuples

Null Handling:
The current serialization format does not yet support null bitmaps. Arrays with null_count() > 0 will return an error during serialization. Null support is planned for future releases.

Sources: llkv-storage/src/serialization.rs:144-165 llkv-storage/src/serialization.rs:199-224 llkv-expr/src/literal.rs:78-94


Custom Serialization Format

Why Not Arrow IPC?

LLKV uses a custom minimal serialization format instead of Arrow's standard IPC (Inter-Process Communication) format for several reasons:

Trade-offs:

graph LR
    subgraph "Arrow IPC Format"
        IPC_SCHEMA["Schema object\nframing metadata"]
IPC_PADDING["Padding alignment\n8/64 byte boundaries"]
IPC_BUFFERS["Buffer pointers\n+ offsets"]
IPC_SIZE["Larger file size\n~20-40% overhead"]
end
    
    subgraph "LLKV Custom Format"
        CUSTOM_HEADER["24-byte header\nfixed size"]
CUSTOM_PAYLOAD["Raw buffer bytes\ncontiguous"]
CUSTOM_ZERO["Zero-copy rebuild\ndirect mmap"]
CUSTOM_SIZE["Minimal size\nno framing"]
end
    
 
   IPC_SCHEMA --> IPC_SIZE
 
   IPC_PADDING --> IPC_SIZE
 
   IPC_BUFFERS --> IPC_SIZE
    
 
   CUSTOM_HEADER --> CUSTOM_SIZE
 
   CUSTOM_PAYLOAD --> CUSTOM_ZERO
AspectArrow IPCLLKV Custom Format
File sizeLarger (metadata + padding)Minimal (24-byte header + payload)
DeserializationAllocates and copies buffersZero-copy via EntryHandle
FlexibilitySupports all Arrow featuresLimited to non-null arrays
Scan performanceModerate (copy overhead)Fast (direct SIMD access)
Null supportFull bitmap supportNot yet implemented

Design Goals:

  1. Minimal headers : 24-byte fixed header, no schema objects per array
  2. Predictable payloads : contiguous buffers for mmap-friendly access
  3. True zero-copy : reconstruct ArrayData referencing original buffer directly
  4. Stable on-disk codes : type tags are compile-time pinned to prevent corruption

Sources: llkv-storage/src/serialization.rs:1-28 llkv-storage/src/serialization.rs:41-135


Serialization Format Details

Header Structure

Every serialized array begins with a 24-byte header:

Offset  Size  Field
------  ----  -----
0-3     4     Magic bytes: b"ARR0"
4       1     Layout code (Primitive=0, FslFloat32=1, Varlen=2, Struct=3)
5       1     Type code (PrimType enum value)
6       1     Precision (for Decimal128) or padding
7       1     Scale (for Decimal128) or padding
8-15    8     Array length (u64, element count)
16-19   4     extra_a (layout-specific u32)
20-23   4     extra_b (layout-specific u32)
24+     var   Payload (layout-specific buffer bytes)

Layout Variants

Primitive Layout

For fixed-width types (Int32, Float64, etc.):

  • extra_a: Length of values buffer in bytes
  • extra_b: Unused (0)
  • Payload: Raw values buffer

Varlen Layout

For variable-length types (Utf8, Binary, etc.):

  • extra_a: Length of offsets buffer in bytes
  • extra_b: Length of values buffer in bytes
  • Payload: Offsets buffer followed by values buffer

FixedSizeList Layout

Special optimization for vector embeddings:

  • extra_a: List size (elements per list)
  • extra_b: Total child buffer length in bytes
  • Payload: Contiguous child Float32 buffer

This enables direct SIMD access to embedding vectors without indirection.

Struct Layout

For nested composite types:

  • extra_a: Unused (0)
  • extra_b: IPC payload length in bytes
  • Payload: Arrow IPC-serialized struct array

Struct types fall back to Arrow IPC format because their complex nested structure doesn't benefit from the custom layout.

Sources: llkv-storage/src/serialization.rs:44-135 llkv-storage/src/serialization.rs:256-378


graph TB
    PAGER["Pager::batch_get"]
HANDLE["EntryHandle\nmemory-mapped region"]
BUFFER["Arrow Buffer\nslice of EntryHandle"]
ARRAYDATA["ArrayData\nreferences Buffer"]
ARRAY["Concrete Array\nInt32Array, etc."]
PAGER --> HANDLE
 
   HANDLE --> BUFFER
 
   BUFFER --> ARRAYDATA
 
   ARRAYDATA --> ARRAY
    
    style HANDLE fill:#f9f9f9
    style BUFFER fill:#f9f9f9

Zero-Copy Deserialization

EntryHandle Integration

The EntryHandle type from simd-r-drive-entry-handle provides a zero-copy wrapper around memory-mapped buffers:

Key Operations:

  1. Pager read : Returns GetResult::Raw { key, bytes: EntryHandle }
  2. Buffer slice : EntryHandle::as_arrow_buffer() creates an Arrow Buffer view
  3. ArrayData build : ArrayData::builder().add_buffer(buffer).build()
  4. Array cast : make_array(data) produces typed arrays

The entire chain avoids copying data — Arrow arrays directly reference the memory-mapped region.

Alignment Requirements:

Decimal128 requires 16-byte alignment. If the EntryHandle buffer is not properly aligned, the deserializer copies it to an aligned buffer:

Sources: llkv-storage/src/serialization.rs:429-559 llkv-column-map/src/store/projection.rs:619-629


graph TB
    ROWIDS["row_ids: &[u64]\nrequested rows"]
FIELDIDS["field_ids: &[LogicalFieldId]\nrequested columns"]
PLANS["FieldPlan\nper-column metadata"]
CHUNKS["Chunk selection\ncandidate_indices"]
BATCH_GET["Pager::batch_get\nfetch chunks"]
CACHE["Chunk cache\nArrayRef map"]
GATHER["gather_rows_from_chunks\nper-type specialization"]
ARRAYS["Vec<ArrayRef>\none per field"]
SCHEMA["Arrow Schema\nField metadata"]
RECORDBATCH["RecordBatch::try_new"]
ROWIDS --> PLANS
 
   FIELDIDS --> PLANS
 
   PLANS --> CHUNKS
 
   CHUNKS --> BATCH_GET
 
   BATCH_GET --> CACHE
 
   CACHE --> GATHER
 
   GATHER --> ARRAYS
 
   ARRAYS --> RECORDBATCH
 
   SCHEMA --> RECORDBATCH

RecordBatch Construction and Projection

Gather Operations

The ColumnStore::gather_rows family of methods reconstructs RecordBatch from chunked columns:

Projection Flow:

  1. Prepare context : Load column descriptors, determine chunk candidates
  2. Batch fetch : Request all needed chunks from pager in one call
  3. Type-specific gather : Dispatch to specialized routines based on DataType
  4. Null policy : Apply GatherNullPolicy (ErrorOnMissing, IncludeNulls, DropNulls)
  5. Schema construction : Build Schema with correct field names and nullability
  6. RecordBatch assembly : RecordBatch::try_new(schema, arrays)

Type Dispatch Table:

DataTypeGather Function
Utf8gather_rows_from_chunks_string::<i32>
LargeUtf8gather_rows_from_chunks_string::<i64>
Binarygather_rows_from_chunks_binary::<i32>
LargeBinarygather_rows_from_chunks_binary::<i64>
Booleangather_rows_from_chunks_bool
Struct(_)gather_rows_from_chunks_struct
Decimal128(_, _)gather_rows_from_chunks_decimal128
Primitivesgather_rows_from_chunks::<ArrowTy> (generic)

Sources: llkv-column-map/src/store/projection.rs:245-446 llkv-column-map/src/store/projection.rs:636-726


graph TB
    BATCH["RecordBatch\nuser data"]
TABLE_SCHEMA["Stored Schema\nfrom catalog"]
VALIDATE["Schema validation"]
FIELD_CHECK["Field count\nname\ntype match"]
MVCC["Inject MVCC columns\nrow_id, created_by,\ndeleted_by"]
EXTENDED["Extended RecordBatch"]
COLMAP["ColumnStore::append"]
BATCH --> VALIDATE
 
   TABLE_SCHEMA --> VALIDATE
 
   VALIDATE --> FIELD_CHECK
 
   FIELD_CHECK -->|Pass| MVCC
 
   FIELD_CHECK -->|Fail| ERROR["Error::SchemaMismatch"]
MVCC --> EXTENDED
 
   EXTENDED --> COLMAP

Schema Validation

Table-Level Schema Enforcement

The Table layer validates incoming RecordBatch schemas against the stored table schema before appending:

Validation Rules:

  1. Field count : Batch must have exactly the same number of columns as the table schema
  2. Field names : Column names must match (case-sensitive)
  3. Field types : DataType must match exactly (no implicit coercion)
  4. Nullability : Currently not strictly enforced (planned improvement)

MVCC Column Injection:

After validation, the table appends three system columns:

  • row_id (UInt64): Unique row identifier
  • created_by (UInt64): Transaction ID that created the row
  • deleted_by (UInt64): Transaction ID that deleted the row (0 if active)

These columns are stored in separate logical namespaces but physically alongside user data.

Sources: llkv-table/README.md:14-17 llkv-column-map/README.md:26-28


graph LR
    SQL_LITERAL["SQL Literal\n123, 'text', etc."]
LITERAL["Literal enum\nInteger, String, etc."]
SCHEMA["Table Schema\nArrow DataType"]
NATIVE["Native Value\ni32, String, etc."]
ARRAY["Arrow Array\nInt32Array, etc."]
SQL_LITERAL --> LITERAL
 
   LITERAL --> SCHEMA
 
   SCHEMA --> NATIVE
 
   NATIVE --> ARRAY

Type Mapping from SQL to Arrow

Literal Conversion

The llkv-expr crate defines a Literal enum that captures untyped SQL values before schema resolution:

Supported Literal Types:

Literal VariantArrow Target Types
Integer(i128)Any integer or float type (with range checks)
Float(f64)Float32, Float64
Decimal(DecimalValue)Decimal128(p, s)
String(String)Utf8, LargeUtf8
Boolean(bool)Boolean
Date32(i32)Date32
Struct(fields)Struct(...)
Interval(IntervalValue)Not directly stored; used for date arithmetic

Conversion Mechanism:

The FromLiteral trait provides type-aware conversion:

Implementations perform range checking and type validation:

Sources: llkv-expr/src/literal.rs:78-94 llkv-expr/src/literal.rs:156-219 llkv-expr/src/literal.rs:395-419


Performance Characteristics

Zero-Copy Benefits

The combination of Arrow's columnar layout and the custom serialization format delivers measurable performance benefits:

OperationTraditional DBLLKV Arrow-Native
Column scanRow-by-row decodeVectorized SIMD over mmap
Type dispatchVirtual function callsMonomorphized at compile time
Buffer managementMultiple allocationsSingle mmap region
Predicate evaluationInterpreted per rowCompiled bytecode over vectors

Chunking Strategy

The ColumnStore organizes data into chunks sized for cache locality and pager efficiency:

  • Target chunk size : Configurable, typically 64KB-256KB per column
  • Row alignment : All columns in a table share the same row boundaries per chunk
  • Append optimization : Incoming batches are chunked and sorted by row_id before persistence

This design minimizes pager I/O and maximizes CPU cache hit rates during scans.

Sources: llkv-column-map/README.md:24-28 llkv-storage/README.md:15-17


Integration with External Tools

Arrow Compatibility

Because LLKV uses standard Arrow data structures at its boundaries, it can integrate with the broader Arrow ecosystem:

  • Export : Query results can be serialized to Arrow IPC files for external processing
  • Import : Arrow IPC files can be read and ingested via Table::append
  • Parquet : Future work could add direct Parquet read/write using Arrow's parquet crate
  • DataFusion : LLKV's table scan APIs could potentially integrate as a DataFusion TableProvider

Current Limitations

  1. Null support : The serialization format doesn't yet handle null bitmaps
  2. Nested types : Only Struct and FixedSizeList<Float32> are fully supported
  3. Dictionary encoding : Not yet implemented (planned)
  4. Compression : No built-in compression (relies on storage-layer features)

Sources: llkv-storage/src/serialization.rs:257-260 llkv-column-map/README.md:10-11


Summary

LLKV's Arrow-native architecture provides:

  • Universal interchange format via RecordBatch across all layers
  • Zero-copy operations through EntryHandle and memory-mapped buffers
  • Custom serialization optimized for mmap and SIMD access patterns
  • Type safety from SQL literals through to persisted columns
  • SIMD-friendly layout for efficient vectorized query evaluation

The trade-off of using a custom format instead of Arrow IPC is reduced flexibility (no nulls yet, fewer complex types) in exchange for smaller files, faster scans, and true zero-copy deserialization.

For details on how Arrow arrays are evaluated during query execution, see Scalar Evaluation and NumericKernels. For information on how MVCC metadata is stored alongside Arrow columns, see Column Storage and ColumnStore.