Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Custom Types and Type Registry

Relevant source files

This page documents LLKV's type system, including SQL type mapping, custom type representations, and type inference mechanisms. The system uses Apache Arrow's DataType as its canonical type representation, with custom types like Decimal128, Date32, and Interval mapped to Arrow-compatible formats.

For information about expression evaluation and scalar operations, see Scalar Evaluation and NumericKernels. For aggregate function type handling, see Aggregation System.


Type System Architecture

LLKV's type system operates in three layers: SQL types (user-facing), intermediate literal types (planning), and Arrow DataTypes (execution). All data flowing through the system ultimately uses Arrow's columnar format.

Type Flow Architecture

graph TB
    subgraph "SQL Layer"
        SQLTYPE["SQL Types\nINT, TEXT, DECIMAL, DATE"]
end
    
    subgraph "Planning Layer"
        SQLVALUE["SqlValue\nInteger, Float, Decimal,\nString, Date32, Interval, Struct"]
LITERAL["Literal\nType-erased values"]
PLANVALUE["PlanValue\nPlan-time literals"]
end
    
    subgraph "Execution Layer"
        DATATYPE["Arrow DataType\nInt64, Float64, Utf8,\nDecimal128, Date32, Interval"]
SCHEMA["ExecutorSchema\nColumn metadata + types"]
INFERENCE["Type Inference\ninfer_computed_data_type"]
end
    
    subgraph "Storage Layer"
        RECORDBATCH["RecordBatch\nTyped columnar data"]
ARRAYS["Typed Arrays\nInt64Array, StringArray, etc."]
end
    
 
   SQLTYPE --> SQLVALUE
 
   SQLVALUE --> PLANVALUE
 
   SQLVALUE --> LITERAL
 
   PLANVALUE --> DATATYPE
 
   LITERAL --> DATATYPE
 
   DATATYPE --> SCHEMA
 
   SCHEMA --> INFERENCE
 
   INFERENCE --> DATATYPE
 
   DATATYPE --> RECORDBATCH
 
   RECORDBATCH --> ARRAYS
    
    style DATATYPE fill:#f9f9f9

Sources: llkv-sql/src/sql_value.rs:16-27 llkv-sql/src/lib.rs:22-29 llkv-executor/src/translation/schema.rs:53-123


SQL to Arrow Type Mapping

SQL types are mapped to Arrow DataTypes during parsing and planning. The mapping is defined implicitly through the parsing logic in SqlValue and the type inference system.

SQL TypeArrow DataTypeNotes
INT, INTEGER, BIGINTInt64All integer types normalized to Int64
FLOAT, DOUBLE, REALFloat64All floating-point types normalized to Float64
DECIMAL(p,s)Decimal128(p,s)Fixed-point decimal with precision and scale
TEXT, VARCHARUtf8Variable-length UTF-8 strings
DATEDate32Days since Unix epoch
INTERVALInterval(MonthDayNano)Calendar-aware interval type
BOOLEANBooleanTrue/false values
Dictionary literalsStructKey-value maps represented as structs

SQL to Arrow Type Conversion Flow

Sources: llkv-sql/src/sql_value.rs:178-214 llkv-sql/src/sql_value.rs:216-236 llkv-sql/src/lib.rs:22-29


Custom Type Representations

LLKV defines custom types for values that require special handling beyond basic Arrow types. These types bridge SQL semantics and Arrow's columnar format.

DecimalValue

Fixed-point decimal numbers with exact precision. Stored as i128 with a scale factor.

DecimalValue Representation

graph TB
    subgraph "DecimalValue Structure"
        DEC["DecimalValue\nraw_value: i128\nscale: i8"]
end
    
    subgraph "SQL Input"
        SQLDEC["SQL: 123.45"]
end
    
    subgraph "Internal Representation"
        RAW["raw_value = 12345\nscale = 2"]
CALC["Actual value = 12345 / 10^2 = 123.45"]
end
    
    subgraph "Arrow Storage"
        ARR["Decimal128Array\nprecision=5, scale=2"]
end
    
 
   SQLDEC --> DEC
 
   DEC --> RAW
 
   RAW --> CALC
 
   DEC --> ARR

Sources: llkv-sql/src/sql_value.rs:187-207 llkv-aggregate/src/lib.rs:314-324

IntervalValue

Calendar-aware time intervals with month, day, and nanosecond components.

IntervalValue Operations

Sources: llkv-sql/src/sql_value.rs:238-283 llkv-expr/src/literal.rs

Date32

Days since Unix epoch (1970-01-01), stored as i32.

Date32 Type Handling

Sources: llkv-sql/src/sql_value.rs:76-87 llkv-sql/src/sql_value.rs:169-174

Struct Types

Dictionary literals in SQL are represented as struct types with named fields.

Struct Type Representation

Sources: llkv-sql/src/sql_value.rs:124-135 llkv-sql/src/sql_value.rs:227-234


Type Inference for Computed Expressions

The type inference system determines Arrow DataTypes for computed expressions at planning time. This enables schema generation before execution.

Type Inference Flow

graph TB
    subgraph "Expression Input"
        EXPR["ScalarExpr<FieldId>\ncol1 + col2 * 3"]
end
    
    subgraph "Type Inference"
        INFER["infer_computed_data_type"]
CHECK["expression_uses_float"]
NORM["normalized_numeric_type"]
end
    
    subgraph "Type Resolution"
        COL1["Column col1: Int64"]
COL2["Column col2: Float64"]
RESULT["Result: Float64\n(one operand is float)"]
end
    
    subgraph "Schema Output"
        FIELD["Field(alias, Float64, nullable=true)"]
end
    
 
   EXPR --> INFER
 
   INFER --> CHECK
 
   CHECK --> COL1
 
   CHECK --> COL2
 
   CHECK --> RESULT
 
   INFER --> NORM
 
   NORM --> RESULT
 
   RESULT --> FIELD

Sources: llkv-executor/src/translation/schema.rs:53-123 llkv-executor/src/translation/schema.rs:149-243

Type Inference Rules

The inference system applies the following rules:

Expression TypeInferred TypeLogic
ScalarExpr::Literal(Integer)Int64Direct mapping
ScalarExpr::Literal(Float)Float64Direct mapping
ScalarExpr::Literal(Decimal(p,s))Decimal128(p,s)Preserves precision/scale
ScalarExpr::Column(field_id)Column's typeLookup in schema
ScalarExpr::Binary{left, op, right}Float64 if any operand is float, else Int64Type promotion
ScalarExpr::Compare{...}Int64Boolean as integer (0/1)
ScalarExpr::Cast{data_type, ...}data_typeExplicit cast target
ScalarExpr::RandomFloat64Floating-point random values

Sources: llkv-executor/src/translation/schema.rs:56-122

graph TB
    subgraph "Input Types"
        INT8["Int8/Int16/Int32/Int64"]
UINT["UInt8/UInt16/UInt32/UInt64"]
FLOAT["Float32/Float64"]
DEC["Decimal128(p,s)"]
BOOL["Boolean"]
end
    
    subgraph "Normalization"
        NORM["normalized_numeric_type"]
end
    
    subgraph "Output Types"
        OUT_INT["Int64"]
OUT_FLOAT["Float64"]
end
    
 
   INT8 --> NORM
 
   BOOL --> NORM
 
   NORM --> OUT_INT
    
 
   UINT --> NORM
 
   FLOAT --> NORM
 
   NORM --> OUT_FLOAT
    
 
   DEC --> NORM
 
   NORM --> |scale=0 && fits in i64| OUT_INT
 
   NORM --> |otherwise| OUT_FLOAT

Numeric Type Normalization

All numeric types are normalized to either Int64 or Float64 for arithmetic operations:

Numeric Type Normalization

Sources: llkv-executor/src/translation/schema.rs:125-147


Type Resolution During Expression Translation

Expression translation converts string-based column references to typed FieldId references, resolving types through the schema.

Expression Translation and Type Resolution

graph TB
    subgraph "String-based Expression"
        EXPRSTR["Expr<String>\nColumn('age') > Literal(18)"]
end
    
    subgraph "Translation"
        TRANS["translate_predicate"]
SCALAR["translate_scalar"]
RESOLVE["resolve_field_id"]
end
    
    subgraph "Schema Lookup"
        SCHEMA["ExecutorSchema"]
LOOKUP["schema.resolve('age')"]
COLUMN["ExecutorColumn\nname='age'\nfield_id=5\ndata_type=Int64"]
end
    
    subgraph "FieldId-based Expression"
        EXPRFID["Expr<FieldId>\nColumn(5) > Literal(18)"]
end
    
 
   EXPRSTR --> TRANS
 
   TRANS --> SCALAR
 
   SCALAR --> RESOLVE
 
   RESOLVE --> LOOKUP
 
   LOOKUP --> COLUMN
 
   COLUMN --> EXPRFID

Sources: llkv-executor/src/translation/expression.rs:18-174 llkv-executor/src/translation/expression.rs:390-407

Type Preservation During Translation

The translation process preserves type information from the original expression:

Expression ComponentType Preservation
Column(name)Replaced with Column(field_id), type from schema
Literal(value)Clone literal, type embedded in Literal enum
Binary{left, op, right}Recursively translate operands, type inferred later
Cast{expr, data_type}Preserve data_type during translation
Aggregate(call)Translate inner expression, aggregate type determined by function

Sources: llkv-executor/src/translation/expression.rs:176-387


graph TB
    subgraph "Aggregate Specification"
        SPEC["AggregateKind::Sum\nfield_id=5\ndata_type=Int64\ndistinct=false"]
end
    
    subgraph "Accumulator Creation"
        CREATE["new_with_projection_index"]
MATCH["Match on (data_type, distinct)"]
end
    
    subgraph "Type-Specific Accumulators"
        INT64["SumInt64\nvalue: Option<i64>\nhas_values: bool"]
FLOAT64["SumFloat64\nvalue: f64\nsaw_value: bool"]
DEC128["SumDecimal128\nsum: i128\nprecision: u8\nscale: i8"]
end
    
    subgraph "Update Logic"
        UPDATE_INT["Checked addition\nError on overflow"]
UPDATE_FLOAT["Floating addition\nNo overflow check"]
UPDATE_DEC["Checked i128 addition\nError on overflow"]
end
    
 
   SPEC --> CREATE
 
   CREATE --> MATCH
 
   MATCH --> |Int64, false| INT64
 
   MATCH --> |Float64, false| FLOAT64
 
   MATCH --> |Decimal128 p,s , false| DEC128
    
 
   INT64 --> UPDATE_INT
 
   FLOAT64 --> UPDATE_FLOAT
 
   DEC128 --> UPDATE_DEC

Type Handling in Aggregates

Aggregate functions have type-specific accumulator implementations. The type determines overflow behavior, precision, and result format.

Aggregate Type-Specific Accumulators

Sources: llkv-aggregate/src/lib.rs:461-542 llkv-aggregate/src/lib.rs:799-859

Aggregate Type Matrix

Different aggregates support different type combinations:

AggregateInt64Float64Decimal128Utf8BooleanDate32
COUNT(*)N/AN/AN/AN/AN/AN/A
COUNT(col)
SUM✓ (coerce)--
AVG✓ (coerce)--
MIN/MAX✓ (coerce)--
TOTAL✓ (coerce)--
GROUP_CONCAT--

Notes:

  • ✓ = Native support with type-specific accumulator
  • ✓ (coerce) = Support via SQLite-style numeric coercion
  • - = Not supported

Sources: llkv-aggregate/src/lib.rs:22-68 llkv-aggregate/src/lib.rs:385-447

graph LR
    subgraph "DistinctKey Variants"
        INT["Int(i64)"]
FLOAT["Float(u64)\nf64::to_bits()"]
STR["Str(String)"]
BOOL["Bool(bool)"]
DATE["Date(i32)"]
DEC["Decimal(i128)"]
end
    
    subgraph "Accumulator"
        SEEN["FxHashSet<DistinctKey>"]
INSERT["seen.insert(key)"]
CHECK["Returns true if new"]
end
    
    subgraph "Aggregation"
        ADD["Add to sum only if new"]
COUNT["Count distinct values"]
end
    
 
   INT --> SEEN
 
   FLOAT --> SEEN
 
   STR --> SEEN
 
   BOOL --> SEEN
 
   DATE --> SEEN
 
   DEC --> SEEN
    
 
   SEEN --> INSERT
 
   INSERT --> CHECK
 
   CHECK --> ADD
 
   CHECK --> COUNT

Distinct Value Tracking

For DISTINCT aggregates, the system tracks seen values using type-specific keys:

Distinct Value Tracking

Sources: llkv-aggregate/src/lib.rs:249-333 llkv-aggregate/src/lib.rs:825-858


Type Coercion and Casting

The system supports both implicit coercion (for numeric operations) and explicit casting (via CAST expressions).

graph TB
    subgraph "Input Values"
        STR["String '123.45'"]
BOOL["Boolean true"]
NULL["NULL"]
end
    
    subgraph "Coercion Function"
        COERCE["array_value_to_numeric"]
PARSE["Parse as f64"]
FALLBACK["Use 0.0 if parse fails"]
end
    
    subgraph "Coerced Values"
        NUM1["123.45"]
NUM2["1.0"]
NUM3["0.0 (NULL skipped)"]
end
    
 
   STR --> COERCE
 
   BOOL --> COERCE
 
   NULL --> COERCE
    
 
   COERCE --> PARSE
 
   PARSE --> |Success| NUM1
 
   PARSE --> |Failure| FALLBACK
 
   FALLBACK --> NUM1
    
 
   COERCE --> |Boolean: 1.0/0.0| NUM2
 
   COERCE --> |NULL: skip row| NUM3

Numeric Coercion in Aggregates

String and boolean values are coerced to numeric types in aggregate functions following SQLite semantics:

Numeric Coercion in Aggregates

Sources: llkv-aggregate/src/lib.rs:385-447 llkv-aggregate/src/lib.rs:860-877

Explicit Type Casting

The CAST expression provides explicit type conversion:

Explicit Type Casting

Sources: llkv-executor/src/translation/schema.rs95 llkv-expr/src/expr.rs:114-118


Type System Integration Points

The type system integrates with multiple layers of the architecture:

LayerIntegration PointPurpose
SQL ParsingSqlValue::try_from_exprParse SQL literals into typed values
PlanningPlanValue conversionConvert literals to plan representation
Schema Inferenceinfer_computed_data_typeDetermine result types for expressions
Expression Translationtranslate_scalarResolve column types from schema
Program CompilationOwnedOperatorStore typed operators in bytecode
ExecutionRecordBatch schemaValidate types match expected schema
AggregationAccumulator creationCreate type-specific aggregators
StorageArrow serializationPersist typed data in columnar format

Sources: llkv-sql/src/sql_value.rs:30-122 llkv-executor/src/translation/schema.rs:15-51 llkv-table/src/planner/program.rs:69-101


Summary

LLKV's type system is built on Apache Arrow's DataType as the canonical type representation, with custom types for SQL-specific semantics:

  • SQL types are mapped to Arrow types during parsing through SqlValue
  • Custom types (Decimal, Interval, Date32, Struct) provide SQL-compatible semantics
  • Type inference determines result types for computed expressions at planning time
  • Type resolution converts string column references to typed FieldId references
  • Aggregate functions use type-specific accumulators with appropriate overflow handling
  • Type coercion follows SQLite semantics for numeric operations

The type system operates transparently across all layers, ensuring type safety from SQL parsing through storage while maintaining compatibility with Arrow's columnar format.

Sources: llkv-sql/src/lib.rs:1-51 llkv-executor/src/translation/schema.rs:1-271 llkv-aggregate/src/lib.rs:1-83