This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Custom Types and Type Registry
Relevant source files
- llkv-aggregate/src/lib.rs
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-sql/src/lib.rs
- llkv-sql/src/sql_value.rs
- llkv-table/src/planner/program.rs
This page documents LLKV's type system, including SQL type mapping, custom type representations, and type inference mechanisms. The system uses Apache Arrow's DataType as its canonical type representation, with custom types like Decimal128, Date32, and Interval mapped to Arrow-compatible formats.
For information about expression evaluation and scalar operations, see Scalar Evaluation and NumericKernels. For aggregate function type handling, see Aggregation System.
Type System Architecture
LLKV's type system operates in three layers: SQL types (user-facing), intermediate literal types (planning), and Arrow DataTypes (execution). All data flowing through the system ultimately uses Arrow's columnar format.
Type Flow Architecture
graph TB
subgraph "SQL Layer"
SQLTYPE["SQL Types\nINT, TEXT, DECIMAL, DATE"]
end
subgraph "Planning Layer"
SQLVALUE["SqlValue\nInteger, Float, Decimal,\nString, Date32, Interval, Struct"]
LITERAL["Literal\nType-erased values"]
PLANVALUE["PlanValue\nPlan-time literals"]
end
subgraph "Execution Layer"
DATATYPE["Arrow DataType\nInt64, Float64, Utf8,\nDecimal128, Date32, Interval"]
SCHEMA["ExecutorSchema\nColumn metadata + types"]
INFERENCE["Type Inference\ninfer_computed_data_type"]
end
subgraph "Storage Layer"
RECORDBATCH["RecordBatch\nTyped columnar data"]
ARRAYS["Typed Arrays\nInt64Array, StringArray, etc."]
end
SQLTYPE --> SQLVALUE
SQLVALUE --> PLANVALUE
SQLVALUE --> LITERAL
PLANVALUE --> DATATYPE
LITERAL --> DATATYPE
DATATYPE --> SCHEMA
SCHEMA --> INFERENCE
INFERENCE --> DATATYPE
DATATYPE --> RECORDBATCH
RECORDBATCH --> ARRAYS
style DATATYPE fill:#f9f9f9
Sources: llkv-sql/src/sql_value.rs:16-27 llkv-sql/src/lib.rs:22-29 llkv-executor/src/translation/schema.rs:53-123
SQL to Arrow Type Mapping
SQL types are mapped to Arrow DataTypes during parsing and planning. The mapping is defined implicitly through the parsing logic in SqlValue and the type inference system.
| SQL Type | Arrow DataType | Notes |
|---|---|---|
INT, INTEGER, BIGINT | Int64 | All integer types normalized to Int64 |
FLOAT, DOUBLE, REAL | Float64 | All floating-point types normalized to Float64 |
DECIMAL(p,s) | Decimal128(p,s) | Fixed-point decimal with precision and scale |
TEXT, VARCHAR | Utf8 | Variable-length UTF-8 strings |
DATE | Date32 | Days since Unix epoch |
INTERVAL | Interval(MonthDayNano) | Calendar-aware interval type |
BOOLEAN | Boolean | True/false values |
| Dictionary literals | Struct | Key-value maps represented as structs |
SQL to Arrow Type Conversion Flow
Sources: llkv-sql/src/sql_value.rs:178-214 llkv-sql/src/sql_value.rs:216-236 llkv-sql/src/lib.rs:22-29
Custom Type Representations
LLKV defines custom types for values that require special handling beyond basic Arrow types. These types bridge SQL semantics and Arrow's columnar format.
DecimalValue
Fixed-point decimal numbers with exact precision. Stored as i128 with a scale factor.
DecimalValue Representation
graph TB
subgraph "DecimalValue Structure"
DEC["DecimalValue\nraw_value: i128\nscale: i8"]
end
subgraph "SQL Input"
SQLDEC["SQL: 123.45"]
end
subgraph "Internal Representation"
RAW["raw_value = 12345\nscale = 2"]
CALC["Actual value = 12345 / 10^2 = 123.45"]
end
subgraph "Arrow Storage"
ARR["Decimal128Array\nprecision=5, scale=2"]
end
SQLDEC --> DEC
DEC --> RAW
RAW --> CALC
DEC --> ARR
Sources: llkv-sql/src/sql_value.rs:187-207 llkv-aggregate/src/lib.rs:314-324
IntervalValue
Calendar-aware time intervals with month, day, and nanosecond components.
IntervalValue Operations
Sources: llkv-sql/src/sql_value.rs:238-283 llkv-expr/src/literal.rs
Date32
Days since Unix epoch (1970-01-01), stored as i32.
Date32 Type Handling
Sources: llkv-sql/src/sql_value.rs:76-87 llkv-sql/src/sql_value.rs:169-174
Struct Types
Dictionary literals in SQL are represented as struct types with named fields.
Struct Type Representation
Sources: llkv-sql/src/sql_value.rs:124-135 llkv-sql/src/sql_value.rs:227-234
Type Inference for Computed Expressions
The type inference system determines Arrow DataTypes for computed expressions at planning time. This enables schema generation before execution.
Type Inference Flow
graph TB
subgraph "Expression Input"
EXPR["ScalarExpr<FieldId>\ncol1 + col2 * 3"]
end
subgraph "Type Inference"
INFER["infer_computed_data_type"]
CHECK["expression_uses_float"]
NORM["normalized_numeric_type"]
end
subgraph "Type Resolution"
COL1["Column col1: Int64"]
COL2["Column col2: Float64"]
RESULT["Result: Float64\n(one operand is float)"]
end
subgraph "Schema Output"
FIELD["Field(alias, Float64, nullable=true)"]
end
EXPR --> INFER
INFER --> CHECK
CHECK --> COL1
CHECK --> COL2
CHECK --> RESULT
INFER --> NORM
NORM --> RESULT
RESULT --> FIELD
Sources: llkv-executor/src/translation/schema.rs:53-123 llkv-executor/src/translation/schema.rs:149-243
Type Inference Rules
The inference system applies the following rules:
| Expression Type | Inferred Type | Logic |
|---|---|---|
ScalarExpr::Literal(Integer) | Int64 | Direct mapping |
ScalarExpr::Literal(Float) | Float64 | Direct mapping |
ScalarExpr::Literal(Decimal(p,s)) | Decimal128(p,s) | Preserves precision/scale |
ScalarExpr::Column(field_id) | Column's type | Lookup in schema |
ScalarExpr::Binary{left, op, right} | Float64 if any operand is float, else Int64 | Type promotion |
ScalarExpr::Compare{...} | Int64 | Boolean as integer (0/1) |
ScalarExpr::Cast{data_type, ...} | data_type | Explicit cast target |
ScalarExpr::Random | Float64 | Floating-point random values |
Sources: llkv-executor/src/translation/schema.rs:56-122
graph TB
subgraph "Input Types"
INT8["Int8/Int16/Int32/Int64"]
UINT["UInt8/UInt16/UInt32/UInt64"]
FLOAT["Float32/Float64"]
DEC["Decimal128(p,s)"]
BOOL["Boolean"]
end
subgraph "Normalization"
NORM["normalized_numeric_type"]
end
subgraph "Output Types"
OUT_INT["Int64"]
OUT_FLOAT["Float64"]
end
INT8 --> NORM
BOOL --> NORM
NORM --> OUT_INT
UINT --> NORM
FLOAT --> NORM
NORM --> OUT_FLOAT
DEC --> NORM
NORM --> |scale=0 && fits in i64| OUT_INT
NORM --> |otherwise| OUT_FLOAT
Numeric Type Normalization
All numeric types are normalized to either Int64 or Float64 for arithmetic operations:
Numeric Type Normalization
Sources: llkv-executor/src/translation/schema.rs:125-147
Type Resolution During Expression Translation
Expression translation converts string-based column references to typed FieldId references, resolving types through the schema.
Expression Translation and Type Resolution
graph TB
subgraph "String-based Expression"
EXPRSTR["Expr<String>\nColumn('age') > Literal(18)"]
end
subgraph "Translation"
TRANS["translate_predicate"]
SCALAR["translate_scalar"]
RESOLVE["resolve_field_id"]
end
subgraph "Schema Lookup"
SCHEMA["ExecutorSchema"]
LOOKUP["schema.resolve('age')"]
COLUMN["ExecutorColumn\nname='age'\nfield_id=5\ndata_type=Int64"]
end
subgraph "FieldId-based Expression"
EXPRFID["Expr<FieldId>\nColumn(5) > Literal(18)"]
end
EXPRSTR --> TRANS
TRANS --> SCALAR
SCALAR --> RESOLVE
RESOLVE --> LOOKUP
LOOKUP --> COLUMN
COLUMN --> EXPRFID
Sources: llkv-executor/src/translation/expression.rs:18-174 llkv-executor/src/translation/expression.rs:390-407
Type Preservation During Translation
The translation process preserves type information from the original expression:
| Expression Component | Type Preservation |
|---|---|
Column(name) | Replaced with Column(field_id), type from schema |
Literal(value) | Clone literal, type embedded in Literal enum |
Binary{left, op, right} | Recursively translate operands, type inferred later |
Cast{expr, data_type} | Preserve data_type during translation |
Aggregate(call) | Translate inner expression, aggregate type determined by function |
Sources: llkv-executor/src/translation/expression.rs:176-387
graph TB
subgraph "Aggregate Specification"
SPEC["AggregateKind::Sum\nfield_id=5\ndata_type=Int64\ndistinct=false"]
end
subgraph "Accumulator Creation"
CREATE["new_with_projection_index"]
MATCH["Match on (data_type, distinct)"]
end
subgraph "Type-Specific Accumulators"
INT64["SumInt64\nvalue: Option<i64>\nhas_values: bool"]
FLOAT64["SumFloat64\nvalue: f64\nsaw_value: bool"]
DEC128["SumDecimal128\nsum: i128\nprecision: u8\nscale: i8"]
end
subgraph "Update Logic"
UPDATE_INT["Checked addition\nError on overflow"]
UPDATE_FLOAT["Floating addition\nNo overflow check"]
UPDATE_DEC["Checked i128 addition\nError on overflow"]
end
SPEC --> CREATE
CREATE --> MATCH
MATCH --> |Int64, false| INT64
MATCH --> |Float64, false| FLOAT64
MATCH --> |Decimal128 p,s , false| DEC128
INT64 --> UPDATE_INT
FLOAT64 --> UPDATE_FLOAT
DEC128 --> UPDATE_DEC
Type Handling in Aggregates
Aggregate functions have type-specific accumulator implementations. The type determines overflow behavior, precision, and result format.
Aggregate Type-Specific Accumulators
Sources: llkv-aggregate/src/lib.rs:461-542 llkv-aggregate/src/lib.rs:799-859
Aggregate Type Matrix
Different aggregates support different type combinations:
| Aggregate | Int64 | Float64 | Decimal128 | Utf8 | Boolean | Date32 |
|---|---|---|---|---|---|---|
COUNT(*) | N/A | N/A | N/A | N/A | N/A | N/A |
COUNT(col) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
SUM | ✓ | ✓ | ✓ | ✓ (coerce) | - | - |
AVG | ✓ | ✓ | ✓ | ✓ (coerce) | - | - |
MIN/MAX | ✓ | ✓ | ✓ | ✓ (coerce) | - | - |
TOTAL | ✓ | ✓ | ✓ | ✓ (coerce) | - | - |
GROUP_CONCAT | ✓ | ✓ | - | ✓ | ✓ | - |
Notes:
- ✓ = Native support with type-specific accumulator
- ✓ (coerce) = Support via SQLite-style numeric coercion
-= Not supported
Sources: llkv-aggregate/src/lib.rs:22-68 llkv-aggregate/src/lib.rs:385-447
graph LR
subgraph "DistinctKey Variants"
INT["Int(i64)"]
FLOAT["Float(u64)\nf64::to_bits()"]
STR["Str(String)"]
BOOL["Bool(bool)"]
DATE["Date(i32)"]
DEC["Decimal(i128)"]
end
subgraph "Accumulator"
SEEN["FxHashSet<DistinctKey>"]
INSERT["seen.insert(key)"]
CHECK["Returns true if new"]
end
subgraph "Aggregation"
ADD["Add to sum only if new"]
COUNT["Count distinct values"]
end
INT --> SEEN
FLOAT --> SEEN
STR --> SEEN
BOOL --> SEEN
DATE --> SEEN
DEC --> SEEN
SEEN --> INSERT
INSERT --> CHECK
CHECK --> ADD
CHECK --> COUNT
Distinct Value Tracking
For DISTINCT aggregates, the system tracks seen values using type-specific keys:
Distinct Value Tracking
Sources: llkv-aggregate/src/lib.rs:249-333 llkv-aggregate/src/lib.rs:825-858
Type Coercion and Casting
The system supports both implicit coercion (for numeric operations) and explicit casting (via CAST expressions).
graph TB
subgraph "Input Values"
STR["String '123.45'"]
BOOL["Boolean true"]
NULL["NULL"]
end
subgraph "Coercion Function"
COERCE["array_value_to_numeric"]
PARSE["Parse as f64"]
FALLBACK["Use 0.0 if parse fails"]
end
subgraph "Coerced Values"
NUM1["123.45"]
NUM2["1.0"]
NUM3["0.0 (NULL skipped)"]
end
STR --> COERCE
BOOL --> COERCE
NULL --> COERCE
COERCE --> PARSE
PARSE --> |Success| NUM1
PARSE --> |Failure| FALLBACK
FALLBACK --> NUM1
COERCE --> |Boolean: 1.0/0.0| NUM2
COERCE --> |NULL: skip row| NUM3
Numeric Coercion in Aggregates
String and boolean values are coerced to numeric types in aggregate functions following SQLite semantics:
Numeric Coercion in Aggregates
Sources: llkv-aggregate/src/lib.rs:385-447 llkv-aggregate/src/lib.rs:860-877
Explicit Type Casting
The CAST expression provides explicit type conversion:
Explicit Type Casting
Sources: llkv-executor/src/translation/schema.rs95 llkv-expr/src/expr.rs:114-118
Type System Integration Points
The type system integrates with multiple layers of the architecture:
| Layer | Integration Point | Purpose |
|---|---|---|
| SQL Parsing | SqlValue::try_from_expr | Parse SQL literals into typed values |
| Planning | PlanValue conversion | Convert literals to plan representation |
| Schema Inference | infer_computed_data_type | Determine result types for expressions |
| Expression Translation | translate_scalar | Resolve column types from schema |
| Program Compilation | OwnedOperator | Store typed operators in bytecode |
| Execution | RecordBatch schema | Validate types match expected schema |
| Aggregation | Accumulator creation | Create type-specific aggregators |
| Storage | Arrow serialization | Persist typed data in columnar format |
Sources: llkv-sql/src/sql_value.rs:30-122 llkv-executor/src/translation/schema.rs:15-51 llkv-table/src/planner/program.rs:69-101
Summary
LLKV's type system is built on Apache Arrow's DataType as the canonical type representation, with custom types for SQL-specific semantics:
- SQL types are mapped to Arrow types during parsing through
SqlValue - Custom types (
Decimal,Interval,Date32,Struct) provide SQL-compatible semantics - Type inference determines result types for computed expressions at planning time
- Type resolution converts string column references to typed
FieldIdreferences - Aggregate functions use type-specific accumulators with appropriate overflow handling
- Type coercion follows SQLite semantics for numeric operations
The type system operates transparently across all layers, ensuring type safety from SQL parsing through storage while maintaining compatibility with Arrow's columnar format.
Sources: llkv-sql/src/lib.rs:1-51 llkv-executor/src/translation/schema.rs:1-271 llkv-aggregate/src/lib.rs:1-83