This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression Translation
Relevant source files
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-table/src/planner/program.rs
Purpose and Scope
Expression Translation is the process of converting expressions from using string-based column names to using internal field identifiers. This transformation bridges the gap between the SQL interface layer (which references columns by name) and the execution layer (which references columns by numeric FieldId). This page covers the translation phase that occurs after query planning but before expression compilation.
For information about the expression AST structure, see Expression AST. For information about how translated expressions are compiled into bytecode, see Program Compilation.
Translation Phase in Query Pipeline
The expression translation phase sits between SQL planning and execution, converting symbolic column references to physical field identifiers.
Translation Phase Detail
graph LR
SQL["SQL WHERE\nname = 'Alice'"]
AST["sqlparser AST"]
EXPRSTR["Expr<String>\nColumn('name')"]
CATALOG["Schema Lookup"]
EXPRFID["Expr<FieldId>\nColumn(42)"]
COMPILE["Bytecode\nCompilation"]
SQL --> AST
AST --> EXPRSTR
EXPRSTR --> CATALOG
CATALOG --> EXPRFID
EXPRFID --> COMPILE
style EXPRSTR fill:#fff5e1
style EXPRFID fill:#ffe1e1
Sources: Based on Diagram 5 from system architecture overview
The translation process resolves all column name strings to their corresponding FieldId values by consulting the table schema. This enables the downstream execution engine to work with stable numeric identifiers rather than string lookups.
Generic Expression Types
Both Expr and ScalarExpr are generic over the field identifier type, parameterized as F.
Type Parameter Instantiations
graph TB
subgraph "Generic Types"
EXPR["Expr<'a, F>"]
SCALAR["ScalarExpr<F>"]
FILTER["Filter<'a, F>\n{field_id: F, op: Operator}"]
end
subgraph "String-Based (Planning)"
EXPRSTR["Expr<'static, String>"]
SCALARSTR["ScalarExpr<String>"]
FILTERSTR["Filter<'static, String>"]
end
subgraph "FieldId-Based (Execution)"
EXPRFID["Expr<'static, FieldId>"]
SCALARFID["ScalarExpr<FieldId>"]
FILTERFID["Filter<'static, FieldId>"]
end
EXPR -.instantiated as.-> EXPRSTR
EXPR -.instantiated as.-> EXPRFID
SCALAR -.instantiated as.-> SCALARSTR
SCALAR -.instantiated as.-> SCALARFID
FILTER -.instantiated as.-> FILTERSTR
FILTER -.instantiated as.-> FILTERFID
EXPRSTR ==translate_predicate==> EXPRFID
SCALARSTR ==translate_scalar==> SCALARFID
Sources: llkv-expr/src/expr.rs:14-43 llkv-executor/src/translation/expression.rs:18-27
| Type | Planning Phase | Execution Phase |
|---|---|---|
| Predicate Expression | Expr<'static, String> | Expr<'static, FieldId> |
| Scalar Expression | ScalarExpr<String> | ScalarExpr<FieldId> |
| Filter | Filter<'static, String> | Filter<'static, FieldId> |
| Field Reference | String column name | FieldId numeric identifier |
The generic parameter F allows the same AST structure to be used at different stages of query processing, with type safety enforcing that planning-phase expressions cannot be mixed with execution-phase expressions.
Sources: llkv-expr/src/expr.rs:14-333
Translation Functions
The llkv-executor crate provides two primary translation functions that recursively transform expression trees.
translate_predicate
Translates boolean predicate expressions (Expr<String> → Expr<FieldId>).
Predicate Translation Flow
Sources: llkv-executor/src/translation/expression.rs:18-174
Function Signature:
The function accepts:
expr: The predicate expression with string column referencesschema: The table schema for column name resolutionunknown_column: Error constructor for unresolved column names
Sources: llkv-executor/src/translation/expression.rs:18-27
translate_scalar
Translates scalar value expressions (ScalarExpr<String> → ScalarExpr<FieldId>).
Function Signature:
Sources: llkv-executor/src/translation/expression.rs:176-185
Variant-Specific Translation
The translation process handles each expression variant differently:
| Variant | Translation Approach |
|---|---|
Expr::Pred(Filter) | Resolve filter's field_id from String to FieldId |
Expr::And(Vec) / Expr::Or(Vec) | Recursively translate all child expressions |
Expr::Not(Box) | Recursively translate inner expression |
Expr::Compare | Translate both left and right scalar expressions |
Expr::InList | Translate target expression and all list items |
Expr::IsNull | Translate inner expression |
Expr::Literal(bool) | No translation needed, pass through |
Expr::Exists(SubqueryExpr) | Pass through subquery ID unchanged |
ScalarExpr::Column(String) | Resolve string to FieldId |
ScalarExpr::Literal | No translation needed, pass through |
ScalarExpr::Binary | Recursively translate left and right operands |
ScalarExpr::Aggregate | Recursively translate aggregate expression argument |
ScalarExpr::GetField | Recursively translate base expression |
ScalarExpr::Cast | Recursively translate inner expression |
ScalarExpr::Case | Translate operand, all branch conditions/results, and else branch |
ScalarExpr::Coalesce | Recursively translate all items |
Sources: llkv-executor/src/translation/expression.rs:86-143 llkv-executor/src/translation/expression.rs:197-386
graph TD
START["Column Name String"]
ROWID_CHECK{"Is 'rowid'\n(case-insensitive)?"}
ROWID["Return\nROW_ID_FIELD_ID"]
LOOKUP["schema.resolve(name)"]
FOUND{"Found in\nschema?"}
RETURN_FID["Return\ncolumn.field_id"]
ERROR["Invoke\nunknown_column\ncallback"]
START --> ROWID_CHECK
ROWID_CHECK -->|Yes| ROWID
ROWID_CHECK -->|No| LOOKUP
LOOKUP --> FOUND
FOUND -->|Yes| RETURN_FID
FOUND -->|No| ERROR
Field Resolution
The core of translation is resolving string column names to numeric field identifiers.
Field Resolution Logic
Sources: llkv-executor/src/translation/expression.rs:390-407
Special Column: rowid
The system provides a special pseudo-column named rowid that references the internal row identifier:
The rowid column is:
- Case-insensitive (accepts "ROWID", "rowid", "RowId", etc.)
- Available in all tables automatically
- Mapped to constant
ROW_ID_FIELD_IDfromllkv_table::ROW_ID_FIELD_ID - Corresponds to
ROW_ID_COLUMN_NAMEconstant fromllkv_column_map::ROW_ID_COLUMN_NAME
Sources: llkv-executor/src/translation/expression.rs:399-401 llkv-executor/src/translation/expression.rs2 llkv-executor/src/translation/expression.rs5
Schema Lookup
For non-special columns, resolution uses the ExecutorSchema::resolve method:
The schema lookup:
- Searches for a column with the given name
- Returns the
ExecutorColumnif found - Extracts the
field_idfrom the column metadata - Invokes the error callback if not found
Sources: llkv-executor/src/translation/expression.rs:403-406
graph TB
START["Initial Expression"]
PUSH_ENTER["Push Enter Frame"]
POP["Pop Frame"]
FRAME_TYPE{"Frame Type?"}
ENTER_NODE["Enter Node"]
NODE_TYPE{"Node Type?"}
AND_OR["And/Or"]
NOT["Not"]
LEAF["Leaf (Pred,\nCompare, etc.)"]
PUSH_EXIT["Push Exit Frame"]
PUSH_CHILDREN["Push Child\nEnter Frames\n(reversed)"]
PUSH_EXIT_NOT["Push Exit Frame"]
PUSH_INNER["Push Inner\nEnter Frame"]
TRANSLATE_LEAF["Translate Leaf\nPush to result_stack"]
EXIT_NODE["Exit Node"]
POP_RESULTS["Pop child results\nfrom result_stack"]
BUILD_NODE["Build translated\nparent node"]
PUSH_RESULT["Push to result_stack"]
DONE{"Stack\nempty?"}
RETURN["Return final result"]
START --> PUSH_ENTER
PUSH_ENTER --> POP
POP --> FRAME_TYPE
FRAME_TYPE -->|Enter| ENTER_NODE
FRAME_TYPE -->|Exit| EXIT_NODE
FRAME_TYPE -->|Leaf| TRANSLATE_LEAF
ENTER_NODE --> NODE_TYPE
NODE_TYPE --> AND_OR
NODE_TYPE --> NOT
NODE_TYPE --> LEAF
AND_OR --> PUSH_EXIT
PUSH_EXIT --> PUSH_CHILDREN
PUSH_CHILDREN --> POP
NOT --> PUSH_EXIT_NOT
PUSH_EXIT_NOT --> PUSH_INNER
PUSH_INNER --> POP
LEAF --> TRANSLATE_LEAF
TRANSLATE_LEAF --> POP
EXIT_NODE --> POP_RESULTS
POP_RESULTS --> BUILD_NODE
BUILD_NODE --> PUSH_RESULT
PUSH_RESULT --> POP
POP --> DONE
DONE -->|No| FRAME_TYPE
DONE -->|Yes| RETURN
Traversal Strategy
Translation uses an iterative traversal approach to avoid stack overflow on deeply nested expressions.
Iterative Traversal Algorithm
Sources: llkv-executor/src/translation/expression.rs:39-174
Frame-Based Pattern
The translation uses a frame-based traversal pattern with two stacks:
Work Stack (owned_stack): Contains frames representing work to be done
OwnedFrame::Enter(expr): Visit a node and potentially expand itOwnedFrame::Exit(context): Collect child results and build parent nodeOwnedFrame::Leaf(translated): Push a fully translated leaf node
Result Stack (result_stack): Contains translated expressions ready to be consumed by parent nodes
Sources: llkv-executor/src/translation/expression.rs:48-63
Traversal Example
For the expression And([Pred(name_col), Pred(age_col)]):
| Step | Work Stack | Result Stack | Action |
|---|---|---|---|
| 1 | [Enter(And)] | [] | Start |
| 2 | [Exit(And(2)), Enter(age), Enter(name)] | [] | Expand And |
| 3 | [Exit(And(2)), Enter(age), Leaf(name→42)] | [] | Translate name |
| 4 | [Exit(And(2)), Enter(age)] | [Pred(42)] | Push name result |
| 5 | [Exit(And(2)), Leaf(age→43)] | [Pred(42)] | Translate age |
| 6 | [Exit(And(2))] | [Pred(42), Pred(43)] | Push age result |
| 7 | [] | [And([Pred(42), Pred(43)])] | Build And, push result |
| 8 | Done | [And([...])] | Return final expression |
This approach handles deeply nested expressions (50k+ nodes) without recursion-induced stack overflow.
Sources: llkv-executor/src/translation/expression.rs:62-174
Why Iterative Traversal?
The codebase comments explain:
This avoids stack overflow on deeply nested expressions (50k+ nodes) by using explicit work_stack and result_stack instead of recursion.
The frame-based pattern is documented in the llkv-plan::traversal module and reused here for expression translation.
Sources: llkv-executor/src/translation/expression.rs:39-46
Error Handling
Translation failures produce descriptive errors through callback functions.
Error Callbacks
Both translation functions accept error constructor callbacks:
| Parameter | Purpose | Example Usage |
|---|---|---|
unknown_column: F | Construct error for unknown column names | ` |
unknown_aggregate: G | Construct error for unknown aggregate functions | Currently unused but reserved for future validation |
The callback pattern allows callers to customize error messages and error types based on their context.
Sources: llkv-executor/src/translation/expression.rs:21-27 llkv-executor/src/translation/expression.rs:189-195
Common Error Scenarios
Translation Error Flow
When schema.resolve(name) returns None, the system invokes the error callback which typically produces an Error::InvalidArgumentError with a message like:
"Binder Error: does not have a column named 'xyz'"
Sources: llkv-executor/src/translation/expression.rs:418-422
Result Stack Underflow
The iterative traversal validates stack consistency:
Stack underflow indicates a bug in the traversal logic rather than invalid user input, so it produces an Error::Internal.
Sources: llkv-executor/src/translation/expression.rs:160-164 llkv-executor/src/translation/expression.rs:171-173
graph TD
EXPR["ScalarExpr<FieldId>"]
TYPE_CHECK{"Expression\nType?"}
LITERAL["Literal"]
INFER_LIT["Infer from\nLiteral type"]
COLUMN["Column(fid)"]
LOOKUP["schema.column_by_field_id(fid)"]
NORMALIZE["normalized_numeric_type"]
BINARY["Binary"]
CHECK_FLOAT["expression_uses_float"]
FLOAT_RESULT["DataType::Float64"]
INT_RESULT["DataType::Int64"]
AGGREGATE["Aggregate"]
AGG_RESULT["DataType::Int64"]
CAST["Cast"]
CAST_TYPE["Use specified\ndata_type"]
RESULT["Arrow DataType"]
EXPR --> TYPE_CHECK
TYPE_CHECK --> LITERAL
TYPE_CHECK --> COLUMN
TYPE_CHECK --> BINARY
TYPE_CHECK --> AGGREGATE
TYPE_CHECK --> CAST
LITERAL --> INFER_LIT
INFER_LIT --> RESULT
COLUMN --> LOOKUP
LOOKUP --> NORMALIZE
NORMALIZE --> RESULT
BINARY --> CHECK_FLOAT
CHECK_FLOAT -->|Uses Float| FLOAT_RESULT
CHECK_FLOAT -->|Integer only| INT_RESULT
FLOAT_RESULT --> RESULT
INT_RESULT --> RESULT
AGGREGATE --> AGG_RESULT
AGG_RESULT --> RESULT
CAST --> CAST_TYPE
CAST_TYPE --> RESULT
Type Inference Integration
After translation, expressions with FieldId references can be used for schema-based type inference.
The infer_computed_data_type function in llkv-executor/src/translation/schema.rs inspects translated expressions to determine their Arrow data types:
Type Inference for Translated Expressions
Sources: llkv-executor/src/translation/schema.rs:53-123
Type Inference Rules
| Expression | Inferred Type | Notes |
|---|---|---|
Literal::Integer | Int64 | 64-bit signed integer |
Literal::Float | Float64 | 64-bit floating point |
Literal::Decimal | Decimal128(p,s) | Precision and scale from value |
Literal::Boolean | Boolean | Boolean flag |
Literal::String | Utf8 | UTF-8 string |
Literal::Null | Null | Null type marker |
Column(fid) | Schema lookup | normalized_numeric_type(column.data_type) |
Binary | Float64 or Int64 | Float if any operand is float |
Compare | Int64 | Comparisons produce boolean (0/1) as Int64 |
Aggregate | Int64 | Most aggregates return Int64 |
Cast | Specified type | Uses explicit data_type parameter |
The normalized_numeric_type function maps small integer types (Int8, Int16, Int32) to Int64 and unsigned/float types to Float64 for consistent expression evaluation.
Sources: llkv-executor/src/translation/schema.rs:125-147
Translation in Context
The translation phase fits into the broader query execution pipeline:
Translation Phase in Query Pipeline
Sources: Based on Diagram 2 and Diagram 5 from system architecture overview
The translation layer serves as the bridge between the human-readable SQL layer (with column names) and the machine-optimized execution layer (with numeric field IDs). This separation allows:
- Planning flexibility : Query plans can reference columns symbolically without knowing physical storage details
- Schema evolution : Field IDs remain stable even if column names change
- Type safety : The type system prevents mixing planning-phase and execution-phase expressions
- Optimization : Numeric field IDs enable efficient lookups in columnar storage
Sources: llkv-expr/src/expr.rs:1-359 llkv-executor/src/translation/expression.rs:1-424 llkv-executor/src/translation/schema.rs:1-338