Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression Translation

Relevant source files

Purpose and Scope

Expression Translation is the process of converting expressions from using string-based column names to using internal field identifiers. This transformation bridges the gap between the SQL interface layer (which references columns by name) and the execution layer (which references columns by numeric FieldId). This page covers the translation phase that occurs after query planning but before expression compilation.

For information about the expression AST structure, see Expression AST. For information about how translated expressions are compiled into bytecode, see Program Compilation.

Translation Phase in Query Pipeline

The expression translation phase sits between SQL planning and execution, converting symbolic column references to physical field identifiers.

Translation Phase Detail

graph LR
    SQL["SQL WHERE\nname = 'Alice'"]
AST["sqlparser AST"]
EXPRSTR["Expr<String>\nColumn('name')"]
CATALOG["Schema Lookup"]
EXPRFID["Expr<FieldId>\nColumn(42)"]
COMPILE["Bytecode\nCompilation"]
SQL --> AST
 
   AST --> EXPRSTR
 
   EXPRSTR --> CATALOG
 
   CATALOG --> EXPRFID
 
   EXPRFID --> COMPILE
    
    style EXPRSTR fill:#fff5e1
    style EXPRFID fill:#ffe1e1

Sources: Based on Diagram 5 from system architecture overview

The translation process resolves all column name strings to their corresponding FieldId values by consulting the table schema. This enables the downstream execution engine to work with stable numeric identifiers rather than string lookups.

Generic Expression Types

Both Expr and ScalarExpr are generic over the field identifier type, parameterized as F.

Type Parameter Instantiations

graph TB
    subgraph "Generic Types"
        EXPR["Expr<'a, F>"]
SCALAR["ScalarExpr<F>"]
FILTER["Filter<'a, F>\n{field_id: F, op: Operator}"]
end
    
    subgraph "String-Based (Planning)"
        EXPRSTR["Expr<'static, String>"]
SCALARSTR["ScalarExpr<String>"]
FILTERSTR["Filter<'static, String>"]
end
    
    subgraph "FieldId-Based (Execution)"
        EXPRFID["Expr<'static, FieldId>"]
SCALARFID["ScalarExpr<FieldId>"]
FILTERFID["Filter<'static, FieldId>"]
end
    
    EXPR -.instantiated as.-> EXPRSTR
    EXPR -.instantiated as.-> EXPRFID
    SCALAR -.instantiated as.-> SCALARSTR
    SCALAR -.instantiated as.-> SCALARFID
    FILTER -.instantiated as.-> FILTERSTR
    FILTER -.instantiated as.-> FILTERFID
    
    EXPRSTR ==translate_predicate==> EXPRFID
    SCALARSTR ==translate_scalar==> SCALARFID

Sources: llkv-expr/src/expr.rs:14-43 llkv-executor/src/translation/expression.rs:18-27

TypePlanning PhaseExecution Phase
Predicate ExpressionExpr<'static, String>Expr<'static, FieldId>
Scalar ExpressionScalarExpr<String>ScalarExpr<FieldId>
FilterFilter<'static, String>Filter<'static, FieldId>
Field ReferenceString column nameFieldId numeric identifier

The generic parameter F allows the same AST structure to be used at different stages of query processing, with type safety enforcing that planning-phase expressions cannot be mixed with execution-phase expressions.

Sources: llkv-expr/src/expr.rs:14-333

Translation Functions

The llkv-executor crate provides two primary translation functions that recursively transform expression trees.

translate_predicate

Translates boolean predicate expressions (Expr<String>Expr<FieldId>).

Predicate Translation Flow

Sources: llkv-executor/src/translation/expression.rs:18-174

Function Signature:

The function accepts:

  • expr: The predicate expression with string column references
  • schema: The table schema for column name resolution
  • unknown_column: Error constructor for unresolved column names

Sources: llkv-executor/src/translation/expression.rs:18-27

translate_scalar

Translates scalar value expressions (ScalarExpr<String>ScalarExpr<FieldId>).

Function Signature:

Sources: llkv-executor/src/translation/expression.rs:176-185

Variant-Specific Translation

The translation process handles each expression variant differently:

VariantTranslation Approach
Expr::Pred(Filter)Resolve filter's field_id from String to FieldId
Expr::And(Vec) / Expr::Or(Vec)Recursively translate all child expressions
Expr::Not(Box)Recursively translate inner expression
Expr::CompareTranslate both left and right scalar expressions
Expr::InListTranslate target expression and all list items
Expr::IsNullTranslate inner expression
Expr::Literal(bool)No translation needed, pass through
Expr::Exists(SubqueryExpr)Pass through subquery ID unchanged
ScalarExpr::Column(String)Resolve string to FieldId
ScalarExpr::LiteralNo translation needed, pass through
ScalarExpr::BinaryRecursively translate left and right operands
ScalarExpr::AggregateRecursively translate aggregate expression argument
ScalarExpr::GetFieldRecursively translate base expression
ScalarExpr::CastRecursively translate inner expression
ScalarExpr::CaseTranslate operand, all branch conditions/results, and else branch
ScalarExpr::CoalesceRecursively translate all items

Sources: llkv-executor/src/translation/expression.rs:86-143 llkv-executor/src/translation/expression.rs:197-386

graph TD
    START["Column Name String"]
ROWID_CHECK{"Is 'rowid'\n(case-insensitive)?"}
ROWID["Return\nROW_ID_FIELD_ID"]
LOOKUP["schema.resolve(name)"]
FOUND{"Found in\nschema?"}
RETURN_FID["Return\ncolumn.field_id"]
ERROR["Invoke\nunknown_column\ncallback"]
START --> ROWID_CHECK
 
   ROWID_CHECK -->|Yes| ROWID
 
   ROWID_CHECK -->|No| LOOKUP
 
   LOOKUP --> FOUND
 
   FOUND -->|Yes| RETURN_FID
 
   FOUND -->|No| ERROR

Field Resolution

The core of translation is resolving string column names to numeric field identifiers.

Field Resolution Logic

Sources: llkv-executor/src/translation/expression.rs:390-407

Special Column: rowid

The system provides a special pseudo-column named rowid that references the internal row identifier:

The rowid column is:

  • Case-insensitive (accepts "ROWID", "rowid", "RowId", etc.)
  • Available in all tables automatically
  • Mapped to constant ROW_ID_FIELD_ID from llkv_table::ROW_ID_FIELD_ID
  • Corresponds to ROW_ID_COLUMN_NAME constant from llkv_column_map::ROW_ID_COLUMN_NAME

Sources: llkv-executor/src/translation/expression.rs:399-401 llkv-executor/src/translation/expression.rs2 llkv-executor/src/translation/expression.rs5

Schema Lookup

For non-special columns, resolution uses the ExecutorSchema::resolve method:

The schema lookup:

  1. Searches for a column with the given name
  2. Returns the ExecutorColumn if found
  3. Extracts the field_id from the column metadata
  4. Invokes the error callback if not found

Sources: llkv-executor/src/translation/expression.rs:403-406

graph TB
    START["Initial Expression"]
PUSH_ENTER["Push Enter Frame"]
POP["Pop Frame"]
FRAME_TYPE{"Frame Type?"}
ENTER_NODE["Enter Node"]
NODE_TYPE{"Node Type?"}
AND_OR["And/Or"]
NOT["Not"]
LEAF["Leaf (Pred,\nCompare, etc.)"]
PUSH_EXIT["Push Exit Frame"]
PUSH_CHILDREN["Push Child\nEnter Frames\n(reversed)"]
PUSH_EXIT_NOT["Push Exit Frame"]
PUSH_INNER["Push Inner\nEnter Frame"]
TRANSLATE_LEAF["Translate Leaf\nPush to result_stack"]
EXIT_NODE["Exit Node"]
POP_RESULTS["Pop child results\nfrom result_stack"]
BUILD_NODE["Build translated\nparent node"]
PUSH_RESULT["Push to result_stack"]
DONE{"Stack\nempty?"}
RETURN["Return final result"]
START --> PUSH_ENTER
 
   PUSH_ENTER --> POP
 
   POP --> FRAME_TYPE
    
 
   FRAME_TYPE -->|Enter| ENTER_NODE
 
   FRAME_TYPE -->|Exit| EXIT_NODE
 
   FRAME_TYPE -->|Leaf| TRANSLATE_LEAF
    
 
   ENTER_NODE --> NODE_TYPE
 
   NODE_TYPE --> AND_OR
 
   NODE_TYPE --> NOT
 
   NODE_TYPE --> LEAF
    
 
   AND_OR --> PUSH_EXIT
 
   PUSH_EXIT --> PUSH_CHILDREN
 
   PUSH_CHILDREN --> POP
    
 
   NOT --> PUSH_EXIT_NOT
 
   PUSH_EXIT_NOT --> PUSH_INNER
 
   PUSH_INNER --> POP
    
 
   LEAF --> TRANSLATE_LEAF
 
   TRANSLATE_LEAF --> POP
    
 
   EXIT_NODE --> POP_RESULTS
 
   POP_RESULTS --> BUILD_NODE
 
   BUILD_NODE --> PUSH_RESULT
 
   PUSH_RESULT --> POP
    
 
   POP --> DONE
 
   DONE -->|No| FRAME_TYPE
 
   DONE -->|Yes| RETURN

Traversal Strategy

Translation uses an iterative traversal approach to avoid stack overflow on deeply nested expressions.

Iterative Traversal Algorithm

Sources: llkv-executor/src/translation/expression.rs:39-174

Frame-Based Pattern

The translation uses a frame-based traversal pattern with two stacks:

Work Stack (owned_stack): Contains frames representing work to be done

  • OwnedFrame::Enter(expr): Visit a node and potentially expand it
  • OwnedFrame::Exit(context): Collect child results and build parent node
  • OwnedFrame::Leaf(translated): Push a fully translated leaf node

Result Stack (result_stack): Contains translated expressions ready to be consumed by parent nodes

Sources: llkv-executor/src/translation/expression.rs:48-63

Traversal Example

For the expression And([Pred(name_col), Pred(age_col)]):

StepWork StackResult StackAction
1[Enter(And)][]Start
2[Exit(And(2)), Enter(age), Enter(name)][]Expand And
3[Exit(And(2)), Enter(age), Leaf(name→42)][]Translate name
4[Exit(And(2)), Enter(age)][Pred(42)]Push name result
5[Exit(And(2)), Leaf(age→43)][Pred(42)]Translate age
6[Exit(And(2))][Pred(42), Pred(43)]Push age result
7[][And([Pred(42), Pred(43)])]Build And, push result
8Done[And([...])]Return final expression

This approach handles deeply nested expressions (50k+ nodes) without recursion-induced stack overflow.

Sources: llkv-executor/src/translation/expression.rs:62-174

Why Iterative Traversal?

The codebase comments explain:

This avoids stack overflow on deeply nested expressions (50k+ nodes) by using explicit work_stack and result_stack instead of recursion.

The frame-based pattern is documented in the llkv-plan::traversal module and reused here for expression translation.

Sources: llkv-executor/src/translation/expression.rs:39-46

Error Handling

Translation failures produce descriptive errors through callback functions.

Error Callbacks

Both translation functions accept error constructor callbacks:

ParameterPurposeExample Usage
unknown_column: FConstruct error for unknown column names`
unknown_aggregate: GConstruct error for unknown aggregate functionsCurrently unused but reserved for future validation

The callback pattern allows callers to customize error messages and error types based on their context.

Sources: llkv-executor/src/translation/expression.rs:21-27 llkv-executor/src/translation/expression.rs:189-195

Common Error Scenarios

Translation Error Flow

When schema.resolve(name) returns None, the system invokes the error callback which typically produces an Error::InvalidArgumentError with a message like:

"Binder Error: does not have a column named 'xyz'"

Sources: llkv-executor/src/translation/expression.rs:418-422

Result Stack Underflow

The iterative traversal validates stack consistency:

Stack underflow indicates a bug in the traversal logic rather than invalid user input, so it produces an Error::Internal.

Sources: llkv-executor/src/translation/expression.rs:160-164 llkv-executor/src/translation/expression.rs:171-173

graph TD
    EXPR["ScalarExpr<FieldId>"]
TYPE_CHECK{"Expression\nType?"}
LITERAL["Literal"]
INFER_LIT["Infer from\nLiteral type"]
COLUMN["Column(fid)"]
LOOKUP["schema.column_by_field_id(fid)"]
NORMALIZE["normalized_numeric_type"]
BINARY["Binary"]
CHECK_FLOAT["expression_uses_float"]
FLOAT_RESULT["DataType::Float64"]
INT_RESULT["DataType::Int64"]
AGGREGATE["Aggregate"]
AGG_RESULT["DataType::Int64"]
CAST["Cast"]
CAST_TYPE["Use specified\ndata_type"]
RESULT["Arrow DataType"]
EXPR --> TYPE_CHECK
    
 
   TYPE_CHECK --> LITERAL
 
   TYPE_CHECK --> COLUMN
 
   TYPE_CHECK --> BINARY
 
   TYPE_CHECK --> AGGREGATE
 
   TYPE_CHECK --> CAST
    
 
   LITERAL --> INFER_LIT
 
   INFER_LIT --> RESULT
    
 
   COLUMN --> LOOKUP
 
   LOOKUP --> NORMALIZE
 
   NORMALIZE --> RESULT
    
 
   BINARY --> CHECK_FLOAT
 
   CHECK_FLOAT -->|Uses Float| FLOAT_RESULT
 
   CHECK_FLOAT -->|Integer only| INT_RESULT
 
   FLOAT_RESULT --> RESULT
 
   INT_RESULT --> RESULT
    
 
   AGGREGATE --> AGG_RESULT
 
   AGG_RESULT --> RESULT
    
 
   CAST --> CAST_TYPE
 
   CAST_TYPE --> RESULT

Type Inference Integration

After translation, expressions with FieldId references can be used for schema-based type inference.

The infer_computed_data_type function in llkv-executor/src/translation/schema.rs inspects translated expressions to determine their Arrow data types:

Type Inference for Translated Expressions

Sources: llkv-executor/src/translation/schema.rs:53-123

Type Inference Rules

ExpressionInferred TypeNotes
Literal::IntegerInt6464-bit signed integer
Literal::FloatFloat6464-bit floating point
Literal::DecimalDecimal128(p,s)Precision and scale from value
Literal::BooleanBooleanBoolean flag
Literal::StringUtf8UTF-8 string
Literal::NullNullNull type marker
Column(fid)Schema lookupnormalized_numeric_type(column.data_type)
BinaryFloat64 or Int64Float if any operand is float
CompareInt64Comparisons produce boolean (0/1) as Int64
AggregateInt64Most aggregates return Int64
CastSpecified typeUses explicit data_type parameter

The normalized_numeric_type function maps small integer types (Int8, Int16, Int32) to Int64 and unsigned/float types to Float64 for consistent expression evaluation.

Sources: llkv-executor/src/translation/schema.rs:125-147

Translation in Context

The translation phase fits into the broader query execution pipeline:

Translation Phase in Query Pipeline

Sources: Based on Diagram 2 and Diagram 5 from system architecture overview

The translation layer serves as the bridge between the human-readable SQL layer (with column names) and the machine-optimized execution layer (with numeric field IDs). This separation allows:

  1. Planning flexibility : Query plans can reference columns symbolically without knowing physical storage details
  2. Schema evolution : Field IDs remain stable even if column names change
  3. Type safety : The type system prevents mixing planning-phase and execution-phase expressions
  4. Optimization : Numeric field IDs enable efficient lookups in columnar storage

Sources: llkv-expr/src/expr.rs:1-359 llkv-executor/src/translation/expression.rs:1-424 llkv-executor/src/translation/schema.rs:1-338