This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression Translation
Loading…
Expression Translation
Relevant source files
Purpose and Scope
Expression Translation is the process of converting expressions that reference columns by name (as strings) into expressions that reference columns by numeric field identifiers (FieldId). This translation bridges the gap between the SQL parsing/planning layer—which operates on human-readable column names—and the execution layer, which requires efficient numeric field identifiers for accessing columnar storage.
This page documents the translation mechanisms, key functions, and integration points. For information about the expression AST types themselves, see Expression AST. For details on how translated expressions are compiled into executable programs, see Program Compilation.
Sources: llkv-expr/src/expr.rs:1-819 llkv-executor/src/lib.rs:87-97
The Parameterized Expression Type System
The LLKV expression system uses generic type parameters to support multiple identifier types throughout the query processing pipeline. All expression types are parameterized over a field identifier type F:
| Expression Type | Description | Parameter |
|---|---|---|
Expr<'a, F> | Boolean predicate expression | Field identifier type F |
ScalarExpr<F> | Arithmetic/scalar expression | Field identifier type F |
Filter<'a, F> | Single-field predicate | Field identifier type F |
The parameterization allows the same expression structures to be used with different identifier representations:
- During Planning :
Expr<'static, String>andScalarExpr<String>use column names as parsed from SQL - During Execution :
Expr<'static, FieldId>andScalarExpr<FieldId>use numeric field identifiers for efficient storage access
graph TD
subgraph "SQL Parsing Layer"
SQL["SQL Query Text"]
PARSER["sqlparser"]
AST["SQL AST"]
end
subgraph "Planning Layer"
PLANNER["Query Planner"]
EXPR_STRING["Expr<String>\nScalarExpr<String>"]
PLAN["SelectPlan"]
end
subgraph "Translation Layer"
TRANSLATOR["translate_scalar\ntranslate_predicate"]
SCHEMA["Schema / Catalog"]
RESOLVER["IdentifierResolver"]
end
subgraph "Execution Layer"
EXPR_FIELDID["Expr<FieldId>\nScalarExpr<FieldId>"]
EVALUATOR["Expression Evaluator"]
STORAGE["Column Store"]
end
SQL --> PARSER
PARSER --> AST
AST --> PLANNER
PLANNER --> EXPR_STRING
EXPR_STRING --> PLAN
PLAN --> TRANSLATOR
SCHEMA --> TRANSLATOR
RESOLVER --> TRANSLATOR
TRANSLATOR --> EXPR_FIELDID
EXPR_FIELDID --> EVALUATOR
EVALUATOR --> STORAGE
This design separates concerns: the planner manipulates human-readable names without needing catalog knowledge, while the executor works with resolved numeric identifiers that map directly to physical storage locations.
Diagram: Expression Translation Flow from SQL to Execution
Sources: llkv-expr/src/expr.rs:14-182 llkv-executor/src/lib.rs:87-97
Core Translation Functions
The translation layer exposes a set of functions for converting string-based expressions to field ID-based expressions. These functions are defined in the llkv-plan crate’s translation module and re-exported by llkv-executor for convenience.
Primary Translation Functions
| Function | Purpose | Signature Pattern |
|---|---|---|
translate_scalar | Translate scalar expressions | (expr: &ScalarExpr<String>, schema, error_fn) -> Result<ScalarExpr<FieldId>> |
translate_scalar_with | Translate with custom resolver | (expr: &ScalarExpr<String>, resolver, error_fn) -> Result<ScalarExpr<FieldId>> |
translate_predicate | Translate filter predicates | (expr: &Expr<String>, schema, error_fn) -> Result<Expr<FieldId>> |
translate_predicate_with | Translate predicate with resolver | (expr: &Expr<String>, resolver, error_fn) -> Result<Expr<FieldId>> |
resolve_field_id_from_schema | Resolve single column name | (name: &str, schema) -> Result<FieldId> |
The _with variants accept an IdentifierResolver reference for more complex scenarios (multi-table queries, subqueries, etc.), while the simpler variants accept a schema directly and construct a resolver internally.
Usage Pattern
Translation functions follow a consistent pattern: they take a string-based expression, schema/resolver information, and an error handler closure. The error handler is invoked when a column name cannot be resolved, allowing callers to customize error messages:
The error closure receives the unresolved column name and returns an appropriate error type. This pattern appears throughout the executor when translating expressions from plans:
Sources: llkv-executor/src/lib.rs:87-97 llkv-executor/src/lib.rs:485-489 llkv-executor/src/lib.rs:1054-1059
Schema-Based Resolution
Column name resolution relies on Arrow schema information to map string identifiers to numeric field IDs. The resolution process handles case-insensitive matching and validates that referenced columns actually exist in the schema.
graph LR
subgraph "Input"
EXPR_STR["ScalarExpr<String>"]
COLUMN_NAME["Column Name: 'user_id'"]
end
subgraph "Resolution Context"
SCHEMA["Arrow Schema"]
FIELDS["Field Definitions"]
METADATA["Field Metadata"]
end
subgraph "Resolution Process"
NORMALIZE["Normalize Name\n(case-insensitive)"]
LOOKUP["Lookup in Schema"]
EXTRACT_ID["Extract FieldId"]
end
subgraph "Output"
EXPR_FIELD["ScalarExpr<FieldId>"]
FIELD_ID["FieldId: 42"]
end
COLUMN_NAME --> NORMALIZE
SCHEMA --> LOOKUP
NORMALIZE --> LOOKUP
LOOKUP --> EXTRACT_ID
EXTRACT_ID --> FIELD_ID
EXPR_STR --> NORMALIZE
EXTRACT_ID --> EXPR_FIELD
Resolution Workflow
Diagram: Column Name to FieldId Resolution
The resolve_field_id_from_schema function performs the core resolution logic. It searches the schema’s field definitions for a matching column name and extracts the associated field ID from the field’s metadata.
Schema Structure
Arrow schemas used during translation contain:
- Field Definitions : Name, data type, nullability
- Field Metadata : Key-value pairs including the numeric field ID
- Nested Field Support : For struct types, schemas may contain nested field hierarchies
The translation process must handle qualified names (e.g., table.column), nested field access (e.g., user.address.city), and alias resolution when applicable.
Sources: llkv-executor/src/lib.rs:87-97
Field Path Resolution for Nested Fields
When expressions reference nested fields within struct types, the translation process must resolve not just the top-level column but the entire field path. This is handled through the IdentifierResolver and ColumnResolution types provided by llkv-table/catalog.
graph TD
subgraph "Input Expression"
NESTED["GetField Expression"]
BASE["base: user"]
FIELD["field_name: 'address'"]
SUBFIELD["field_name: 'city'"]
end
subgraph "Resolver"
RESOLVER["IdentifierResolver"]
CONTEXT["IdentifierContext"]
end
subgraph "Resolution Result"
COL_RES["ColumnResolution"]
COL_NAME["column(): 'user'"]
FIELD_PATH["field_path(): ['address', 'city']"]
FIELD_ID["Resolved FieldId"]
end
NESTED --> RESOLVER
CONTEXT --> RESOLVER
RESOLVER --> COL_RES
COL_RES --> COL_NAME
COL_RES --> FIELD_PATH
COL_RES --> FIELD_ID
ColumnResolution Structure
Diagram: Nested Field Resolution
The ColumnResolution type encapsulates the resolution result, providing:
- The base column name
- The field path for nested access (empty for top-level columns)
- The resolved field ID for storage access
This information is used during correlated subquery tracking and when translating GetField expressions in the scalar expression tree.
Sources: llkv-sql/src/sql_engine.rs:37-38 llkv-sql/src/sql_engine.rs:420-427
Translation in Multi-Table Contexts
When translating expressions for queries involving multiple tables (joins, cross products, subqueries), the translation process must disambiguate column references that may appear in multiple tables. This is handled by the IdentifierResolver which maintains context about available tables and their schemas.
IdentifierContext and Resolution
The IdentifierContext type (from llkv-table/catalog) represents the set of tables and columns available in a given scope. During translation:
- Outer Scope Tracking : For subqueries, outer table contexts are tracked separately
- Column Disambiguation : Qualified names (e.g.,
table.column) are resolved against the appropriate table - Ambiguity Detection : Unqualified references to columns that exist in multiple tables produce errors
The translate_predicate_with and translate_scalar_with functions accept an IdentifierResolver reference that encapsulates this context:
Sources: llkv-sql/src/sql_engine.rs:37-38
Error Handling and Diagnostics
Translation failures occur when column names cannot be resolved. The error handling strategy uses caller-provided closures to generate context-specific error messages.
Error Patterns
| Scenario | Error Message Pattern |
|---|---|
| Unknown column in aggregate | "unknown column '{name}' in aggregate expression" |
| Unknown column in WHERE clause | "unknown column '{name}' in filter" |
| Unknown column in cross product | "column '{name}' not found in cross product result" |
| Ambiguous column reference | "column '{name}' is ambiguous" |
The error closure pattern allows the caller to include query-specific context in error messages. This is particularly important for debugging complex queries where the same expression type might be used in multiple contexts.
Resolution Failure Example
When translate_scalar encounters a ScalarExpr::Column(name) variant and the name cannot be found in the schema, it invokes the error closure:
Sources: llkv-executor/src/lib.rs:485-489 llkv-executor/src/lib.rs:1054-1059
graph TB
subgraph "Planning Phase"
SQL["SQL Statement"]
PARSE["Parse & Build Plan"]
PLAN["SelectPlan\nUpdatePlan\netc."]
EXPR_STR["Expressions with\nString identifiers"]
end
subgraph "Execution Preparation"
GET_TABLE["Get Table Handle"]
SCHEMA_FETCH["Fetch Schema"]
TRANSLATE["Translation Functions"]
EXPR_FIELD["Expressions with\nFieldId identifiers"]
end
subgraph "Execution Phase"
BUILD_SCAN["Build ScanProjection"]
COMPILE["Compile to EvalProgram"]
EVALUATE["Evaluate Against Batches"]
RESULTS["RecordBatch Results"]
end
SQL --> PARSE
PARSE --> PLAN
PLAN --> EXPR_STR
PLAN --> GET_TABLE
GET_TABLE --> SCHEMA_FETCH
SCHEMA_FETCH --> TRANSLATE
EXPR_STR --> TRANSLATE
TRANSLATE --> EXPR_FIELD
EXPR_FIELD --> BUILD_SCAN
BUILD_SCAN --> COMPILE
COMPILE --> EVALUATE
EVALUATE --> RESULTS
Integration with Query Execution Pipeline
Expression translation occurs at the boundary between planning and execution. Plans produced by the SQL layer contain string-based expressions, which are translated as execution structures are built.
Translation Points in Execution
Diagram: Translation in the Execution Pipeline
Key Translation Points
- Filter Translation : When building scan plans,
WHEREclause expressions are translated before being passed to the scan optimizer - Projection Translation : Computed columns in
SELECTprojections are translated before evaluation - Aggregate Translation : Aggregate function arguments are translated to resolve column references
- Join Condition Translation :
ONclause expressions for joins are translated in the context of both joined tables
The executor’s ensure_computed_projection function demonstrates this integration. It translates a string-based expression, infers its result data type, and registers it as a computed projection for the scan:
This function encapsulates the full translation workflow: resolve column names, infer types, and prepare the translated expression for execution.
Sources: llkv-executor/src/lib.rs:470-501 llkv-executor/src/lib.rs:87-97
Translation of Complex Expression Types
The translation process must handle all variants of the expression AST, recursively translating nested expressions while preserving structure and semantics.
Recursive Translation Table
| Expression Variant | Translation Strategy |
|---|---|
ScalarExpr::Column | Resolve string to FieldId via schema |
ScalarExpr::Literal | No translation needed (no field references) |
ScalarExpr::Binary | Recursively translate left and right operands |
ScalarExpr::Aggregate | Translate the aggregate’s argument expression |
ScalarExpr::GetField | Translate base expression, preserve field name |
ScalarExpr::Cast | Translate inner expression, preserve target type |
ScalarExpr::Compare | Recursively translate both comparison operands |
ScalarExpr::Coalesce | Translate each expression in the list |
ScalarExpr::Case | Translate operand and all WHEN/THEN/ELSE branches |
ScalarExpr::ScalarSubquery | No translation (contains SubqueryId reference) |
ScalarExpr::Random | No translation (no field references) |
For predicate expressions (Expr<F>):
| Predicate Variant | Translation Strategy |
|---|---|
Expr::And / Expr::Or | Recursively translate all sub-expressions |
Expr::Not | Recursively translate inner expression |
Expr::Pred(Filter) | Translate filter’s field ID, preserve operator |
Expr::Compare | Translate left and right scalar expressions |
Expr::InList | Translate target expression and list elements |
Expr::IsNull | Translate the operand expression |
Expr::Literal | No translation (constant boolean value) |
Expr::Exists | No translation (contains SubqueryId reference) |
The translation process maintains the expression tree structure while substituting field identifiers, ensuring that evaluation semantics remain unchanged.
Sources: llkv-expr/src/expr.rs:125-182 llkv-expr/src/expr.rs:14-66
Performance Considerations
Expression translation is performed once during query execution setup, not per-row or per-batch. The translated expressions are then compiled into evaluation programs (see Program Compilation) which are reused across all batches in the query result.
Translation Caching
The executor maintains caches to avoid redundant translation work:
- Computed Projection Cache : Stores translated expressions keyed by their string representation to avoid re-translating identical expressions in the same query
- Column Projection Cache : Maps field IDs to projection indices to reuse existing projections when multiple expressions reference the same column
This caching strategy is evident in functions like ensure_computed_projection, which checks the cache before performing translation:
Sources: llkv-executor/src/lib.rs:470-501
Dismiss
Refresh this wiki
Enter email to refresh