This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression AST
Relevant source files
- llkv-executor/src/translation/expression.rs
- llkv-executor/src/translation/schema.rs
- llkv-expr/src/expr.rs
- llkv-table/src/planner/program.rs
Purpose and Scope
This document describes the expression Abstract Syntax Tree (AST) defined in the llkv-expr crate. The expression AST provides type-aware, Arrow-native data structures for representing boolean predicates and scalar expressions throughout the LLKV system. These AST nodes are decoupled from SQL parsing and can be parameterized by field identifier types, enabling reuse across multiple processing stages.
For information about how expressions are translated between field identifier types, see Expression Translation. For details on how expressions are compiled into executable bytecode, see Program Compilation. For scalar evaluation mechanics, see Scalar Evaluation and NumericKernels.
Sources: llkv-expr/src/expr.rs:1-8
Expression Type Hierarchy
The llkv-expr crate defines two primary expression enums:
Expr<'a, F>- Boolean/predicate expressions that evaluate to true or falseScalarExpr<F>- Arithmetic/scalar expressions that produce typed values
graph TB
subgraph "Expression Type System"
EXPR["Expr<'a, F>\nBoolean Predicates"]
SCALAR["ScalarExpr<F>\nScalar Values"]
EXPR --> LOGICAL["Logical Operators\nAnd, Or, Not"]
EXPR --> PRED["Pred(Filter)\nField Predicates"]
EXPR --> COMPARE["Compare\nScalar Comparisons"]
EXPR --> INLIST["InList\nSet Membership"]
EXPR --> ISNULL["IsNull\nNull Checks"]
EXPR --> LITERAL["Literal(bool)\nConstant Boolean"]
EXPR --> EXISTS["Exists\nSubquery Predicates"]
SCALAR --> COLUMN["Column(F)\nField Reference"]
SCALAR --> SLITERAL["Literal\nConstant Value"]
SCALAR --> BINARY["Binary\nArithmetic Ops"]
SCALAR --> SNOT["Not\nLogical Negation"]
SCALAR --> SISNULL["IsNull\nNull Test"]
SCALAR --> AGG["Aggregate\nAggregate Functions"]
SCALAR --> GETFIELD["GetField\nStruct Access"]
SCALAR --> CAST["Cast\nType Conversion"]
SCALAR --> SCOMPARE["Compare\nBoolean Result"]
SCALAR --> COALESCE["Coalesce\nFirst Non-Null"]
SCALAR --> SUBQ["ScalarSubquery\nSubquery Value"]
SCALAR --> CASE["Case\nConditional Logic"]
SCALAR --> RANDOM["Random\nRandom Number"]
COMPARE -.uses.-> SCALAR
INLIST -.uses.-> SCALAR
ISNULL -.uses.-> SCALAR
end
style EXPR fill:#e8f5e9
style SCALAR fill:#e1f5ff
Both types are generic over a field identifier parameter F, which allows the same AST structure to be used with different field representations (typically String column names during planning, or FieldId numeric identifiers during execution).
Sources: llkv-expr/src/expr.rs:14-143
Expr<'a, F> - Boolean Expressions
The Expr<'a, F> enum represents boolean predicate expressions that evaluate to true or false. These are primarily used in WHERE clauses, JOIN conditions, and HAVING clauses.
Expr Variants
| Variant | Description | Use Case |
|---|---|---|
And(Vec<Expr>) | Logical AND of multiple predicates | Combining multiple filter conditions |
Or(Vec<Expr>) | Logical OR of multiple predicates | Alternative filter conditions |
Not(Box<Expr>) | Logical negation | Inverting a predicate |
Pred(Filter<'a, F>) | Single-field predicate | Column-level filtering (e.g., age > 18) |
Compare { left, op, right } | Comparison between scalar expressions | Cross-column comparisons (e.g., col1 + col2 > 10) |
InList { expr, list, negated } | Set membership test | IN/NOT IN clauses |
IsNull { expr, negated } | Null check on expression | IS NULL/IS NOT NULL on complex expressions |
Literal(bool) | Constant boolean value | Always-true/always-false conditions |
Exists(SubqueryExpr) | Correlated subquery predicate | EXISTS/NOT EXISTS clauses |
Sources: llkv-expr/src/expr.rs:14-43
Expr Construction Helpers
The Expr type provides builder methods for common patterns:
These helpers simplify construction of common predicate patterns during query planning.
Sources: llkv-expr/src/expr.rs:65-84
Filter and Operator Types
The Pred variant wraps a Filter<'a, F> struct, which represents a single predicate against a field:
The Operator<'a> enum defines comparison operations over untyped Literal values:
| Operator | Description | Example SQL |
|---|---|---|
Equals(Literal) | Exact equality | col = 5 |
Range { lower, upper } | Range with bounds | col BETWEEN 10 AND 20 |
GreaterThan(Literal) | Greater-than comparison | col > 10 |
GreaterThanOrEquals(Literal) | Greater-or-equal | col >= 10 |
LessThan(Literal) | Less-than comparison | col < 10 |
LessThanOrEquals(Literal) | Less-or-equal | col <= 10 |
In(&'a [Literal]) | Set membership (borrowed slice) | col IN (1, 2, 3) |
StartsWith { pattern, case_sensitive } | String prefix match | col LIKE 'abc%' |
EndsWith { pattern, case_sensitive } | String suffix match | col LIKE '%xyz' |
Contains { pattern, case_sensitive } | String substring match | col LIKE '%abc%' |
IsNull | Null check | col IS NULL |
IsNotNull | Non-null check | col IS NOT NULL |
The Operator type uses borrowed slices for In and borrowed string slices for pattern matching to avoid allocations in common cases.
Sources: llkv-expr/src/expr.rs:295-358
ScalarExpr - Scalar Expressions
The ScalarExpr<F> enum represents arithmetic and scalar expressions that produce typed values. These are used in SELECT projections, computed columns, ORDER BY clauses, and as operands in comparisons.
graph TB
subgraph "ScalarExpr Evaluation Categories"
SIMPLE["Simple Values"]
ARITH["Arithmetic"]
LOGIC["Logical"]
STRUCT["Structured"]
CONTROL["Control Flow"]
SPECIAL["Special Functions"]
SIMPLE --> COLUMN["Column(F)\nField reference"]
SIMPLE --> LITERAL["Literal\nConstant value"]
ARITH --> BINARY["Binary\nleft op right"]
ARITH --> CAST["Cast\nType conversion"]
LOGIC --> NOT["Not\nLogical negation"]
LOGIC --> ISNULL["IsNull\nNull test"]
LOGIC --> COMPARE["Compare\nComparison"]
STRUCT --> GETFIELD["GetField\nStruct field access"]
STRUCT --> AGGREGATE["Aggregate\nAggregate function"]
CONTROL --> CASE["Case\nCASE expression"]
CONTROL --> COALESCE["Coalesce\nFirst non-null"]
SPECIAL --> SUBQUERY["ScalarSubquery\nScalar subquery"]
SPECIAL --> RANDOM["Random\nRandom number"]
end
ScalarExpr Variants
Sources: llkv-expr/src/expr.rs:86-143
Arithmetic and Binary Operations
The Binary variant supports arithmetic and logical operations:
BinaryOp Variants:
| Operator | Numeric | Bitwise | Logical |
|---|---|---|---|
Add | ✓ | ||
Subtract | ✓ | ||
Multiply | ✓ | ||
Divide | ✓ | ||
Modulo | ✓ | ||
And | ✓ | ||
Or | ✓ | ||
BitwiseShiftLeft | ✓ | ||
BitwiseShiftRight | ✓ |
Sources: llkv-expr/src/expr.rs:270-282
Comparison Operations
The Compare variant in ScalarExpr produces a boolean (1/0) result:
CompareOp Variants:
| Operator | SQL Equivalent |
|---|---|
Eq | = |
NotEq | != or <> |
Lt | < |
LtEq | <= |
Gt | > |
GtEq | >= |
Sources: llkv-expr/src/expr.rs:284-293 llkv-expr/src/expr.rs:119-124
AggregateCall Variants
The Aggregate(AggregateCall<F>) variant wraps aggregate function calls:
Each aggregate (except CountStar) operates on a ScalarExpr<F> rather than just a column, enabling complex expressions like AVG(col1 + col2) or SUM(-col1).
Sources: llkv-expr/src/expr.rs:145-176
Struct Field Access
The GetField variant extracts fields from struct-typed expressions:
This represents dot-notation access like user.address.city, which is nested as:
GetField {
base: GetField {
base: Column(user),
field_name: "address"
},
field_name: "city"
}
Sources: llkv-expr/src/expr.rs:107-113
CASE Expressions
The Case variant implements SQL CASE expressions:
- Simple CASE:
operandisSome, branches test equality - Searched CASE:
operandisNone, branches evaluate conditions - ELSE clause: Handled by
else_exprfield
Sources: llkv-expr/src/expr.rs:129-137
ScalarExpr Helper Methods
Builder methods simplify construction:
| Method | Purpose |
|---|---|
column(field: F) | Create column reference |
literal<L>(lit: L) | Create literal value |
binary(left, op, right) | Create binary operation |
logical_not(expr) | Create logical NOT |
is_null(expr, negated) | Create null test |
aggregate(call) | Create aggregate function |
get_field(base, name) | Create struct field access |
cast(expr, data_type) | Create type cast |
compare(left, op, right) | Create comparison |
coalesce(exprs) | Create COALESCE |
scalar_subquery(id) | Create scalar subquery |
case(operand, branches, else_expr) | Create CASE expression |
random() | Create random number generator |
Sources: llkv-expr/src/expr.rs:178-268
Subquery Expression Types
The expression AST includes dedicated types for correlated and scalar subqueries:
SubqueryExpr
Used in Expr::Exists for boolean subquery predicates:
The id references a subquery definition stored separately (typically in the enclosing plan), and negated indicates NOT EXISTS.
Sources: llkv-expr/src/expr.rs:49-56
ScalarSubqueryExpr
Used in ScalarExpr::ScalarSubquery for value-producing subqueries:
This represents subqueries that return a single scalar value, used in expressions like SELECT (SELECT MAX(price) FROM orders) + 10.
Sources: llkv-expr/src/expr.rs:58-63
SubqueryId
Both subquery types use SubqueryId as an opaque identifier:
This ID is resolved during execution by looking up the subquery definition in the parent plan's metadata.
Sources: llkv-expr/src/expr.rs:45-47
graph LR
subgraph "Expression Translation Pipeline"
SQL["SQL Text"]
PLAN["Query Plan\nExpr<String>\nScalarExpr<String>"]
TRANS["Translation\nresolve_field_id()"]
EXEC["Execution\nExpr<FieldId>\nScalarExpr<FieldId>"]
EVAL["Evaluation\nRecordBatch Results"]
SQL --> PLAN
PLAN --> TRANS
TRANS --> EXEC
EXEC --> EVAL
end
Generic Field Parameter
The expression AST is parameterized by field identifier type F, enabling reuse across different processing stages:
Common instantiations:
Expr<'a, String>- Used during query planning with column namesExpr<'a, FieldId>- Used during execution with numeric field IDsScalarExpr<String>- Planning-time scalar expressionsScalarExpr<FieldId>- Execution-time scalar expressions
The translation from String to FieldId occurs in llkv-executor/src/translation/expression.rs using catalog lookups to resolve column names to their internal numeric identifiers.
Sources: llkv-executor/src/translation/expression.rs:18-174
Literal Values
Both expression types use the Literal enum from llkv-expr to represent untyped constant values:
| Literal Variant | Arrow Type | Example |
|---|---|---|
Integer(i64) | Int64 | 42 |
Float(f64) | Float64 | 3.14 |
Decimal(DecimalValue) | Decimal128 | 123.45 |
Boolean(bool) | Boolean | true |
String(String) | Utf8 | "hello" |
Date32(i32) | Date32 | DATE '2024-01-01' |
Null | Null | NULL |
Struct(...) | Struct | {a: 1, b: "x"} |
Interval(...) | Interval | INTERVAL '1 month' |
Literal values are type-agnostic at the AST level. Type coercion and validation occur during execution when column types are known.
Sources: llkv-expr/src/expr.rs10 llkv-executor/src/translation/schema.rs:53-123
graph TB
subgraph "Type Inference Rules"
SCALAR["ScalarExpr<FieldId>"]
SCALAR --> LITERAL["Literal → Literal type"]
SCALAR --> COLUMN["Column → Schema lookup"]
SCALAR --> BINARY["Binary → Int64 or Float64"]
SCALAR --> NOT["Not → Int64 (boolean)"]
SCALAR --> ISNULL["IsNull → Int64 (boolean)"]
SCALAR --> COMPARE["Compare → Int64 (boolean)"]
SCALAR --> AGG["Aggregate → Int64"]
SCALAR --> GETFIELD["GetField → Struct field type"]
SCALAR --> CAST["Cast → Target type"]
SCALAR --> CASE["Case → Int64 or Float64"]
SCALAR --> COALESCE["Coalesce → Int64 or Float64"]
SCALAR --> RANDOM["Random → Float64"]
SCALAR --> SUBQUERY["ScalarSubquery → Utf8 (TODO)"]
BINARY --> FLOATCHECK["Contains Float64?"]
FLOATCHECK -->|Yes| FLOAT64["Float64"]
FLOATCHECK -->|No| INT64["Int64"]
CASE --> CASECHECK["Branches use Float64?"]
CASECHECK -->|Yes| CASEFLOAT["Float64"]
CASECHECK -->|No| CASEINT["Int64"]
end
Expression Type Inference
During execution planning, the system infers output types for ScalarExpr based on operand types:
The inference logic is implemented in infer_computed_data_type() and expression_uses_float(), which recursively analyze expression trees to determine output types.
Sources: llkv-executor/src/translation/schema.rs:53-271
Expression Normalization
Before compilation, predicates undergo normalization to flatten nested AND/OR structures and apply De Morgan's laws:
Normalization Rules:
- Flatten AND:
And([And([a, b]), c])→And([a, b, c]) - Flatten OR:
Or([Or([a, b]), c])→Or([a, b, c]) - De Morgan's AND:
Not(And([a, b]))→Or([Not(a), Not(b)]) - De Morgan's OR:
Not(Or([a, b]))→And([Not(a), Not(b)]) - Double Negation:
Not(Not(expr))→expr - Literal Negation:
Not(Literal(true))→Literal(false) - IsNull Negation:
Not(IsNull { expr, negated })→IsNull { expr, negated: !negated }
This normalization simplifies subsequent optimization and compilation steps.
Sources: llkv-table/src/planner/program.rs:286-343
Expression Compilation
Normalized expressions are compiled into two bytecode programs:
- EvalProgram - Stack-based evaluation of predicates
- DomainProgram - Set-based domain analysis for row filtering
The compilation process:
- Detects fusable predicates (multiple conditions on same field)
- Builds domain programs for NOT operations
- Produces postorder bytecode for stack-based evaluation
Sources: llkv-table/src/planner/program.rs:256-631
Expression AST Usage Flow
Key stages:
- SQL Parsing - External sqlparser produces SQL AST
- Plan Building - llkv-plan converts SQL AST to
Expr<String> - Translation - llkv-executor resolves column names to
FieldId - Normalization - llkv-table flattens and optimizes structure
- Compilation - llkv-table produces bytecode programs
- Execution - llkv-table evaluates against RecordBatch data
Sources: llkv-expr/src/expr.rs:1-8 llkv-executor/src/translation/expression.rs:18-174 llkv-table/src/planner/program.rs:256-631