This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression AST
Loading…
Expression AST
Relevant source files
Purpose and Scope
The Expression AST defines the abstract syntax tree for all logical and arithmetic expressions in LLKV. This module provides the core data structures used to represent predicates, scalar computations, aggregations, and subqueries throughout the query processing pipeline. The AST is type-agnostic and decoupled from Arrow’s concrete scalar types, allowing expressions to be constructed, transformed, and optimized before being bound to specific table schemas.
For information about translating expressions from string column names to field identifiers, see Expression Translation. For information about compiling expressions into executable bytecode, see Program Compilation. For information about evaluating expressions against data, see Scalar Evaluation and NumericKernels.
Sources: llkv-expr/src/expr.rs:1-14
Expression System Architecture
The expression system consists of two primary AST types that serve distinct purposes in query processing:
Expr<'a, F> represents boolean-valued expressions used in WHERE clauses, HAVING clauses, and join conditions. These expressions combine logical operators (AND, OR, NOT) with predicates that test column values against literals or ranges.
graph TB
subgraph "Boolean Predicates"
Expr["Expr<'a, F>\nBoolean logic"]
Filter["Filter<'a, F>\nField predicate"]
Operator["Operator<'a>\nComparison operators"]
end
subgraph "Value Expressions"
ScalarExpr["ScalarExpr<F>\nArithmetic expressions"]
BinaryOp["BinaryOp\n+, -, *, /, %"]
CompareOp["CompareOp\n=, !=, <, >"]
AggregateCall["AggregateCall<F>\nCOUNT, SUM, AVG"]
end
subgraph "Supporting Types"
Literal["Literal\nType-agnostic values"]
SubqueryExpr["SubqueryExpr\nEXISTS predicate"]
ScalarSubqueryExpr["ScalarSubqueryExpr\nScalar subquery"]
end
Expr --> Filter
Filter --> Operator
Expr --> ScalarExpr
ScalarExpr --> BinaryOp
ScalarExpr --> CompareOp
ScalarExpr --> AggregateCall
ScalarExpr --> Literal
Expr --> SubqueryExpr
ScalarExpr --> ScalarSubqueryExpr
Operator --> Literal
ScalarExpr<F> represents value-producing expressions used in SELECT projections, computed columns, and UPDATE assignments. These expressions support arithmetic operations, function calls, type casts, and complex nested computations.
Both types are parameterized by a generic field identifier type F, enabling the same AST structures to be used with different column reference schemes (e.g., String names during planning, FieldId integers during execution).
Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:125-182
Boolean Predicate Expressions
The Expr<'a, F> enum represents logical expressions that evaluate to boolean values. It forms the foundation for filtering operations throughout the query engine.
graph TD
Expr["Expr<'a, F>"]
And["And(Vec<Expr>)\nLogical conjunction"]
Or["Or(Vec<Expr>)\nLogical disjunction"]
Not["Not(Box<Expr>)\nLogical negation"]
Pred["Pred(Filter)\nField predicate"]
Compare["Compare\nScalar comparison"]
InList["InList\nSet membership"]
IsNull["IsNull\nNULL test"]
Literal["Literal(bool)\nConstant boolean"]
Exists["Exists(SubqueryExpr)\nCorrelated EXISTS"]
Expr --> And
Expr --> Or
Expr --> Not
Expr --> Pred
Expr --> Compare
Expr --> InList
Expr --> IsNull
Expr --> Literal
Expr --> Exists
Expr Variants
| Variant | Purpose | Example SQL |
|---|---|---|
And(Vec<Expr>) | Logical conjunction of sub-expressions | col1 = 5 AND col2 > 10 |
Or(Vec<Expr>) | Logical disjunction of sub-expressions | status = 'active' OR status = 'pending' |
Not(Box<Expr>) | Logical negation | NOT (age < 18) |
Pred(Filter) | Single-field predicate with operator | price < 100 |
Compare | Comparison between two scalar expressions | col1 + col2 > col3 * 2 |
InList | Set membership test | status IN ('active', 'pending') |
IsNull | NULL test for complex expressions | (col1 + col2) IS NULL |
Literal(bool) | Constant true/false value | WHERE true |
Exists(SubqueryExpr) | Correlated subquery existence test | EXISTS (SELECT 1 FROM t2 WHERE ...) |
The And and Or variants accept vectors of expressions, allowing efficient representation of multi-way logical operations without deep nesting. The Pred variant wraps a Filter<'a, F> structure for simple single-column predicates, which can be efficiently pushed down to storage layer scanning operations.
Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:67-123
Filter and Operator Types
The Filter<'a, F> structure encapsulates a single predicate against a field, combining a field identifier with an operator:
The Operator<'a> enum defines comparison and pattern-matching operations over untyped Literal values:
| Operator Variant | Description | Example |
|---|---|---|
Equals(Literal) | Exact equality test | status = 'active' |
Range { lower, upper } | Bounded range test | age BETWEEN 18 AND 65 |
GreaterThan(Literal) | Greater than comparison | price > 100.0 |
GreaterThanOrEquals(Literal) | Greater than or equal | quantity >= 10 |
LessThan(Literal) | Less than comparison | age < 18 |
LessThanOrEquals(Literal) | Less than or equal | score <= 100 |
In(&'a [Literal]) | Set membership | status IN ('active', 'pending') |
StartsWith { pattern, case_sensitive } | Prefix match | name LIKE 'John%' |
EndsWith { pattern, case_sensitive } | Suffix match | email LIKE '%@example.com' |
Contains { pattern, case_sensitive } | Substring match | description LIKE '%keyword%' |
IsNull | NULL test | email IS NULL |
IsNotNull | Non-NULL test | email IS NOT NULL |
The In operator accepts a borrowed slice of literals to avoid allocations for small, static IN lists. The pattern-matching operators (StartsWith, EndsWith, Contains) support both case-sensitive and case-insensitive matching.
Sources: llkv-expr/src/expr.rs:365-428
graph TD
ScalarExpr["ScalarExpr<F>"]
Column["Column(F)\nField reference"]
Literal["Literal\nConstant value"]
Binary["Binary\nArithmetic operation"]
Not["Not\nLogical negation"]
IsNull["IsNull\nNULL test"]
Aggregate["Aggregate\nAggregate function"]
GetField["GetField\nStruct field access"]
Cast["Cast\nType conversion"]
Compare["Compare\nBoolean comparison"]
Coalesce["Coalesce\nFirst non-NULL"]
ScalarSubquery["ScalarSubquery\nSubquery result"]
Case["Case\nConditional expression"]
Random["Random\nRandom number"]
ScalarExpr --> Column
ScalarExpr --> Literal
ScalarExpr --> Binary
ScalarExpr --> Not
ScalarExpr --> IsNull
ScalarExpr --> Aggregate
ScalarExpr --> GetField
ScalarExpr --> Cast
ScalarExpr --> Compare
ScalarExpr --> Coalesce
ScalarExpr --> ScalarSubquery
ScalarExpr --> Case
ScalarExpr --> Random
Scalar Value Expressions
The ScalarExpr<F> enum represents expressions that produce scalar values. These are used in SELECT projections, computed columns, ORDER BY clauses, and anywhere a value (rather than a boolean) is needed.
ScalarExpr Variants
| Variant | Purpose | Example SQL |
|---|---|---|
Column(F) | Reference to a table column | price |
Literal(Literal) | Constant value | 42, 'hello', 3.14 |
Binary { left, op, right } | Arithmetic or logical binary operation | price * quantity |
Not(Box<ScalarExpr>) | Logical negation returning 1/0 | NOT active |
IsNull { expr, negated } | NULL test returning 1/0 | col1 IS NULL |
Aggregate(AggregateCall) | Aggregate function call | COUNT(*) + 1 |
GetField { base, field_name } | Struct field extraction | user.address.city |
Cast { expr, data_type } | Explicit type cast | CAST(price AS INTEGER) |
Compare { left, op, right } | Comparison producing boolean result | price > 100 |
Coalesce(Vec<ScalarExpr>) | First non-NULL expression | COALESCE(nickname, username) |
ScalarSubquery(ScalarSubqueryExpr) | Scalar subquery result | (SELECT MAX(price) FROM ...) |
Case { operand, branches, else_expr } | Conditional expression | CASE WHEN x > 0 THEN 1 ELSE -1 END |
Random | Random number generator | RANDOM() |
The Binary variant supports arithmetic operators (+, -, *, /, %) as well as logical operators (AND, OR) and bitwise shift operators (<<, >>). The Compare variant produces boolean results (represented as 1/0 integers) from comparisons like col1 > col2.
Sources: llkv-expr/src/expr.rs:125-307
Binary and Comparison Operators
The BinaryOp enum defines arithmetic and logical binary operators:
The CompareOp enum defines comparison operators for scalar expressions:
These operators enable complex arithmetic expressions like (price * quantity * (1 - discount)) > 1000 and nested logical operations like (col1 + col2) > (col3 * 2) AND status = 'active'.
Sources: llkv-expr/src/expr.rs:309-363
Aggregate Function Calls
The AggregateCall<F> enum represents aggregate function invocations within scalar expressions. Unlike simple column aggregates, these variants operate on full expressions, enabling aggregations like AVG(col1 + col2) or SUM(-price).
graph LR
AggregateCall["AggregateCall<F>"]
CountStar["CountStar\nCOUNT(*)"]
Count["Count { expr, distinct }\nCOUNT(expr)"]
Sum["Sum { expr, distinct }\nSUM(expr)"]
Total["Total { expr, distinct }\nTOTAL(expr)"]
Avg["Avg { expr, distinct }\nAVG(expr)"]
Min["Min(expr)\nMIN(expr)"]
Max["Max(expr)\nMAX(expr)"]
CountNulls["CountNulls(expr)\nCOUNT_NULLS(expr)"]
GroupConcat["GroupConcat { expr, distinct, separator }\nGROUP_CONCAT(expr)"]
AggregateCall --> CountStar
AggregateCall --> Count
AggregateCall --> Sum
AggregateCall --> Total
AggregateCall --> Avg
AggregateCall --> Min
AggregateCall --> Max
AggregateCall --> CountNulls
AggregateCall --> GroupConcat
AggregateCall Variants
| Variant | SQL Equivalent | Distinct Support | Description |
|---|---|---|---|
CountStar | COUNT(*) | No | Count all rows including NULLs |
Count { expr, distinct } | COUNT(expr) | Yes | Count non-NULL expression values |
Sum { expr, distinct } | SUM(expr) | Yes | Sum of expression values |
Total { expr, distinct } | TOTAL(expr) | Yes | Sum returning 0 for empty set (SQLite semantics) |
Avg { expr, distinct } | AVG(expr) | Yes | Average of expression values |
Min(expr) | MIN(expr) | No | Minimum expression value |
Max(expr) | MAX(expr) | No | Maximum expression value |
CountNulls(expr) | N/A | No | Count NULL values in expression |
GroupConcat { expr, distinct, separator } | GROUP_CONCAT(expr) | Yes | Concatenate values with separator |
All variants except CountStar accept a Box<ScalarExpr<F>>, allowing aggregates to operate on computed expressions. For example, SUM(price * quantity) is represented as:
AggregateCall::Sum {
expr: Box::new(ScalarExpr::Binary {
left: Box::new(ScalarExpr::Column("price")),
op: BinaryOp::Multiply,
right: Box::new(ScalarExpr::Column("quantity")),
}),
distinct: false,
}
Sources: llkv-expr/src/expr.rs:184-215 llkv-expr/src/expr.rs:217-307
Subquery Integration
The expression AST supports two forms of subquery integration: boolean EXISTS predicates and scalar subqueries that produce single values.
Boolean EXISTS Subqueries
The SubqueryExpr structure represents a correlated EXISTS predicate within a boolean expression:
The id field references a subquery definition stored separately in the query plan (see FilterSubquery in llkv-plan/src/plans.rs:36-45), while negated indicates whether the SQL used NOT EXISTS. The separation of subquery metadata from the expression tree allows the same subquery to be referenced multiple times without duplication.
Scalar Subqueries
The ScalarSubqueryExpr structure represents a subquery that returns a single scalar value:
graph TB
SelectPlan["SelectPlan"]
ExprFilter["filter: Option<SelectFilter>"]
SelectFilter["SelectFilter"]
Predicate["predicate: Expr<'static, String>"]
FilterSubqueries["subqueries: Vec<FilterSubquery>"]
ScalarSubqueries["scalar_subqueries: Vec<ScalarSubquery>"]
Projections["projections: Vec<SelectProjection>"]
ComputedProj["Computed { expr, alias }"]
ScalarExpr["expr: ScalarExpr<String>"]
SelectPlan --> ExprFilter
ExprFilter --> SelectFilter
SelectFilter --> Predicate
SelectFilter --> FilterSubqueries
SelectPlan --> ScalarSubqueries
SelectPlan --> Projections
Projections --> ComputedProj
ComputedProj --> ScalarExpr
Predicate -.references.-> FilterSubqueries
ScalarExpr -.references.-> ScalarSubqueries
Scalar subqueries appear in value-producing contexts like SELECT (SELECT MAX(price) FROM items) AS max_price or WHERE quantity > (SELECT AVG(quantity) FROM inventory). The data_type field captures the Arrow data type of the subquery’s output column, enabling type checking during expression compilation.
The query planner populates the subqueries field in SelectFilter and the scalar_subqueries field in SelectPlan with complete subquery definitions, while expressions reference them by SubqueryId. This architecture enables efficient subquery execution and correlation tracking during query evaluation.
Sources: llkv-expr/src/expr.rs:45-66 llkv-plan/src/plans.rs:27-67
Literal Values
Expressions reference the Literal type from the llkv-expr crate’s literal module to represent constant values. The literal system is type-agnostic, deferring concrete type resolution until expressions are evaluated against actual table schemas.
The Literal enum (defined in llkv-expr/src/literal.rs) supports:
- Null : SQL NULL value
- Int128(i128) : Integer literals (wide representation to handle all integer sizes)
- Float64(f64) : Floating-point literals
- Decimal128(DecimalValue) : Fixed-precision decimal literals
- String(String) : Text literals
- Boolean(bool) : True/false literals
- Date32(i32) : Date literals (days since Unix epoch)
- Struct(Vec <(String, Literal)>): Structured data literals
- Interval(IntervalValue) : Time interval literals
The use of Int128 for integer literals allows the expression system to represent values that may exceed i64 range during intermediate computations, with overflow checks deferred to evaluation time. The Decimal128 variant uses the DecimalValue type from llkv-types to maintain precision and scale metadata.
Literals can be constructed through From trait implementations, enabling ergonomic expression building:
Sources: llkv-expr/src/expr.rs10
Type Parameterization and Field References
Both Expr<'a, F> and ScalarExpr<F> are parameterized by a generic type F that represents field identifiers. This design enables the same AST structures to be used at different stages of query processing with different column reference schemes:
graph LR
subgraph "Planning Stage"
ExprString["Expr<'static, String>\nColumn names"]
ScalarString["ScalarExpr<String>\nColumn names"]
end
subgraph "Translation Stage"
Translator["Expression Translator"]
end
subgraph "Execution Stage"
ExprFieldId["Expr<'static, FieldId>\nNumeric IDs"]
ScalarFieldId["ScalarExpr<FieldId>\nNumeric IDs"]
end
ExprString --> Translator
ScalarString --> Translator
Translator --> ExprFieldId
Translator --> ScalarFieldId
Common Type Instantiations
| Stage | Field Type | Use Case | Example Types |
|---|---|---|---|
| Planning | String | SQL parsing and plan construction | Expr<'static, String> |
| Execution | FieldId | Table scanning and evaluation | Expr<'static, FieldId> |
| Testing | u32 or &str | Lightweight tests without full schema | Expr<'static, u32> |
The lifetime parameter 'a in Expr<'a, F> represents the lifetime of borrowed data in Operator::In(&'a [Literal]), allowing filter expressions to reference static IN lists without heap allocation. Most expressions use 'static lifetime, indicating no borrowed data.
Field Reference Translation
The expression translation system (see Expression Translation) converts expressions from string-based column names to numeric FieldId identifiers by walking the AST and resolving names against table schemas. This transformation is represented by the generic parameter substitution:
Expr<'static, String> → Expr<'static, FieldId>
ScalarExpr<String> → ScalarExpr<FieldId>
The generic design allows the same expression evaluation logic to work with both naming schemes without code duplication, while maintaining type safety at compile time.
Sources: llkv-expr/src/expr.rs:14-21 llkv-expr/src/expr.rs:125-134 llkv-expr/src/expr.rs:365-370
Expression Construction Helpers
Both Expr and ScalarExpr provide builder methods for ergonomic AST construction:
Expr Helpers
ScalarExpr Helpers
These helpers simplify expression construction in query planners and translators by providing clearer semantics than direct enum variant construction.
Sources: llkv-expr/src/expr.rs:67-86 llkv-expr/src/expr.rs:217-307
Expression Analysis Methods
The Expr type provides utility methods for analyzing expression structure and optimizing query execution:
The is_trivially_true() method identifies expressions that cannot filter out any rows, such as:
- Unbounded range filters:
Operator::Range { lower: Unbounded, upper: Unbounded } - Literal true values:
Expr::Literal(true)
Scan executors use this method to skip predicate evaluation entirely when the filter is guaranteed to match all rows, avoiding unnecessary computation.
Sources: llkv-expr/src/expr.rs:87-123
graph TB
subgraph "Query Plans"
SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
UpdatePlan["UpdatePlan"]
DeletePlan["DeletePlan"]
end
subgraph "Expression Usage"
FilterUsage["filter: Option<SelectFilter>\nWHERE clauses"]
HavingUsage["having: Option<Expr>\nHAVING clauses"]
ProjUsage["projections with ScalarExpr\nSELECT list"]
UpdateExpr["assignments with ScalarExpr\nSET clauses"]
JoinOn["on_condition in JoinMetadata\nJOIN ON clauses"]
end
SelectPlan --> FilterUsage
SelectPlan --> HavingUsage
SelectPlan --> ProjUsage
SelectPlan --> JoinOn
UpdatePlan --> UpdateExpr
UpdatePlan --> FilterUsage
DeletePlan --> FilterUsage
Integration with Query Plans
Expressions integrate deeply with the query planning system defined in llkv-plan:
Expression Usage in Plans
| Plan Type | Expression Field | Expression Type | Purpose |
|---|---|---|---|
SelectPlan | filter | SelectFilter with Expr<'static, String> | WHERE clause filtering |
SelectPlan | having | Expr<'static, String> | HAVING clause filtering |
SelectPlan | projections | SelectProjection::Computed with ScalarExpr<String> | Computed columns |
SelectPlan | joins[].on_condition | Expr<'static, String> | JOIN ON conditions |
UpdatePlan | filter | Expr<'static, String> | WHERE clause for updates |
UpdatePlan | assignments[].value | AssignmentValue::Expression with ScalarExpr<String> | SET clause expressions |
DeletePlan | filter | Expr<'static, String> | WHERE clause for deletes |
All plan-level expressions use String as the field identifier type since plans are constructed during SQL parsing before schema resolution. The query executor translates these to FieldId-based expressions before evaluation.
Sources: llkv-plan/src/plans.rs:27-34 llkv-plan/src/plans.rs:660-692 llkv-plan/src/plans.rs:794-828
Dismiss
Refresh this wiki
Enter email to refresh