This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression AST

Loading…

Expression AST

Relevant source files

Purpose and Scope

The Expression AST defines the abstract syntax tree for all logical and arithmetic expressions in LLKV. This module provides the core data structures used to represent predicates, scalar computations, aggregations, and subqueries throughout the query processing pipeline. The AST is type-agnostic and decoupled from Arrow’s concrete scalar types, allowing expressions to be constructed, transformed, and optimized before being bound to specific table schemas.

For information about translating expressions from string column names to field identifiers, see Expression Translation. For information about compiling expressions into executable bytecode, see Program Compilation. For information about evaluating expressions against data, see Scalar Evaluation and NumericKernels.

Sources: llkv-expr/src/expr.rs:1-14

Expression System Architecture

The expression system consists of two primary AST types that serve distinct purposes in query processing:

Expr<'a, F> represents boolean-valued expressions used in WHERE clauses, HAVING clauses, and join conditions. These expressions combine logical operators (AND, OR, NOT) with predicates that test column values against literals or ranges.

graph TB
    subgraph "Boolean Predicates"
        Expr["Expr&lt;'a, F&gt;\nBoolean logic"]
Filter["Filter&lt;'a, F&gt;\nField predicate"]
Operator["Operator&lt;'a&gt;\nComparison operators"]
end
    
    subgraph "Value Expressions"
        ScalarExpr["ScalarExpr&lt;F&gt;\nArithmetic expressions"]
BinaryOp["BinaryOp\n+, -, *, /, %"]
CompareOp["CompareOp\n=, !=, &lt;, &gt;"]
AggregateCall["AggregateCall&lt;F&gt;\nCOUNT, SUM, AVG"]
end
    
    subgraph "Supporting Types"
        Literal["Literal\nType-agnostic values"]
SubqueryExpr["SubqueryExpr\nEXISTS predicate"]
ScalarSubqueryExpr["ScalarSubqueryExpr\nScalar subquery"]
end
    
 
   Expr --> Filter
 
   Filter --> Operator
 
   Expr --> ScalarExpr
 
   ScalarExpr --> BinaryOp
 
   ScalarExpr --> CompareOp
 
   ScalarExpr --> AggregateCall
 
   ScalarExpr --> Literal
 
   Expr --> SubqueryExpr
 
   ScalarExpr --> ScalarSubqueryExpr
 
   Operator --> Literal

ScalarExpr<F> represents value-producing expressions used in SELECT projections, computed columns, and UPDATE assignments. These expressions support arithmetic operations, function calls, type casts, and complex nested computations.

Both types are parameterized by a generic field identifier type F, enabling the same AST structures to be used with different column reference schemes (e.g., String names during planning, FieldId integers during execution).

Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:125-182

Boolean Predicate Expressions

The Expr<'a, F> enum represents logical expressions that evaluate to boolean values. It forms the foundation for filtering operations throughout the query engine.

graph TD
    Expr["Expr&lt;'a, F&gt;"]
And["And(Vec&lt;Expr&gt;)\nLogical conjunction"]
Or["Or(Vec&lt;Expr&gt;)\nLogical disjunction"]
Not["Not(Box&lt;Expr&gt;)\nLogical negation"]
Pred["Pred(Filter)\nField predicate"]
Compare["Compare\nScalar comparison"]
InList["InList\nSet membership"]
IsNull["IsNull\nNULL test"]
Literal["Literal(bool)\nConstant boolean"]
Exists["Exists(SubqueryExpr)\nCorrelated EXISTS"]
Expr --> And
 
   Expr --> Or
 
   Expr --> Not
 
   Expr --> Pred
 
   Expr --> Compare
 
   Expr --> InList
 
   Expr --> IsNull
 
   Expr --> Literal
 
   Expr --> Exists

Expr Variants

Variant	Purpose	Example SQL
`And(Vec<Expr>)`	Logical conjunction of sub-expressions	`col1 = 5 AND col2 > 10`
`Or(Vec<Expr>)`	Logical disjunction of sub-expressions	`status = 'active' OR status = 'pending'`
`Not(Box<Expr>)`	Logical negation	`NOT (age < 18)`
`Pred(Filter)`	Single-field predicate with operator	`price < 100`
`Compare`	Comparison between two scalar expressions	`col1 + col2 > col3 * 2`
`InList`	Set membership test	`status IN ('active', 'pending')`
`IsNull`	NULL test for complex expressions	`(col1 + col2) IS NULL`
`Literal(bool)`	Constant true/false value	`WHERE true`
`Exists(SubqueryExpr)`	Correlated subquery existence test	`EXISTS (SELECT 1 FROM t2 WHERE ...)`

The And and Or variants accept vectors of expressions, allowing efficient representation of multi-way logical operations without deep nesting. The Pred variant wraps a Filter<'a, F> structure for simple single-column predicates, which can be efficiently pushed down to storage layer scanning operations.

Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:67-123

Filter and Operator Types

The Filter<'a, F> structure encapsulates a single predicate against a field, combining a field identifier with an operator:

The Operator<'a> enum defines comparison and pattern-matching operations over untyped Literal values:

Operator Variant	Description	Example
`Equals(Literal)`	Exact equality test	`status = 'active'`
`Range { lower, upper }`	Bounded range test	`age BETWEEN 18 AND 65`
`GreaterThan(Literal)`	Greater than comparison	`price > 100.0`
`GreaterThanOrEquals(Literal)`	Greater than or equal	`quantity >= 10`
`LessThan(Literal)`	Less than comparison	`age < 18`
`LessThanOrEquals(Literal)`	Less than or equal	`score <= 100`
`In(&'a [Literal])`	Set membership	`status IN ('active', 'pending')`
`StartsWith { pattern, case_sensitive }`	Prefix match	`name LIKE 'John%'`
`EndsWith { pattern, case_sensitive }`	Suffix match	`email LIKE '%@example.com'`
`Contains { pattern, case_sensitive }`	Substring match	`description LIKE '%keyword%'`
`IsNull`	NULL test	`email IS NULL`
`IsNotNull`	Non-NULL test	`email IS NOT NULL`

The In operator accepts a borrowed slice of literals to avoid allocations for small, static IN lists. The pattern-matching operators (StartsWith, EndsWith, Contains) support both case-sensitive and case-insensitive matching.

Sources: llkv-expr/src/expr.rs:365-428

graph TD
    ScalarExpr["ScalarExpr&lt;F&gt;"]
Column["Column(F)\nField reference"]
Literal["Literal\nConstant value"]
Binary["Binary\nArithmetic operation"]
Not["Not\nLogical negation"]
IsNull["IsNull\nNULL test"]
Aggregate["Aggregate\nAggregate function"]
GetField["GetField\nStruct field access"]
Cast["Cast\nType conversion"]
Compare["Compare\nBoolean comparison"]
Coalesce["Coalesce\nFirst non-NULL"]
ScalarSubquery["ScalarSubquery\nSubquery result"]
Case["Case\nConditional expression"]
Random["Random\nRandom number"]
ScalarExpr --> Column
 
   ScalarExpr --> Literal
 
   ScalarExpr --> Binary
 
   ScalarExpr --> Not
 
   ScalarExpr --> IsNull
 
   ScalarExpr --> Aggregate
 
   ScalarExpr --> GetField
 
   ScalarExpr --> Cast
 
   ScalarExpr --> Compare
 
   ScalarExpr --> Coalesce
 
   ScalarExpr --> ScalarSubquery
 
   ScalarExpr --> Case
 
   ScalarExpr --> Random

Scalar Value Expressions

The ScalarExpr<F> enum represents expressions that produce scalar values. These are used in SELECT projections, computed columns, ORDER BY clauses, and anywhere a value (rather than a boolean) is needed.

ScalarExpr Variants

Variant	Purpose	Example SQL
`Column(F)`	Reference to a table column	`price`
`Literal(Literal)`	Constant value	`42`, `'hello'`, `3.14`
`Binary { left, op, right }`	Arithmetic or logical binary operation	`price * quantity`
`Not(Box<ScalarExpr>)`	Logical negation returning 1/0	`NOT active`
`IsNull { expr, negated }`	NULL test returning 1/0	`col1 IS NULL`
`Aggregate(AggregateCall)`	Aggregate function call	`COUNT(*) + 1`
`GetField { base, field_name }`	Struct field extraction	`user.address.city`
`Cast { expr, data_type }`	Explicit type cast	`CAST(price AS INTEGER)`
`Compare { left, op, right }`	Comparison producing boolean result	`price > 100`
`Coalesce(Vec<ScalarExpr>)`	First non-NULL expression	`COALESCE(nickname, username)`
`ScalarSubquery(ScalarSubqueryExpr)`	Scalar subquery result	`(SELECT MAX(price) FROM ...)`
`Case { operand, branches, else_expr }`	Conditional expression	`CASE WHEN x > 0 THEN 1 ELSE -1 END`
`Random`	Random number generator	`RANDOM()`

The Binary variant supports arithmetic operators (+, -, *, /, %) as well as logical operators (AND, OR) and bitwise shift operators (<<, >>). The Compare variant produces boolean results (represented as 1/0 integers) from comparisons like col1 > col2.

Sources: llkv-expr/src/expr.rs:125-307

Binary and Comparison Operators

The BinaryOp enum defines arithmetic and logical binary operators:

The CompareOp enum defines comparison operators for scalar expressions:

These operators enable complex arithmetic expressions like (price * quantity * (1 - discount)) > 1000 and nested logical operations like (col1 + col2) > (col3 * 2) AND status = 'active'.

Sources: llkv-expr/src/expr.rs:309-363

Aggregate Function Calls

The AggregateCall<F> enum represents aggregate function invocations within scalar expressions. Unlike simple column aggregates, these variants operate on full expressions, enabling aggregations like AVG(col1 + col2) or SUM(-price).

graph LR
    AggregateCall["AggregateCall&lt;F&gt;"]
CountStar["CountStar\nCOUNT(*)"]
Count["Count { expr, distinct }\nCOUNT(expr)"]
Sum["Sum { expr, distinct }\nSUM(expr)"]
Total["Total { expr, distinct }\nTOTAL(expr)"]
Avg["Avg { expr, distinct }\nAVG(expr)"]
Min["Min(expr)\nMIN(expr)"]
Max["Max(expr)\nMAX(expr)"]
CountNulls["CountNulls(expr)\nCOUNT_NULLS(expr)"]
GroupConcat["GroupConcat { expr, distinct, separator }\nGROUP_CONCAT(expr)"]
AggregateCall --> CountStar
 
   AggregateCall --> Count
 
   AggregateCall --> Sum
 
   AggregateCall --> Total
 
   AggregateCall --> Avg
 
   AggregateCall --> Min
 
   AggregateCall --> Max
 
   AggregateCall --> CountNulls
 
   AggregateCall --> GroupConcat

AggregateCall Variants

Variant	SQL Equivalent	Distinct Support	Description
`CountStar`	`COUNT(*)`	No	Count all rows including NULLs
`Count { expr, distinct }`	`COUNT(expr)`	Yes	Count non-NULL expression values
`Sum { expr, distinct }`	`SUM(expr)`	Yes	Sum of expression values
`Total { expr, distinct }`	`TOTAL(expr)`	Yes	Sum returning 0 for empty set (SQLite semantics)
`Avg { expr, distinct }`	`AVG(expr)`	Yes	Average of expression values
`Min(expr)`	`MIN(expr)`	No	Minimum expression value
`Max(expr)`	`MAX(expr)`	No	Maximum expression value
`CountNulls(expr)`	N/A	No	Count NULL values in expression
`GroupConcat { expr, distinct, separator }`	`GROUP_CONCAT(expr)`	Yes	Concatenate values with separator

All variants except CountStar accept a Box<ScalarExpr<F>>, allowing aggregates to operate on computed expressions. For example, SUM(price * quantity) is represented as:

AggregateCall::Sum {
    expr: Box::new(ScalarExpr::Binary {
        left: Box::new(ScalarExpr::Column("price")),
        op: BinaryOp::Multiply,
        right: Box::new(ScalarExpr::Column("quantity")),
    }),
    distinct: false,
}

Sources: llkv-expr/src/expr.rs:184-215 llkv-expr/src/expr.rs:217-307

Subquery Integration

The expression AST supports two forms of subquery integration: boolean EXISTS predicates and scalar subqueries that produce single values.

Boolean EXISTS Subqueries

The SubqueryExpr structure represents a correlated EXISTS predicate within a boolean expression:

The id field references a subquery definition stored separately in the query plan (see FilterSubquery in llkv-plan/src/plans.rs:36-45), while negated indicates whether the SQL used NOT EXISTS. The separation of subquery metadata from the expression tree allows the same subquery to be referenced multiple times without duplication.

Scalar Subqueries

The ScalarSubqueryExpr structure represents a subquery that returns a single scalar value:

graph TB
    SelectPlan["SelectPlan"]
ExprFilter["filter: Option&lt;SelectFilter&gt;"]
SelectFilter["SelectFilter"]
Predicate["predicate: Expr&lt;'static, String&gt;"]
FilterSubqueries["subqueries: Vec&lt;FilterSubquery&gt;"]
ScalarSubqueries["scalar_subqueries: Vec&lt;ScalarSubquery&gt;"]
Projections["projections: Vec&lt;SelectProjection&gt;"]
ComputedProj["Computed { expr, alias }"]
ScalarExpr["expr: ScalarExpr&lt;String&gt;"]
SelectPlan --> ExprFilter
 
   ExprFilter --> SelectFilter
 
   SelectFilter --> Predicate
 
   SelectFilter --> FilterSubqueries
    
 
   SelectPlan --> ScalarSubqueries
 
   SelectPlan --> Projections
 
   Projections --> ComputedProj
 
   ComputedProj --> ScalarExpr
    
    Predicate -.references.-> FilterSubqueries
    ScalarExpr -.references.-> ScalarSubqueries

Scalar subqueries appear in value-producing contexts like SELECT (SELECT MAX(price) FROM items) AS max_price or WHERE quantity > (SELECT AVG(quantity) FROM inventory). The data_type field captures the Arrow data type of the subquery’s output column, enabling type checking during expression compilation.

The query planner populates the subqueries field in SelectFilter and the scalar_subqueries field in SelectPlan with complete subquery definitions, while expressions reference them by SubqueryId. This architecture enables efficient subquery execution and correlation tracking during query evaluation.

Sources: llkv-expr/src/expr.rs:45-66 llkv-plan/src/plans.rs:27-67

Literal Values

Expressions reference the Literal type from the llkv-expr crate’s literal module to represent constant values. The literal system is type-agnostic, deferring concrete type resolution until expressions are evaluated against actual table schemas.

The Literal enum (defined in llkv-expr/src/literal.rs) supports:

Null : SQL NULL value
Int128(i128) : Integer literals (wide representation to handle all integer sizes)
Float64(f64) : Floating-point literals
Decimal128(DecimalValue) : Fixed-precision decimal literals
String(String) : Text literals
Boolean(bool) : True/false literals
Date32(i32) : Date literals (days since Unix epoch)
Struct(Vec <(String, Literal)>): Structured data literals
Interval(IntervalValue) : Time interval literals

The use of Int128 for integer literals allows the expression system to represent values that may exceed i64 range during intermediate computations, with overflow checks deferred to evaluation time. The Decimal128 variant uses the DecimalValue type from llkv-types to maintain precision and scale metadata.

Literals can be constructed through From trait implementations, enabling ergonomic expression building:

Sources: llkv-expr/src/expr.rs10

Type Parameterization and Field References

Both Expr<'a, F> and ScalarExpr<F> are parameterized by a generic type F that represents field identifiers. This design enables the same AST structures to be used at different stages of query processing with different column reference schemes:

graph LR
    subgraph "Planning Stage"
        ExprString["Expr&lt;'static, String&gt;\nColumn names"]
ScalarString["ScalarExpr&lt;String&gt;\nColumn names"]
end
    
    subgraph "Translation Stage"
        Translator["Expression Translator"]
end
    
    subgraph "Execution Stage"
        ExprFieldId["Expr&lt;'static, FieldId&gt;\nNumeric IDs"]
ScalarFieldId["ScalarExpr&lt;FieldId&gt;\nNumeric IDs"]
end
    
 
   ExprString --> Translator
 
   ScalarString --> Translator
 
   Translator --> ExprFieldId
 
   Translator --> ScalarFieldId

Common Type Instantiations

Stage	Field Type	Use Case	Example Types
Planning	`String`	SQL parsing and plan construction	`Expr<'static, String>`
Execution	`FieldId`	Table scanning and evaluation	`Expr<'static, FieldId>`
Testing	`u32` or `&str`	Lightweight tests without full schema	`Expr<'static, u32>`

The lifetime parameter 'a in Expr<'a, F> represents the lifetime of borrowed data in Operator::In(&'a [Literal]), allowing filter expressions to reference static IN lists without heap allocation. Most expressions use 'static lifetime, indicating no borrowed data.

Field Reference Translation

The expression translation system (see Expression Translation) converts expressions from string-based column names to numeric FieldId identifiers by walking the AST and resolving names against table schemas. This transformation is represented by the generic parameter substitution:

Expr<'static, String>  →  Expr<'static, FieldId>
ScalarExpr<String>     →  ScalarExpr<FieldId>

The generic design allows the same expression evaluation logic to work with both naming schemes without code duplication, while maintaining type safety at compile time.

Sources: llkv-expr/src/expr.rs:14-21 llkv-expr/src/expr.rs:125-134 llkv-expr/src/expr.rs:365-370

Expression Construction Helpers

Both Expr and ScalarExpr provide builder methods for ergonomic AST construction:

Expr Helpers

ScalarExpr Helpers

These helpers simplify expression construction in query planners and translators by providing clearer semantics than direct enum variant construction.

Sources: llkv-expr/src/expr.rs:67-86 llkv-expr/src/expr.rs:217-307

Expression Analysis Methods

The Expr type provides utility methods for analyzing expression structure and optimizing query execution:

The is_trivially_true() method identifies expressions that cannot filter out any rows, such as:

Unbounded range filters: Operator::Range { lower: Unbounded, upper: Unbounded }
Literal true values: Expr::Literal(true)

Scan executors use this method to skip predicate evaluation entirely when the filter is guaranteed to match all rows, avoiding unnecessary computation.

Sources: llkv-expr/src/expr.rs:87-123

graph TB
    subgraph "Query Plans"
        SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
UpdatePlan["UpdatePlan"]
DeletePlan["DeletePlan"]
end
    
    subgraph "Expression Usage"
        FilterUsage["filter: Option&lt;SelectFilter&gt;\nWHERE clauses"]
HavingUsage["having: Option&lt;Expr&gt;\nHAVING clauses"]
ProjUsage["projections with ScalarExpr\nSELECT list"]
UpdateExpr["assignments with ScalarExpr\nSET clauses"]
JoinOn["on_condition in JoinMetadata\nJOIN ON clauses"]
end
    
 
   SelectPlan --> FilterUsage
 
   SelectPlan --> HavingUsage
 
   SelectPlan --> ProjUsage
 
   SelectPlan --> JoinOn
    
 
   UpdatePlan --> UpdateExpr
 
   UpdatePlan --> FilterUsage
    
 
   DeletePlan --> FilterUsage

Integration with Query Plans

Expressions integrate deeply with the query planning system defined in llkv-plan:

Expression Usage in Plans

Plan Type	Expression Field	Expression Type	Purpose
`SelectPlan`	`filter`	`SelectFilter` with `Expr<'static, String>`	WHERE clause filtering
`SelectPlan`	`having`	`Expr<'static, String>`	HAVING clause filtering
`SelectPlan`	`projections`	`SelectProjection::Computed` with `ScalarExpr<String>`	Computed columns
`SelectPlan`	`joins[].on_condition`	`Expr<'static, String>`	JOIN ON conditions
`UpdatePlan`	`filter`	`Expr<'static, String>`	WHERE clause for updates
`UpdatePlan`	`assignments[].value`	`AssignmentValue::Expression` with `ScalarExpr<String>`	SET clause expressions
`DeletePlan`	`filter`	`Expr<'static, String>`	WHERE clause for deletes

All plan-level expressions use String as the field identifier type since plans are constructed during SQL parsing before schema resolution. The query executor translates these to FieldId-based expressions before evaluation.

Sources: llkv-plan/src/plans.rs:27-34 llkv-plan/src/plans.rs:660-692 llkv-plan/src/plans.rs:794-828

Dismiss

Refresh this wiki

Enter email to refresh

Keyboard shortcuts

rust-llkv Documentation