Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression AST

Loading…

Expression AST

Relevant source files

Purpose and Scope

The Expression AST defines the abstract syntax tree for all logical and arithmetic expressions in LLKV. This module provides the core data structures used to represent predicates, scalar computations, aggregations, and subqueries throughout the query processing pipeline. The AST is type-agnostic and decoupled from Arrow’s concrete scalar types, allowing expressions to be constructed, transformed, and optimized before being bound to specific table schemas.

For information about translating expressions from string column names to field identifiers, see Expression Translation. For information about compiling expressions into executable bytecode, see Program Compilation. For information about evaluating expressions against data, see Scalar Evaluation and NumericKernels.

Sources: llkv-expr/src/expr.rs:1-14

Expression System Architecture

The expression system consists of two primary AST types that serve distinct purposes in query processing:

Expr<'a, F> represents boolean-valued expressions used in WHERE clauses, HAVING clauses, and join conditions. These expressions combine logical operators (AND, OR, NOT) with predicates that test column values against literals or ranges.

graph TB
    subgraph "Boolean Predicates"
        Expr["Expr&lt;'a, F&gt;\nBoolean logic"]
Filter["Filter&lt;'a, F&gt;\nField predicate"]
Operator["Operator&lt;'a&gt;\nComparison operators"]
end
    
    subgraph "Value Expressions"
        ScalarExpr["ScalarExpr&lt;F&gt;\nArithmetic expressions"]
BinaryOp["BinaryOp\n+, -, *, /, %"]
CompareOp["CompareOp\n=, !=, &lt;, &gt;"]
AggregateCall["AggregateCall&lt;F&gt;\nCOUNT, SUM, AVG"]
end
    
    subgraph "Supporting Types"
        Literal["Literal\nType-agnostic values"]
SubqueryExpr["SubqueryExpr\nEXISTS predicate"]
ScalarSubqueryExpr["ScalarSubqueryExpr\nScalar subquery"]
end
    
 
   Expr --> Filter
 
   Filter --> Operator
 
   Expr --> ScalarExpr
 
   ScalarExpr --> BinaryOp
 
   ScalarExpr --> CompareOp
 
   ScalarExpr --> AggregateCall
 
   ScalarExpr --> Literal
 
   Expr --> SubqueryExpr
 
   ScalarExpr --> ScalarSubqueryExpr
 
   Operator --> Literal

ScalarExpr<F> represents value-producing expressions used in SELECT projections, computed columns, and UPDATE assignments. These expressions support arithmetic operations, function calls, type casts, and complex nested computations.

Both types are parameterized by a generic field identifier type F, enabling the same AST structures to be used with different column reference schemes (e.g., String names during planning, FieldId integers during execution).

Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:125-182

Boolean Predicate Expressions

The Expr<'a, F> enum represents logical expressions that evaluate to boolean values. It forms the foundation for filtering operations throughout the query engine.

graph TD
    Expr["Expr&lt;'a, F&gt;"]
And["And(Vec&lt;Expr&gt;)\nLogical conjunction"]
Or["Or(Vec&lt;Expr&gt;)\nLogical disjunction"]
Not["Not(Box&lt;Expr&gt;)\nLogical negation"]
Pred["Pred(Filter)\nField predicate"]
Compare["Compare\nScalar comparison"]
InList["InList\nSet membership"]
IsNull["IsNull\nNULL test"]
Literal["Literal(bool)\nConstant boolean"]
Exists["Exists(SubqueryExpr)\nCorrelated EXISTS"]
Expr --> And
 
   Expr --> Or
 
   Expr --> Not
 
   Expr --> Pred
 
   Expr --> Compare
 
   Expr --> InList
 
   Expr --> IsNull
 
   Expr --> Literal
 
   Expr --> Exists

Expr Variants

VariantPurposeExample SQL
And(Vec<Expr>)Logical conjunction of sub-expressionscol1 = 5 AND col2 > 10
Or(Vec<Expr>)Logical disjunction of sub-expressionsstatus = 'active' OR status = 'pending'
Not(Box<Expr>)Logical negationNOT (age < 18)
Pred(Filter)Single-field predicate with operatorprice < 100
CompareComparison between two scalar expressionscol1 + col2 > col3 * 2
InListSet membership teststatus IN ('active', 'pending')
IsNullNULL test for complex expressions(col1 + col2) IS NULL
Literal(bool)Constant true/false valueWHERE true
Exists(SubqueryExpr)Correlated subquery existence testEXISTS (SELECT 1 FROM t2 WHERE ...)

The And and Or variants accept vectors of expressions, allowing efficient representation of multi-way logical operations without deep nesting. The Pred variant wraps a Filter<'a, F> structure for simple single-column predicates, which can be efficiently pushed down to storage layer scanning operations.

Sources: llkv-expr/src/expr.rs:14-43 llkv-expr/src/expr.rs:67-123

Filter and Operator Types

The Filter<'a, F> structure encapsulates a single predicate against a field, combining a field identifier with an operator:

The Operator<'a> enum defines comparison and pattern-matching operations over untyped Literal values:

Operator VariantDescriptionExample
Equals(Literal)Exact equality teststatus = 'active'
Range { lower, upper }Bounded range testage BETWEEN 18 AND 65
GreaterThan(Literal)Greater than comparisonprice > 100.0
GreaterThanOrEquals(Literal)Greater than or equalquantity >= 10
LessThan(Literal)Less than comparisonage < 18
LessThanOrEquals(Literal)Less than or equalscore <= 100
In(&'a [Literal])Set membershipstatus IN ('active', 'pending')
StartsWith { pattern, case_sensitive }Prefix matchname LIKE 'John%'
EndsWith { pattern, case_sensitive }Suffix matchemail LIKE '%@example.com'
Contains { pattern, case_sensitive }Substring matchdescription LIKE '%keyword%'
IsNullNULL testemail IS NULL
IsNotNullNon-NULL testemail IS NOT NULL

The In operator accepts a borrowed slice of literals to avoid allocations for small, static IN lists. The pattern-matching operators (StartsWith, EndsWith, Contains) support both case-sensitive and case-insensitive matching.

Sources: llkv-expr/src/expr.rs:365-428

graph TD
    ScalarExpr["ScalarExpr&lt;F&gt;"]
Column["Column(F)\nField reference"]
Literal["Literal\nConstant value"]
Binary["Binary\nArithmetic operation"]
Not["Not\nLogical negation"]
IsNull["IsNull\nNULL test"]
Aggregate["Aggregate\nAggregate function"]
GetField["GetField\nStruct field access"]
Cast["Cast\nType conversion"]
Compare["Compare\nBoolean comparison"]
Coalesce["Coalesce\nFirst non-NULL"]
ScalarSubquery["ScalarSubquery\nSubquery result"]
Case["Case\nConditional expression"]
Random["Random\nRandom number"]
ScalarExpr --> Column
 
   ScalarExpr --> Literal
 
   ScalarExpr --> Binary
 
   ScalarExpr --> Not
 
   ScalarExpr --> IsNull
 
   ScalarExpr --> Aggregate
 
   ScalarExpr --> GetField
 
   ScalarExpr --> Cast
 
   ScalarExpr --> Compare
 
   ScalarExpr --> Coalesce
 
   ScalarExpr --> ScalarSubquery
 
   ScalarExpr --> Case
 
   ScalarExpr --> Random

Scalar Value Expressions

The ScalarExpr<F> enum represents expressions that produce scalar values. These are used in SELECT projections, computed columns, ORDER BY clauses, and anywhere a value (rather than a boolean) is needed.

ScalarExpr Variants

VariantPurposeExample SQL
Column(F)Reference to a table columnprice
Literal(Literal)Constant value42, 'hello', 3.14
Binary { left, op, right }Arithmetic or logical binary operationprice * quantity
Not(Box<ScalarExpr>)Logical negation returning 1/0NOT active
IsNull { expr, negated }NULL test returning 1/0col1 IS NULL
Aggregate(AggregateCall)Aggregate function callCOUNT(*) + 1
GetField { base, field_name }Struct field extractionuser.address.city
Cast { expr, data_type }Explicit type castCAST(price AS INTEGER)
Compare { left, op, right }Comparison producing boolean resultprice > 100
Coalesce(Vec<ScalarExpr>)First non-NULL expressionCOALESCE(nickname, username)
ScalarSubquery(ScalarSubqueryExpr)Scalar subquery result(SELECT MAX(price) FROM ...)
Case { operand, branches, else_expr }Conditional expressionCASE WHEN x > 0 THEN 1 ELSE -1 END
RandomRandom number generatorRANDOM()

The Binary variant supports arithmetic operators (+, -, *, /, %) as well as logical operators (AND, OR) and bitwise shift operators (<<, >>). The Compare variant produces boolean results (represented as 1/0 integers) from comparisons like col1 > col2.

Sources: llkv-expr/src/expr.rs:125-307

Binary and Comparison Operators

The BinaryOp enum defines arithmetic and logical binary operators:

The CompareOp enum defines comparison operators for scalar expressions:

These operators enable complex arithmetic expressions like (price * quantity * (1 - discount)) > 1000 and nested logical operations like (col1 + col2) > (col3 * 2) AND status = 'active'.

Sources: llkv-expr/src/expr.rs:309-363

Aggregate Function Calls

The AggregateCall<F> enum represents aggregate function invocations within scalar expressions. Unlike simple column aggregates, these variants operate on full expressions, enabling aggregations like AVG(col1 + col2) or SUM(-price).

graph LR
    AggregateCall["AggregateCall&lt;F&gt;"]
CountStar["CountStar\nCOUNT(*)"]
Count["Count { expr, distinct }\nCOUNT(expr)"]
Sum["Sum { expr, distinct }\nSUM(expr)"]
Total["Total { expr, distinct }\nTOTAL(expr)"]
Avg["Avg { expr, distinct }\nAVG(expr)"]
Min["Min(expr)\nMIN(expr)"]
Max["Max(expr)\nMAX(expr)"]
CountNulls["CountNulls(expr)\nCOUNT_NULLS(expr)"]
GroupConcat["GroupConcat { expr, distinct, separator }\nGROUP_CONCAT(expr)"]
AggregateCall --> CountStar
 
   AggregateCall --> Count
 
   AggregateCall --> Sum
 
   AggregateCall --> Total
 
   AggregateCall --> Avg
 
   AggregateCall --> Min
 
   AggregateCall --> Max
 
   AggregateCall --> CountNulls
 
   AggregateCall --> GroupConcat

AggregateCall Variants

VariantSQL EquivalentDistinct SupportDescription
CountStarCOUNT(*)NoCount all rows including NULLs
Count { expr, distinct }COUNT(expr)YesCount non-NULL expression values
Sum { expr, distinct }SUM(expr)YesSum of expression values
Total { expr, distinct }TOTAL(expr)YesSum returning 0 for empty set (SQLite semantics)
Avg { expr, distinct }AVG(expr)YesAverage of expression values
Min(expr)MIN(expr)NoMinimum expression value
Max(expr)MAX(expr)NoMaximum expression value
CountNulls(expr)N/ANoCount NULL values in expression
GroupConcat { expr, distinct, separator }GROUP_CONCAT(expr)YesConcatenate values with separator

All variants except CountStar accept a Box<ScalarExpr<F>>, allowing aggregates to operate on computed expressions. For example, SUM(price * quantity) is represented as:

AggregateCall::Sum {
    expr: Box::new(ScalarExpr::Binary {
        left: Box::new(ScalarExpr::Column("price")),
        op: BinaryOp::Multiply,
        right: Box::new(ScalarExpr::Column("quantity")),
    }),
    distinct: false,
}

Sources: llkv-expr/src/expr.rs:184-215 llkv-expr/src/expr.rs:217-307

Subquery Integration

The expression AST supports two forms of subquery integration: boolean EXISTS predicates and scalar subqueries that produce single values.

Boolean EXISTS Subqueries

The SubqueryExpr structure represents a correlated EXISTS predicate within a boolean expression:

The id field references a subquery definition stored separately in the query plan (see FilterSubquery in llkv-plan/src/plans.rs:36-45), while negated indicates whether the SQL used NOT EXISTS. The separation of subquery metadata from the expression tree allows the same subquery to be referenced multiple times without duplication.

Scalar Subqueries

The ScalarSubqueryExpr structure represents a subquery that returns a single scalar value:

graph TB
    SelectPlan["SelectPlan"]
ExprFilter["filter: Option&lt;SelectFilter&gt;"]
SelectFilter["SelectFilter"]
Predicate["predicate: Expr&lt;'static, String&gt;"]
FilterSubqueries["subqueries: Vec&lt;FilterSubquery&gt;"]
ScalarSubqueries["scalar_subqueries: Vec&lt;ScalarSubquery&gt;"]
Projections["projections: Vec&lt;SelectProjection&gt;"]
ComputedProj["Computed { expr, alias }"]
ScalarExpr["expr: ScalarExpr&lt;String&gt;"]
SelectPlan --> ExprFilter
 
   ExprFilter --> SelectFilter
 
   SelectFilter --> Predicate
 
   SelectFilter --> FilterSubqueries
    
 
   SelectPlan --> ScalarSubqueries
 
   SelectPlan --> Projections
 
   Projections --> ComputedProj
 
   ComputedProj --> ScalarExpr
    
    Predicate -.references.-> FilterSubqueries
    ScalarExpr -.references.-> ScalarSubqueries

Scalar subqueries appear in value-producing contexts like SELECT (SELECT MAX(price) FROM items) AS max_price or WHERE quantity > (SELECT AVG(quantity) FROM inventory). The data_type field captures the Arrow data type of the subquery’s output column, enabling type checking during expression compilation.

The query planner populates the subqueries field in SelectFilter and the scalar_subqueries field in SelectPlan with complete subquery definitions, while expressions reference them by SubqueryId. This architecture enables efficient subquery execution and correlation tracking during query evaluation.

Sources: llkv-expr/src/expr.rs:45-66 llkv-plan/src/plans.rs:27-67

Literal Values

Expressions reference the Literal type from the llkv-expr crate’s literal module to represent constant values. The literal system is type-agnostic, deferring concrete type resolution until expressions are evaluated against actual table schemas.

The Literal enum (defined in llkv-expr/src/literal.rs) supports:

  • Null : SQL NULL value
  • Int128(i128) : Integer literals (wide representation to handle all integer sizes)
  • Float64(f64) : Floating-point literals
  • Decimal128(DecimalValue) : Fixed-precision decimal literals
  • String(String) : Text literals
  • Boolean(bool) : True/false literals
  • Date32(i32) : Date literals (days since Unix epoch)
  • Struct(Vec <(String, Literal)>): Structured data literals
  • Interval(IntervalValue) : Time interval literals

The use of Int128 for integer literals allows the expression system to represent values that may exceed i64 range during intermediate computations, with overflow checks deferred to evaluation time. The Decimal128 variant uses the DecimalValue type from llkv-types to maintain precision and scale metadata.

Literals can be constructed through From trait implementations, enabling ergonomic expression building:

Sources: llkv-expr/src/expr.rs10

Type Parameterization and Field References

Both Expr<'a, F> and ScalarExpr<F> are parameterized by a generic type F that represents field identifiers. This design enables the same AST structures to be used at different stages of query processing with different column reference schemes:

graph LR
    subgraph "Planning Stage"
        ExprString["Expr&lt;'static, String&gt;\nColumn names"]
ScalarString["ScalarExpr&lt;String&gt;\nColumn names"]
end
    
    subgraph "Translation Stage"
        Translator["Expression Translator"]
end
    
    subgraph "Execution Stage"
        ExprFieldId["Expr&lt;'static, FieldId&gt;\nNumeric IDs"]
ScalarFieldId["ScalarExpr&lt;FieldId&gt;\nNumeric IDs"]
end
    
 
   ExprString --> Translator
 
   ScalarString --> Translator
 
   Translator --> ExprFieldId
 
   Translator --> ScalarFieldId

Common Type Instantiations

StageField TypeUse CaseExample Types
PlanningStringSQL parsing and plan constructionExpr<'static, String>
ExecutionFieldIdTable scanning and evaluationExpr<'static, FieldId>
Testingu32 or &strLightweight tests without full schemaExpr<'static, u32>

The lifetime parameter 'a in Expr<'a, F> represents the lifetime of borrowed data in Operator::In(&'a [Literal]), allowing filter expressions to reference static IN lists without heap allocation. Most expressions use 'static lifetime, indicating no borrowed data.

Field Reference Translation

The expression translation system (see Expression Translation) converts expressions from string-based column names to numeric FieldId identifiers by walking the AST and resolving names against table schemas. This transformation is represented by the generic parameter substitution:

Expr<'static, String>  →  Expr<'static, FieldId>
ScalarExpr<String>     →  ScalarExpr<FieldId>

The generic design allows the same expression evaluation logic to work with both naming schemes without code duplication, while maintaining type safety at compile time.

Sources: llkv-expr/src/expr.rs:14-21 llkv-expr/src/expr.rs:125-134 llkv-expr/src/expr.rs:365-370

Expression Construction Helpers

Both Expr and ScalarExpr provide builder methods for ergonomic AST construction:

Expr Helpers

ScalarExpr Helpers

These helpers simplify expression construction in query planners and translators by providing clearer semantics than direct enum variant construction.

Sources: llkv-expr/src/expr.rs:67-86 llkv-expr/src/expr.rs:217-307

Expression Analysis Methods

The Expr type provides utility methods for analyzing expression structure and optimizing query execution:

The is_trivially_true() method identifies expressions that cannot filter out any rows, such as:

  • Unbounded range filters: Operator::Range { lower: Unbounded, upper: Unbounded }
  • Literal true values: Expr::Literal(true)

Scan executors use this method to skip predicate evaluation entirely when the filter is guaranteed to match all rows, avoiding unnecessary computation.

Sources: llkv-expr/src/expr.rs:87-123

graph TB
    subgraph "Query Plans"
        SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
UpdatePlan["UpdatePlan"]
DeletePlan["DeletePlan"]
end
    
    subgraph "Expression Usage"
        FilterUsage["filter: Option&lt;SelectFilter&gt;\nWHERE clauses"]
HavingUsage["having: Option&lt;Expr&gt;\nHAVING clauses"]
ProjUsage["projections with ScalarExpr\nSELECT list"]
UpdateExpr["assignments with ScalarExpr\nSET clauses"]
JoinOn["on_condition in JoinMetadata\nJOIN ON clauses"]
end
    
 
   SelectPlan --> FilterUsage
 
   SelectPlan --> HavingUsage
 
   SelectPlan --> ProjUsage
 
   SelectPlan --> JoinOn
    
 
   UpdatePlan --> UpdateExpr
 
   UpdatePlan --> FilterUsage
    
 
   DeletePlan --> FilterUsage

Integration with Query Plans

Expressions integrate deeply with the query planning system defined in llkv-plan:

Expression Usage in Plans

Plan TypeExpression FieldExpression TypePurpose
SelectPlanfilterSelectFilter with Expr<'static, String>WHERE clause filtering
SelectPlanhavingExpr<'static, String>HAVING clause filtering
SelectPlanprojectionsSelectProjection::Computed with ScalarExpr<String>Computed columns
SelectPlanjoins[].on_conditionExpr<'static, String>JOIN ON conditions
UpdatePlanfilterExpr<'static, String>WHERE clause for updates
UpdatePlanassignments[].valueAssignmentValue::Expression with ScalarExpr<String>SET clause expressions
DeletePlanfilterExpr<'static, String>WHERE clause for deletes

All plan-level expressions use String as the field identifier type since plans are constructed during SQL parsing before schema resolution. The query executor translates these to FieldId-based expressions before evaluation.

Sources: llkv-plan/src/plans.rs:27-34 llkv-plan/src/plans.rs:660-692 llkv-plan/src/plans.rs:794-828

Dismiss

Refresh this wiki

Enter email to refresh