This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression System

Loading…

Expression System

Relevant source files

The Expression System provides the foundational Abstract Syntax Tree (AST) and type system for representing predicates, scalar computations, and aggregate functions throughout LLKV’s query processing pipeline. This system decouples expression semantics from concrete Arrow data types through a generic design that allows expressions to be built, translated, optimized, and evaluated independently.

For information about query planning structures that contain these expressions, see Query Planning. For details on how expressions are executed and evaluated, see Query Execution.

Purpose and Scope

This page documents the expression representation layer defined primarily in llkv-expr, covering:

Boolean predicate expressions (Expr) for WHERE and HAVING clauses
Scalar arithmetic expressions (ScalarExpr) for projections and computed columns
Operator types and literal value representations
Subquery integration points
Generic field identifier patterns that enable expression translation from column names to internal field IDs

graph TB
    subgraph "Boolean Expression Layer"
        Expr["Expr&lt;'a, F&gt;\nBoolean Predicates"]
ExprAnd["And(Vec&lt;Expr&gt;)"]
ExprOr["Or(Vec&lt;Expr&gt;)"]
ExprNot["Not(Box&lt;Expr&gt;)"]
ExprPred["Pred(Filter)"]
ExprCompare["Compare{left, op, right}"]
ExprInList["InList{expr, list, negated}"]
ExprIsNull["IsNull{expr, negated}"]
ExprLiteral["Literal(bool)"]
ExprExists["Exists(SubqueryExpr)"]
end
    
    subgraph "Scalar Expression Layer"
        ScalarExpr["ScalarExpr&lt;F&gt;\nArithmetic/Values"]
SEColumn["Column(F)"]
SELiteral["Literal(Literal)"]
SEBinary["Binary{left, op, right}"]
SENot["Not(Box&lt;ScalarExpr&gt;)"]
SEIsNull["IsNull{expr, negated}"]
SEAggregate["Aggregate(AggregateCall)"]
SEGetField["GetField{base, field_name}"]
SECast["Cast{expr, data_type}"]
SECompare["Compare{left, op, right}"]
SECoalesce["Coalesce(Vec&lt;ScalarExpr&gt;)"]
SEScalarSubquery["ScalarSubquery(ScalarSubqueryExpr)"]
SECase["Case{operand, branches, else_expr}"]
SERandom["Random"]
end
    
    subgraph "Filter Layer"
        Filter["Filter&lt;'a, F&gt;"]
FilterField["field_id: F"]
FilterOp["op: Operator&lt;'a&gt;"]
end
    
    subgraph "Operator Types"
        Operator["Operator&lt;'a&gt;"]
OpEquals["Equals(Literal)"]
OpRange["Range{lower, upper}"]
OpGT["GreaterThan(Literal)"]
OpLT["LessThan(Literal)"]
OpIn["In(&amp;'a [Literal])"]
OpStartsWith["StartsWith{pattern, case_sensitive}"]
OpEndsWith["EndsWith{pattern, case_sensitive}"]
OpContains["Contains{pattern, case_sensitive}"]
OpIsNull["IsNull"]
OpIsNotNull["IsNotNull"]
end
    
 
   Expr --> ExprAnd
 
   Expr --> ExprOr
 
   Expr --> ExprNot
 
   Expr --> ExprPred
 
   Expr --> ExprCompare
 
   Expr --> ExprInList
 
   Expr --> ExprIsNull
 
   Expr --> ExprLiteral
 
   Expr --> ExprExists
    
 
   ExprPred --> Filter
 
   ExprCompare --> ScalarExpr
 
   ExprInList --> ScalarExpr
 
   ExprIsNull --> ScalarExpr
    
 
   Filter --> FilterField
 
   Filter --> FilterOp
 
   FilterOp --> Operator
    
 
   Operator --> OpEquals
 
   Operator --> OpRange
 
   Operator --> OpGT
 
   Operator --> OpLT
 
   Operator --> OpIn
 
   Operator --> OpStartsWith
 
   Operator --> OpEndsWith
 
   Operator --> OpContains
 
   Operator --> OpIsNull
 
   Operator --> OpIsNotNull
    
 
   ScalarExpr --> SEColumn
 
   ScalarExpr --> SELiteral
 
   ScalarExpr --> SEBinary
 
   ScalarExpr --> SENot
 
   ScalarExpr --> SEIsNull
 
   ScalarExpr --> SEAggregate
 
   ScalarExpr --> SEGetField
 
   ScalarExpr --> SECast
 
   ScalarExpr --> SECompare
 
   ScalarExpr --> SECoalesce
 
   ScalarExpr --> SEScalarSubquery
 
   ScalarExpr --> SECase
 
   ScalarExpr --> SERandom

The Expression System serves as the intermediate representation between SQL parsing and physical execution, allowing optimizations and transformations to occur independently of storage details.

Expression Type Hierarchy

Sources: llkv-expr/src/expr.rs:14-182

Boolean Expressions: Expr<’a, F>

The Expr<'a, F> enum represents logical boolean expressions used in WHERE clauses, HAVING clauses, and JOIN conditions. The generic type parameter F represents field identifiers, allowing the same expression structure to work with string column names during planning and numeric field IDs during execution.

Core Expression Variants

Variant	Purpose	Example SQL
`And(Vec<Expr>)`	Logical conjunction	`WHERE a > 5 AND b < 10`
`Or(Vec<Expr>)`	Logical disjunction	`WHERE status = 'active' OR status = 'pending'`
`Not(Box<Expr>)`	Logical negation	`WHERE NOT (price > 100)`
`Pred(Filter)`	Single field predicate	`WHERE age >= 18`
`Compare{left, op, right}`	Scalar comparison	`WHERE col1 + col2 > 100`
`InList{expr, list, negated}`	Set membership	`WHERE status IN ('active', 'pending')`
`IsNull{expr, negated}`	Null testing	`WHERE (col1 + col2) IS NULL`
`Literal(bool)`	Constant boolean	Used for empty IN lists or optimizations
`Exists(SubqueryExpr)`	Correlated subquery	`WHERE EXISTS (SELECT ...)`

The Expr type provides helper methods for constructing common patterns:

Sources: llkv-expr/src/expr.rs:14-123

Scalar Expressions: ScalarExpr

The ScalarExpr<F> enum represents expressions that compute scalar values, used in SELECT projections, computed columns, and as operands in boolean expressions. These expressions can reference columns, perform arithmetic, call functions, and handle complex nested computations.

Scalar Expression Variants

Variant	Purpose	Example SQL
`Column(F)`	Column reference	`SELECT price FROM products`
`Literal(Literal)`	Constant value	`SELECT 42`
`Binary{left, op, right}`	Arithmetic operation	`SELECT price * quantity`
`Not(Box<ScalarExpr>)`	Logical NOT	`SELECT NOT active`
`IsNull{expr, negated}`	NULL test	`SELECT col IS NULL`
`Aggregate(AggregateCall)`	Aggregate function	`SELECT COUNT(*) + 1`
`GetField{base, field_name}`	Struct field access	`SELECT user.address.city`
`Cast{expr, data_type}`	Type conversion	`SELECT CAST(price AS INTEGER)`
`Compare{left, op, right}`	Comparison returning 0/1	`SELECT (price > 100)`
`Coalesce(Vec<ScalarExpr>)`	First non-NULL	`SELECT COALESCE(price, 0)`
`ScalarSubquery(ScalarSubqueryExpr)`	Scalar subquery	`SELECT (SELECT MAX(price) FROM items)`
`Case{operand, branches, else_expr}`	Conditional	`SELECT CASE WHEN ... THEN ... END`
`Random`	Random number <FileRef file-url=“https://github.com/jzombie/rust-llkv/blob/89777726/0.0, 1.0)	`SELECT RANDOM()`

Operators and Filters

Binary Arithmetic Operators

The BinaryOp enum defines arithmetic and logical operations:

Operator	SQL Symbol	Description
`Add`	`+`	Addition
`Subtract`	`-`	Subtraction
`Multiply`	`*`	Multiplication
`Divide`	`/`	Division
`Modulo`	`%`	Modulo
`And`	`AND`	Logical AND
`Or`	`OR`	Logical OR
`BitwiseShiftLeft`	`<<`	Left shift
`BitwiseShiftRight`	`>>`	Right shift

Sources: llkv-expr/src/expr.rs:309-338

Comparison Operators

The CompareOp enum defines relational comparisons:

Operator	SQL Symbol	Description
`Eq`	`=`	Equality
`NotEq`	`!=`	Inequality
`Lt`	`<`	Less than
`LtEq`	`<=`	Less than or equal
`Gt`	`>`	Greater than
`GtEq`	`>=`	Greater than or equal

Sources: llkv-expr/src/expr.rs:340-363

Filter Operators

The Filter<'a, F> struct combines a field identifier with an Operator<'a> to represent single-field predicates. The Operator enum supports specialized operations optimized for columnar storage:

Operator	Purpose	Optimization
`Equals(Literal)`	Exact match	Hash-based lookup
`Range{lower, upper}`	Range query	Min/max chunk pruning
`GreaterThan(Literal)`	`>` comparison	Min/max chunk pruning
`GreaterThanOrEquals(Literal)`	`>=` comparison	Min/max chunk pruning
`LessThan(Literal)`	`<` comparison	Min/max chunk pruning
`LessThanOrEquals(Literal)`	`<=` comparison	Min/max chunk pruning
`In(&'a [Literal])`	Set membership	Borrowed slice for efficiency
`StartsWith{pattern, case_sensitive}`	Prefix match	String-optimized
`EndsWith{pattern, case_sensitive}`	Suffix match	String-optimized
`Contains{pattern, case_sensitive}`	Substring match	String-optimized
`IsNull`	NULL test	Null bitmap scan
`IsNotNull`	NOT NULL test	Null bitmap scan

The Operator::Range variant uses Rust’s std::ops::Bound enum to represent inclusive/exclusive bounds, enabling efficient representation of expressions like WHERE age BETWEEN 18 AND 65 or WHERE timestamp >= '2024-01-01'.

Sources: llkv-expr/src/expr.rs:365-428

Literal Value System

The Expression System uses Literal (defined in llkv-expr) to represent untyped constant values in expressions. These are separate from Arrow’s scalar types and provide a lightweight representation that can be coerced to concrete types during execution based on column schemas.

The PlanValue enum (defined in llkv-plan) serves a similar purpose at the plan level, with the conversion function plan_value_from_literal() bridging between the two representations.

Sources: llkv-expr/src/expr.rs:1-10 llkv-plan/src/plans.rs:73-161

graph TB
    subgraph "AggregateCall Variants"
        AggregateCall["AggregateCall&lt;F&gt;"]
CountStar["CountStar"]
Count["Count{expr, distinct}"]
Sum["Sum{expr, distinct}"]
Total["Total{expr, distinct}"]
Avg["Avg{expr, distinct}"]
Min["Min(Box&lt;ScalarExpr&gt;)"]
Max["Max(Box&lt;ScalarExpr&gt;)"]
CountNulls["CountNulls(Box&lt;ScalarExpr&gt;)"]
GroupConcat["GroupConcat{expr, distinct, separator}"]
end
    
 
   AggregateCall --> CountStar
 
   AggregateCall --> Count
 
   AggregateCall --> Sum
 
   AggregateCall --> Total
 
   AggregateCall --> Avg
 
   AggregateCall --> Min
 
   AggregateCall --> Max
 
   AggregateCall --> CountNulls
 
   AggregateCall --> GroupConcat
    
 
   Count --> ScalarExprArg["Box&lt;ScalarExpr&lt;F&gt;&gt;"]
Sum --> ScalarExprArg
 
   Total --> ScalarExprArg
 
   Avg --> ScalarExprArg
 
   GroupConcat --> ScalarExprArg

Aggregate Functions

The AggregateCall<F> enum represents aggregate function calls within scalar expressions. Unlike simple column aggregates, these operate on arbitrary expressions:

Aggregate	Purpose	Example SQL
`CountStar`	Count all rows	`SELECT COUNT(*)`
`Count{expr, distinct}`	Count non-null expression values	`SELECT COUNT(DISTINCT user_id)`
`Sum{expr, distinct}`	Sum of expression values	`SELECT SUM(price * quantity)`
`Total{expr, distinct}`	Sum treating NULL as 0	`SELECT TOTAL(amount)`
`Avg{expr, distinct}`	Average of expression values	`SELECT AVG(col1 + col2)`
`Min(expr)`	Minimum value	`SELECT MIN(-price)`
`Max(expr)`	Maximum value	`SELECT MAX(col1 * col2)`
`CountNulls(expr)`	Count NULL values	`SELECT COUNT_NULLS(optional_field)`
`GroupConcat{expr, distinct, separator}`	Concatenate strings	`SELECT GROUP_CONCAT(name, ', ')`

The key distinction is that each aggregate (except CountStar) accepts a Box<ScalarExpr<F>> rather than just a column name, enabling complex expressions like AVG(col1 + col2) or SUM(-price).

Sources: llkv-expr/src/expr.rs:184-215

Subquery Integration

The Expression System provides integration points for correlated and scalar subqueries through specialized types:

SubqueryExpr

Used in boolean Expr::Exists predicates to represent EXISTS or NOT EXISTS subqueries:

Sources: llkv-expr/src/expr.rs:45-56

ScalarSubqueryExpr

Used in ScalarExpr::ScalarSubquery to represent scalar subqueries that return a single value:

Sources: llkv-expr/src/expr.rs:58-65

SubqueryId

A lightweight wrapper around u32 that uniquely identifies a subquery within a query plan:

The actual subquery definitions are stored in the parent plan structure (e.g., SelectPlan::scalar_subqueries or SelectFilter::subqueries), with expressions referencing them by ID. This design allows the same subquery to be referenced multiple times without duplication.

Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:27-67

Generic Field Identifier Pattern

The generic type parameter F in Expr<'a, F>, ScalarExpr<F>, and Filter<'a, F> enables the same expression structure to work at different stages of query processing:

Planning Stage : F = String - Expressions use human-readable column names from SQL
Translation Stage : F = FieldId - Expressions use table-local field identifiers
Execution Stage : F = LogicalFieldId - Expressions use namespace-qualified field identifiers for MVCC columns

This design allows expression trees to be built during SQL parsing, translated to internal identifiers during planning, and evaluated efficiently during execution without changing the fundamental expression structure. The translation process is documented in section Expression Translation.

Sources: llkv-expr/src/expr.rs:14-370

Expression Construction Helpers

Both Expr and ScalarExpr provide builder-style helper methods to simplify expression construction in code:

Boolean Expression Helpers

Sources: llkv-expr/src/expr.rs:67-123

Scalar Expression Helpers

Sources: llkv-expr/src/expr.rs:217-307

Integration with Query Plans

The Expression System integrates with query plans through several key structures defined in llkv-plan:

SelectFilter

Wraps a boolean expression with associated correlated subqueries:

FilterSubquery

Describes a correlated subquery referenced by an EXISTS predicate:

ScalarSubquery

Describes a scalar subquery referenced in a projection expression:

CorrelatedColumn

Maps placeholder column names used in subquery expressions to actual outer query columns:

Plans attach these structures to represent the full expression context, allowing executors to evaluate subqueries with proper outer row bindings.

Sources: llkv-plan/src/plans.rs:27-67