Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression System

Loading…

Expression System

Relevant source files

The Expression System provides the foundational Abstract Syntax Tree (AST) and type system for representing predicates, scalar computations, and aggregate functions throughout LLKV’s query processing pipeline. This system decouples expression semantics from concrete Arrow data types through a generic design that allows expressions to be built, translated, optimized, and evaluated independently.

For information about query planning structures that contain these expressions, see Query Planning. For details on how expressions are executed and evaluated, see Query Execution.

Purpose and Scope

This page documents the expression representation layer defined primarily in llkv-expr, covering:

  • Boolean predicate expressions (Expr) for WHERE and HAVING clauses
  • Scalar arithmetic expressions (ScalarExpr) for projections and computed columns
  • Operator types and literal value representations
  • Subquery integration points
  • Generic field identifier patterns that enable expression translation from column names to internal field IDs
graph TB
    subgraph "Boolean Expression Layer"
        Expr["Expr<'a, F>\nBoolean Predicates"]
ExprAnd["And(Vec<Expr>)"]
ExprOr["Or(Vec<Expr>)"]
ExprNot["Not(Box<Expr>)"]
ExprPred["Pred(Filter)"]
ExprCompare["Compare{left, op, right}"]
ExprInList["InList{expr, list, negated}"]
ExprIsNull["IsNull{expr, negated}"]
ExprLiteral["Literal(bool)"]
ExprExists["Exists(SubqueryExpr)"]
end
    
    subgraph "Scalar Expression Layer"
        ScalarExpr["ScalarExpr<F>\nArithmetic/Values"]
SEColumn["Column(F)"]
SELiteral["Literal(Literal)"]
SEBinary["Binary{left, op, right}"]
SENot["Not(Box<ScalarExpr>)"]
SEIsNull["IsNull{expr, negated}"]
SEAggregate["Aggregate(AggregateCall)"]
SEGetField["GetField{base, field_name}"]
SECast["Cast{expr, data_type}"]
SECompare["Compare{left, op, right}"]
SECoalesce["Coalesce(Vec<ScalarExpr>)"]
SEScalarSubquery["ScalarSubquery(ScalarSubqueryExpr)"]
SECase["Case{operand, branches, else_expr}"]
SERandom["Random"]
end
    
    subgraph "Filter Layer"
        Filter["Filter<'a, F>"]
FilterField["field_id: F"]
FilterOp["op: Operator<'a>"]
end
    
    subgraph "Operator Types"
        Operator["Operator<'a>"]
OpEquals["Equals(Literal)"]
OpRange["Range{lower, upper}"]
OpGT["GreaterThan(Literal)"]
OpLT["LessThan(Literal)"]
OpIn["In(&'a [Literal])"]
OpStartsWith["StartsWith{pattern, case_sensitive}"]
OpEndsWith["EndsWith{pattern, case_sensitive}"]
OpContains["Contains{pattern, case_sensitive}"]
OpIsNull["IsNull"]
OpIsNotNull["IsNotNull"]
end
    
 
   Expr --> ExprAnd
 
   Expr --> ExprOr
 
   Expr --> ExprNot
 
   Expr --> ExprPred
 
   Expr --> ExprCompare
 
   Expr --> ExprInList
 
   Expr --> ExprIsNull
 
   Expr --> ExprLiteral
 
   Expr --> ExprExists
    
 
   ExprPred --> Filter
 
   ExprCompare --> ScalarExpr
 
   ExprInList --> ScalarExpr
 
   ExprIsNull --> ScalarExpr
    
 
   Filter --> FilterField
 
   Filter --> FilterOp
 
   FilterOp --> Operator
    
 
   Operator --> OpEquals
 
   Operator --> OpRange
 
   Operator --> OpGT
 
   Operator --> OpLT
 
   Operator --> OpIn
 
   Operator --> OpStartsWith
 
   Operator --> OpEndsWith
 
   Operator --> OpContains
 
   Operator --> OpIsNull
 
   Operator --> OpIsNotNull
    
 
   ScalarExpr --> SEColumn
 
   ScalarExpr --> SELiteral
 
   ScalarExpr --> SEBinary
 
   ScalarExpr --> SENot
 
   ScalarExpr --> SEIsNull
 
   ScalarExpr --> SEAggregate
 
   ScalarExpr --> SEGetField
 
   ScalarExpr --> SECast
 
   ScalarExpr --> SECompare
 
   ScalarExpr --> SECoalesce
 
   ScalarExpr --> SEScalarSubquery
 
   ScalarExpr --> SECase
 
   ScalarExpr --> SERandom

The Expression System serves as the intermediate representation between SQL parsing and physical execution, allowing optimizations and transformations to occur independently of storage details.

Expression Type Hierarchy

Sources: llkv-expr/src/expr.rs:14-182

Boolean Expressions: Expr<’a, F>

The Expr<'a, F> enum represents logical boolean expressions used in WHERE clauses, HAVING clauses, and JOIN conditions. The generic type parameter F represents field identifiers, allowing the same expression structure to work with string column names during planning and numeric field IDs during execution.

Core Expression Variants

VariantPurposeExample SQL
And(Vec<Expr>)Logical conjunctionWHERE a > 5 AND b < 10
Or(Vec<Expr>)Logical disjunctionWHERE status = 'active' OR status = 'pending'
Not(Box<Expr>)Logical negationWHERE NOT (price > 100)
Pred(Filter)Single field predicateWHERE age >= 18
Compare{left, op, right}Scalar comparisonWHERE col1 + col2 > 100
InList{expr, list, negated}Set membershipWHERE status IN ('active', 'pending')
IsNull{expr, negated}Null testingWHERE (col1 + col2) IS NULL
Literal(bool)Constant booleanUsed for empty IN lists or optimizations
Exists(SubqueryExpr)Correlated subqueryWHERE EXISTS (SELECT ...)

The Expr type provides helper methods for constructing common patterns:

Sources: llkv-expr/src/expr.rs:14-123

Scalar Expressions: ScalarExpr

The ScalarExpr<F> enum represents expressions that compute scalar values, used in SELECT projections, computed columns, and as operands in boolean expressions. These expressions can reference columns, perform arithmetic, call functions, and handle complex nested computations.

Scalar Expression Variants

VariantPurposeExample SQL
Column(F)Column referenceSELECT price FROM products
Literal(Literal)Constant valueSELECT 42
Binary{left, op, right}Arithmetic operationSELECT price * quantity
Not(Box<ScalarExpr>)Logical NOTSELECT NOT active
IsNull{expr, negated}NULL testSELECT col IS NULL
Aggregate(AggregateCall)Aggregate functionSELECT COUNT(*) + 1
GetField{base, field_name}Struct field accessSELECT user.address.city
Cast{expr, data_type}Type conversionSELECT CAST(price AS INTEGER)
Compare{left, op, right}Comparison returning 0/1SELECT (price > 100)
Coalesce(Vec<ScalarExpr>)First non-NULLSELECT COALESCE(price, 0)
ScalarSubquery(ScalarSubqueryExpr)Scalar subquerySELECT (SELECT MAX(price) FROM items)
Case{operand, branches, else_expr}ConditionalSELECT CASE WHEN ... THEN ... END
RandomRandom number <FileRef file-url=“https://github.com/jzombie/rust-llkv/blob/89777726/0.0, 1.0)SELECT RANDOM()

Operators and Filters

Binary Arithmetic Operators

The BinaryOp enum defines arithmetic and logical operations:

OperatorSQL SymbolDescription
Add+Addition
Subtract-Subtraction
Multiply*Multiplication
Divide/Division
Modulo%Modulo
AndANDLogical AND
OrORLogical OR
BitwiseShiftLeft<<Left shift
BitwiseShiftRight>>Right shift

Sources: llkv-expr/src/expr.rs:309-338

Comparison Operators

The CompareOp enum defines relational comparisons:

OperatorSQL SymbolDescription
Eq=Equality
NotEq!=Inequality
Lt<Less than
LtEq<=Less than or equal
Gt>Greater than
GtEq>=Greater than or equal

Sources: llkv-expr/src/expr.rs:340-363

Filter Operators

The Filter<'a, F> struct combines a field identifier with an Operator<'a> to represent single-field predicates. The Operator enum supports specialized operations optimized for columnar storage:

OperatorPurposeOptimization
Equals(Literal)Exact matchHash-based lookup
Range{lower, upper}Range queryMin/max chunk pruning
GreaterThan(Literal)> comparisonMin/max chunk pruning
GreaterThanOrEquals(Literal)>= comparisonMin/max chunk pruning
LessThan(Literal)< comparisonMin/max chunk pruning
LessThanOrEquals(Literal)<= comparisonMin/max chunk pruning
In(&'a [Literal])Set membershipBorrowed slice for efficiency
StartsWith{pattern, case_sensitive}Prefix matchString-optimized
EndsWith{pattern, case_sensitive}Suffix matchString-optimized
Contains{pattern, case_sensitive}Substring matchString-optimized
IsNullNULL testNull bitmap scan
IsNotNullNOT NULL testNull bitmap scan

The Operator::Range variant uses Rust’s std::ops::Bound enum to represent inclusive/exclusive bounds, enabling efficient representation of expressions like WHERE age BETWEEN 18 AND 65 or WHERE timestamp >= '2024-01-01'.

Sources: llkv-expr/src/expr.rs:365-428

Literal Value System

The Expression System uses Literal (defined in llkv-expr) to represent untyped constant values in expressions. These are separate from Arrow’s scalar types and provide a lightweight representation that can be coerced to concrete types during execution based on column schemas.

The PlanValue enum (defined in llkv-plan) serves a similar purpose at the plan level, with the conversion function plan_value_from_literal() bridging between the two representations.

Sources: llkv-expr/src/expr.rs:1-10 llkv-plan/src/plans.rs:73-161

graph TB
    subgraph "AggregateCall Variants"
        AggregateCall["AggregateCall&lt;F&gt;"]
CountStar["CountStar"]
Count["Count{expr, distinct}"]
Sum["Sum{expr, distinct}"]
Total["Total{expr, distinct}"]
Avg["Avg{expr, distinct}"]
Min["Min(Box&lt;ScalarExpr&gt;)"]
Max["Max(Box&lt;ScalarExpr&gt;)"]
CountNulls["CountNulls(Box&lt;ScalarExpr&gt;)"]
GroupConcat["GroupConcat{expr, distinct, separator}"]
end
    
 
   AggregateCall --> CountStar
 
   AggregateCall --> Count
 
   AggregateCall --> Sum
 
   AggregateCall --> Total
 
   AggregateCall --> Avg
 
   AggregateCall --> Min
 
   AggregateCall --> Max
 
   AggregateCall --> CountNulls
 
   AggregateCall --> GroupConcat
    
 
   Count --> ScalarExprArg["Box&lt;ScalarExpr&lt;F&gt;&gt;"]
Sum --> ScalarExprArg
 
   Total --> ScalarExprArg
 
   Avg --> ScalarExprArg
 
   GroupConcat --> ScalarExprArg

Aggregate Functions

The AggregateCall<F> enum represents aggregate function calls within scalar expressions. Unlike simple column aggregates, these operate on arbitrary expressions:

AggregatePurposeExample SQL
CountStarCount all rowsSELECT COUNT(*)
Count{expr, distinct}Count non-null expression valuesSELECT COUNT(DISTINCT user_id)
Sum{expr, distinct}Sum of expression valuesSELECT SUM(price * quantity)
Total{expr, distinct}Sum treating NULL as 0SELECT TOTAL(amount)
Avg{expr, distinct}Average of expression valuesSELECT AVG(col1 + col2)
Min(expr)Minimum valueSELECT MIN(-price)
Max(expr)Maximum valueSELECT MAX(col1 * col2)
CountNulls(expr)Count NULL valuesSELECT COUNT_NULLS(optional_field)
GroupConcat{expr, distinct, separator}Concatenate stringsSELECT GROUP_CONCAT(name, ', ')

The key distinction is that each aggregate (except CountStar) accepts a Box<ScalarExpr<F>> rather than just a column name, enabling complex expressions like AVG(col1 + col2) or SUM(-price).

Sources: llkv-expr/src/expr.rs:184-215

Subquery Integration

The Expression System provides integration points for correlated and scalar subqueries through specialized types:

SubqueryExpr

Used in boolean Expr::Exists predicates to represent EXISTS or NOT EXISTS subqueries:

Sources: llkv-expr/src/expr.rs:45-56

ScalarSubqueryExpr

Used in ScalarExpr::ScalarSubquery to represent scalar subqueries that return a single value:

Sources: llkv-expr/src/expr.rs:58-65

SubqueryId

A lightweight wrapper around u32 that uniquely identifies a subquery within a query plan:

The actual subquery definitions are stored in the parent plan structure (e.g., SelectPlan::scalar_subqueries or SelectFilter::subqueries), with expressions referencing them by ID. This design allows the same subquery to be referenced multiple times without duplication.

Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:27-67

Generic Field Identifier Pattern

The generic type parameter F in Expr<'a, F>, ScalarExpr<F>, and Filter<'a, F> enables the same expression structure to work at different stages of query processing:

  1. Planning Stage : F = String - Expressions use human-readable column names from SQL
  2. Translation Stage : F = FieldId - Expressions use table-local field identifiers
  3. Execution Stage : F = LogicalFieldId - Expressions use namespace-qualified field identifiers for MVCC columns

This design allows expression trees to be built during SQL parsing, translated to internal identifiers during planning, and evaluated efficiently during execution without changing the fundamental expression structure. The translation process is documented in section Expression Translation.

Sources: llkv-expr/src/expr.rs:14-370

Expression Construction Helpers

Both Expr and ScalarExpr provide builder-style helper methods to simplify expression construction in code:

Boolean Expression Helpers

Sources: llkv-expr/src/expr.rs:67-123

Scalar Expression Helpers

Sources: llkv-expr/src/expr.rs:217-307

Integration with Query Plans

The Expression System integrates with query plans through several key structures defined in llkv-plan:

SelectFilter

Wraps a boolean expression with associated correlated subqueries:

FilterSubquery

Describes a correlated subquery referenced by an EXISTS predicate:

ScalarSubquery

Describes a scalar subquery referenced in a projection expression:

CorrelatedColumn

Maps placeholder column names used in subquery expressions to actual outer query columns:

Plans attach these structures to represent the full expression context, allowing executors to evaluate subqueries with proper outer row bindings.

Sources: llkv-plan/src/plans.rs:27-67

Expression System Data Flow

Sources: llkv-expr/src/expr.rs llkv-plan/src/plans.rs:27-1032

Key Design Principles

Type-Safe Generic Design

The generic field identifier pattern (F in Expr<'a, F>) provides compile-time type safety while allowing the same expression structure to work at different pipeline stages. This eliminates the need for multiple parallel expression type hierarchies.

Deferred Type Resolution

Literal values remain untyped until evaluation time, when they are coerced based on the target column’s Arrow DataType. This allows expressions like WHERE age > 18 to work correctly whether age is Int8, Int32, or Int64.

Zero-Copy Operator Patterns

The Operator::In(&'a [Literal]) variant borrows a slice of literals rather than owning a Vec, enabling stack-allocated IN lists without heap allocation for common small cases.

Expression Rewriting Support

The is_trivially_true() and is_full_range_for() helper methods enable query optimizers to identify and eliminate redundant predicates without deep tree traversals.

Subquery Indirection

Subqueries are represented by SubqueryId references rather than inline nested plans, allowing the same subquery to be referenced multiple times without duplication and enabling separate optimization of inner and outer queries.

Sources: llkv-expr/src/expr.rs:1-819

Dismiss

Refresh this wiki

Enter email to refresh