This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Expression System
Loading…
Expression System
Relevant source files
- .github/workflows/build.docs.yml
- demos/llkv-sql-pong-demo/assets/llkv-sql-pong-screenshot.png
- dev-docs/doc-preview.md
- llkv-expr/src/expr.rs
- llkv-plan/Cargo.toml
- llkv-plan/src/plans.rs
- llkv-sql/Cargo.toml
- llkv-test-utils/Cargo.toml
The Expression System provides the foundational Abstract Syntax Tree (AST) and type system for representing predicates, scalar computations, and aggregate functions throughout LLKV’s query processing pipeline. This system decouples expression semantics from concrete Arrow data types through a generic design that allows expressions to be built, translated, optimized, and evaluated independently.
For information about query planning structures that contain these expressions, see Query Planning. For details on how expressions are executed and evaluated, see Query Execution.
Purpose and Scope
This page documents the expression representation layer defined primarily in llkv-expr, covering:
- Boolean predicate expressions (
Expr) for WHERE and HAVING clauses - Scalar arithmetic expressions (
ScalarExpr) for projections and computed columns - Operator types and literal value representations
- Subquery integration points
- Generic field identifier patterns that enable expression translation from column names to internal field IDs
graph TB
subgraph "Boolean Expression Layer"
Expr["Expr<'a, F>\nBoolean Predicates"]
ExprAnd["And(Vec<Expr>)"]
ExprOr["Or(Vec<Expr>)"]
ExprNot["Not(Box<Expr>)"]
ExprPred["Pred(Filter)"]
ExprCompare["Compare{left, op, right}"]
ExprInList["InList{expr, list, negated}"]
ExprIsNull["IsNull{expr, negated}"]
ExprLiteral["Literal(bool)"]
ExprExists["Exists(SubqueryExpr)"]
end
subgraph "Scalar Expression Layer"
ScalarExpr["ScalarExpr<F>\nArithmetic/Values"]
SEColumn["Column(F)"]
SELiteral["Literal(Literal)"]
SEBinary["Binary{left, op, right}"]
SENot["Not(Box<ScalarExpr>)"]
SEIsNull["IsNull{expr, negated}"]
SEAggregate["Aggregate(AggregateCall)"]
SEGetField["GetField{base, field_name}"]
SECast["Cast{expr, data_type}"]
SECompare["Compare{left, op, right}"]
SECoalesce["Coalesce(Vec<ScalarExpr>)"]
SEScalarSubquery["ScalarSubquery(ScalarSubqueryExpr)"]
SECase["Case{operand, branches, else_expr}"]
SERandom["Random"]
end
subgraph "Filter Layer"
Filter["Filter<'a, F>"]
FilterField["field_id: F"]
FilterOp["op: Operator<'a>"]
end
subgraph "Operator Types"
Operator["Operator<'a>"]
OpEquals["Equals(Literal)"]
OpRange["Range{lower, upper}"]
OpGT["GreaterThan(Literal)"]
OpLT["LessThan(Literal)"]
OpIn["In(&'a [Literal])"]
OpStartsWith["StartsWith{pattern, case_sensitive}"]
OpEndsWith["EndsWith{pattern, case_sensitive}"]
OpContains["Contains{pattern, case_sensitive}"]
OpIsNull["IsNull"]
OpIsNotNull["IsNotNull"]
end
Expr --> ExprAnd
Expr --> ExprOr
Expr --> ExprNot
Expr --> ExprPred
Expr --> ExprCompare
Expr --> ExprInList
Expr --> ExprIsNull
Expr --> ExprLiteral
Expr --> ExprExists
ExprPred --> Filter
ExprCompare --> ScalarExpr
ExprInList --> ScalarExpr
ExprIsNull --> ScalarExpr
Filter --> FilterField
Filter --> FilterOp
FilterOp --> Operator
Operator --> OpEquals
Operator --> OpRange
Operator --> OpGT
Operator --> OpLT
Operator --> OpIn
Operator --> OpStartsWith
Operator --> OpEndsWith
Operator --> OpContains
Operator --> OpIsNull
Operator --> OpIsNotNull
ScalarExpr --> SEColumn
ScalarExpr --> SELiteral
ScalarExpr --> SEBinary
ScalarExpr --> SENot
ScalarExpr --> SEIsNull
ScalarExpr --> SEAggregate
ScalarExpr --> SEGetField
ScalarExpr --> SECast
ScalarExpr --> SECompare
ScalarExpr --> SECoalesce
ScalarExpr --> SEScalarSubquery
ScalarExpr --> SECase
ScalarExpr --> SERandom
The Expression System serves as the intermediate representation between SQL parsing and physical execution, allowing optimizations and transformations to occur independently of storage details.
Expression Type Hierarchy
Sources: llkv-expr/src/expr.rs:14-182
Boolean Expressions: Expr<’a, F>
The Expr<'a, F> enum represents logical boolean expressions used in WHERE clauses, HAVING clauses, and JOIN conditions. The generic type parameter F represents field identifiers, allowing the same expression structure to work with string column names during planning and numeric field IDs during execution.
Core Expression Variants
| Variant | Purpose | Example SQL |
|---|---|---|
And(Vec<Expr>) | Logical conjunction | WHERE a > 5 AND b < 10 |
Or(Vec<Expr>) | Logical disjunction | WHERE status = 'active' OR status = 'pending' |
Not(Box<Expr>) | Logical negation | WHERE NOT (price > 100) |
Pred(Filter) | Single field predicate | WHERE age >= 18 |
Compare{left, op, right} | Scalar comparison | WHERE col1 + col2 > 100 |
InList{expr, list, negated} | Set membership | WHERE status IN ('active', 'pending') |
IsNull{expr, negated} | Null testing | WHERE (col1 + col2) IS NULL |
Literal(bool) | Constant boolean | Used for empty IN lists or optimizations |
Exists(SubqueryExpr) | Correlated subquery | WHERE EXISTS (SELECT ...) |
The Expr type provides helper methods for constructing common patterns:
Sources: llkv-expr/src/expr.rs:14-123
Scalar Expressions: ScalarExpr
The ScalarExpr<F> enum represents expressions that compute scalar values, used in SELECT projections, computed columns, and as operands in boolean expressions. These expressions can reference columns, perform arithmetic, call functions, and handle complex nested computations.
Scalar Expression Variants
| Variant | Purpose | Example SQL |
|---|---|---|
Column(F) | Column reference | SELECT price FROM products |
Literal(Literal) | Constant value | SELECT 42 |
Binary{left, op, right} | Arithmetic operation | SELECT price * quantity |
Not(Box<ScalarExpr>) | Logical NOT | SELECT NOT active |
IsNull{expr, negated} | NULL test | SELECT col IS NULL |
Aggregate(AggregateCall) | Aggregate function | SELECT COUNT(*) + 1 |
GetField{base, field_name} | Struct field access | SELECT user.address.city |
Cast{expr, data_type} | Type conversion | SELECT CAST(price AS INTEGER) |
Compare{left, op, right} | Comparison returning 0/1 | SELECT (price > 100) |
Coalesce(Vec<ScalarExpr>) | First non-NULL | SELECT COALESCE(price, 0) |
ScalarSubquery(ScalarSubqueryExpr) | Scalar subquery | SELECT (SELECT MAX(price) FROM items) |
Case{operand, branches, else_expr} | Conditional | SELECT CASE WHEN ... THEN ... END |
Random | Random number <FileRef file-url=“https://github.com/jzombie/rust-llkv/blob/89777726/0.0, 1.0) | SELECT RANDOM() |
Operators and Filters
Binary Arithmetic Operators
The BinaryOp enum defines arithmetic and logical operations:
| Operator | SQL Symbol | Description |
|---|---|---|
Add | + | Addition |
Subtract | - | Subtraction |
Multiply | * | Multiplication |
Divide | / | Division |
Modulo | % | Modulo |
And | AND | Logical AND |
Or | OR | Logical OR |
BitwiseShiftLeft | << | Left shift |
BitwiseShiftRight | >> | Right shift |
Sources: llkv-expr/src/expr.rs:309-338
Comparison Operators
The CompareOp enum defines relational comparisons:
| Operator | SQL Symbol | Description |
|---|---|---|
Eq | = | Equality |
NotEq | != | Inequality |
Lt | < | Less than |
LtEq | <= | Less than or equal |
Gt | > | Greater than |
GtEq | >= | Greater than or equal |
Sources: llkv-expr/src/expr.rs:340-363
Filter Operators
The Filter<'a, F> struct combines a field identifier with an Operator<'a> to represent single-field predicates. The Operator enum supports specialized operations optimized for columnar storage:
| Operator | Purpose | Optimization |
|---|---|---|
Equals(Literal) | Exact match | Hash-based lookup |
Range{lower, upper} | Range query | Min/max chunk pruning |
GreaterThan(Literal) | > comparison | Min/max chunk pruning |
GreaterThanOrEquals(Literal) | >= comparison | Min/max chunk pruning |
LessThan(Literal) | < comparison | Min/max chunk pruning |
LessThanOrEquals(Literal) | <= comparison | Min/max chunk pruning |
In(&'a [Literal]) | Set membership | Borrowed slice for efficiency |
StartsWith{pattern, case_sensitive} | Prefix match | String-optimized |
EndsWith{pattern, case_sensitive} | Suffix match | String-optimized |
Contains{pattern, case_sensitive} | Substring match | String-optimized |
IsNull | NULL test | Null bitmap scan |
IsNotNull | NOT NULL test | Null bitmap scan |
The Operator::Range variant uses Rust’s std::ops::Bound enum to represent inclusive/exclusive bounds, enabling efficient representation of expressions like WHERE age BETWEEN 18 AND 65 or WHERE timestamp >= '2024-01-01'.
Sources: llkv-expr/src/expr.rs:365-428
Literal Value System
The Expression System uses Literal (defined in llkv-expr) to represent untyped constant values in expressions. These are separate from Arrow’s scalar types and provide a lightweight representation that can be coerced to concrete types during execution based on column schemas.
The PlanValue enum (defined in llkv-plan) serves a similar purpose at the plan level, with the conversion function plan_value_from_literal() bridging between the two representations.
Sources: llkv-expr/src/expr.rs:1-10 llkv-plan/src/plans.rs:73-161
graph TB
subgraph "AggregateCall Variants"
AggregateCall["AggregateCall<F>"]
CountStar["CountStar"]
Count["Count{expr, distinct}"]
Sum["Sum{expr, distinct}"]
Total["Total{expr, distinct}"]
Avg["Avg{expr, distinct}"]
Min["Min(Box<ScalarExpr>)"]
Max["Max(Box<ScalarExpr>)"]
CountNulls["CountNulls(Box<ScalarExpr>)"]
GroupConcat["GroupConcat{expr, distinct, separator}"]
end
AggregateCall --> CountStar
AggregateCall --> Count
AggregateCall --> Sum
AggregateCall --> Total
AggregateCall --> Avg
AggregateCall --> Min
AggregateCall --> Max
AggregateCall --> CountNulls
AggregateCall --> GroupConcat
Count --> ScalarExprArg["Box<ScalarExpr<F>>"]
Sum --> ScalarExprArg
Total --> ScalarExprArg
Avg --> ScalarExprArg
GroupConcat --> ScalarExprArg
Aggregate Functions
The AggregateCall<F> enum represents aggregate function calls within scalar expressions. Unlike simple column aggregates, these operate on arbitrary expressions:
| Aggregate | Purpose | Example SQL |
|---|---|---|
CountStar | Count all rows | SELECT COUNT(*) |
Count{expr, distinct} | Count non-null expression values | SELECT COUNT(DISTINCT user_id) |
Sum{expr, distinct} | Sum of expression values | SELECT SUM(price * quantity) |
Total{expr, distinct} | Sum treating NULL as 0 | SELECT TOTAL(amount) |
Avg{expr, distinct} | Average of expression values | SELECT AVG(col1 + col2) |
Min(expr) | Minimum value | SELECT MIN(-price) |
Max(expr) | Maximum value | SELECT MAX(col1 * col2) |
CountNulls(expr) | Count NULL values | SELECT COUNT_NULLS(optional_field) |
GroupConcat{expr, distinct, separator} | Concatenate strings | SELECT GROUP_CONCAT(name, ', ') |
The key distinction is that each aggregate (except CountStar) accepts a Box<ScalarExpr<F>> rather than just a column name, enabling complex expressions like AVG(col1 + col2) or SUM(-price).
Sources: llkv-expr/src/expr.rs:184-215
Subquery Integration
The Expression System provides integration points for correlated and scalar subqueries through specialized types:
SubqueryExpr
Used in boolean Expr::Exists predicates to represent EXISTS or NOT EXISTS subqueries:
Sources: llkv-expr/src/expr.rs:45-56
ScalarSubqueryExpr
Used in ScalarExpr::ScalarSubquery to represent scalar subqueries that return a single value:
Sources: llkv-expr/src/expr.rs:58-65
SubqueryId
A lightweight wrapper around u32 that uniquely identifies a subquery within a query plan:
The actual subquery definitions are stored in the parent plan structure (e.g., SelectPlan::scalar_subqueries or SelectFilter::subqueries), with expressions referencing them by ID. This design allows the same subquery to be referenced multiple times without duplication.
Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:27-67
Generic Field Identifier Pattern
The generic type parameter F in Expr<'a, F>, ScalarExpr<F>, and Filter<'a, F> enables the same expression structure to work at different stages of query processing:
- Planning Stage :
F = String- Expressions use human-readable column names from SQL - Translation Stage :
F = FieldId- Expressions use table-local field identifiers - Execution Stage :
F = LogicalFieldId- Expressions use namespace-qualified field identifiers for MVCC columns
This design allows expression trees to be built during SQL parsing, translated to internal identifiers during planning, and evaluated efficiently during execution without changing the fundamental expression structure. The translation process is documented in section Expression Translation.
Sources: llkv-expr/src/expr.rs:14-370
Expression Construction Helpers
Both Expr and ScalarExpr provide builder-style helper methods to simplify expression construction in code:
Boolean Expression Helpers
Sources: llkv-expr/src/expr.rs:67-123
Scalar Expression Helpers
Sources: llkv-expr/src/expr.rs:217-307
Integration with Query Plans
The Expression System integrates with query plans through several key structures defined in llkv-plan:
SelectFilter
Wraps a boolean expression with associated correlated subqueries:
FilterSubquery
Describes a correlated subquery referenced by an EXISTS predicate:
ScalarSubquery
Describes a scalar subquery referenced in a projection expression:
CorrelatedColumn
Maps placeholder column names used in subquery expressions to actual outer query columns:
Plans attach these structures to represent the full expression context, allowing executors to evaluate subqueries with proper outer row bindings.
Sources: llkv-plan/src/plans.rs:27-67
Expression System Data Flow
Sources: llkv-expr/src/expr.rs llkv-plan/src/plans.rs:27-1032
Key Design Principles
Type-Safe Generic Design
The generic field identifier pattern (F in Expr<'a, F>) provides compile-time type safety while allowing the same expression structure to work at different pipeline stages. This eliminates the need for multiple parallel expression type hierarchies.
Deferred Type Resolution
Literal values remain untyped until evaluation time, when they are coerced based on the target column’s Arrow DataType. This allows expressions like WHERE age > 18 to work correctly whether age is Int8, Int32, or Int64.
Zero-Copy Operator Patterns
The Operator::In(&'a [Literal]) variant borrows a slice of literals rather than owning a Vec, enabling stack-allocated IN lists without heap allocation for common small cases.
Expression Rewriting Support
The is_trivially_true() and is_full_range_for() helper methods enable query optimizers to identify and eliminate redundant predicates without deep tree traversals.
Subquery Indirection
Subqueries are represented by SubqueryId references rather than inline nested plans, allowing the same subquery to be referenced multiple times without duplication and enabling separate optimization of inner and outer queries.
Sources: llkv-expr/src/expr.rs:1-819
Dismiss
Refresh this wiki
Enter email to refresh