Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Expression AST

Relevant source files

Purpose and Scope

This document describes the expression Abstract Syntax Tree (AST) defined in the llkv-expr crate. The expression AST provides type-aware, Arrow-native data structures for representing boolean predicates and scalar expressions throughout the LLKV system. These AST nodes are decoupled from SQL parsing and can be parameterized by field identifier types, enabling reuse across multiple processing stages.

For information about how expressions are translated between field identifier types, see Expression Translation. For details on how expressions are compiled into executable bytecode, see Program Compilation. For scalar evaluation mechanics, see Scalar Evaluation and NumericKernels.

Sources: llkv-expr/src/expr.rs:1-8

Expression Type Hierarchy

The llkv-expr crate defines two primary expression enums:

  • Expr<'a, F> - Boolean/predicate expressions that evaluate to true or false
  • ScalarExpr<F> - Arithmetic/scalar expressions that produce typed values
graph TB
    subgraph "Expression Type System"
        EXPR["Expr&lt;'a, F&gt;\nBoolean Predicates"]
SCALAR["ScalarExpr&lt;F&gt;\nScalar Values"]
EXPR --> LOGICAL["Logical Operators\nAnd, Or, Not"]
EXPR --> PRED["Pred(Filter)\nField Predicates"]
EXPR --> COMPARE["Compare\nScalar Comparisons"]
EXPR --> INLIST["InList\nSet Membership"]
EXPR --> ISNULL["IsNull\nNull Checks"]
EXPR --> LITERAL["Literal(bool)\nConstant Boolean"]
EXPR --> EXISTS["Exists\nSubquery Predicates"]
SCALAR --> COLUMN["Column(F)\nField Reference"]
SCALAR --> SLITERAL["Literal\nConstant Value"]
SCALAR --> BINARY["Binary\nArithmetic Ops"]
SCALAR --> SNOT["Not\nLogical Negation"]
SCALAR --> SISNULL["IsNull\nNull Test"]
SCALAR --> AGG["Aggregate\nAggregate Functions"]
SCALAR --> GETFIELD["GetField\nStruct Access"]
SCALAR --> CAST["Cast\nType Conversion"]
SCALAR --> SCOMPARE["Compare\nBoolean Result"]
SCALAR --> COALESCE["Coalesce\nFirst Non-Null"]
SCALAR --> SUBQ["ScalarSubquery\nSubquery Value"]
SCALAR --> CASE["Case\nConditional Logic"]
SCALAR --> RANDOM["Random\nRandom Number"]
COMPARE -.uses.-> SCALAR
        INLIST -.uses.-> SCALAR
        ISNULL -.uses.-> SCALAR
    end
    
    style EXPR fill:#e8f5e9
    style SCALAR fill:#e1f5ff

Both types are generic over a field identifier parameter F, which allows the same AST structure to be used with different field representations (typically String column names during planning, or FieldId numeric identifiers during execution).

Sources: llkv-expr/src/expr.rs:14-143

Expr<'a, F> - Boolean Expressions

The Expr<'a, F> enum represents boolean predicate expressions that evaluate to true or false. These are primarily used in WHERE clauses, JOIN conditions, and HAVING clauses.

Expr Variants

VariantDescriptionUse Case
And(Vec<Expr>)Logical AND of multiple predicatesCombining multiple filter conditions
Or(Vec<Expr>)Logical OR of multiple predicatesAlternative filter conditions
Not(Box<Expr>)Logical negationInverting a predicate
Pred(Filter<'a, F>)Single-field predicateColumn-level filtering (e.g., age > 18)
Compare { left, op, right }Comparison between scalar expressionsCross-column comparisons (e.g., col1 + col2 > 10)
InList { expr, list, negated }Set membership testIN/NOT IN clauses
IsNull { expr, negated }Null check on expressionIS NULL/IS NOT NULL on complex expressions
Literal(bool)Constant boolean valueAlways-true/always-false conditions
Exists(SubqueryExpr)Correlated subquery predicateEXISTS/NOT EXISTS clauses

Sources: llkv-expr/src/expr.rs:14-43

Expr Construction Helpers

The Expr type provides builder methods for common patterns:

These helpers simplify construction of common predicate patterns during query planning.

Sources: llkv-expr/src/expr.rs:65-84

Filter and Operator Types

The Pred variant wraps a Filter<'a, F> struct, which represents a single predicate against a field:

The Operator<'a> enum defines comparison operations over untyped Literal values:

OperatorDescriptionExample SQL
Equals(Literal)Exact equalitycol = 5
Range { lower, upper }Range with boundscol BETWEEN 10 AND 20
GreaterThan(Literal)Greater-than comparisoncol > 10
GreaterThanOrEquals(Literal)Greater-or-equalcol >= 10
LessThan(Literal)Less-than comparisoncol < 10
LessThanOrEquals(Literal)Less-or-equalcol <= 10
In(&'a [Literal])Set membership (borrowed slice)col IN (1, 2, 3)
StartsWith { pattern, case_sensitive }String prefix matchcol LIKE 'abc%'
EndsWith { pattern, case_sensitive }String suffix matchcol LIKE '%xyz'
Contains { pattern, case_sensitive }String substring matchcol LIKE '%abc%'
IsNullNull checkcol IS NULL
IsNotNullNon-null checkcol IS NOT NULL

The Operator type uses borrowed slices for In and borrowed string slices for pattern matching to avoid allocations in common cases.

Sources: llkv-expr/src/expr.rs:295-358

ScalarExpr - Scalar Expressions

The ScalarExpr<F> enum represents arithmetic and scalar expressions that produce typed values. These are used in SELECT projections, computed columns, ORDER BY clauses, and as operands in comparisons.

graph TB
    subgraph "ScalarExpr Evaluation Categories"
        SIMPLE["Simple Values"]
ARITH["Arithmetic"]
LOGIC["Logical"]
STRUCT["Structured"]
CONTROL["Control Flow"]
SPECIAL["Special Functions"]
SIMPLE --> COLUMN["Column(F)\nField reference"]
SIMPLE --> LITERAL["Literal\nConstant value"]
ARITH --> BINARY["Binary\nleft op right"]
ARITH --> CAST["Cast\nType conversion"]
LOGIC --> NOT["Not\nLogical negation"]
LOGIC --> ISNULL["IsNull\nNull test"]
LOGIC --> COMPARE["Compare\nComparison"]
STRUCT --> GETFIELD["GetField\nStruct field access"]
STRUCT --> AGGREGATE["Aggregate\nAggregate function"]
CONTROL --> CASE["Case\nCASE expression"]
CONTROL --> COALESCE["Coalesce\nFirst non-null"]
SPECIAL --> SUBQUERY["ScalarSubquery\nScalar subquery"]
SPECIAL --> RANDOM["Random\nRandom number"]
end

ScalarExpr Variants

Sources: llkv-expr/src/expr.rs:86-143

Arithmetic and Binary Operations

The Binary variant supports arithmetic and logical operations:

BinaryOp Variants:

OperatorNumericBitwiseLogical
Add
Subtract
Multiply
Divide
Modulo
And
Or
BitwiseShiftLeft
BitwiseShiftRight

Sources: llkv-expr/src/expr.rs:270-282

Comparison Operations

The Compare variant in ScalarExpr produces a boolean (1/0) result:

CompareOp Variants:

OperatorSQL Equivalent
Eq=
NotEq!= or <>
Lt<
LtEq<=
Gt>
GtEq>=

Sources: llkv-expr/src/expr.rs:284-293 llkv-expr/src/expr.rs:119-124

AggregateCall Variants

The Aggregate(AggregateCall<F>) variant wraps aggregate function calls:

Each aggregate (except CountStar) operates on a ScalarExpr<F> rather than just a column, enabling complex expressions like AVG(col1 + col2) or SUM(-col1).

Sources: llkv-expr/src/expr.rs:145-176

Struct Field Access

The GetField variant extracts fields from struct-typed expressions:

This represents dot-notation access like user.address.city, which is nested as:

GetField {
    base: GetField {
        base: Column(user),
        field_name: "address"
    },
    field_name: "city"
}

Sources: llkv-expr/src/expr.rs:107-113

CASE Expressions

The Case variant implements SQL CASE expressions:

  • Simple CASE: operand is Some, branches test equality
  • Searched CASE: operand is None, branches evaluate conditions
  • ELSE clause: Handled by else_expr field

Sources: llkv-expr/src/expr.rs:129-137

ScalarExpr Helper Methods

Builder methods simplify construction:

MethodPurpose
column(field: F)Create column reference
literal<L>(lit: L)Create literal value
binary(left, op, right)Create binary operation
logical_not(expr)Create logical NOT
is_null(expr, negated)Create null test
aggregate(call)Create aggregate function
get_field(base, name)Create struct field access
cast(expr, data_type)Create type cast
compare(left, op, right)Create comparison
coalesce(exprs)Create COALESCE
scalar_subquery(id)Create scalar subquery
case(operand, branches, else_expr)Create CASE expression
random()Create random number generator

Sources: llkv-expr/src/expr.rs:178-268

Subquery Expression Types

The expression AST includes dedicated types for correlated and scalar subqueries:

SubqueryExpr

Used in Expr::Exists for boolean subquery predicates:

The id references a subquery definition stored separately (typically in the enclosing plan), and negated indicates NOT EXISTS.

Sources: llkv-expr/src/expr.rs:49-56

ScalarSubqueryExpr

Used in ScalarExpr::ScalarSubquery for value-producing subqueries:

This represents subqueries that return a single scalar value, used in expressions like SELECT (SELECT MAX(price) FROM orders) + 10.

Sources: llkv-expr/src/expr.rs:58-63

SubqueryId

Both subquery types use SubqueryId as an opaque identifier:

This ID is resolved during execution by looking up the subquery definition in the parent plan's metadata.

Sources: llkv-expr/src/expr.rs:45-47

graph LR
    subgraph "Expression Translation Pipeline"
        SQL["SQL Text"]
PLAN["Query Plan\nExpr&lt;String&gt;\nScalarExpr&lt;String&gt;"]
TRANS["Translation\nresolve_field_id()"]
EXEC["Execution\nExpr&lt;FieldId&gt;\nScalarExpr&lt;FieldId&gt;"]
EVAL["Evaluation\nRecordBatch Results"]
SQL --> PLAN
 
       PLAN --> TRANS
 
       TRANS --> EXEC
 
       EXEC --> EVAL
    end

Generic Field Parameter

The expression AST is parameterized by field identifier type F, enabling reuse across different processing stages:

Common instantiations:

  • Expr<'a, String> - Used during query planning with column names
  • Expr<'a, FieldId> - Used during execution with numeric field IDs
  • ScalarExpr<String> - Planning-time scalar expressions
  • ScalarExpr<FieldId> - Execution-time scalar expressions

The translation from String to FieldId occurs in llkv-executor/src/translation/expression.rs using catalog lookups to resolve column names to their internal numeric identifiers.

Sources: llkv-executor/src/translation/expression.rs:18-174

Literal Values

Both expression types use the Literal enum from llkv-expr to represent untyped constant values:

Literal VariantArrow TypeExample
Integer(i64)Int6442
Float(f64)Float643.14
Decimal(DecimalValue)Decimal128123.45
Boolean(bool)Booleantrue
String(String)Utf8"hello"
Date32(i32)Date32DATE '2024-01-01'
NullNullNULL
Struct(...)Struct{a: 1, b: "x"}
Interval(...)IntervalINTERVAL '1 month'

Literal values are type-agnostic at the AST level. Type coercion and validation occur during execution when column types are known.

Sources: llkv-expr/src/expr.rs10 llkv-executor/src/translation/schema.rs:53-123

graph TB
    subgraph "Type Inference Rules"
        SCALAR["ScalarExpr&lt;FieldId&gt;"]
SCALAR --> LITERAL["Literal → Literal type"]
SCALAR --> COLUMN["Column → Schema lookup"]
SCALAR --> BINARY["Binary → Int64 or Float64"]
SCALAR --> NOT["Not → Int64 (boolean)"]
SCALAR --> ISNULL["IsNull → Int64 (boolean)"]
SCALAR --> COMPARE["Compare → Int64 (boolean)"]
SCALAR --> AGG["Aggregate → Int64"]
SCALAR --> GETFIELD["GetField → Struct field type"]
SCALAR --> CAST["Cast → Target type"]
SCALAR --> CASE["Case → Int64 or Float64"]
SCALAR --> COALESCE["Coalesce → Int64 or Float64"]
SCALAR --> RANDOM["Random → Float64"]
SCALAR --> SUBQUERY["ScalarSubquery → Utf8 (TODO)"]
BINARY --> FLOATCHECK["Contains Float64?"]
FLOATCHECK -->|Yes| FLOAT64["Float64"]
FLOATCHECK -->|No| INT64["Int64"]
CASE --> CASECHECK["Branches use Float64?"]
CASECHECK -->|Yes| CASEFLOAT["Float64"]
CASECHECK -->|No| CASEINT["Int64"]
end

Expression Type Inference

During execution planning, the system infers output types for ScalarExpr based on operand types:

The inference logic is implemented in infer_computed_data_type() and expression_uses_float(), which recursively analyze expression trees to determine output types.

Sources: llkv-executor/src/translation/schema.rs:53-271

Expression Normalization

Before compilation, predicates undergo normalization to flatten nested AND/OR structures and apply De Morgan's laws:

Normalization Rules:

  1. Flatten AND: And([And([a, b]), c])And([a, b, c])
  2. Flatten OR: Or([Or([a, b]), c])Or([a, b, c])
  3. De Morgan's AND: Not(And([a, b]))Or([Not(a), Not(b)])
  4. De Morgan's OR: Not(Or([a, b]))And([Not(a), Not(b)])
  5. Double Negation: Not(Not(expr))expr
  6. Literal Negation: Not(Literal(true))Literal(false)
  7. IsNull Negation: Not(IsNull { expr, negated })IsNull { expr, negated: !negated }

This normalization simplifies subsequent optimization and compilation steps.

Sources: llkv-table/src/planner/program.rs:286-343

Expression Compilation

Normalized expressions are compiled into two bytecode programs:

  1. EvalProgram - Stack-based evaluation of predicates
  2. DomainProgram - Set-based domain analysis for row filtering

The compilation process:

  • Detects fusable predicates (multiple conditions on same field)
  • Builds domain programs for NOT operations
  • Produces postorder bytecode for stack-based evaluation

Sources: llkv-table/src/planner/program.rs:256-631

Expression AST Usage Flow

Key stages:

  1. SQL Parsing - External sqlparser produces SQL AST
  2. Plan Building - llkv-plan converts SQL AST to Expr<String>
  3. Translation - llkv-executor resolves column names to FieldId
  4. Normalization - llkv-table flattens and optimizes structure
  5. Compilation - llkv-table produces bytecode programs
  6. Execution - llkv-table evaluates against RecordBatch data

Sources: llkv-expr/src/expr.rs:1-8 llkv-executor/src/translation/expression.rs:18-174 llkv-table/src/planner/program.rs:256-631