This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Subquery and Correlation Handling
Loading…
Subquery and Correlation Handling
Relevant source files
- llkv-expr/src/expr.rs
- llkv-plan/src/lib.rs
- llkv-plan/src/plans.rs
- llkv-table/src/resolvers/identifier.rs
Purpose and Scope
This document explains how LLKV handles correlated and uncorrelated subqueries in SQL queries. It covers the expression AST structures for representing subqueries, the plan-level metadata for tracking correlation relationships, and the placeholder system used during query planning.
For information about query plan structures more broadly, see Plan Structures. For expression evaluation and compilation, see Expression System.
Overview
LLKV supports two categories of subqueries:
- Predicate Subqueries - Used in WHERE clauses with
EXISTSorNOT EXISTS - Scalar Subqueries - Used in SELECT projections or expressions, returning a single value
Both types can be either correlated (referencing columns from outer queries) or uncorrelated (self-contained). Correlated subqueries require special handling to capture outer column references and inject them as parameters during evaluation.
The system uses a multi-stage approach:
- Planning Phase : Assign unique
SubqueryIdvalues and build metadata - Translation Phase : Replace correlated column references with placeholders
- Execution Phase : Evaluate subqueries per outer row, binding placeholder values
Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:27-67
Subquery Identification System
SubqueryId
Each subquery within a query plan receives a unique SubqueryId identifier. This allows the executor to distinguish multiple subqueries and manage their evaluation contexts separately.
SubqueryId Assignment Flow
graph TB
Query["Main Query\n(SELECT Plan)"]
Filter["WHERE Clause\n(SelectFilter)"]
Proj["SELECT List\n(Projections)"]
Sub1["EXISTS Subquery\nSubqueryId(0)"]
Sub2["Scalar Subquery\nSubqueryId(1)"]
Sub3["EXISTS Subquery\nSubqueryId(2)"]
Query --> Filter
Query --> Proj
Filter --> Sub1
Filter --> Sub3
Proj --> Sub2
Sub1 -.references.-> Meta1["FilterSubquery\nid=0\nplan + correlations"]
Sub2 -.references.-> Meta2["ScalarSubquery\nid=1\nplan + correlations"]
Sub3 -.references.-> Meta3["FilterSubquery\nid=2\nplan + correlations"]
The planner assigns sequential IDs during query translation. Each subquery expression (Expr::Exists or ScalarExpr::ScalarSubquery) holds its assigned ID, while the full subquery plan and correlation metadata are stored separately in the parent SelectPlan.
| Structure | Location | Purpose |
|---|---|---|
SubqueryId | Expression AST | References a subquery definition |
FilterSubquery | SelectPlan.filter.subqueries | Metadata for EXISTS/NOT EXISTS |
ScalarSubquery | SelectPlan.scalar_subqueries | Metadata for scalar subqueries |
Sources: llkv-expr/src/expr.rs:46-65 llkv-plan/src/plans.rs:36-56
Expression AST Structures
SubqueryExpr for Predicates
The SubqueryExpr structure appears in boolean expressions as Expr::Exists variants. It supports negation for NOT EXISTS semantics.
Example WHERE clause : WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id)
The predicate tree contains Expr::Exists(SubqueryExpr { id: SubqueryId(0), negated: false }), while SelectPlan.filter.subqueries[0] holds the actual subquery plan and correlation data.
Sources: llkv-expr/src/expr.rs:49-56
ScalarSubqueryExpr for Projections
Scalar subqueries return a single value per outer row and appear in ScalarExpr trees. They carry the expected data type for validation.
Example projection : SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.id) AS order_count
The expression is ScalarExpr::ScalarSubquery(ScalarSubqueryExpr { id: SubqueryId(0), data_type: Int64 }), with full plan details in SelectPlan.scalar_subqueries[0].
Sources: llkv-expr/src/expr.rs:58-65 llkv-plan/src/plans.rs:47-56
Plan-Level Metadata Structures
classDiagram
class FilterSubquery {+SubqueryId id\n+Box~SelectPlan~ plan\n+Vec~CorrelatedColumn~ correlated_columns}
class CorrelatedColumn {+String placeholder\n+String column\n+Vec~String~ field_path}
class SelectFilter {+Expr predicate\n+Vec~FilterSubquery~ subqueries}
SelectFilter --> FilterSubquery : contains
FilterSubquery --> CorrelatedColumn : tracks correlations
FilterSubquery --> SelectPlan : nested plan
FilterSubquery
This structure captures all metadata needed to evaluate an EXISTS predicate during query execution.
Field Descriptions :
| Field | Type | Purpose |
|---|---|---|
id | SubqueryId | Unique identifier referenced in predicate AST |
plan | Box<SelectPlan> | Complete logical plan for subquery execution |
correlated_columns | Vec<CorrelatedColumn> | Outer columns referenced inside subquery |
Sources: llkv-plan/src/plans.rs:36-45
ScalarSubquery
Parallel structure for scalar subqueries used in projections:
The plan must produce exactly one row with one column. If the subquery returns zero rows, the executor produces NULL. Multiple rows trigger a runtime error.
Sources: llkv-plan/src/plans.rs:47-56
CorrelatedColumn
Describes a single correlated column reference captured from the outer query:
Example :
- Outer Query :
SELECT * FROM customers WHERE ... - Subquery :
SELECT 1 FROM orders WHERE orders.customer_id = customers.id - CorrelatedColumn :
{ placeholder: "$corr_0", column: "id", field_path: [] }
During subquery evaluation, the executor binds $corr_0 to the current outer row’s id value.
Sources: llkv-plan/src/plans.rs:58-67
Correlation Tracking System
sequenceDiagram
participant Planner
participant Tracker as SubqueryCorrelatedTracker
participant SubqueryPlan
Planner->>Tracker: new()
Planner->>Tracker: track_column("customers.id")
Tracker-->>Planner: placeholder = "$corr_0"
Planner->>SubqueryPlan: Replace "customers.id" with "$corr_0"
Planner->>Tracker: finalize()
Tracker-->>Planner: Vec<CorrelatedColumn>
Placeholder Naming Convention
The planner module exports SUBQUERY_CORRELATED_PLACEHOLDER_PREFIX which defines the prefix for generated placeholder names. The helper function subquery_correlated_placeholder(index) produces names like $corr_0, $corr_1, etc.
Tracking Workflow :
- Create
SubqueryCorrelatedTrackerwhen entering subquery translation - For each outer column reference, call
track_column(canonical_name)→ returns placeholder - Replace column reference in subquery AST with placeholder
- Call
finalize()to extractVec<CorrelatedColumn>for plan metadata
Sources: llkv-plan/src/lib.rs:43-46
SubqueryCorrelatedColumnTracker
This type (referenced in exports) manages the mapping between outer columns and generated placeholders. It ensures each unique outer column gets one placeholder regardless of how many times it’s referenced in the subquery.
Deduplication Example :
The tracker creates only one CorrelatedColumn entry with a single placeholder $corr_0 that both references use.
Sources: llkv-plan/src/lib.rs:44-45
Integration with Plan Structures
SelectPlan Storage
The SelectPlan structure holds subquery metadata in dedicated fields:
Storage Pattern :
| Subquery Type | Storage Location | ID Reference Location |
|---|---|---|
| EXISTS in WHERE | filter.subqueries | filter.predicate → Expr::Exists |
| Scalar in SELECT | scalar_subqueries | projections → ScalarExpr::ScalarSubquery |
| Scalar in WHERE | scalar_subqueries | filter.predicate → comparison expressions |
Sources: llkv-plan/src/plans.rs:800-829
Builder Methods
The SelectPlan provides fluent builder methods for attaching subquery metadata:
The planner typically calls these after translating the main query and all nested subqueries.
Sources: llkv-plan/src/plans.rs:895-909
Identifier Resolution During Correlation
graph TB
OuterQuery["Outer Query\ntable=customers\nalias=c"]
SubqueryScope["Subquery Scope\ntable=orders\nalias=o"]
Identifier["Column Reference:\n'c.id'"]
Resolver["IdentifierResolver"]
OuterQuery --> Context1["IdentifierContext\ntable_id=1\nalias=Some('c')"]
SubqueryScope --> Context2["IdentifierContext\ntable_id=2\nalias=Some('o')"]
Identifier --> Resolver
Resolver --> Decision{"Which scope?"}
Decision -->|Outer| Correlated["Mark as correlated\nGenerate placeholder"]
Decision -->|Inner| Local["Resolve locally"]
IdentifierContext for Outer Scopes
When translating a correlated subquery, the planner maintains an IdentifierContext that tracks which columns belong to outer query scopes vs. the subquery’s own tables.
Resolution Process :
- Check if identifier starts with outer table alias → mark correlated
- Check if identifier matches outer table columns (when no alias) → mark correlated
- Otherwise, resolve within subquery’s own table scope
Sources: llkv-table/src/resolvers/identifier.rs:8-66
ColumnResolution Structure
The resolver produces ColumnResolution objects that distinguish simple column references from nested field access:
For correlated columns, this resolution data flows into CorrelatedColumn.field_path, enabling struct-typed correlation like outer_table.struct_column.nested_field.
Sources: llkv-table/src/resolvers/identifier.rs:68-100
Execution Flow
Evaluation Steps :
- Outer Row Context : For each row from outer query, create evaluation context
- Placeholder Binding : Extract values for correlated columns from outer row
- Subquery Execution : Run inner query with bound placeholder values
- Result Integration : EXISTS → boolean, Scalar → value, integrate into outer query
The executor must evaluate correlated subqueries once per outer row, making them potentially expensive. Uncorrelated subqueries can be evaluated once and cached.
Sources: llkv-plan/src/plans.rs:27-67
Example: Correlated EXISTS Subquery
Consider this SQL query:
Planning Phase Output
SelectPlan Structure :
SelectPlan {
tables: [TableRef { table: "customers", alias: Some("c") }],
projections: [
Column { name: "name", alias: None },
Column { name: "id", alias: None }
],
filter: Some(SelectFilter {
predicate: Expr::Exists(SubqueryExpr {
id: SubqueryId(0),
negated: false
}),
subqueries: [
FilterSubquery {
id: SubqueryId(0),
plan: Box::new(SelectPlan {
tables: [TableRef { table: "orders", alias: Some("o") }],
filter: Some(SelectFilter {
predicate: Expr::And([
Compare {
left: Column("customer_id"),
op: Eq,
right: Column("$corr_0")
},
Compare {
left: Column("status"),
op: Eq,
right: Literal("pending")
}
]),
subqueries: []
}),
projections: [Computed { expr: Literal(1), alias: "1" }]
}),
correlated_columns: [
CorrelatedColumn {
placeholder: "$corr_0",
column: "id",
field_path: []
}
]
}
]
}),
// ... other fields ...
}
Translation Details
| Original Reference | After Translation | Correlation Entry |
|---|---|---|
c.id in subquery WHERE | $corr_0 | { placeholder: "$corr_0", column: "id", field_path: [] } |
The subquery’s filter now compares o.customer_id against the placeholder $corr_0 instead of directly referencing the outer column.
Sources: llkv-plan/src/plans.rs:27-67 llkv-expr/src/expr.rs:49-56
Key Design Decisions
Why Separate FilterSubquery and ScalarSubquery?
Different evaluation semantics:
- EXISTS : Returns boolean, short-circuits on first match
- Scalar : Must verify exactly one row returned, extracts value
Having distinct types allows the executor to apply appropriate validation and optimization strategies.
Why Store SubqueryId in Expression AST?
Decouples expression evaluation from subquery execution context. The expression tree remains lightweight and can be cloned/transformed without carrying full subquery plans. The executor looks up metadata by ID when needed.
Why Use String Placeholders?
String placeholders like $corr_0 integrate naturally with the existing column resolution system. The executor can treat them as special “virtual columns” that get their values from outer row context rather than table scans.
Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:36-67
Dismiss
Refresh this wiki
Enter email to refresh