Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Subquery and Correlation Handling

Loading…

Subquery and Correlation Handling

Relevant source files

Purpose and Scope

This document explains how LLKV handles correlated and uncorrelated subqueries in SQL queries. It covers the expression AST structures for representing subqueries, the plan-level metadata for tracking correlation relationships, and the placeholder system used during query planning.

For information about query plan structures more broadly, see Plan Structures. For expression evaluation and compilation, see Expression System.


Overview

LLKV supports two categories of subqueries:

  1. Predicate Subqueries - Used in WHERE clauses with EXISTS or NOT EXISTS
  2. Scalar Subqueries - Used in SELECT projections or expressions, returning a single value

Both types can be either correlated (referencing columns from outer queries) or uncorrelated (self-contained). Correlated subqueries require special handling to capture outer column references and inject them as parameters during evaluation.

The system uses a multi-stage approach:

  • Planning Phase : Assign unique SubqueryId values and build metadata
  • Translation Phase : Replace correlated column references with placeholders
  • Execution Phase : Evaluate subqueries per outer row, binding placeholder values

Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:27-67


Subquery Identification System

SubqueryId

Each subquery within a query plan receives a unique SubqueryId identifier. This allows the executor to distinguish multiple subqueries and manage their evaluation contexts separately.

SubqueryId Assignment Flow

graph TB
    Query["Main Query\n(SELECT Plan)"]
Filter["WHERE Clause\n(SelectFilter)"]
Proj["SELECT List\n(Projections)"]
Sub1["EXISTS Subquery\nSubqueryId(0)"]
Sub2["Scalar Subquery\nSubqueryId(1)"]
Sub3["EXISTS Subquery\nSubqueryId(2)"]
Query --> Filter
 
   Query --> Proj
 
   Filter --> Sub1
 
   Filter --> Sub3
 
   Proj --> Sub2
    
    Sub1 -.references.-> Meta1["FilterSubquery\nid=0\nplan + correlations"]
Sub2 -.references.-> Meta2["ScalarSubquery\nid=1\nplan + correlations"]
Sub3 -.references.-> Meta3["FilterSubquery\nid=2\nplan + correlations"]

The planner assigns sequential IDs during query translation. Each subquery expression (Expr::Exists or ScalarExpr::ScalarSubquery) holds its assigned ID, while the full subquery plan and correlation metadata are stored separately in the parent SelectPlan.

StructureLocationPurpose
SubqueryIdExpression ASTReferences a subquery definition
FilterSubquerySelectPlan.filter.subqueriesMetadata for EXISTS/NOT EXISTS
ScalarSubquerySelectPlan.scalar_subqueriesMetadata for scalar subqueries

Sources: llkv-expr/src/expr.rs:46-65 llkv-plan/src/plans.rs:36-56


Expression AST Structures

SubqueryExpr for Predicates

The SubqueryExpr structure appears in boolean expressions as Expr::Exists variants. It supports negation for NOT EXISTS semantics.

Example WHERE clause : WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id)

The predicate tree contains Expr::Exists(SubqueryExpr { id: SubqueryId(0), negated: false }), while SelectPlan.filter.subqueries[0] holds the actual subquery plan and correlation data.

Sources: llkv-expr/src/expr.rs:49-56

ScalarSubqueryExpr for Projections

Scalar subqueries return a single value per outer row and appear in ScalarExpr trees. They carry the expected data type for validation.

Example projection : SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.id) AS order_count

The expression is ScalarExpr::ScalarSubquery(ScalarSubqueryExpr { id: SubqueryId(0), data_type: Int64 }), with full plan details in SelectPlan.scalar_subqueries[0].

Sources: llkv-expr/src/expr.rs:58-65 llkv-plan/src/plans.rs:47-56


Plan-Level Metadata Structures

classDiagram
    class FilterSubquery {+SubqueryId id\n+Box~SelectPlan~ plan\n+Vec~CorrelatedColumn~ correlated_columns}
    
    class CorrelatedColumn {+String placeholder\n+String column\n+Vec~String~ field_path}
    
    class SelectFilter {+Expr predicate\n+Vec~FilterSubquery~ subqueries}
    
    SelectFilter --> FilterSubquery : contains
    FilterSubquery --> CorrelatedColumn : tracks correlations
    FilterSubquery --> SelectPlan : nested plan

FilterSubquery

This structure captures all metadata needed to evaluate an EXISTS predicate during query execution.

Field Descriptions :

FieldTypePurpose
idSubqueryIdUnique identifier referenced in predicate AST
planBox<SelectPlan>Complete logical plan for subquery execution
correlated_columnsVec<CorrelatedColumn>Outer columns referenced inside subquery

Sources: llkv-plan/src/plans.rs:36-45

ScalarSubquery

Parallel structure for scalar subqueries used in projections:

The plan must produce exactly one row with one column. If the subquery returns zero rows, the executor produces NULL. Multiple rows trigger a runtime error.

Sources: llkv-plan/src/plans.rs:47-56

CorrelatedColumn

Describes a single correlated column reference captured from the outer query:

Example :

  • Outer Query : SELECT * FROM customers WHERE ...
  • Subquery : SELECT 1 FROM orders WHERE orders.customer_id = customers.id
  • CorrelatedColumn : { placeholder: "$corr_0", column: "id", field_path: [] }

During subquery evaluation, the executor binds $corr_0 to the current outer row’s id value.

Sources: llkv-plan/src/plans.rs:58-67


Correlation Tracking System

sequenceDiagram
    participant Planner
    participant Tracker as SubqueryCorrelatedTracker
    participant SubqueryPlan
    
    Planner->>Tracker: new()
    Planner->>Tracker: track_column("customers.id")
    Tracker-->>Planner: placeholder = "$corr_0"
    
    Planner->>SubqueryPlan: Replace "customers.id" with "$corr_0"
    
    Planner->>Tracker: finalize()
    Tracker-->>Planner: Vec<CorrelatedColumn>

Placeholder Naming Convention

The planner module exports SUBQUERY_CORRELATED_PLACEHOLDER_PREFIX which defines the prefix for generated placeholder names. The helper function subquery_correlated_placeholder(index) produces names like $corr_0, $corr_1, etc.

Tracking Workflow :

  1. Create SubqueryCorrelatedTracker when entering subquery translation
  2. For each outer column reference, call track_column(canonical_name) → returns placeholder
  3. Replace column reference in subquery AST with placeholder
  4. Call finalize() to extract Vec<CorrelatedColumn> for plan metadata

Sources: llkv-plan/src/lib.rs:43-46

SubqueryCorrelatedColumnTracker

This type (referenced in exports) manages the mapping between outer columns and generated placeholders. It ensures each unique outer column gets one placeholder regardless of how many times it’s referenced in the subquery.

Deduplication Example :

The tracker creates only one CorrelatedColumn entry with a single placeholder $corr_0 that both references use.

Sources: llkv-plan/src/lib.rs:44-45


Integration with Plan Structures

SelectPlan Storage

The SelectPlan structure holds subquery metadata in dedicated fields:

Storage Pattern :

Subquery TypeStorage LocationID Reference Location
EXISTS in WHEREfilter.subqueriesfilter.predicateExpr::Exists
Scalar in SELECTscalar_subqueriesprojectionsScalarExpr::ScalarSubquery
Scalar in WHEREscalar_subqueriesfilter.predicate → comparison expressions

Sources: llkv-plan/src/plans.rs:800-829

Builder Methods

The SelectPlan provides fluent builder methods for attaching subquery metadata:

The planner typically calls these after translating the main query and all nested subqueries.

Sources: llkv-plan/src/plans.rs:895-909


Identifier Resolution During Correlation

graph TB
    OuterQuery["Outer Query\ntable=customers\nalias=c"]
SubqueryScope["Subquery Scope\ntable=orders\nalias=o"]
Identifier["Column Reference:\n'c.id'"]
Resolver["IdentifierResolver"]
OuterQuery --> Context1["IdentifierContext\ntable_id=1\nalias=Some('c')"]
SubqueryScope --> Context2["IdentifierContext\ntable_id=2\nalias=Some('o')"]
Identifier --> Resolver
 
   Resolver --> Decision{"Which scope?"}
Decision -->|Outer| Correlated["Mark as correlated\nGenerate placeholder"]
Decision -->|Inner| Local["Resolve locally"]

IdentifierContext for Outer Scopes

When translating a correlated subquery, the planner maintains an IdentifierContext that tracks which columns belong to outer query scopes vs. the subquery’s own tables.

Resolution Process :

  1. Check if identifier starts with outer table alias → mark correlated
  2. Check if identifier matches outer table columns (when no alias) → mark correlated
  3. Otherwise, resolve within subquery’s own table scope

Sources: llkv-table/src/resolvers/identifier.rs:8-66

ColumnResolution Structure

The resolver produces ColumnResolution objects that distinguish simple column references from nested field access:

For correlated columns, this resolution data flows into CorrelatedColumn.field_path, enabling struct-typed correlation like outer_table.struct_column.nested_field.

Sources: llkv-table/src/resolvers/identifier.rs:68-100


Execution Flow

Evaluation Steps :

  1. Outer Row Context : For each row from outer query, create evaluation context
  2. Placeholder Binding : Extract values for correlated columns from outer row
  3. Subquery Execution : Run inner query with bound placeholder values
  4. Result Integration : EXISTS → boolean, Scalar → value, integrate into outer query

The executor must evaluate correlated subqueries once per outer row, making them potentially expensive. Uncorrelated subqueries can be evaluated once and cached.

Sources: llkv-plan/src/plans.rs:27-67


Example: Correlated EXISTS Subquery

Consider this SQL query:

Planning Phase Output

SelectPlan Structure :

SelectPlan {
    tables: [TableRef { table: "customers", alias: Some("c") }],
    projections: [
        Column { name: "name", alias: None },
        Column { name: "id", alias: None }
    ],
    filter: Some(SelectFilter {
        predicate: Expr::Exists(SubqueryExpr {
            id: SubqueryId(0),
            negated: false
        }),
        subqueries: [
            FilterSubquery {
                id: SubqueryId(0),
                plan: Box::new(SelectPlan {
                    tables: [TableRef { table: "orders", alias: Some("o") }],
                    filter: Some(SelectFilter {
                        predicate: Expr::And([
                            Compare { 
                                left: Column("customer_id"), 
                                op: Eq, 
                                right: Column("$corr_0") 
                            },
                            Compare { 
                                left: Column("status"), 
                                op: Eq, 
                                right: Literal("pending") 
                            }
                        ]),
                        subqueries: []
                    }),
                    projections: [Computed { expr: Literal(1), alias: "1" }]
                }),
                correlated_columns: [
                    CorrelatedColumn {
                        placeholder: "$corr_0",
                        column: "id",
                        field_path: []
                    }
                ]
            }
        ]
    }),
    // ... other fields ...
}

Translation Details

Original ReferenceAfter TranslationCorrelation Entry
c.id in subquery WHERE$corr_0{ placeholder: "$corr_0", column: "id", field_path: [] }

The subquery’s filter now compares o.customer_id against the placeholder $corr_0 instead of directly referencing the outer column.

Sources: llkv-plan/src/plans.rs:27-67 llkv-expr/src/expr.rs:49-56


Key Design Decisions

Why Separate FilterSubquery and ScalarSubquery?

Different evaluation semantics:

  • EXISTS : Returns boolean, short-circuits on first match
  • Scalar : Must verify exactly one row returned, extracts value

Having distinct types allows the executor to apply appropriate validation and optimization strategies.

Why Store SubqueryId in Expression AST?

Decouples expression evaluation from subquery execution context. The expression tree remains lightweight and can be cloned/transformed without carrying full subquery plans. The executor looks up metadata by ID when needed.

Why Use String Placeholders?

String placeholders like $corr_0 integrate naturally with the existing column resolution system. The executor can treat them as special “virtual columns” that get their values from outer row context rather than table scans.

Sources: llkv-expr/src/expr.rs:45-65 llkv-plan/src/plans.rs:36-67

Dismiss

Refresh this wiki

Enter email to refresh