Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Query Planning

Loading…

Query Planning

Relevant source files

Purpose and Scope

The query planning layer converts parsed SQL abstract syntax trees (AST) into structured logical plans that execution engines can process. This document covers the planning architecture, plan types, translation process, and validation steps. For information about the SQL parsing that precedes planning, see SQL Interface. For details about the plan structures themselves, see Plan Structures. For subquery correlation handling, see Subquery and Correlation Handling. For expression compilation and evaluation, see Expression System. For plan execution, see Query Execution.

Sources: llkv-plan/src/lib.rs:1-49 llkv-plan/src/plans.rs:1-6

Architecture Overview

The llkv-plan crate sits between the SQL parsing layer and the execution layer. It receives SQL ASTs from sqlparser-rs and produces logical plan structures consumed by llkv-executor and llkv-runtime. The crate is organized into several focused modules:

ModulePurpose
plansCore plan structures (SelectPlan, InsertPlan, etc.)
plannerPlan building logic
translationSQL AST to plan conversion
validationSchema and naming constraint enforcement
canonicalCanonical value conversion
conversionHelper conversions for plan types
physicalPhysical execution plan structures
table_scanTable scan plan optimization
subquery_correlationCorrelated subquery tracking
traversalGeneric AST traversal utilities

Sources: llkv-plan/src/lib.rs:1-49 llkv-plan/Cargo.toml:1-43

Planning Process Flow

The planning process begins when the SQL layer invokes translation functions that walk the parsed AST and construct typed plan structures. The process involves multiple phases:

Identifier Resolution : During translation, column references like "users.name" or "u.address.city" must be resolved to canonical column names. The IdentifierResolver in llkv-table handles this by consulting the TableCatalog and applying alias rules.

sequenceDiagram
    participant Parser as sqlparser
    participant Translator as translation module
    participant Validator as validation module
    participant Builder as Plan builders
    participant Catalog as TableCatalog
    participant Output as Logical Plans
    
    Parser->>Translator: Statement AST
    
    Note over Translator: Identify statement type\n(SELECT, INSERT, DDL, etc.)
    
    Translator->>Validator: Check schema references
    Validator->>Catalog: Resolve table names
    Catalog-->>Validator: TableMetadataView
    Validator-->>Translator: Validated identifiers
    
    Translator->>Builder: Construct plan with validated data
    
    alt SELECT statement
        Builder->>Builder: Build SelectPlan\n- Parse projections\n- Translate WHERE\n- Extract aggregates\n- Build join metadata
    else INSERT statement
        Builder->>Builder: Build InsertPlan\n- Parse column list\n- Convert values\n- Set conflict action
    else DDL statement
        Builder->>Builder: Build DDL Plan\n- Extract schema info\n- Validate constraints
    end
    
    Builder->>Validator: Final validation pass
    Validator-->>Builder: Approved
    
    Builder->>Output: Typed logical plan

Expression Translation : Predicate and scalar expressions are translated from SQL AST nodes into the llkv-expr expression types (Expr, ScalarExpr). This translation involves converting SQL operators to expression operators, resolving function calls, and tracking subquery placeholders.

Subquery Discovery : The translation layer identifies correlated subqueries in WHERE and SELECT clauses, assigns them unique SubqueryId identifiers, and builds FilterSubquery or ScalarSubquery metadata structures that track which outer columns are referenced.

Sources: llkv-plan/src/plans.rs:1-100 llkv-table/src/resolvers/identifier.rs:1-260

Plan Type Categories

Plans are organized by SQL statement category. Each category has distinct plan structures tailored to its execution requirements:

DDL Plans

Data Definition Language plans modify schema and structure:

DML Plans

Data Manipulation Language plans modify table contents:

Query Plans

SELECT statements produce the most complex plans with nested substructures:

Sources: llkv-plan/src/plans.rs:164-1620

Plan Construction Patterns

Plan structures expose builder-style APIs for incremental construction:

Builder Methods

Most plan types provide fluent builder methods:

CreateTablePlan::new("users")
    .with_if_not_exists(true)
    .with_columns(vec![
        PlanColumnSpec::new("id", DataType::Int64, false)
            .with_primary_key(true),
        PlanColumnSpec::new("name", DataType::Utf8, true),
    ])
    .with_namespace(Some("main".to_string()))

SelectPlan Construction

SelectPlan is built incrementally as the translator walks the SQL AST:

SelectPlan::new("users")
    .with_projections(projections)
    .with_filter(Some(SelectFilter {
        predicate: expr,
        subqueries: vec![],
    }))
    .with_aggregates(aggregates)
    .with_order_by(order_by)
    .with_group_by(group_by_columns)

The with_tables constructor supports multi-table queries:

SelectPlan::with_tables(vec![
    TableRef::new("main", "orders"),
    TableRef::new("main", "customers"),
])
.with_joins(vec![
    JoinMetadata {
        left_table_index: 0,
        join_type: JoinPlan::Inner,
        on_condition: Some(join_expr),
    },
])

Join Metadata

JoinMetadata replaces the older parallel join_types and join_filters vectors by bundling join information into a single structure per join operation. Each entry describes how tables[i] connects to tables[i+1]:

FieldTypePurpose
left_table_indexusizeIndex into SelectPlan.tables
join_typeJoinPlanINNER, LEFT, RIGHT, or FULL
on_conditionOption<Expr>ON clause predicate

Sources: llkv-plan/src/plans.rs:176-952 llkv-plan/src/plans.rs:756-792

Subquery Plan Structures

Subqueries appear in two contexts and use distinct plan structures:

Filter Subqueries

Used in WHERE clauses with EXISTS predicates:

The SubqueryId links the predicate’s Expr::Exists node to the corresponding FilterSubquery in the subqueries vector.

Scalar Subqueries

Used in SELECT projections to return a single value:

Correlated Column Tracking

When a subquery references outer query columns, the planner injects placeholder column names and tracks the mapping:

CorrelatedColumn {
    placeholder: "__subquery_corr_0_users_id",  // Injected into subquery
    column: "users.id",                          // Actual outer column
    field_path: vec![],                          // Empty for simple columns
}

For struct field references like users.address.city:

CorrelatedColumn {
    placeholder: "__subquery_corr_0_users_address",
    column: "users.address",
    field_path: vec!["city".to_string()],
}

Sources: llkv-plan/src/plans.rs:23-68 llkv-expr/src/expr.rs:42-65

graph TD
    LOGICAL["Logical Plans\n(SelectPlan, InsertPlan, etc.)"]
LOGICAL --> PHYS_BUILD["Physical plan builder\nllkv-plan::physical"]
PHYS_BUILD --> PHYS["PhysicalPlan"]
PHYS --> SCAN["TableScanPlan\nAccess path selection"]
PHYS --> JOIN_EXEC["Join algorithm choice\n(hash join, nested loop)"]
PHYS --> AGG_EXEC["Aggregation strategy"]
PHYS --> SORT_EXEC["Sort algorithm"]
SCAN --> FULL["Full table scan"]
SCAN --> INDEX["Index scan"]
SCAN --> FILTER_SCAN["Filtered scan with pushdown"]
PHYS --> EXEC["llkv-executor\nQuery execution"]

Physical Planning

While logical plans represent what operations to perform, physical plans specify how to execute them with specific algorithms and data access patterns:

The table_scan module provides build_table_scan_plan which analyzes predicates to determine the optimal scan strategy:

TableScanProjectionSpec {
    columns: Vec<String>,           // Columns to retrieve
    filter: Option<Expr>,           // Pushed-down filter
    order_by: Vec<OrderByPlan>,     // Sort requirements
    limit: Option<usize>,           // Row limit
}

This physical plan metadata guides the executor in choosing between:

  • Full scans : No filter, read all rows
  • Filter scans : Predicate evaluation during scan
  • Index scans : Use indexes when available and beneficial

Sources: llkv-plan/src/physical.rs:1-50 (inferred), llkv-plan/src/table_scan.rs:1-50 (inferred)

Translation and Validation

Translation Process

The translation modules convert SQL AST nodes to plan structures while preserving semantic meaning:

Projection Translation :

  • SELECT *SelectProjection::AllColumns
  • SELECT * EXCEPT (x)SelectProjection::AllColumnsExcept
  • SELECT colSelectProjection::Column { name, alias }
  • SELECT expr AS nameSelectProjection::Computed { expr, alias }

Predicate Translation : SQL predicates are converted to Expr trees:

  • WHERE a = 1 AND b < 2Expr::And([Filter(a=1), Filter(b<2)])
  • WHERE x IN (1,2,3)Expr::Pred(Filter { op: Operator::In(...) })
  • WHERE EXISTS (SELECT ...)Expr::Exists(SubqueryExpr { id, negated })

Aggregate Translation : Aggregate functions map to AggregateExpr variants:

  • COUNT(*)AggregateExpr::CountStar
  • SUM(col)AggregateExpr::Column { function: SumInt64, ... }
  • AVG(DISTINCT col)AggregateExpr::Column { distinct: true, ... }

Validation Helpers

The validation module enforces constraints during plan construction:

Validation TypePurpose
Schema validationVerify table and column existence
Type compatibilityCheck data type conversions
Constraint validationEnforce PRIMARY KEY, UNIQUE, CHECK, FK rules
Identifier resolutionResolve ambiguous column references
Aggregate contextEnsure aggregates only in valid contexts

The validation module provides helper functions that translators invoke at critical points:

  • Before plan construction : Validate referenced tables exist
  • During expression building : Resolve column identifiers
  • After plan assembly : Final consistency checks

Canonical Value Conversion

The canonical module converts PlanValue instances to CanonicalScalar for internal processing:

PlanValue::Integer(42) → CanonicalScalar::Int64(42)
PlanValue::String("abc") → CanonicalScalar::String("abc")
PlanValue::Decimal(dec) → CanonicalScalar::Decimal(dec)

This normalization ensures consistent value representation across plan construction and execution.

Sources: llkv-plan/src/translation.rs:1-50 (inferred), llkv-plan/src/validation.rs:1-50 (inferred), llkv-plan/src/canonical.rs:1-50 (inferred), llkv-plan/src/plans.rs:126-161

Plan Metadata and Auxiliary Types

Plans contain rich metadata to guide execution:

PlanValue

Represents literal values in plans:

ColumnAssignment

UPDATE plans use ColumnAssignment to specify modifications:

ColumnAssignment {
    column: "age",
    value: AssignmentValue::Literal(PlanValue::Integer(25)),
}

ColumnAssignment {
    column: "updated_at",
    value: AssignmentValue::Expression(
        ScalarExpr::Column("current_timestamp")
    ),
}

Constraint Specifications

CREATE TABLE plans include constraint metadata:

PlanColumnSpec {
    name: "id",
    data_type: DataType::Int64,
    nullable: false,
    primary_key: true,
    unique: true,
    check_expr: None,
}

ForeignKeySpec {
    name: Some("fk_orders_customer"),
    columns: vec!["customer_id"],
    referenced_table: "customers",
    referenced_columns: vec!["id"],
    on_delete: ForeignKeyAction::Restrict,
    on_update: ForeignKeyAction::NoAction,
}

MultiColumnUniqueSpec {
    name: Some("unique_email_username"),
    columns: vec!["email", "username"],
}

Sources: llkv-plan/src/plans.rs:73-161 llkv-plan/src/plans.rs:167-427 llkv-plan/src/plans.rs:660-682 llkv-plan/src/plans.rs:499-546

Traversal and Program Compilation

AST Traversal

The traversal module provides generic postorder traversal for deeply nested ASTs:

traverse_postorder(
    root_node,
    |node| { /* pre-visit logic */ },
    |node| { /* post-visit logic */ }
)

This pattern supports:

  • Subquery discovery during SELECT translation
  • Expression normalization
  • Dead code elimination in predicates

Program Compilation

Plans containing expressions are compiled into evaluation programs for efficient execution. The ProgramCompiler converts Expr trees into bytecode-like instruction sequences:

Expr → EvalProgram → Execution

The compiled programs optimize:

  • Predicate evaluation order
  • Short-circuit boolean logic
  • Literal folding
  • Domain program generation for chunk pruning

Sources: llkv-plan/src/traversal.rs:1-50 (inferred), llkv-plan/src/lib.rs:38-42

sequenceDiagram
    participant Planner as llkv-plan
    participant Runtime as llkv-runtime
    participant Executor as llkv-executor
    participant Table as llkv-table
    
    Planner->>Runtime: Logical Plan
    Runtime->>Runtime: Acquire transaction context
    
    alt SELECT Plan
        Runtime->>Executor: execute_select(plan, txn)
        Executor->>Executor: Build execution pipeline
        Executor->>Table: scan with filter
        Table-->>Executor: RecordBatch stream
        Executor->>Executor: Apply aggregations, joins, sorts
        Executor-->>Runtime: Final RecordBatch
    else INSERT Plan
        Runtime->>Runtime: Convert InsertSource to batches
        Runtime->>Table: append(batches)
        Table-->>Runtime: Row count
    else UPDATE/DELETE Plan
        Runtime->>Executor: execute_update/delete(plan, txn)
        Executor->>Table: filter + modify
        Table-->>Runtime: Row count
    else DDL Plan
        Runtime->>Table: Modify catalog/schema
        Table-->>Runtime: Success
    end
    
    Runtime->>Planner: Execution result

Integration with Execution

Plans flow from the planning layer to execution through well-defined interfaces:

The execution layer consumes plan metadata to:

  1. Determine table access order
  2. Select join algorithms
  3. Push down predicates to storage
  4. Apply aggregations and sorts
  5. Stream results incrementally

Sources: llkv-plan/src/plans.rs:1-1620 llkv-executor/src/lib.rs:1-50 (inferred), llkv-runtime/src/lib.rs:1-50 (inferred)

Dismiss

Refresh this wiki

Enter email to refresh