This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Query Planning
Loading…
Query Planning
Relevant source files
- .github/workflows/build.docs.yml
- demos/llkv-sql-pong-demo/assets/llkv-sql-pong-screenshot.png
- dev-docs/doc-preview.md
- llkv-expr/src/expr.rs
- llkv-plan/Cargo.toml
- llkv-plan/src/lib.rs
- llkv-plan/src/plans.rs
- llkv-sql/Cargo.toml
- llkv-table/src/resolvers/identifier.rs
- llkv-test-utils/Cargo.toml
Purpose and Scope
The query planning layer converts parsed SQL abstract syntax trees (AST) into structured logical plans that execution engines can process. This document covers the planning architecture, plan types, translation process, and validation steps. For information about the SQL parsing that precedes planning, see SQL Interface. For details about the plan structures themselves, see Plan Structures. For subquery correlation handling, see Subquery and Correlation Handling. For expression compilation and evaluation, see Expression System. For plan execution, see Query Execution.
Sources: llkv-plan/src/lib.rs:1-49 llkv-plan/src/plans.rs:1-6
Architecture Overview
The llkv-plan crate sits between the SQL parsing layer and the execution layer. It receives SQL ASTs from sqlparser-rs and produces logical plan structures consumed by llkv-executor and llkv-runtime. The crate is organized into several focused modules:
| Module | Purpose |
|---|---|
plans | Core plan structures (SelectPlan, InsertPlan, etc.) |
planner | Plan building logic |
translation | SQL AST to plan conversion |
validation | Schema and naming constraint enforcement |
canonical | Canonical value conversion |
conversion | Helper conversions for plan types |
physical | Physical execution plan structures |
table_scan | Table scan plan optimization |
subquery_correlation | Correlated subquery tracking |
traversal | Generic AST traversal utilities |
Sources: llkv-plan/src/lib.rs:1-49 llkv-plan/Cargo.toml:1-43
Planning Process Flow
The planning process begins when the SQL layer invokes translation functions that walk the parsed AST and construct typed plan structures. The process involves multiple phases:
Identifier Resolution : During translation, column references like "users.name" or "u.address.city" must be resolved to canonical column names. The IdentifierResolver in llkv-table handles this by consulting the TableCatalog and applying alias rules.
sequenceDiagram
participant Parser as sqlparser
participant Translator as translation module
participant Validator as validation module
participant Builder as Plan builders
participant Catalog as TableCatalog
participant Output as Logical Plans
Parser->>Translator: Statement AST
Note over Translator: Identify statement type\n(SELECT, INSERT, DDL, etc.)
Translator->>Validator: Check schema references
Validator->>Catalog: Resolve table names
Catalog-->>Validator: TableMetadataView
Validator-->>Translator: Validated identifiers
Translator->>Builder: Construct plan with validated data
alt SELECT statement
Builder->>Builder: Build SelectPlan\n- Parse projections\n- Translate WHERE\n- Extract aggregates\n- Build join metadata
else INSERT statement
Builder->>Builder: Build InsertPlan\n- Parse column list\n- Convert values\n- Set conflict action
else DDL statement
Builder->>Builder: Build DDL Plan\n- Extract schema info\n- Validate constraints
end
Builder->>Validator: Final validation pass
Validator-->>Builder: Approved
Builder->>Output: Typed logical plan
Expression Translation : Predicate and scalar expressions are translated from SQL AST nodes into the llkv-expr expression types (Expr, ScalarExpr). This translation involves converting SQL operators to expression operators, resolving function calls, and tracking subquery placeholders.
Subquery Discovery : The translation layer identifies correlated subqueries in WHERE and SELECT clauses, assigns them unique SubqueryId identifiers, and builds FilterSubquery or ScalarSubquery metadata structures that track which outer columns are referenced.
Sources: llkv-plan/src/plans.rs:1-100 llkv-table/src/resolvers/identifier.rs:1-260
Plan Type Categories
Plans are organized by SQL statement category. Each category has distinct plan structures tailored to its execution requirements:
DDL Plans
Data Definition Language plans modify schema and structure:
DML Plans
Data Manipulation Language plans modify table contents:
Query Plans
SELECT statements produce the most complex plans with nested substructures:
Sources: llkv-plan/src/plans.rs:164-1620
Plan Construction Patterns
Plan structures expose builder-style APIs for incremental construction:
Builder Methods
Most plan types provide fluent builder methods:
CreateTablePlan::new("users")
.with_if_not_exists(true)
.with_columns(vec![
PlanColumnSpec::new("id", DataType::Int64, false)
.with_primary_key(true),
PlanColumnSpec::new("name", DataType::Utf8, true),
])
.with_namespace(Some("main".to_string()))
SelectPlan Construction
SelectPlan is built incrementally as the translator walks the SQL AST:
SelectPlan::new("users")
.with_projections(projections)
.with_filter(Some(SelectFilter {
predicate: expr,
subqueries: vec![],
}))
.with_aggregates(aggregates)
.with_order_by(order_by)
.with_group_by(group_by_columns)
The with_tables constructor supports multi-table queries:
SelectPlan::with_tables(vec![
TableRef::new("main", "orders"),
TableRef::new("main", "customers"),
])
.with_joins(vec![
JoinMetadata {
left_table_index: 0,
join_type: JoinPlan::Inner,
on_condition: Some(join_expr),
},
])
Join Metadata
JoinMetadata replaces the older parallel join_types and join_filters vectors by bundling join information into a single structure per join operation. Each entry describes how tables[i] connects to tables[i+1]:
| Field | Type | Purpose |
|---|---|---|
left_table_index | usize | Index into SelectPlan.tables |
join_type | JoinPlan | INNER, LEFT, RIGHT, or FULL |
on_condition | Option<Expr> | ON clause predicate |
Sources: llkv-plan/src/plans.rs:176-952 llkv-plan/src/plans.rs:756-792
Subquery Plan Structures
Subqueries appear in two contexts and use distinct plan structures:
Filter Subqueries
Used in WHERE clauses with EXISTS predicates:
The SubqueryId links the predicate’s Expr::Exists node to the corresponding FilterSubquery in the subqueries vector.
Scalar Subqueries
Used in SELECT projections to return a single value:
Correlated Column Tracking
When a subquery references outer query columns, the planner injects placeholder column names and tracks the mapping:
CorrelatedColumn {
placeholder: "__subquery_corr_0_users_id", // Injected into subquery
column: "users.id", // Actual outer column
field_path: vec![], // Empty for simple columns
}
For struct field references like users.address.city:
CorrelatedColumn {
placeholder: "__subquery_corr_0_users_address",
column: "users.address",
field_path: vec!["city".to_string()],
}
Sources: llkv-plan/src/plans.rs:23-68 llkv-expr/src/expr.rs:42-65
graph TD
LOGICAL["Logical Plans\n(SelectPlan, InsertPlan, etc.)"]
LOGICAL --> PHYS_BUILD["Physical plan builder\nllkv-plan::physical"]
PHYS_BUILD --> PHYS["PhysicalPlan"]
PHYS --> SCAN["TableScanPlan\nAccess path selection"]
PHYS --> JOIN_EXEC["Join algorithm choice\n(hash join, nested loop)"]
PHYS --> AGG_EXEC["Aggregation strategy"]
PHYS --> SORT_EXEC["Sort algorithm"]
SCAN --> FULL["Full table scan"]
SCAN --> INDEX["Index scan"]
SCAN --> FILTER_SCAN["Filtered scan with pushdown"]
PHYS --> EXEC["llkv-executor\nQuery execution"]
Physical Planning
While logical plans represent what operations to perform, physical plans specify how to execute them with specific algorithms and data access patterns:
The table_scan module provides build_table_scan_plan which analyzes predicates to determine the optimal scan strategy:
TableScanProjectionSpec {
columns: Vec<String>, // Columns to retrieve
filter: Option<Expr>, // Pushed-down filter
order_by: Vec<OrderByPlan>, // Sort requirements
limit: Option<usize>, // Row limit
}
This physical plan metadata guides the executor in choosing between:
- Full scans : No filter, read all rows
- Filter scans : Predicate evaluation during scan
- Index scans : Use indexes when available and beneficial
Sources: llkv-plan/src/physical.rs:1-50 (inferred), llkv-plan/src/table_scan.rs:1-50 (inferred)
Translation and Validation
Translation Process
The translation modules convert SQL AST nodes to plan structures while preserving semantic meaning:
Projection Translation :
SELECT *→SelectProjection::AllColumnsSELECT * EXCEPT (x)→SelectProjection::AllColumnsExceptSELECT col→SelectProjection::Column { name, alias }SELECT expr AS name→SelectProjection::Computed { expr, alias }
Predicate Translation : SQL predicates are converted to Expr trees:
WHERE a = 1 AND b < 2→Expr::And([Filter(a=1), Filter(b<2)])WHERE x IN (1,2,3)→Expr::Pred(Filter { op: Operator::In(...) })WHERE EXISTS (SELECT ...)→Expr::Exists(SubqueryExpr { id, negated })
Aggregate Translation : Aggregate functions map to AggregateExpr variants:
COUNT(*)→AggregateExpr::CountStarSUM(col)→AggregateExpr::Column { function: SumInt64, ... }AVG(DISTINCT col)→AggregateExpr::Column { distinct: true, ... }
Validation Helpers
The validation module enforces constraints during plan construction:
| Validation Type | Purpose |
|---|---|
| Schema validation | Verify table and column existence |
| Type compatibility | Check data type conversions |
| Constraint validation | Enforce PRIMARY KEY, UNIQUE, CHECK, FK rules |
| Identifier resolution | Resolve ambiguous column references |
| Aggregate context | Ensure aggregates only in valid contexts |
The validation module provides helper functions that translators invoke at critical points:
- Before plan construction : Validate referenced tables exist
- During expression building : Resolve column identifiers
- After plan assembly : Final consistency checks
Canonical Value Conversion
The canonical module converts PlanValue instances to CanonicalScalar for internal processing:
PlanValue::Integer(42) → CanonicalScalar::Int64(42)
PlanValue::String("abc") → CanonicalScalar::String("abc")
PlanValue::Decimal(dec) → CanonicalScalar::Decimal(dec)
This normalization ensures consistent value representation across plan construction and execution.
Sources: llkv-plan/src/translation.rs:1-50 (inferred), llkv-plan/src/validation.rs:1-50 (inferred), llkv-plan/src/canonical.rs:1-50 (inferred), llkv-plan/src/plans.rs:126-161
Plan Metadata and Auxiliary Types
Plans contain rich metadata to guide execution:
PlanValue
Represents literal values in plans:
ColumnAssignment
UPDATE plans use ColumnAssignment to specify modifications:
ColumnAssignment {
column: "age",
value: AssignmentValue::Literal(PlanValue::Integer(25)),
}
ColumnAssignment {
column: "updated_at",
value: AssignmentValue::Expression(
ScalarExpr::Column("current_timestamp")
),
}
Constraint Specifications
CREATE TABLE plans include constraint metadata:
PlanColumnSpec {
name: "id",
data_type: DataType::Int64,
nullable: false,
primary_key: true,
unique: true,
check_expr: None,
}
ForeignKeySpec {
name: Some("fk_orders_customer"),
columns: vec!["customer_id"],
referenced_table: "customers",
referenced_columns: vec!["id"],
on_delete: ForeignKeyAction::Restrict,
on_update: ForeignKeyAction::NoAction,
}
MultiColumnUniqueSpec {
name: Some("unique_email_username"),
columns: vec!["email", "username"],
}
Sources: llkv-plan/src/plans.rs:73-161 llkv-plan/src/plans.rs:167-427 llkv-plan/src/plans.rs:660-682 llkv-plan/src/plans.rs:499-546
Traversal and Program Compilation
AST Traversal
The traversal module provides generic postorder traversal for deeply nested ASTs:
traverse_postorder(
root_node,
|node| { /* pre-visit logic */ },
|node| { /* post-visit logic */ }
)
This pattern supports:
- Subquery discovery during SELECT translation
- Expression normalization
- Dead code elimination in predicates
Program Compilation
Plans containing expressions are compiled into evaluation programs for efficient execution. The ProgramCompiler converts Expr trees into bytecode-like instruction sequences:
Expr → EvalProgram → Execution
The compiled programs optimize:
- Predicate evaluation order
- Short-circuit boolean logic
- Literal folding
- Domain program generation for chunk pruning
Sources: llkv-plan/src/traversal.rs:1-50 (inferred), llkv-plan/src/lib.rs:38-42
sequenceDiagram
participant Planner as llkv-plan
participant Runtime as llkv-runtime
participant Executor as llkv-executor
participant Table as llkv-table
Planner->>Runtime: Logical Plan
Runtime->>Runtime: Acquire transaction context
alt SELECT Plan
Runtime->>Executor: execute_select(plan, txn)
Executor->>Executor: Build execution pipeline
Executor->>Table: scan with filter
Table-->>Executor: RecordBatch stream
Executor->>Executor: Apply aggregations, joins, sorts
Executor-->>Runtime: Final RecordBatch
else INSERT Plan
Runtime->>Runtime: Convert InsertSource to batches
Runtime->>Table: append(batches)
Table-->>Runtime: Row count
else UPDATE/DELETE Plan
Runtime->>Executor: execute_update/delete(plan, txn)
Executor->>Table: filter + modify
Table-->>Runtime: Row count
else DDL Plan
Runtime->>Table: Modify catalog/schema
Table-->>Runtime: Success
end
Runtime->>Planner: Execution result
Integration with Execution
Plans flow from the planning layer to execution through well-defined interfaces:
The execution layer consumes plan metadata to:
- Determine table access order
- Select join algorithms
- Push down predicates to storage
- Apply aggregations and sorts
- Stream results incrementally
Sources: llkv-plan/src/plans.rs:1-1620 llkv-executor/src/lib.rs:1-50 (inferred), llkv-runtime/src/lib.rs:1-50 (inferred)
Dismiss
Refresh this wiki
Enter email to refresh