This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
SQL Query Processing Pipeline
Relevant source files
- README.md
- demos/llkv-sql-pong-demo/src/main.rs
- llkv-executor/README.md
- llkv-plan/README.md
- llkv-sql/README.md
- llkv-sql/src/sql_engine.rs
- llkv-sql/src/tpch.rs
- llkv-tpch/.gitignore
- llkv-tpch/Cargo.toml
- llkv-tpch/DRAFT-PRE-FINAL.md
Purpose and Scope
This document describes the end-to-end SQL query processing pipeline in LLKV, from raw SQL text input to final results. It covers the five major stages: SQL preprocessing, parsing, plan translation, execution coordination, and result formatting.
For information about the specific plan structures created during translation, see Plan Structures. For details on how plans are executed to produce results, see Query Execution. For the user-facing SqlEngine API, see SqlEngine API.
Overview
The SQL query processing pipeline transforms user-provided SQL text into Arrow RecordBatch results through a series of well-defined stages. The primary entry point is SqlEngine::execute(), which orchestrates the entire flow while maintaining transaction boundaries and handling cross-statement optimizations like INSERT buffering.
Sources: llkv-sql/src/sql_engine.rs:933-998
graph TB
SQL["Raw SQL Text"]
PREPROCESS["Stage 1: SQL Preprocessing\nDialect Normalization"]
PARSE["Stage 2: Parsing\nsqlparser → AST"]
TRANSLATE["Stage 3: Plan Translation\nAST → PlanStatement"]
EXECUTE["Stage 4: Plan Execution\nRuntimeEngine"]
RESULTS["Stage 5: Result Formatting\nRuntimeStatementResult"]
SQL --> PREPROCESS
PREPROCESS --> PARSE
PARSE --> TRANSLATE
TRANSLATE --> EXECUTE
EXECUTE --> RESULTS
PREPROCESS --> |preprocess_sql_input| PREPROCESS_IMPL["• preprocess_tpch_connect_syntax()\n• preprocess_create_type_syntax()\n• preprocess_exclude_syntax()\n• preprocess_trailing_commas_in_values()\n• preprocess_empty_in_lists()\n• preprocess_index_hints()\n• preprocess_reindex_syntax()\n• preprocess_bare_table_in_clauses()"]
PARSE --> |parse_sql_with_recursion_limit| PARSER["sqlparser::Parser\nPARSER_RECURSION_LIMIT = 200"]
TRANSLATE --> |translate_statement| PLANNER["• SelectPlan\n• InsertPlan\n• UpdatePlan\n• DeletePlan\n• CreateTablePlan\n• DDL Plans"]
EXECUTE --> |engine.execute_statement| RUNTIME["RuntimeEngine\n• Transaction Management\n• MVCC Snapshots\n• Catalog Operations"]
style PREPROCESS fill:#f9f9f9
style PARSE fill:#f9f9f9
style TRANSLATE fill:#f9f9f9
style EXECUTE fill:#f9f9f9
style RESULTS fill:#f9f9f9
Stage 1: SQL Preprocessing
Before parsing, the SQL text undergoes a series of preprocessing transformations to normalize dialect-specific syntax. This allows LLKV to accept SQL written for SQLite, DuckDB, and other dialects while presenting a consistent AST to the planner.
Preprocessing Transformations
| Preprocessor Method | Purpose | Example Transformation |
|---|---|---|
preprocess_tpch_connect_syntax() | Strip TPC-H CONNECT TO directives | CONNECT TO tpch; → `` |
preprocess_create_type_syntax() | Convert CREATE TYPE to CREATE DOMAIN | CREATE TYPE name AS INT → CREATE DOMAIN name AS INT |
preprocess_exclude_syntax() | Quote qualified names in EXCLUDE clauses | EXCLUDE (t.col) → EXCLUDE ("t.col") |
preprocess_trailing_commas_in_values() | Remove trailing commas in VALUES | VALUES (1, 2,) → VALUES (1, 2) |
preprocess_empty_in_lists() | Expand empty IN predicates to constant booleans | x IN () → (x = NULL AND 0 = 1) |
preprocess_index_hints() | Strip SQLite INDEXED BY hints | FROM t INDEXED BY idx → FROM t |
preprocess_reindex_syntax() | Convert REINDEX to VACUUM REINDEX | REINDEX idx → VACUUM REINDEX idx |
preprocess_bare_table_in_clauses() | Expand IN tablename to subquery | x IN t → x IN (SELECT * FROM t) |
Sources: llkv-sql/src/sql_engine.rs:623-873 llkv-sql/src/tpch.rs:1-17
Fallback Trigger Preprocessing
If parsing fails and the SQL contains CREATE TRIGGER, the engine applies an additional preprocess_sqlite_trigger_shorthand() transformation and retries. This handles SQLite's optional BEFORE/AFTER timing and FOR EACH ROW clauses by injecting defaults that sqlparser expects.
Sources: llkv-sql/src/sql_engine.rs:941-957 llkv-sql/src/sql_engine.rs:771-842
Stage 2: Parsing
Parsing is delegated to the sqlparser crate, which produces a Vec<Statement> AST. LLKV configures the parser with:
- Dialect:
GenericDialectto accept a wide range of SQL syntax - Recursion Limit:
PARSER_RECURSION_LIMIT = 200(raised from sqlparser's default of 50 to handle deeply nested queries in test suites)
The parse_sql_with_recursion_limit() helper function wraps sqlparser's API to apply this custom limit.
Sources: llkv-sql/src/sql_engine.rs:317-324 llkv-sql/src/sql_engine.rs:939-957
Stage 3: Plan Translation
Each parsed Statement is translated into a strongly-typed PlanStatement that the runtime can execute. This translation happens through statement-specific methods in SqlEngine.
graph TB
AST["sqlparser::ast::Statement"]
SELECT["Statement::Query"]
INSERT["Statement::Insert"]
UPDATE["Statement::Update"]
DELETE["Statement::Delete"]
CREATE["Statement::CreateTable"]
DROP["Statement::Drop"]
TRANSACTION["Statement::StartTransaction\nStatement::Commit\nStatement::Rollback"]
ALTER["Statement::AlterTable"]
OTHER["Other DDL/DML"]
SELECT_PLAN["translate_query()\n→ SelectPlan"]
INSERT_PLAN["buffer_insert()\n→ InsertPlan or buffered"]
UPDATE_PLAN["translate_update()\n→ UpdatePlan"]
DELETE_PLAN["translate_delete()\n→ DeletePlan"]
CREATE_PLAN["translate_create_table()\n→ CreateTablePlan"]
DROP_PLAN["translate_drop()\n→ PlanStatement::Drop*"]
TXN_RUNTIME["Direct runtime delegation\nflush INSERT buffer first"]
ALTER_PLAN["translate_alter_table()\n→ PlanStatement::Alter*"]
OTHER_PLAN["translate_* methods\n→ PlanStatement::*"]
AST --> SELECT
AST --> INSERT
AST --> UPDATE
AST --> DELETE
AST --> CREATE
AST --> DROP
AST --> TRANSACTION
AST --> ALTER
AST --> OTHER
SELECT --> SELECT_PLAN
INSERT --> INSERT_PLAN
UPDATE --> UPDATE_PLAN
DELETE --> DELETE_PLAN
CREATE --> CREATE_PLAN
DROP --> DROP_PLAN
TRANSACTION --> TXN_RUNTIME
ALTER --> ALTER_PLAN
OTHER --> OTHER_PLAN
SELECT_PLAN --> RUNTIME["RuntimeEngine::execute_statement()"]
INSERT_PLAN --> BUFFER_CHECK{"Buffering\nenabled?"}
BUFFER_CHECK -->|Yes| BUFFER["InsertBuffer\naccumulates rows"]
BUFFER_CHECK -->|No| RUNTIME
UPDATE_PLAN --> RUNTIME
DELETE_PLAN --> RUNTIME
CREATE_PLAN --> RUNTIME
DROP_PLAN --> RUNTIME
TXN_RUNTIME --> RUNTIME
ALTER_PLAN --> RUNTIME
OTHER_PLAN --> RUNTIME
BUFFER --> FLUSH_CHECK{"Flush\nneeded?"}
FLUSH_CHECK -->|Yes| FLUSH["Flush buffered rows"]
FLUSH_CHECK -->|No| CONTINUE["Continue buffering"]
FLUSH --> RUNTIME
Statement Routing
Sources: llkv-sql/src/sql_engine.rs:960-998
Translation Process
The translation process involves:
- Column Resolution: Identifier strings are resolved to
FieldIdreferences using the runtime's catalog - Expression Translation: SQL expressions are converted to
Expr<String>, then resolved toExpr<FieldId> - Subquery Handling: Correlated subqueries are tracked with placeholder generation
- Parameter Binding: SQL placeholders (
?,$1,:name) are mapped to parameter indices
Sources: llkv-sql/src/sql_engine.rs:1000-5000 (various translate_* methods)
sequenceDiagram
participant SqlEngine
participant Catalog as "RuntimeContext\nCatalog"
participant ExprTranslator
participant PlanBuilder
SqlEngine->>Catalog: resolve_table("users")
Catalog-->>SqlEngine: TableId(namespace=0, table=1)
SqlEngine->>Catalog: resolve_column("id", TableId)
Catalog-->>SqlEngine: ColumnResolution(FieldId)
SqlEngine->>ExprTranslator: translate_expr(sqlparser::Expr)
ExprTranslator->>ExprTranslator: Build Expr<String>
ExprTranslator->>Catalog: resolve_identifiers()
Catalog-->>ExprTranslator: Expr<FieldId>
ExprTranslator-->>SqlEngine: Expr<FieldId>
SqlEngine->>PlanBuilder: Create SelectPlan
Note over PlanBuilder: Attach projections,\nfilters, sorts, limits
PlanBuilder-->>SqlEngine: PlanStatement::Select(SelectPlan)
Stage 4: Plan Execution
Once a PlanStatement is constructed, it is passed to RuntimeEngine::execute_statement() for execution. The runtime coordinates:
- Transaction Management: Ensures each statement executes within a transaction snapshot
- MVCC Enforcement: Filters rows based on visibility rules
- Catalog Operations: Updates system catalog for DDL statements
- Executor Invocation: Delegates
SelectPlanexecution tollkv-executor
Execution Routing by Statement Type
Sources: llkv-sql/src/sql_engine.rs:587-609 llkv-runtime/ (RuntimeEngine implementation)
Stage 5: Result Formatting
The runtime returns a RuntimeStatementResult enum that represents the outcome of statement execution. SqlEngine surfaces this directly to callers via the execute() method, or converts it to Vec<RecordBatch> for the sql() convenience method.
Result Types
| Statement Type | Result Variant | Contents |
|---|---|---|
SELECT | RuntimeStatementResult::Select | Vec<RecordBatch> of query results |
INSERT | RuntimeStatementResult::Insert | rows_inserted: u64 |
UPDATE | RuntimeStatementResult::Update | rows_updated: u64 |
DELETE | RuntimeStatementResult::Delete | rows_deleted: u64 |
CREATE TABLE | RuntimeStatementResult::CreateTable | table_name: String |
CREATE INDEX | RuntimeStatementResult::CreateIndex | index_name: String |
DROP TABLE | RuntimeStatementResult::DropTable | table_name: String |
| Transaction control | RuntimeStatementResult::Transaction | Transaction state |
Sources: llkv-runtime/ (RuntimeStatementResult definition), llkv-sql/src/sql_engine.rs:933-998
Prepared Statements and Parameters
LLKV supports parameterized queries through a prepared statement mechanism that handles three parameter syntaxes:
- Positional (numbered):
?,?1,?2,$1,$2 - Named:
:name,:id - Auto-incremented: Sequential
?placeholders
Parameter Processing Pipeline
Sources: llkv-sql/src/sql_engine.rs:71-206 llkv-sql/src/sql_engine.rs:278-297
Parameter Substitution
During plan execution with parameters, the engine performs a second pass to replace sentinel strings (__llkv_param__N__) with the actual Literal values provided by the caller. This two-phase approach allows the same PlanStatement to be reused across multiple executions with different parameter values.
Sources: llkv-sql/src/sql_engine.rs:194-206
INSERT Buffering
SqlEngine includes an optional INSERT buffering optimization that batches consecutive INSERT ... VALUES statements targeting the same table. This is disabled by default but can be enabled with set_insert_buffering(true) for bulk ingestion workloads.
stateDiagram-v2
[*] --> NoBuffer : buffering disabled
NoBuffer --> NoBuffer : INSERT → immediate execute
[*] --> BufferEmpty : buffering enabled
BufferEmpty --> BufferActive : INSERT(table, cols, rows)
BufferActive --> BufferActive : INSERT(same table/cols) accumulate rows
BufferActive --> Flush : Different table
BufferActive --> Flush : Different columns
BufferActive --> Flush : Different conflict action
BufferActive --> Flush : Row count ≥ MAX_BUFFERED_INSERT_ROWS
BufferActive --> Flush : Non-INSERT statement
BufferActive --> Flush : Transaction boundary
Flush --> RuntimeExecution : Create InsertPlan from accumulated rows
RuntimeExecution --> BufferEmpty : Reset buffer
NoBuffer --> [*]
BufferEmpty --> [*]
Buffering Logic
Sources: llkv-sql/src/sql_engine.rs:410-495 llkv-sql/src/sql_engine.rs:887-905
Buffer Flush Triggers
| Trigger Condition | Rationale |
|---|---|
Row count ≥ MAX_BUFFERED_INSERT_ROWS (8192) | Limit memory usage |
| Target table changes | Cannot batch cross-table INSERTs |
| Column list changes | Schema mismatch |
| Conflict action changes | ON CONFLICT semantics differ |
| Non-INSERT statement encountered | Preserve statement ordering |
Transaction boundary (BEGIN, COMMIT, ROLLBACK) | Ensure transactional consistency |
Explicit flush_pending_inserts() call | Manual control |
| Statement expectation hint (testing) | Test harness needs per-statement results |
Sources: llkv-sql/src/sql_engine.rs:410-495
Error Handling and Table Mapping
The pipeline includes special error handling for table-not-found scenarios. When the runtime returns Error::NotFound or catalog-related errors, SqlEngine::execute_plan_statement() rewrites them to user-friendly messages like "Catalog Error: Table 'users' does not exist".
This mapping is skipped for CREATE VIEW and DROP VIEW statements where the "table" name refers to the view being created/dropped rather than a referenced table.
Sources: llkv-sql/src/sql_engine.rs:558-609