Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

SQL Query Processing Pipeline

Relevant source files

Purpose and Scope

This document describes the end-to-end SQL query processing pipeline in LLKV, from raw SQL text input to final results. It covers the five major stages: SQL preprocessing, parsing, plan translation, execution coordination, and result formatting.

For information about the specific plan structures created during translation, see Plan Structures. For details on how plans are executed to produce results, see Query Execution. For the user-facing SqlEngine API, see SqlEngine API.

Overview

The SQL query processing pipeline transforms user-provided SQL text into Arrow RecordBatch results through a series of well-defined stages. The primary entry point is SqlEngine::execute(), which orchestrates the entire flow while maintaining transaction boundaries and handling cross-statement optimizations like INSERT buffering.

Sources: llkv-sql/src/sql_engine.rs:933-998

graph TB
    SQL["Raw SQL Text"]
PREPROCESS["Stage 1: SQL Preprocessing\nDialect Normalization"]
PARSE["Stage 2: Parsing\nsqlparser → AST"]
TRANSLATE["Stage 3: Plan Translation\nAST → PlanStatement"]
EXECUTE["Stage 4: Plan Execution\nRuntimeEngine"]
RESULTS["Stage 5: Result Formatting\nRuntimeStatementResult"]
SQL --> PREPROCESS
 
   PREPROCESS --> PARSE
 
   PARSE --> TRANSLATE
 
   TRANSLATE --> EXECUTE
 
   EXECUTE --> RESULTS
    
 
   PREPROCESS --> |preprocess_sql_input| PREPROCESS_IMPL["• preprocess_tpch_connect_syntax()\n• preprocess_create_type_syntax()\n• preprocess_exclude_syntax()\n• preprocess_trailing_commas_in_values()\n• preprocess_empty_in_lists()\n• preprocess_index_hints()\n• preprocess_reindex_syntax()\n• preprocess_bare_table_in_clauses()"]
PARSE --> |parse_sql_with_recursion_limit| PARSER["sqlparser::Parser\nPARSER_RECURSION_LIMIT = 200"]
TRANSLATE --> |translate_statement| PLANNER["• SelectPlan\n• InsertPlan\n• UpdatePlan\n• DeletePlan\n• CreateTablePlan\n• DDL Plans"]
EXECUTE --> |engine.execute_statement| RUNTIME["RuntimeEngine\n• Transaction Management\n• MVCC Snapshots\n• Catalog Operations"]
style PREPROCESS fill:#f9f9f9
    style PARSE fill:#f9f9f9
    style TRANSLATE fill:#f9f9f9
    style EXECUTE fill:#f9f9f9
    style RESULTS fill:#f9f9f9

Stage 1: SQL Preprocessing

Before parsing, the SQL text undergoes a series of preprocessing transformations to normalize dialect-specific syntax. This allows LLKV to accept SQL written for SQLite, DuckDB, and other dialects while presenting a consistent AST to the planner.

Preprocessing Transformations

Preprocessor MethodPurposeExample Transformation
preprocess_tpch_connect_syntax()Strip TPC-H CONNECT TO directivesCONNECT TO tpch; → ``
preprocess_create_type_syntax()Convert CREATE TYPE to CREATE DOMAINCREATE TYPE name AS INTCREATE DOMAIN name AS INT
preprocess_exclude_syntax()Quote qualified names in EXCLUDE clausesEXCLUDE (t.col)EXCLUDE ("t.col")
preprocess_trailing_commas_in_values()Remove trailing commas in VALUESVALUES (1, 2,)VALUES (1, 2)
preprocess_empty_in_lists()Expand empty IN predicates to constant booleansx IN ()(x = NULL AND 0 = 1)
preprocess_index_hints()Strip SQLite INDEXED BY hintsFROM t INDEXED BY idxFROM t
preprocess_reindex_syntax()Convert REINDEX to VACUUM REINDEXREINDEX idxVACUUM REINDEX idx
preprocess_bare_table_in_clauses()Expand IN tablename to subqueryx IN tx IN (SELECT * FROM t)

Sources: llkv-sql/src/sql_engine.rs:623-873 llkv-sql/src/tpch.rs:1-17

Fallback Trigger Preprocessing

If parsing fails and the SQL contains CREATE TRIGGER, the engine applies an additional preprocess_sqlite_trigger_shorthand() transformation and retries. This handles SQLite's optional BEFORE/AFTER timing and FOR EACH ROW clauses by injecting defaults that sqlparser expects.

Sources: llkv-sql/src/sql_engine.rs:941-957 llkv-sql/src/sql_engine.rs:771-842

Stage 2: Parsing

Parsing is delegated to the sqlparser crate, which produces a Vec<Statement> AST. LLKV configures the parser with:

  • Dialect: GenericDialect to accept a wide range of SQL syntax
  • Recursion Limit: PARSER_RECURSION_LIMIT = 200 (raised from sqlparser's default of 50 to handle deeply nested queries in test suites)

The parse_sql_with_recursion_limit() helper function wraps sqlparser's API to apply this custom limit.

Sources: llkv-sql/src/sql_engine.rs:317-324 llkv-sql/src/sql_engine.rs:939-957

Stage 3: Plan Translation

Each parsed Statement is translated into a strongly-typed PlanStatement that the runtime can execute. This translation happens through statement-specific methods in SqlEngine.

graph TB
    AST["sqlparser::ast::Statement"]
SELECT["Statement::Query"]
INSERT["Statement::Insert"]
UPDATE["Statement::Update"]
DELETE["Statement::Delete"]
CREATE["Statement::CreateTable"]
DROP["Statement::Drop"]
TRANSACTION["Statement::StartTransaction\nStatement::Commit\nStatement::Rollback"]
ALTER["Statement::AlterTable"]
OTHER["Other DDL/DML"]
SELECT_PLAN["translate_query()\n→ SelectPlan"]
INSERT_PLAN["buffer_insert()\n→ InsertPlan or buffered"]
UPDATE_PLAN["translate_update()\n→ UpdatePlan"]
DELETE_PLAN["translate_delete()\n→ DeletePlan"]
CREATE_PLAN["translate_create_table()\n→ CreateTablePlan"]
DROP_PLAN["translate_drop()\n→ PlanStatement::Drop*"]
TXN_RUNTIME["Direct runtime delegation\nflush INSERT buffer first"]
ALTER_PLAN["translate_alter_table()\n→ PlanStatement::Alter*"]
OTHER_PLAN["translate_* methods\n→ PlanStatement::*"]
AST --> SELECT
 
   AST --> INSERT
 
   AST --> UPDATE
 
   AST --> DELETE
 
   AST --> CREATE
 
   AST --> DROP
 
   AST --> TRANSACTION
 
   AST --> ALTER
 
   AST --> OTHER
    
 
   SELECT --> SELECT_PLAN
 
   INSERT --> INSERT_PLAN
 
   UPDATE --> UPDATE_PLAN
 
   DELETE --> DELETE_PLAN
 
   CREATE --> CREATE_PLAN
 
   DROP --> DROP_PLAN
 
   TRANSACTION --> TXN_RUNTIME
 
   ALTER --> ALTER_PLAN
 
   OTHER --> OTHER_PLAN
    
 
   SELECT_PLAN --> RUNTIME["RuntimeEngine::execute_statement()"]
INSERT_PLAN --> BUFFER_CHECK{"Buffering\nenabled?"}
BUFFER_CHECK -->|Yes| BUFFER["InsertBuffer\naccumulates rows"]
BUFFER_CHECK -->|No| RUNTIME
 
   UPDATE_PLAN --> RUNTIME
 
   DELETE_PLAN --> RUNTIME
 
   CREATE_PLAN --> RUNTIME
 
   DROP_PLAN --> RUNTIME
 
   TXN_RUNTIME --> RUNTIME
 
   ALTER_PLAN --> RUNTIME
 
   OTHER_PLAN --> RUNTIME
    
 
   BUFFER --> FLUSH_CHECK{"Flush\nneeded?"}
FLUSH_CHECK -->|Yes| FLUSH["Flush buffered rows"]
FLUSH_CHECK -->|No| CONTINUE["Continue buffering"]
FLUSH --> RUNTIME

Statement Routing

Sources: llkv-sql/src/sql_engine.rs:960-998

Translation Process

The translation process involves:

  1. Column Resolution: Identifier strings are resolved to FieldId references using the runtime's catalog
  2. Expression Translation: SQL expressions are converted to Expr<String>, then resolved to Expr<FieldId>
  3. Subquery Handling: Correlated subqueries are tracked with placeholder generation
  4. Parameter Binding: SQL placeholders (?, $1, :name) are mapped to parameter indices

Sources: llkv-sql/src/sql_engine.rs:1000-5000 (various translate_* methods)

sequenceDiagram
    participant SqlEngine
    participant Catalog as "RuntimeContext\nCatalog"
    participant ExprTranslator
    participant PlanBuilder
    
    SqlEngine->>Catalog: resolve_table("users")
    Catalog-->>SqlEngine: TableId(namespace=0, table=1)
    
    SqlEngine->>Catalog: resolve_column("id", TableId)
    Catalog-->>SqlEngine: ColumnResolution(FieldId)
    
    SqlEngine->>ExprTranslator: translate_expr(sqlparser::Expr)
    ExprTranslator->>ExprTranslator: Build Expr<String>
    ExprTranslator->>Catalog: resolve_identifiers()
    Catalog-->>ExprTranslator: Expr<FieldId>
    ExprTranslator-->>SqlEngine: Expr<FieldId>
    
    SqlEngine->>PlanBuilder: Create SelectPlan
    Note over PlanBuilder: Attach projections,\nfilters, sorts, limits
    PlanBuilder-->>SqlEngine: PlanStatement::Select(SelectPlan)

Stage 4: Plan Execution

Once a PlanStatement is constructed, it is passed to RuntimeEngine::execute_statement() for execution. The runtime coordinates:

  • Transaction Management: Ensures each statement executes within a transaction snapshot
  • MVCC Enforcement: Filters rows based on visibility rules
  • Catalog Operations: Updates system catalog for DDL statements
  • Executor Invocation: Delegates SelectPlan execution to llkv-executor

Execution Routing by Statement Type

Sources: llkv-sql/src/sql_engine.rs:587-609 llkv-runtime/ (RuntimeEngine implementation)

Stage 5: Result Formatting

The runtime returns a RuntimeStatementResult enum that represents the outcome of statement execution. SqlEngine surfaces this directly to callers via the execute() method, or converts it to Vec<RecordBatch> for the sql() convenience method.

Result Types

Statement TypeResult VariantContents
SELECTRuntimeStatementResult::SelectVec<RecordBatch> of query results
INSERTRuntimeStatementResult::Insertrows_inserted: u64
UPDATERuntimeStatementResult::Updaterows_updated: u64
DELETERuntimeStatementResult::Deleterows_deleted: u64
CREATE TABLERuntimeStatementResult::CreateTabletable_name: String
CREATE INDEXRuntimeStatementResult::CreateIndexindex_name: String
DROP TABLERuntimeStatementResult::DropTabletable_name: String
Transaction controlRuntimeStatementResult::TransactionTransaction state

Sources: llkv-runtime/ (RuntimeStatementResult definition), llkv-sql/src/sql_engine.rs:933-998

Prepared Statements and Parameters

LLKV supports parameterized queries through a prepared statement mechanism that handles three parameter syntaxes:

  • Positional (numbered): ?, ?1, ?2, $1, $2
  • Named: :name, :id
  • Auto-incremented: Sequential ? placeholders

Parameter Processing Pipeline

Sources: llkv-sql/src/sql_engine.rs:71-206 llkv-sql/src/sql_engine.rs:278-297

Parameter Substitution

During plan execution with parameters, the engine performs a second pass to replace sentinel strings (__llkv_param__N__) with the actual Literal values provided by the caller. This two-phase approach allows the same PlanStatement to be reused across multiple executions with different parameter values.

Sources: llkv-sql/src/sql_engine.rs:194-206

INSERT Buffering

SqlEngine includes an optional INSERT buffering optimization that batches consecutive INSERT ... VALUES statements targeting the same table. This is disabled by default but can be enabled with set_insert_buffering(true) for bulk ingestion workloads.

stateDiagram-v2
    [*] --> NoBuffer : buffering disabled
    NoBuffer --> NoBuffer : INSERT → immediate execute
    
    [*] --> BufferEmpty : buffering enabled
    BufferEmpty --> BufferActive : INSERT(table, cols, rows)
    
    BufferActive --> BufferActive : INSERT(same table/cols) accumulate rows
    BufferActive --> Flush : Different table
    BufferActive --> Flush : Different columns
    BufferActive --> Flush : Different conflict action
    BufferActive --> Flush : Row count ≥ MAX_BUFFERED_INSERT_ROWS
    BufferActive --> Flush : Non-INSERT statement
    BufferActive --> Flush : Transaction boundary
    
    Flush --> RuntimeExecution : Create InsertPlan from accumulated rows
    RuntimeExecution --> BufferEmpty : Reset buffer
    
    NoBuffer --> [*]
    BufferEmpty --> [*]

Buffering Logic

Sources: llkv-sql/src/sql_engine.rs:410-495 llkv-sql/src/sql_engine.rs:887-905

Buffer Flush Triggers

Trigger ConditionRationale
Row count ≥ MAX_BUFFERED_INSERT_ROWS (8192)Limit memory usage
Target table changesCannot batch cross-table INSERTs
Column list changesSchema mismatch
Conflict action changesON CONFLICT semantics differ
Non-INSERT statement encounteredPreserve statement ordering
Transaction boundary (BEGIN, COMMIT, ROLLBACK)Ensure transactional consistency
Explicit flush_pending_inserts() callManual control
Statement expectation hint (testing)Test harness needs per-statement results

Sources: llkv-sql/src/sql_engine.rs:410-495

Error Handling and Table Mapping

The pipeline includes special error handling for table-not-found scenarios. When the runtime returns Error::NotFound or catalog-related errors, SqlEngine::execute_plan_statement() rewrites them to user-friendly messages like "Catalog Error: Table 'users' does not exist".

This mapping is skipped for CREATE VIEW and DROP VIEW statements where the "table" name refers to the view being created/dropped rather than a referenced table.

Sources: llkv-sql/src/sql_engine.rs:558-609