This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
SQL Interface
Loading…
SQL Interface
Relevant source files
- llkv-executor/src/lib.rs
- llkv-sql/src/lib.rs
- llkv-sql/src/sql_engine.rs
- llkv-tpch/Cargo.toml
- llkv-tpch/DRAFT.md
- llkv-tpch/src/lib.rs
- llkv-tpch/src/main.rs
- llkv-tpch/src/queries.rs
The SQL Interface provides the primary user-facing API for interacting with LLKV databases through SQL statements. This layer is responsible for parsing SQL text, preprocessing dialect-specific syntax, translating Abstract Syntax Trees (AST) into execution plans, and delegating to the runtime engine for actual execution.
For information about query planning and execution, see Query Planning and Query Execution. For details on the underlying runtime that executes plans, see Architecture.
Purpose and Scope
The SQL Interface layer (llkv-sql crate) serves as the entry point for SQL-based database operations. It bridges the gap between SQL text written by users and the Arrow-native columnar storage engine, handling:
- SQL Parsing : Converting SQL strings into Abstract Syntax Trees using
sqlparser-rs - Dialect Normalization : Preprocessing SQL to handle syntax variations from SQLite, DuckDB, and PostgreSQL
- Statement Translation : Converting ASTs into typed execution plans (
PlanStatementstructures) - Execution Coordination : Delegating plans to the
RuntimeEngineand formatting results - Transaction Management : Coordinating multi-statement transactions with MVCC support
- Performance Optimization : Batching INSERT statements to reduce planning overhead
The SQL Interface does not execute queries directly—it delegates all data operations to the runtime and executor layers. Its primary responsibility is accurate SQL-to-plan translation while preserving user intent across different SQL dialects.
Sources : llkv-sql/src/lib.rs:1-52 llkv-sql/src/sql_engine.rs:1-60
Architecture Overview
The SQL Interface operates as a stateful wrapper around the RuntimeEngine, maintaining session state, prepared statement caches, and insert buffering state. The core workflow proceeds through several stages:
SQL Interface Processing Pipeline
graph TB
User["User Application"]
SqlEngine["SqlEngine"]
Preprocess["SQL Preprocessing"]
Parser["sqlparser-rs\nGenericDialect"]
Translator["Statement Translation"]
Runtime["RuntimeEngine"]
Results["RecordBatch / RowCount"]
User -->|execute sql| SqlEngine
SqlEngine -->|1. Preprocess| Preprocess
Preprocess -->|Normalized SQL| Parser
Parser -->|AST Statement| Translator
Translator -->|PlanStatement| Runtime
Runtime -->|RuntimeStatementResult| SqlEngine
SqlEngine -->|Vec<RecordBatch>| Results
Results -->|Query Results| User
subgraph "llkv-sql Crate"
SqlEngine
Preprocess
Translator
end
subgraph "External Dependencies"
Parser
end
subgraph "llkv-runtime Crate"
Runtime
end
The SqlEngine maintains several internal subsystems:
- Statement Cache : Thread-safe prepared statement storage (
RwLock<FxHashMap<String, Arc<PreparedPlan>>>) - Insert Buffer : Cross-statement INSERT batching for bulk ingest (
RefCell<Option<InsertBuffer>>) - Session State : Transaction context and configuration flags
- Information Schema Cache : Lazy metadata refresh tracking
Sources : llkv-sql/src/sql_engine.rs:572-621 [Diagram 1 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 1 from overview)
SqlEngine Structure
The SqlEngine struct encapsulates all SQL processing state and exposes methods for executing SQL statements:
SqlEngine Class Structure
classDiagram
class SqlEngine {-engine: RuntimeEngine\n-default_nulls_first: AtomicBool\n-insert_buffer: RefCell~Option~InsertBuffer~~\n-insert_buffering_enabled: AtomicBool\n-information_schema_ready: AtomicBool\n-statement_cache: RwLock~FxHashMap~String PreparedPlan~~\n+new(pager) SqlEngine\n+with_context(context, nulls_first) SqlEngine\n+execute(sql) Result~Vec~RuntimeStatementResult~~\n+sql(sql) Result~Vec~RecordBatch~~\n+prepare(sql) Result~PreparedStatement~\n+execute_prepared(stmt, params) Result~Vec~RuntimeStatementResult~~\n+session() RuntimeSession\n+runtime_context() Arc~RuntimeContext~}
class RuntimeEngine {+execute_statement(plan) Result~RuntimeStatementResult~\n+context() Arc~RuntimeContext~}
class PreparedStatement {-inner: Arc~PreparedPlan~\n+parameter_count() usize}
class InsertBuffer {-table_name: String\n-columns: Vec~String~\n-rows: Vec~Vec~PlanValue~~\n-on_conflict: InsertConflictAction\n+can_accept(table, cols, conflict) bool\n+should_flush() bool}
SqlEngine --> RuntimeEngine : delegates to
SqlEngine --> PreparedStatement : creates
SqlEngine --> InsertBuffer : maintains
Sources : llkv-sql/src/sql_engine.rs:572-621 llkv-sql/src/sql_engine.rs:354-373 llkv-sql/src/sql_engine.rs:487-547
Statement Execution Flow
A typical SQL execution proceeds through the following phases:
Statement Execution Sequence
Sources : llkv-sql/src/sql_engine.rs:1533-1633 llkv-sql/src/sql_engine.rs:2134-2285
SQL Preprocessing System
Before parsing, the SQL Interface applies multiple preprocessing passes to normalize syntax variations across different SQL dialects:
| Preprocessor | Purpose | Example Transformation |
|---|---|---|
| TPCH Connect Syntax | Strip TPC-H CONNECT TO directives | CONNECT TO db; → removed |
| CREATE TYPE Syntax | Convert DuckDB type aliases | CREATE TYPE t AS INT → CREATE DOMAIN t AS INT |
| EXCLUDE Syntax | Quote qualified names in EXCLUDE | EXCLUDE (schema.col) → EXCLUDE ("schema.col") |
| Trailing Commas | Remove DuckDB-style trailing commas | VALUES (1,) → VALUES (1) |
| Empty IN Lists | Convert degenerate IN expressions | col IN () → (col = NULL AND 0 = 1) |
| Index Hints | Strip SQLite index hints | FROM t INDEXED BY idx → FROM t |
| REINDEX Syntax | Convert SQLite REINDEX | REINDEX idx → VACUUM REINDEX idx |
| Trigger Shorthand | Expand SQLite trigger defaults | CREATE TRIGGER tr ... → adds AFTER, FOR EACH ROW |
| Bare Table IN | Expand SQLite table subqueries | col IN tablename → col IN (SELECT * FROM tablename) |
Each preprocessor runs as a regex-based rewrite pass before the SQL text reaches sqlparser-rs. This allows LLKV to accept SQL from multiple dialects while using a single parser.
Sources : llkv-sql/src/sql_engine.rs:759-1006 llkv-sql/src/sql_engine.rs:771-793
Prepared Statements and Parameters
The SQL Interface supports prepared statements with parameterized queries. Parameters use placeholder syntax compatible with multiple dialects:
Parameter Processing Flow
graph LR
subgraph "Parameter Placeholder Formats"
Q1["? \n(anonymous)"]
Q2["?N \n(numbered)"]
Dollar["$N \n(PostgreSQL)"]
Named[":name \n(named)"]
end
subgraph "Parameter Processing"
Register["register_placeholder(raw)"]
State["ParameterState"]
Index["assigned: HashMap<String, usize>"]
end
subgraph "Execution"
Prepare["SqlEngine::prepare(sql)"]
PrepStmt["PreparedStatement"]
Execute["execute_prepared(stmt, params)"]
Bind["Bind params to plan"]
end
Q1 --> Register
Q2 --> Register
Dollar --> Register
Named --> Register
Register --> State
State --> Index
Index --> Prepare
Prepare --> PrepStmt
PrepStmt --> Execute
Execute --> Bind
Parameter placeholders are registered during parsing and replaced with internal sentinel values (__llkv_param__N__). When executing a prepared statement, the sentinel values are substituted with actual parameter values before plan execution.
Sources : llkv-sql/src/sql_engine.rs:78-282 llkv-sql/src/sql_engine.rs:354-373 llkv-sql/src/sql_engine.rs:1707-1773
INSERT Buffering Optimization
To improve bulk insert performance, the SQL Interface can buffer multiple consecutive INSERT ... VALUES statements and execute them as a single batched operation. This dramatically reduces planning overhead for large data loads:
INSERT Buffer State Machine
stateDiagram-v2
[*] --> Empty : engine created
Empty --> Buffering : INSERT with VALUES
Buffering --> Buffering : compatible INSERT
Buffering --> Flushing : incompatible statement
Buffering --> Flushing : buffer threshold reached
Flushing --> Empty : execute batched insert
Flushing --> Buffering : new INSERT after flush
Empty --> [*]
note right of Buffering
Accumulate rows while:
- Same table
- Same columns
- Same conflict action
- Below MAX_BUFFERED_INSERT_ROWS
end note
note right of Flushing
Execute single INSERT with
all accumulated rows
end note
Buffering is disabled by default to preserve per-statement semantics for unit tests. Long-running workloads can enable buffering via set_insert_buffering(true) to achieve significant performance gains on bulk ingests.
Sources : llkv-sql/src/sql_engine.rs:487-547 llkv-sql/src/sql_engine.rs:2134-2285
graph TD
AST["sqlparser::ast::Statement"]
subgraph "DDL Statements"
CreateTable["CreateTable"]
AlterTable["AlterTable"]
DropTable["DropTable"]
CreateView["CreateView"]
CreateIndex["CreateIndex"]
end
subgraph "DML Statements"
Query["Query (SELECT)"]
Insert["Insert"]
Update["Update"]
Delete["Delete"]
end
subgraph "Transaction Control"
Begin["BEGIN"]
Commit["COMMIT"]
Rollback["ROLLBACK"]
end
subgraph "PlanStatement Variants"
SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
UpdatePlan["UpdatePlan"]
DeletePlan["DeletePlan"]
CreateTablePlan["CreateTablePlan"]
AlterTablePlan["AlterTablePlan"]
BeginTxn["BeginTransaction"]
CommitTxn["CommitTransaction"]
end
AST --> CreateTable
AST --> AlterTable
AST --> DropTable
AST --> Query
AST --> Insert
AST --> Update
AST --> Delete
AST --> Begin
AST --> Commit
CreateTable --> CreateTablePlan
AlterTable --> AlterTablePlan
Query --> SelectPlan
Insert --> InsertPlan
Update --> UpdatePlan
Delete --> DeletePlan
Begin --> BeginTxn
Commit --> CommitTxn
Statement Translation Process
Once SQL is parsed into an AST, the SQL Interface translates each Statement variant into a corresponding PlanStatement:
AST to Plan Translation
Translation involves:
- Identifier Resolution : Converting string column names to
FieldIdvalues viaIdentifierResolver - Expression Translation : Building typed expression trees from AST nodes
- Type Inference : Determining result types for computed columns
- Validation : Checking for unknown tables, duplicate columns, type mismatches
Sources : llkv-sql/src/sql_engine.rs:2287-3500 llkv-executor/src/lib.rs:89-93
Integration with Runtime and Executor
The SQL Interface delegates all actual execution to lower layers:
Layer Integration Diagram
The SQL Interface owns the RuntimeEngine instance and delegates PlanStatement execution to it. The runtime coordinates with the executor for query execution and the table layer for DDL operations.
Sources : llkv-sql/src/sql_engine.rs:706-745 [llkv-runtime/src/lib.rs (implied)](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-runtime/src/lib.rs (implied)) [Diagram 1 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 1 from overview)
Example Usage
Basic usage of the SqlEngine:
Sources : llkv-sql/src/sql_engine.rs:443-485 llkv-sql/src/lib.rs:1-52
Key Design Principles
The SQL Interface embodies several architectural decisions:
- Dialect Tolerance : Accept SQL from multiple databases through preprocessing rather than forking the parser
- Lazy Validation : Validate table existence and column types during plan translation, not parsing
- Stateful Optimization : Maintain cross-statement state (insert buffer, prepared statement cache) for performance
- Delegate Execution : Never directly manipulate storage; always route through runtime abstractions
- Arrow Native : Return query results as Arrow
RecordBatchstructures for zero-copy integration - Session Isolation : Each
SqlEngineinstance owns its transaction state and constraint enforcement mode
These principles allow the SQL Interface to serve as a stable, predictable API surface while the underlying execution engine evolves independently.
Sources : llkv-sql/src/sql_engine.rs:1-60 llkv-sql/src/lib.rs:1-52 [Diagram 2 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 2 from overview)
Dismiss
Refresh this wiki
Enter email to refresh