This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
SQL Interface
Relevant source files
- README.md
- demos/llkv-sql-pong-demo/src/main.rs
- llkv-aggregate/src/lib.rs
- llkv-sql/src/lib.rs
- llkv-sql/src/sql_engine.rs
- llkv-sql/src/sql_value.rs
- llkv-sql/src/tpch.rs
- llkv-tpch/.gitignore
- llkv-tpch/Cargo.toml
- llkv-tpch/DRAFT-PRE-FINAL.md
The SQL Interface layer provides the primary user-facing entry point for executing SQL statements against LLKV. It consists of the llkv-sql crate, which wraps the underlying runtime and provides SQL parsing, preprocessing, statement caching, and result formatting.
This document covers the SqlEngine struct and its methods, SQL preprocessing and dialect normalization, and the INSERT buffering optimization system. For information about query planning after SQL parsing, see Query Planning. For runtime execution, see the Architecture section at #2.
Sources: llkv-sql/src/lib.rs:1-51 README.md:47-48
Core Components
The SQL Interface layer is centered around three main subsystems:
| Component | Purpose | Key Types |
|---|---|---|
SqlEngine | Main execution interface | SqlEngine, RuntimeEngine, RuntimeSession |
| Preprocessing | SQL normalization and dialect handling | Various regex-based transformers |
| INSERT Buffering | Batch optimization for literal inserts | InsertBuffer, PreparedInsert |
Sources: llkv-sql/src/sql_engine.rs:365-556
SqlEngine Structure
The SqlEngine wraps a RuntimeEngine instance and adds SQL-specific functionality including statement caching, INSERT buffering, and configurable behavior flags. The insert_buffer field holds accumulated literal INSERT payloads when buffering is enabled.
Sources: llkv-sql/src/sql_engine.rs:496-509 llkv-sql/src/sql_engine.rs:421-471
SQL Statement Processing Flow
The statement processing flow consists of:
- Preprocessing: SQL text undergoes dialect normalization via regex-based transformations
- Parsing:
sqlparserwith increased recursion limit (200 vs default 50) produces AST - Planning: AST nodes are translated to typed
PlanStatementstructures - Buffering (INSERT only): Literal INSERT statements may be accumulated in
InsertBuffer - Execution: Plans are passed to
RuntimeEnginefor execution - Result collection:
RuntimeStatementResultinstances are collected and returned
Sources: llkv-sql/src/sql_engine.rs:933-991 llkv-sql/src/sql_engine.rs:318-324
Public API Methods
Core Execution Methods
The SqlEngine exposes two primary execution methods:
| Method | Signature | Purpose | Returns |
|---|---|---|---|
execute | fn execute(&self, sql: &str) | Execute one or more SQL statements | SqlResult<Vec<RuntimeStatementResult>> |
sql | fn sql(&self, query: &str) | Execute a single SELECT and return batches | SqlResult<Vec<RecordBatch>> |
The execute method handles arbitrary SQL (DDL, DML, queries) and returns statement results. The sql method is a convenience wrapper that enforces single-SELECT semantics and extracts Arrow batches from the result stream.
Sources: llkv-sql/src/sql_engine.rs:921-991 llkv-sql/src/sql_engine.rs:1009-1052
Prepared Statements
Prepared statements support three placeholder syntaxes:
- Positional:
?(auto-numbered),?1,$1(explicit index) - Named:
:param_name
Placeholders are tracked via thread-local ParameterState during parsing, converted to sentinel strings like __llkv_param__1__, and stored in a PreparedPlan with parameter count metadata. The statement_cache field provides a statement-level cache keyed by SQL text.
Sources: llkv-sql/src/sql_engine.rs:77-206 llkv-sql/src/sql_engine.rs:278-297
Configuration Methods
| Method | Purpose |
|---|---|
new<Pg>(pager: Arc<Pg>) | Construct engine with given pager (buffering disabled) |
with_context(context, default_nulls_first) | Construct from existing RuntimeContext |
set_insert_buffering(enabled: bool) | Toggle INSERT batching mode |
The set_insert_buffering method controls cross-statement INSERT accumulation. When disabled (default), each INSERT executes immediately. When enabled, compatible INSERTs targeting the same table are batched together up to MAX_BUFFERED_INSERT_ROWS (8192 rows).
Sources: llkv-sql/src/sql_engine.rs:615-621 llkv-sql/src/sql_engine.rs:879-905 llkv-sql/src/sql_engine.rs:410-414
SQL Preprocessing System
The preprocessing layer normalizes SQL dialects before parsing to handle incompatibilities between SQLite, DuckDB, and sqlparser expectations.
graph TB
RAW["Raw SQL String"]
TPCH["preprocess_tpch_connect_syntax\n(strip CONNECT TO statements)"]
TYPE["preprocess_create_type_syntax\n(CREATE TYPE → CREATE DOMAIN)"]
EXCLUDE["preprocess_exclude_syntax\n(quote qualified names in EXCLUDE)"]
COMMA["preprocess_trailing_commas_in_values\n(remove trailing commas)"]
EMPTY["preprocess_empty_in_lists\n(expr IN () → constant)"]
INDEX["preprocess_index_hints\n(strip INDEXED BY / NOT INDEXED)"]
REINDEX["preprocess_reindex_syntax\n(REINDEX → VACUUM REINDEX)"]
BARE["preprocess_bare_table_in_clauses\n(IN table → IN (SELECT * FROM))"]
TRIGGER["preprocess_sqlite_trigger_shorthand\n(add AFTER / FOR EACH ROW)"]
PARSER["sqlparser::Parser"]
RAW --> TPCH
TPCH --> TYPE
TYPE --> EXCLUDE
EXCLUDE --> COMMA
COMMA --> EMPTY
EMPTY --> INDEX
INDEX --> REINDEX
REINDEX --> BARE
BARE --> PARSER
PARSER -.parse error.-> TRIGGER
TRIGGER --> PARSER
Each preprocessing function is implemented as a regex-based transformer:
| Function | Pattern | Purpose | Lines |
|---|---|---|---|
preprocess_tpch_connect_syntax | CONNECT TO database; | Strip TPC-H multi-database directives | 6:28-630 |
preprocess_create_type_syntax | CREATE TYPE → CREATE DOMAIN | Translate DuckDB type alias syntax | 6:39-657 |
preprocess_exclude_syntax | EXCLUDE(a.b.c) → EXCLUDE("a.b.c") | Quote qualified names in EXCLUDE | 6:59-676 |
preprocess_trailing_commas_in_values | VALUES (v,) → VALUES (v) | Remove DuckDB-style trailing commas | 6:78-689 |
preprocess_empty_in_lists | expr IN () → (expr = NULL AND 0 = 1) | Convert empty IN to constant false | 6:91-720 |
preprocess_index_hints | INDEXED BY idx / NOT INDEXED | Strip SQLite index hints | 7:22-739 |
preprocess_reindex_syntax | REINDEX idx → VACUUM REINDEX idx | Convert to sqlparser-compatible form | 7:41-757 |
preprocess_bare_table_in_clauses | IN table → IN (SELECT * FROM table) | Expand SQLite shorthand | 8:44-873 |
preprocess_sqlite_trigger_shorthand | Missing AFTER / FOR EACH ROW | Add required trigger components | 7:71-842 |
The trigger preprocessor is only invoked on parse errors containing CREATE TRIGGER, as it requires more complex regex patterns to inject missing timing and row-level clauses.
Sources: llkv-sql/src/sql_engine.rs:623-873
Regex Pattern Details
Static OnceLock<Regex> instances cache compiled patterns across invocations:
For example, the empty IN list handler uses:
(?i)(\([^)]*\)|x'[0-9a-fA-F]*'|'(?:[^']|'')*'|[a-zA-Z_][a-zA-Z0-9_]*(?:\.[a-zA-Z_][a-zA-Z0-9_]*)*|\d+(?:\.\d+)?)\s+(NOT\s+)?IN\s*\(\s*\)
This matches expressions (parenthesized, hex literals, strings, identifiers, numbers) followed by [NOT] IN () and replaces with boolean expressions that preserve evaluation side effects while producing constant results.
Sources: llkv-sql/src/sql_engine.rs:691-720
Parameter Placeholder System
The parameter system uses thread-local state to track placeholders during statement preparation:
- Scope Creation:
ParameterScope::new()initializes thread-localParameterState - Registration: Each placeholder calls
register_placeholder(raw)which:- For
?: auto-increments index - For
?Nor$N: uses explicit numeric index - For
:name: assigns next available index and stores mapping
- For
- Sentinel Generation:
placeholder_marker(index)creates__llkv_param__N__string - Parsing: Sentinel strings are parsed as string literals in the SQL AST
- Binding:
execute_preparedreplaces sentinels withSqlParamValueinstances
The ParameterState struct tracks:
assigned: FxHashMap<String, usize>- named parameter to index mappingnext_auto: usize- next index for?placeholdersmax_index: usize- highest parameter index seen
Sources: llkv-sql/src/sql_engine.rs:77-206 llkv-sql/src/sql_engine.rs:1120-1235
graph TB
subgraph "INSERT Processing Decision"
INSERT["Statement::Insert"]
CLASSIFY["classify_insert"]
VALUES["PreparedInsert::Values"]
IMMEDIATE["PreparedInsert::Immediate"]
end
subgraph "Buffering Logic"
ENABLED{"Buffering\nEnabled?"}
COMPAT{"Can Buffer\nAccept?"}
THRESHOLD{">= MAX_BUFFERED_INSERT_ROWS\n(8192)?"}
BUFFER["InsertBuffer::push_statement"]
FLUSH["flush_buffered_insert"]
EXECUTE["execute_plan_statement"]
end
INSERT --> CLASSIFY
CLASSIFY --> VALUES
CLASSIFY --> IMMEDIATE
VALUES --> ENABLED
ENABLED -->|No| EXECUTE
ENABLED -->|Yes| COMPAT
COMPAT -->|No| FLUSH
COMPAT -->|Yes| BUFFER
FLUSH --> BUFFER
BUFFER --> THRESHOLD
THRESHOLD -->|Yes| FLUSH
THRESHOLD -->|No| RETURN["Return placeholder result"]
IMMEDIATE --> EXECUTE
INSERT Buffering System
The INSERT buffering system batches compatible literal INSERT statements to reduce planning overhead for bulk ingest workloads.
Buffer Structure
The InsertBuffer struct accumulates rows across multiple INSERT statements:
Key fields:
table_name,columns,on_conflict: compatibility key for bufferingrows: accumulated literal values from all buffered statementsstatement_row_counts: per-statement row counts to emit individual resultstotal_rows: sum ofstatement_row_countsfor threshold checking
Sources: llkv-sql/src/sql_engine.rs:421-471
Buffering Conditions
An INSERT can be buffered if:
- The
InsertSourceisValues(literal rows) or a constantSELECT - Buffering is enabled via
insert_buffering_enabledflag - Either no buffer exists or
InsertBuffer::can_acceptreturns true:table_namematches exactlycolumnsmatch exactly (same names, same order)on_conflictaction matches
When the buffer reaches MAX_BUFFERED_INSERT_ROWS (8192), it is flushed automatically. Flush also occurs on:
- Transaction boundaries (BEGIN, COMMIT, ROLLBACK)
- Incompatible INSERT statement
- Engine drop
- Explicit
set_insert_buffering(false)call
Sources: llkv-sql/src/sql_engine.rs:452-470 llkv-sql/src/sql_engine.rs:2028-2146 llkv-sql/src/sql_engine.rs:410-414
Buffer Flush Process
The flush process:
- Extracts
InsertBufferfromRefCell<Option<InsertBuffer>> - Constructs single
InsertPlanwith all accumulated rows - Executes via
execute_statement - Receives single
RuntimeStatementResult::Insertwith total rows inserted - Splits result into per-statement results using
statement_row_countsvector - Returns vector of results matching original statement order
This allows bulk execution while preserving per-statement result semantics.
Sources: llkv-sql/src/sql_engine.rs:2028-2146
Value Handling
The SqlValue enum represents literal values during SQL processing:
The SqlValue::try_from_expr function handles:
- Unary operators (negation for numeric types, intervals)
- CAST expressions (particularly to DATE)
- Nested expressions
- Dictionary/struct literals
- Binary operations (addition, subtraction, bitshift for constant folding)
- Typed strings (
DATE '2024-01-01')
Interval arithmetic is performed at constant-folding time:
Date32 + Interval→Date32Interval + Date32→Date32Date32 - Interval→Date32Date32 - Date32→IntervalInterval +/- Interval→Interval
Sources: llkv-sql/src/sql_value.rs:16-320
Error Handling
The SQL layer maps table-related errors to catalog-specific error messages:
| Error Type | Mapping | Method |
|---|---|---|
Error::NotFound | Catalog Error: Table 'X' does not exist | table_not_found_error |
Error::InvalidArgumentError (contains "unknown table") | Same as above | map_table_error |
| Transaction conflicts | another transaction has dropped this table | String constant |
The execute_plan_statement method applies error mapping except for CREATE VIEW and DROP VIEW statements, where the "table" name refers to the view being created/dropped rather than a referenced table.
Sources: llkv-sql/src/sql_engine.rs:558-609 llkv-sql/src/sql_engine.rs511
Thread Safety and Cloning
The SqlEngine::clone implementation creates a new session:
This ensures each cloned engine has an independent:
RuntimeSession(transaction state, temporary namespace)insert_buffer(no shared buffering across sessions)statement_cache(independent prepared statement cache)
The warning message indicates this is typically not intended usage, as most applications should use a single shared SqlEngine instance across threads (enabled by interior mutability via RefCell and atomic types).
Sources: llkv-sql/src/sql_engine.rs:522-540