Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

SQL Interface

Relevant source files

The SQL Interface layer provides the primary user-facing entry point for executing SQL statements against LLKV. It consists of the llkv-sql crate, which wraps the underlying runtime and provides SQL parsing, preprocessing, statement caching, and result formatting.

This document covers the SqlEngine struct and its methods, SQL preprocessing and dialect normalization, and the INSERT buffering optimization system. For information about query planning after SQL parsing, see Query Planning. For runtime execution, see the Architecture section at #2.

Sources: llkv-sql/src/lib.rs:1-51 README.md:47-48

Core Components

The SQL Interface layer is centered around three main subsystems:

ComponentPurposeKey Types
SqlEngineMain execution interfaceSqlEngine, RuntimeEngine, RuntimeSession
PreprocessingSQL normalization and dialect handlingVarious regex-based transformers
INSERT BufferingBatch optimization for literal insertsInsertBuffer, PreparedInsert

Sources: llkv-sql/src/sql_engine.rs:365-556

SqlEngine Structure

The SqlEngine wraps a RuntimeEngine instance and adds SQL-specific functionality including statement caching, INSERT buffering, and configurable behavior flags. The insert_buffer field holds accumulated literal INSERT payloads when buffering is enabled.

Sources: llkv-sql/src/sql_engine.rs:496-509 llkv-sql/src/sql_engine.rs:421-471

SQL Statement Processing Flow

The statement processing flow consists of:

  1. Preprocessing: SQL text undergoes dialect normalization via regex-based transformations
  2. Parsing: sqlparser with increased recursion limit (200 vs default 50) produces AST
  3. Planning: AST nodes are translated to typed PlanStatement structures
  4. Buffering (INSERT only): Literal INSERT statements may be accumulated in InsertBuffer
  5. Execution: Plans are passed to RuntimeEngine for execution
  6. Result collection: RuntimeStatementResult instances are collected and returned

Sources: llkv-sql/src/sql_engine.rs:933-991 llkv-sql/src/sql_engine.rs:318-324

Public API Methods

Core Execution Methods

The SqlEngine exposes two primary execution methods:

MethodSignaturePurposeReturns
executefn execute(&self, sql: &str)Execute one or more SQL statementsSqlResult<Vec<RuntimeStatementResult>>
sqlfn sql(&self, query: &str)Execute a single SELECT and return batchesSqlResult<Vec<RecordBatch>>

The execute method handles arbitrary SQL (DDL, DML, queries) and returns statement results. The sql method is a convenience wrapper that enforces single-SELECT semantics and extracts Arrow batches from the result stream.

Sources: llkv-sql/src/sql_engine.rs:921-991 llkv-sql/src/sql_engine.rs:1009-1052

Prepared Statements

Prepared statements support three placeholder syntaxes:

  • Positional: ? (auto-numbered), ?1, $1 (explicit index)
  • Named: :param_name

Placeholders are tracked via thread-local ParameterState during parsing, converted to sentinel strings like __llkv_param__1__, and stored in a PreparedPlan with parameter count metadata. The statement_cache field provides a statement-level cache keyed by SQL text.

Sources: llkv-sql/src/sql_engine.rs:77-206 llkv-sql/src/sql_engine.rs:278-297

Configuration Methods

MethodPurpose
new<Pg>(pager: Arc<Pg>)Construct engine with given pager (buffering disabled)
with_context(context, default_nulls_first)Construct from existing RuntimeContext
set_insert_buffering(enabled: bool)Toggle INSERT batching mode

The set_insert_buffering method controls cross-statement INSERT accumulation. When disabled (default), each INSERT executes immediately. When enabled, compatible INSERTs targeting the same table are batched together up to MAX_BUFFERED_INSERT_ROWS (8192 rows).

Sources: llkv-sql/src/sql_engine.rs:615-621 llkv-sql/src/sql_engine.rs:879-905 llkv-sql/src/sql_engine.rs:410-414

SQL Preprocessing System

The preprocessing layer normalizes SQL dialects before parsing to handle incompatibilities between SQLite, DuckDB, and sqlparser expectations.

graph TB
    RAW["Raw SQL String"]
TPCH["preprocess_tpch_connect_syntax\n(strip CONNECT TO statements)"]
TYPE["preprocess_create_type_syntax\n(CREATE TYPE → CREATE DOMAIN)"]
EXCLUDE["preprocess_exclude_syntax\n(quote qualified names in EXCLUDE)"]
COMMA["preprocess_trailing_commas_in_values\n(remove trailing commas)"]
EMPTY["preprocess_empty_in_lists\n(expr IN () → constant)"]
INDEX["preprocess_index_hints\n(strip INDEXED BY / NOT INDEXED)"]
REINDEX["preprocess_reindex_syntax\n(REINDEX → VACUUM REINDEX)"]
BARE["preprocess_bare_table_in_clauses\n(IN table → IN (SELECT * FROM))"]
TRIGGER["preprocess_sqlite_trigger_shorthand\n(add AFTER / FOR EACH ROW)"]
PARSER["sqlparser::Parser"]
RAW --> TPCH
 
   TPCH --> TYPE
 
   TYPE --> EXCLUDE
 
   EXCLUDE --> COMMA
 
   COMMA --> EMPTY
 
   EMPTY --> INDEX
 
   INDEX --> REINDEX
 
   REINDEX --> BARE
 
   BARE --> PARSER
    
    PARSER -.parse error.-> TRIGGER
 
   TRIGGER --> PARSER

Each preprocessing function is implemented as a regex-based transformer:

FunctionPatternPurposeLines
preprocess_tpch_connect_syntaxCONNECT TO database;Strip TPC-H multi-database directives6:28-630
preprocess_create_type_syntaxCREATE TYPECREATE DOMAINTranslate DuckDB type alias syntax6:39-657
preprocess_exclude_syntaxEXCLUDE(a.b.c)EXCLUDE("a.b.c")Quote qualified names in EXCLUDE6:59-676
preprocess_trailing_commas_in_valuesVALUES (v,)VALUES (v)Remove DuckDB-style trailing commas6:78-689
preprocess_empty_in_listsexpr IN ()(expr = NULL AND 0 = 1)Convert empty IN to constant false6:91-720
preprocess_index_hintsINDEXED BY idx / NOT INDEXEDStrip SQLite index hints7:22-739
preprocess_reindex_syntaxREINDEX idxVACUUM REINDEX idxConvert to sqlparser-compatible form7:41-757
preprocess_bare_table_in_clausesIN tableIN (SELECT * FROM table)Expand SQLite shorthand8:44-873
preprocess_sqlite_trigger_shorthandMissing AFTER / FOR EACH ROWAdd required trigger components7:71-842

The trigger preprocessor is only invoked on parse errors containing CREATE TRIGGER, as it requires more complex regex patterns to inject missing timing and row-level clauses.

Sources: llkv-sql/src/sql_engine.rs:623-873

Regex Pattern Details

Static OnceLock<Regex> instances cache compiled patterns across invocations:

For example, the empty IN list handler uses:

(?i)(\([^)]*\)|x'[0-9a-fA-F]*'|'(?:[^']|'')*'|[a-zA-Z_][a-zA-Z0-9_]*(?:\.[a-zA-Z_][a-zA-Z0-9_]*)*|\d+(?:\.\d+)?)\s+(NOT\s+)?IN\s*\(\s*\)

This matches expressions (parenthesized, hex literals, strings, identifiers, numbers) followed by [NOT] IN () and replaces with boolean expressions that preserve evaluation side effects while producing constant results.

Sources: llkv-sql/src/sql_engine.rs:691-720

Parameter Placeholder System

The parameter system uses thread-local state to track placeholders during statement preparation:

  1. Scope Creation: ParameterScope::new() initializes thread-local ParameterState
  2. Registration: Each placeholder calls register_placeholder(raw) which:
    • For ?: auto-increments index
    • For ?N or $N: uses explicit numeric index
    • For :name: assigns next available index and stores mapping
  3. Sentinel Generation: placeholder_marker(index) creates __llkv_param__N__ string
  4. Parsing: Sentinel strings are parsed as string literals in the SQL AST
  5. Binding: execute_prepared replaces sentinels with SqlParamValue instances

The ParameterState struct tracks:

  • assigned: FxHashMap<String, usize> - named parameter to index mapping
  • next_auto: usize - next index for ? placeholders
  • max_index: usize - highest parameter index seen

Sources: llkv-sql/src/sql_engine.rs:77-206 llkv-sql/src/sql_engine.rs:1120-1235

graph TB
    subgraph "INSERT Processing Decision"
        INSERT["Statement::Insert"]
CLASSIFY["classify_insert"]
VALUES["PreparedInsert::Values"]
IMMEDIATE["PreparedInsert::Immediate"]
end
    
    subgraph "Buffering Logic"
        ENABLED{"Buffering\nEnabled?"}
COMPAT{"Can Buffer\nAccept?"}
THRESHOLD{"&gt;= MAX_BUFFERED_INSERT_ROWS\n(8192)?"}
BUFFER["InsertBuffer::push_statement"]
FLUSH["flush_buffered_insert"]
EXECUTE["execute_plan_statement"]
end
    
 
   INSERT --> CLASSIFY
 
   CLASSIFY --> VALUES
 
   CLASSIFY --> IMMEDIATE
    
 
   VALUES --> ENABLED
 
   ENABLED -->|No| EXECUTE
 
   ENABLED -->|Yes| COMPAT
    
 
   COMPAT -->|No| FLUSH
 
   COMPAT -->|Yes| BUFFER
 
   FLUSH --> BUFFER
    
 
   BUFFER --> THRESHOLD
 
   THRESHOLD -->|Yes| FLUSH
 
   THRESHOLD -->|No| RETURN["Return placeholder result"]
IMMEDIATE --> EXECUTE

INSERT Buffering System

The INSERT buffering system batches compatible literal INSERT statements to reduce planning overhead for bulk ingest workloads.

Buffer Structure

The InsertBuffer struct accumulates rows across multiple INSERT statements:

Key fields:

  • table_name, columns, on_conflict: compatibility key for buffering
  • rows: accumulated literal values from all buffered statements
  • statement_row_counts: per-statement row counts to emit individual results
  • total_rows: sum of statement_row_counts for threshold checking

Sources: llkv-sql/src/sql_engine.rs:421-471

Buffering Conditions

An INSERT can be buffered if:

  1. The InsertSource is Values (literal rows) or a constant SELECT
  2. Buffering is enabled via insert_buffering_enabled flag
  3. Either no buffer exists or InsertBuffer::can_accept returns true:
    • table_name matches exactly
    • columns match exactly (same names, same order)
    • on_conflict action matches

When the buffer reaches MAX_BUFFERED_INSERT_ROWS (8192), it is flushed automatically. Flush also occurs on:

  • Transaction boundaries (BEGIN, COMMIT, ROLLBACK)
  • Incompatible INSERT statement
  • Engine drop
  • Explicit set_insert_buffering(false) call

Sources: llkv-sql/src/sql_engine.rs:452-470 llkv-sql/src/sql_engine.rs:2028-2146 llkv-sql/src/sql_engine.rs:410-414

Buffer Flush Process

The flush process:

  1. Extracts InsertBuffer from RefCell<Option<InsertBuffer>>
  2. Constructs single InsertPlan with all accumulated rows
  3. Executes via execute_statement
  4. Receives single RuntimeStatementResult::Insert with total rows inserted
  5. Splits result into per-statement results using statement_row_counts vector
  6. Returns vector of results matching original statement order

This allows bulk execution while preserving per-statement result semantics.

Sources: llkv-sql/src/sql_engine.rs:2028-2146

Value Handling

The SqlValue enum represents literal values during SQL processing:

The SqlValue::try_from_expr function handles:

  • Unary operators (negation for numeric types, intervals)
  • CAST expressions (particularly to DATE)
  • Nested expressions
  • Dictionary/struct literals
  • Binary operations (addition, subtraction, bitshift for constant folding)
  • Typed strings (DATE '2024-01-01')

Interval arithmetic is performed at constant-folding time:

  • Date32 + IntervalDate32
  • Interval + Date32Date32
  • Date32 - IntervalDate32
  • Date32 - Date32Interval
  • Interval +/- IntervalInterval

Sources: llkv-sql/src/sql_value.rs:16-320

Error Handling

The SQL layer maps table-related errors to catalog-specific error messages:

Error TypeMappingMethod
Error::NotFoundCatalog Error: Table 'X' does not existtable_not_found_error
Error::InvalidArgumentError (contains "unknown table")Same as abovemap_table_error
Transaction conflictsanother transaction has dropped this tableString constant

The execute_plan_statement method applies error mapping except for CREATE VIEW and DROP VIEW statements, where the "table" name refers to the view being created/dropped rather than a referenced table.

Sources: llkv-sql/src/sql_engine.rs:558-609 llkv-sql/src/sql_engine.rs511

Thread Safety and Cloning

The SqlEngine::clone implementation creates a new session:

This ensures each cloned engine has an independent:

  • RuntimeSession (transaction state, temporary namespace)
  • insert_buffer (no shared buffering across sessions)
  • statement_cache (independent prepared statement cache)

The warning message indicates this is typically not intended usage, as most applications should use a single shared SqlEngine instance across threads (enabled by interior mutability via RefCell and atomic types).

Sources: llkv-sql/src/sql_engine.rs:522-540