Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

SQL Interface

Loading…

SQL Interface

Relevant source files

The SQL Interface provides the primary user-facing API for interacting with LLKV databases through SQL statements. This layer is responsible for parsing SQL text, preprocessing dialect-specific syntax, translating Abstract Syntax Trees (AST) into execution plans, and delegating to the runtime engine for actual execution.

For information about query planning and execution, see Query Planning and Query Execution. For details on the underlying runtime that executes plans, see Architecture.


Purpose and Scope

The SQL Interface layer (llkv-sql crate) serves as the entry point for SQL-based database operations. It bridges the gap between SQL text written by users and the Arrow-native columnar storage engine, handling:

  • SQL Parsing : Converting SQL strings into Abstract Syntax Trees using sqlparser-rs
  • Dialect Normalization : Preprocessing SQL to handle syntax variations from SQLite, DuckDB, and PostgreSQL
  • Statement Translation : Converting ASTs into typed execution plans (PlanStatement structures)
  • Execution Coordination : Delegating plans to the RuntimeEngine and formatting results
  • Transaction Management : Coordinating multi-statement transactions with MVCC support
  • Performance Optimization : Batching INSERT statements to reduce planning overhead

The SQL Interface does not execute queries directly—it delegates all data operations to the runtime and executor layers. Its primary responsibility is accurate SQL-to-plan translation while preserving user intent across different SQL dialects.

Sources : llkv-sql/src/lib.rs:1-52 llkv-sql/src/sql_engine.rs:1-60


Architecture Overview

The SQL Interface operates as a stateful wrapper around the RuntimeEngine, maintaining session state, prepared statement caches, and insert buffering state. The core workflow proceeds through several stages:

SQL Interface Processing Pipeline

graph TB
    User["User Application"]
SqlEngine["SqlEngine"]
Preprocess["SQL Preprocessing"]
Parser["sqlparser-rs\nGenericDialect"]
Translator["Statement Translation"]
Runtime["RuntimeEngine"]
Results["RecordBatch / RowCount"]
User -->|execute sql| SqlEngine
 
   SqlEngine -->|1. Preprocess| Preprocess
 
   Preprocess -->|Normalized SQL| Parser
 
   Parser -->|AST Statement| Translator
 
   Translator -->|PlanStatement| Runtime
 
   Runtime -->|RuntimeStatementResult| SqlEngine
 
   SqlEngine -->|Vec<RecordBatch>| Results
 
   Results -->|Query Results| User
    
    subgraph "llkv-sql Crate"
        SqlEngine
        Preprocess
        Translator
    end
    
    subgraph "External Dependencies"
        Parser
    end
    
    subgraph "llkv-runtime Crate"
        Runtime
    end

The SqlEngine maintains several internal subsystems:

  1. Statement Cache : Thread-safe prepared statement storage (RwLock<FxHashMap<String, Arc<PreparedPlan>>>)
  2. Insert Buffer : Cross-statement INSERT batching for bulk ingest (RefCell<Option<InsertBuffer>>)
  3. Session State : Transaction context and configuration flags
  4. Information Schema Cache : Lazy metadata refresh tracking

Sources : llkv-sql/src/sql_engine.rs:572-621 [Diagram 1 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 1 from overview)


SqlEngine Structure

The SqlEngine struct encapsulates all SQL processing state and exposes methods for executing SQL statements:

SqlEngine Class Structure

classDiagram
    class SqlEngine {-engine: RuntimeEngine\n-default_nulls_first: AtomicBool\n-insert_buffer: RefCell~Option~InsertBuffer~~\n-insert_buffering_enabled: AtomicBool\n-information_schema_ready: AtomicBool\n-statement_cache: RwLock~FxHashMap~String PreparedPlan~~\n+new(pager) SqlEngine\n+with_context(context, nulls_first) SqlEngine\n+execute(sql) Result~Vec~RuntimeStatementResult~~\n+sql(sql) Result~Vec~RecordBatch~~\n+prepare(sql) Result~PreparedStatement~\n+execute_prepared(stmt, params) Result~Vec~RuntimeStatementResult~~\n+session() RuntimeSession\n+runtime_context() Arc~RuntimeContext~}
    
    class RuntimeEngine {+execute_statement(plan) Result~RuntimeStatementResult~\n+context() Arc~RuntimeContext~}
    
    class PreparedStatement {-inner: Arc~PreparedPlan~\n+parameter_count() usize}
    
    class InsertBuffer {-table_name: String\n-columns: Vec~String~\n-rows: Vec~Vec~PlanValue~~\n-on_conflict: InsertConflictAction\n+can_accept(table, cols, conflict) bool\n+should_flush() bool}
    
    SqlEngine --> RuntimeEngine : delegates to
    SqlEngine --> PreparedStatement : creates
    SqlEngine --> InsertBuffer : maintains

Sources : llkv-sql/src/sql_engine.rs:572-621 llkv-sql/src/sql_engine.rs:354-373 llkv-sql/src/sql_engine.rs:487-547


Statement Execution Flow

A typical SQL execution proceeds through the following phases:

Statement Execution Sequence

Sources : llkv-sql/src/sql_engine.rs:1533-1633 llkv-sql/src/sql_engine.rs:2134-2285


SQL Preprocessing System

Before parsing, the SQL Interface applies multiple preprocessing passes to normalize syntax variations across different SQL dialects:

PreprocessorPurposeExample Transformation
TPCH Connect SyntaxStrip TPC-H CONNECT TO directivesCONNECT TO db; → removed
CREATE TYPE SyntaxConvert DuckDB type aliasesCREATE TYPE t AS INTCREATE DOMAIN t AS INT
EXCLUDE SyntaxQuote qualified names in EXCLUDEEXCLUDE (schema.col)EXCLUDE ("schema.col")
Trailing CommasRemove DuckDB-style trailing commasVALUES (1,)VALUES (1)
Empty IN ListsConvert degenerate IN expressionscol IN ()(col = NULL AND 0 = 1)
Index HintsStrip SQLite index hintsFROM t INDEXED BY idxFROM t
REINDEX SyntaxConvert SQLite REINDEXREINDEX idxVACUUM REINDEX idx
Trigger ShorthandExpand SQLite trigger defaultsCREATE TRIGGER tr ... → adds AFTER, FOR EACH ROW
Bare Table INExpand SQLite table subqueriescol IN tablenamecol IN (SELECT * FROM tablename)

Each preprocessor runs as a regex-based rewrite pass before the SQL text reaches sqlparser-rs. This allows LLKV to accept SQL from multiple dialects while using a single parser.

Sources : llkv-sql/src/sql_engine.rs:759-1006 llkv-sql/src/sql_engine.rs:771-793


Prepared Statements and Parameters

The SQL Interface supports prepared statements with parameterized queries. Parameters use placeholder syntax compatible with multiple dialects:

Parameter Processing Flow

graph LR
    subgraph "Parameter Placeholder Formats"
        Q1["? \n(anonymous)"]
Q2["?N \n(numbered)"]
Dollar["$N \n(PostgreSQL)"]
Named[":name \n(named)"]
end
    
    subgraph "Parameter Processing"
        Register["register_placeholder(raw)"]
State["ParameterState"]
Index["assigned: HashMap<String, usize>"]
end
    
    subgraph "Execution"
        Prepare["SqlEngine::prepare(sql)"]
PrepStmt["PreparedStatement"]
Execute["execute_prepared(stmt, params)"]
Bind["Bind params to plan"]
end
    
 
   Q1 --> Register
 
   Q2 --> Register
 
   Dollar --> Register
 
   Named --> Register
 
   Register --> State
 
   State --> Index
 
   Index --> Prepare
 
   Prepare --> PrepStmt
 
   PrepStmt --> Execute
 
   Execute --> Bind

Parameter placeholders are registered during parsing and replaced with internal sentinel values (__llkv_param__N__). When executing a prepared statement, the sentinel values are substituted with actual parameter values before plan execution.

Sources : llkv-sql/src/sql_engine.rs:78-282 llkv-sql/src/sql_engine.rs:354-373 llkv-sql/src/sql_engine.rs:1707-1773


INSERT Buffering Optimization

To improve bulk insert performance, the SQL Interface can buffer multiple consecutive INSERT ... VALUES statements and execute them as a single batched operation. This dramatically reduces planning overhead for large data loads:

INSERT Buffer State Machine

stateDiagram-v2
    [*] --> Empty : engine created
    Empty --> Buffering : INSERT with VALUES
    Buffering --> Buffering : compatible INSERT
    Buffering --> Flushing : incompatible statement
    Buffering --> Flushing : buffer threshold reached
    Flushing --> Empty : execute batched insert
    Flushing --> Buffering : new INSERT after flush
    Empty --> [*]
    
    note right of Buffering
        Accumulate rows while:
        - Same table
        - Same columns
        - Same conflict action
        - Below MAX_BUFFERED_INSERT_ROWS
    end note
    
    note right of Flushing
        Execute single INSERT with
        all accumulated rows
    end note

Buffering is disabled by default to preserve per-statement semantics for unit tests. Long-running workloads can enable buffering via set_insert_buffering(true) to achieve significant performance gains on bulk ingests.

Sources : llkv-sql/src/sql_engine.rs:487-547 llkv-sql/src/sql_engine.rs:2134-2285


graph TD
    AST["sqlparser::ast::Statement"]
subgraph "DDL Statements"
        CreateTable["CreateTable"]
AlterTable["AlterTable"]
DropTable["DropTable"]
CreateView["CreateView"]
CreateIndex["CreateIndex"]
end
    
    subgraph "DML Statements"
        Query["Query (SELECT)"]
Insert["Insert"]
Update["Update"]
Delete["Delete"]
end
    
    subgraph "Transaction Control"
        Begin["BEGIN"]
Commit["COMMIT"]
Rollback["ROLLBACK"]
end
    
    subgraph "PlanStatement Variants"
        SelectPlan["SelectPlan"]
InsertPlan["InsertPlan"]
UpdatePlan["UpdatePlan"]
DeletePlan["DeletePlan"]
CreateTablePlan["CreateTablePlan"]
AlterTablePlan["AlterTablePlan"]
BeginTxn["BeginTransaction"]
CommitTxn["CommitTransaction"]
end
    
 
   AST --> CreateTable
 
   AST --> AlterTable
 
   AST --> DropTable
 
   AST --> Query
 
   AST --> Insert
 
   AST --> Update
 
   AST --> Delete
 
   AST --> Begin
 
   AST --> Commit
    
 
   CreateTable --> CreateTablePlan
 
   AlterTable --> AlterTablePlan
 
   Query --> SelectPlan
 
   Insert --> InsertPlan
 
   Update --> UpdatePlan
 
   Delete --> DeletePlan
 
   Begin --> BeginTxn
 
   Commit --> CommitTxn

Statement Translation Process

Once SQL is parsed into an AST, the SQL Interface translates each Statement variant into a corresponding PlanStatement:

AST to Plan Translation

Translation involves:

  1. Identifier Resolution : Converting string column names to FieldId values via IdentifierResolver
  2. Expression Translation : Building typed expression trees from AST nodes
  3. Type Inference : Determining result types for computed columns
  4. Validation : Checking for unknown tables, duplicate columns, type mismatches

Sources : llkv-sql/src/sql_engine.rs:2287-3500 llkv-executor/src/lib.rs:89-93


Integration with Runtime and Executor

The SQL Interface delegates all actual execution to lower layers:

Layer Integration Diagram

The SQL Interface owns the RuntimeEngine instance and delegates PlanStatement execution to it. The runtime coordinates with the executor for query execution and the table layer for DDL operations.

Sources : llkv-sql/src/sql_engine.rs:706-745 [llkv-runtime/src/lib.rs (implied)](https://github.com/jzombie/rust-llkv/blob/89777726/llkv-runtime/src/lib.rs (implied)) [Diagram 1 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 1 from overview)


Example Usage

Basic usage of the SqlEngine:

Sources : llkv-sql/src/sql_engine.rs:443-485 llkv-sql/src/lib.rs:1-52


Key Design Principles

The SQL Interface embodies several architectural decisions:

  1. Dialect Tolerance : Accept SQL from multiple databases through preprocessing rather than forking the parser
  2. Lazy Validation : Validate table existence and column types during plan translation, not parsing
  3. Stateful Optimization : Maintain cross-statement state (insert buffer, prepared statement cache) for performance
  4. Delegate Execution : Never directly manipulate storage; always route through runtime abstractions
  5. Arrow Native : Return query results as Arrow RecordBatch structures for zero-copy integration
  6. Session Isolation : Each SqlEngine instance owns its transaction state and constraint enforcement mode

These principles allow the SQL Interface to serve as a stable, predictable API surface while the underlying execution engine evolves independently.

Sources : llkv-sql/src/sql_engine.rs:1-60 llkv-sql/src/lib.rs:1-52 [Diagram 2 from overview](https://github.com/jzombie/rust-llkv/blob/89777726/Diagram 2 from overview)

Dismiss

Refresh this wiki

Enter email to refresh