Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

System Catalog and SysCatalog

Relevant source files

Purpose and Scope

This document describes the system catalog infrastructure that stores and manages table and column metadata for LLKV. The system catalog treats metadata as first-class data, persisting it in table 0 using the same Arrow-based storage mechanisms that handle user data. This ensures crash consistency, enables transactional DDL operations, and simplifies the overall architecture by eliminating separate metadata storage layers.

For information about the higher-level catalog management API that orchestrates table lifecycle operations, see CatalogManager API. For details on custom type definitions and the type registry, see Custom Types and Type Registry.

System Catalog as Table 0

LLKV stores all table and column metadata in a special table with ID 0, known as the system catalog. This design leverages the existing storage infrastructure rather than introducing a separate metadata store.

Key Properties

PropertyDescription
Table IDAlways 0, reserved at system initialization
Storage FormatArrow RecordBatch with predefined schema
MVCC SemanticsFull transaction support with snapshot isolation
PersistenceUses the same ColumnStore and Pager as user tables
Crash SafetyMetadata mutations are atomic through the append pipeline

The system catalog contains two types of metadata records:

  1. Table Metadata (TableMeta): Defines table schemas, IDs, and names
  2. Column Metadata (ColMeta): Describes individual columns within tables
graph TB
    subgraph "Metadata Storage Model"
        UserTables["User Tables\n(ID ≥ 1)"]
SysCatalog["System Catalog\n(Table 0)"]
TableMeta["TableMeta Records\n• table_id\n• table_name\n• schema"]
ColMeta["ColMeta Records\n• table_id\n• col_name\n• col_id\n• data_type"]
end
    
    subgraph "Storage Layer"
        ColumnStore["ColumnStore"]
Pager["Pager (MemPager/SimdRDrivePager)"]
end
    
 
   UserTables -->|described by| SysCatalog
 
   SysCatalog --> TableMeta
 
   SysCatalog --> ColMeta
    
 
   SysCatalog -->|persisted via| ColumnStore
 
   UserTables -->|persisted via| ColumnStore
 
   ColumnStore --> Pager
    
    style SysCatalog fill:#f9f9f9

Sources: llkv-table/README.md:28-29 llkv-column-map/README.md:10-16

Metadata Schema

The system catalog stores metadata using a predefined Arrow schema with the following structure:

TableMeta Schema

Field NameArrow TypeDescription
table_idUInt32Unique identifier for the table
table_nameUtf8Human-readable table name
schemaBinarySerialized Arrow schema definition
row_idUInt64MVCC row identifier (auto-injected)
created_byUInt64Transaction ID that created this record
deleted_byUInt64Transaction ID that deleted this record (NULL if active)

ColMeta Schema

Field NameArrow TypeDescription
table_idUInt32References the parent table
col_idUInt32Column identifier within the table
col_nameUtf8Column name
data_typeUtf8Arrow data type descriptor
row_idUInt64MVCC row identifier (auto-injected)
created_byUInt64Transaction ID that created this record
deleted_byUInt64Transaction ID that deleted this record (NULL if active)

Sources: llkv-table/README.md:13-17 Diagram 4 from high-level architecture

SysCatalog Implementation

The SysCatalog struct serves as the programmatic interface to the system catalog, providing methods to read and write metadata while abstracting the underlying Arrow storage details.

graph LR
    subgraph "SysCatalog Interface"
        SysCatalog["SysCatalog"]
CreateTable["create_table()"]
GetTable["get_table_meta()"]
ListTables["list_tables()"]
DropTable["drop_table()"]
CreateCol["create_column()"]
GetCol["get_column_meta()"]
ListCols["list_columns()"]
end
    
    subgraph "Storage Backend"
        Table0["Table (ID=0)"]
ColumnStore["ColumnStore"]
end
    
 
   SysCatalog --> CreateTable
 
   SysCatalog --> GetTable
 
   SysCatalog --> ListTables
 
   SysCatalog --> DropTable
 
   SysCatalog --> CreateCol
 
   SysCatalog --> GetCol
 
   SysCatalog --> ListCols
    
 
   CreateTable --> Table0
 
   GetTable --> Table0
 
   ListTables --> Table0
 
   DropTable --> Table0
 
   CreateCol --> Table0
 
   GetCol --> Table0
 
   ListCols --> Table0
    
 
   Table0 --> ColumnStore

Core Components

Sources: llkv-table/README.md:28-29 llkv-runtime/README.md39

Metadata Query Process

When the runtime queries the catalog (e.g., during SELECT planning), it follows this flow:

Sources: llkv-table/README.md:23-25 llkv-runtime/README.md:36-40

sequenceDiagram
    participant Runtime as RuntimeContext
    participant Catalog as SysCatalog
    participant Table0 as Table (ID=0)
    participant Store as ColumnStore
    
    Runtime->>Catalog: get_table_meta("users")
    
    Catalog->>Table0: scan_stream()\nWHERE table_name = 'users'
    Table0->>Store: ColumnStream with predicate
    Store->>Store: Apply MVCC filtering
    Store-->>Table0: RecordBatch
    
    Table0-->>Catalog: RecordBatch
    
    Note over Catalog: Deserialize TableMeta\nfrom Arrow batch
    
    Catalog-->>Runtime: TableMeta struct
    
    Runtime->>Catalog: list_columns(table_id)
    Catalog->>Table0: scan_stream()\nWHERE table_id = X
    Table0->>Store: ColumnStream with predicate
    Store-->>Table0: RecordBatch
    Table0-->>Catalog: RecordBatch
    
    Note over Catalog: Deserialize ColMeta\nrecords
    
    Catalog-->>Runtime: Vec<ColMeta>

Metadata Operations

DDL operations (CREATE TABLE, DROP TABLE, ALTER TABLE) modify the system catalog through the same transactional append pipeline used for INSERT statements.

graph TD
    ParseSQL["Parse SQL:\nCREATE TABLE users (...)"]
CreatePlan["CreateTablePlan"]
RuntimeExec["Runtime.execute_create_table()"]
ValidateSchema["Validate Schema"]
AllocTableID["Allocate table_id"]
BuildTableMeta["Build TableMeta RecordBatch"]
BuildColMeta["Build ColMeta RecordBatch"]
AppendTable["Table(0).append(TableMeta)"]
AppendCols["Table(0).append(ColMeta)"]
ColumnStore["ColumnStore.append()"]
CommitPager["Pager.batch_put()"]
ParseSQL --> CreatePlan
 
   CreatePlan --> RuntimeExec
 
   RuntimeExec --> ValidateSchema
 
   ValidateSchema --> AllocTableID
    
 
   AllocTableID --> BuildTableMeta
 
   AllocTableID --> BuildColMeta
    
 
   BuildTableMeta --> AppendTable
 
   BuildColMeta --> AppendCols
    
 
   AppendTable --> ColumnStore
 
   AppendCols --> ColumnStore
    
 
   ColumnStore --> CommitPager
    
    style AppendTable fill:#f9f9f9
    style AppendCols fill:#f9f9f9

CREATE TABLE Flow

Key Implementation Details:

  1. Schema Validation : The runtime validates the Arrow schema before allocating resources
  2. Table ID Allocation : Monotonically increasing IDs are assigned via CatalogManager
  3. Atomic Append : Both TableMeta and all ColMeta records are appended in a single transaction
  4. MVCC Tagging : The created_by column is set to the current transaction ID

Sources: llkv-runtime/README.md:36-40 llkv-table/README.md:22-24

graph TD
    DropPlan["DropTablePlan"]
RuntimeExec["Runtime.execute_drop_table()"]
LookupMeta["SysCatalog.get_table_meta()"]
CheckExists["Verify table exists"]
BuildDeleteMeta["Build RecordBatch:\n• table_id\n• deleted_by = current_txn"]
AppendDelete["Table(0).append(delete_batch)"]
ColumnStore["ColumnStore.append()"]
DropPlan --> RuntimeExec
 
   RuntimeExec --> LookupMeta
 
   LookupMeta --> CheckExists
    
 
   CheckExists --> BuildDeleteMeta
 
   BuildDeleteMeta --> AppendDelete
 
   AppendDelete --> ColumnStore
    
    style BuildDeleteMeta fill:#f9f9f9

DROP TABLE Flow

Dropping a table uses MVCC soft-delete semantics rather than physical deletion:

The deleted_by column is updated to mark the metadata as deleted. MVCC visibility rules ensure that:

  • Transactions with snapshots before the deletion still see the table
  • Transactions starting after the deletion do not see the table

Sources: llkv-table/README.md:32-34 Diagram 4 from high-level architecture

sequenceDiagram
    participant Main as main() or SqlEngine::new()
    participant Runtime as RuntimeContext::new()
    participant CatMgr as CatalogManager::new()
    participant Table as Table::open_or_create()
    participant Store as ColumnStore::open()
    participant Pager as Pager (MemPager/SimdRDrivePager)
    
    Main->>Runtime: new(pager)
    Runtime->>CatMgr: new(pager)
    
    CatMgr->>Store: open(pager, root_key)
    Store->>Pager: batch_get([root_key])
    
    alt Catalog Exists
        Pager-->>Store: Catalog data
        Store-->>CatMgr: ColumnStore (loaded)
        Note over CatMgr: Deserialize catalog entries\nelse First Run
        Pager-->>Store: NULL
        Store-->>CatMgr: ColumnStore (empty)
        CatMgr->>Table: open_or_create(table_id=0)
        Note over CatMgr: Create system catalog schema
        Table->>Store: Initialize table 0
        Store->>Pager: batch_put(catalog_schema)
    end
    
    CatMgr-->>Runtime: CatalogManager (initialized)
    Runtime-->>Main: RuntimeContext (ready)

Bootstrap Process

When LLKV initializes, the system catalog must bootstrap itself before any user operations can proceed.

Initialization Sequence

Bootstrap Steps:

  1. Pager Initialization : The storage backend is opened (in-memory or persistent)
  2. Catalog Discovery : The ColumnStore attempts to load the catalog from the pager root key
  3. Schema Creation : If no catalog exists, table 0 is created with the predefined schema
  4. Ready State : The runtime can now service DDL and DML operations

Sources: llkv-runtime/README.md:26-31 llkv-storage/README.md:12-16

graph TB
    subgraph "SQL Query Processing"
        ParsedSQL["Parsed SQL AST"]
SelectPlan["SelectPlan<String>"]
ResolvedPlan["SelectPlan<FieldId>"]
end
    
    subgraph "RuntimeContext"
        CatalogLookup["Catalog Lookup"]
FieldResolution["Field Name → FieldId\nResolution"]
SchemaValidation["Schema Validation"]
end
    
    subgraph "System Catalog"
        SysCatalog["SysCatalog"]
TableMetaCache["In-Memory Metadata Cache"]
end
    
 
   ParsedSQL --> SelectPlan
 
   SelectPlan --> CatalogLookup
    
 
   CatalogLookup --> SysCatalog
 
   SysCatalog --> TableMetaCache
    
 
   TableMetaCache --> FieldResolution
 
   FieldResolution --> SchemaValidation
 
   SchemaValidation --> ResolvedPlan

Integration with Runtime

The RuntimeContext uses the system catalog for all schema-dependent operations:

Schema Resolution Flow

Usage Examples

OperationCatalog Interaction
SELECTResolve table names → table IDs, resolve column names → field IDs
INSERTValidate schema compatibility, check for required columns
JOINResolve schemas for both tables, validate join key compatibility
CREATE INDEX(Future) Persist index metadata as new catalog record type
ALTER TABLEUpdate existing metadata records with new schema definitions

Sources: llkv-runtime/README.md:36-40 llkv-expr/README.md:50-54

Dual-Context Catalog Access

During explicit transactions, the runtime maintains two catalog views:

Catalog Visibility Rules

  1. Persistent Context : Sees only metadata committed before the transaction's snapshot
  2. Staging Context : Sees tables created within the current transaction
  3. On Commit : Staged metadata is replayed into the persistent context
  4. On Rollback : Staged metadata is discarded

This dual-view approach ensures that:

  • DDL operations remain transactional
  • Uncommitted schema changes don't leak to other sessions
  • Catalog queries are snapshot-isolated like DML operations

Sources: llkv-runtime/README.md:26-31 llkv-table/README.md:32-34

Metadata Caching

The CatalogManager maintains an in-memory cache of frequently accessed metadata to avoid repeated scans of table 0:

Cache StructurePurposeInvalidation Strategy
Table Name → ID MapFast table resolution during planningInvalidated on CREATE/DROP TABLE
Table ID → Schema MapQuick schema validation during INSERTInvalidated on ALTER TABLE
Column Name → FieldId MapField resolution for expressionsRebuilt on schema changes

The cache is session-local and does not require cross-session synchronization in the current single-process model.

Sources: Inferred from llkv-runtime/README.md:12-17

Summary

The LLKV system catalog demonstrates the principle of treating metadata as data by storing all table and column definitions in table 0 using the same Arrow-based storage infrastructure that handles user tables. This design:

  • Simplifies Architecture : Eliminates the need for separate metadata storage systems
  • Ensures Consistency : Metadata mutations use MVCC transactions like all other data
  • Enables Crash Recovery : The pager's atomicity guarantees extend to schema changes
  • Supports Transactional DDL : Schema modifications can be rolled back or committed atomically

The SysCatalog interface abstracts the underlying Arrow storage, providing a type-safe API for the runtime to query and modify metadata. The bootstrap process ensures the system catalog exists before any user operations proceed, and the dual-context model enables proper transaction isolation for DDL operations.

Sources: llkv-table/README.md:28-29 llkv-runtime/README.md:36-40 llkv-column-map/README.md:10-16 Diagram 4 from high-level architecture