Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Catalog and Metadata Management

Relevant source files

Purpose and Scope

This document describes LLKV's metadata management infrastructure, including how table schemas, column definitions, and type information are persisted and accessed throughout the system. The catalog serves as the authoritative source for all schema information and coordinates with the storage layer to ensure crash consistency for metadata changes.

For details on specific catalog APIs, see CatalogManager API. For information on how metadata is physically stored, see System Catalog and SysCatalog. For type alias management, see Custom Types and Type Registry.

System Catalog Architecture

LLKV implements a self-hosting catalog where metadata is stored as regular data within the system. The system catalog, referred to as SysCatalog, is physically stored as table 0 and uses the same Arrow-based columnar storage infrastructure as user tables. This design provides several advantages:

  • Crash consistency : Metadata changes use the same transactional append path as data, ensuring atomic schema modifications.
  • MVCC for metadata : Schema changes are versioned alongside data using the same created_by and deleted_by columns.
  • Unified storage : No special-case persistence logic is required for metadata versus data.
  • Bootstrap simplicity : The catalog table itself can be opened using minimal hardcoded schema information.

Sources : llkv-table/README.md:27-29 llkv-runtime/README.md:37-40

graph TB
    subgraph "SQL Layer"
        SQLENG["SqlEngine"]
end
    
    subgraph "Runtime Layer"
        RUNTIME["RuntimeEngine"]
RTCONTEXT["RuntimeContext"]
end
    
    subgraph "Catalog Layer"
        CATMGR["CatalogManager"]
SYSCAT["SysCatalog\n(Table 0)"]
TYPEREG["TypeRegistry"]
RESOLVER["IdentifierResolver"]
end
    
    subgraph "Table Layer"
        TABLE["Table"]
TABLEMETA["TableMeta"]
COLMETA["ColMeta"]
end
    
    subgraph "Storage Layer"
        COLSTORE["ColumnStore"]
PAGER["Pager"]
end
    
 
   SQLENG --> RUNTIME
 
   RUNTIME --> RTCONTEXT
 
   RTCONTEXT --> CATMGR
    
 
   CATMGR --> SYSCAT
 
   CATMGR --> TYPEREG
 
   CATMGR --> RESOLVER
    
 
   SYSCAT --> TABLE
 
   TABLE --> COLSTORE
 
   COLSTORE --> PAGER
    
    TABLEMETA -.stored in.-> SYSCAT
    COLMETA -.stored in.-> SYSCAT
    
    RESOLVER -.queries.-> CATMGR
    RUNTIME -.queries.-> RESOLVER

Metadata Storage Model

The catalog stores two primary metadata types as Arrow RecordBatches within table 0:

TableMeta Structure

TableMeta records describe each table's schema and properties:

  • table_id : Unique identifier (u32)
  • namespace_id : Namespace the table belongs to (u32)
  • table_name : User-visible name (String)
  • schema : Serialized Arrow Schema describing columns and types
  • row_count : Approximate row count for query planning
  • created_at : Timestamp of table creation

ColMeta Structure

ColMeta records describe individual columns within tables:

  • table_id : Parent table reference (u32)
  • field_id : Column identifier within the table (u32)
  • field_name : Column name (String)
  • data_type : Arrow DataType serialization
  • nullable : Whether NULL values are permitted (bool)
  • metadata : Key-value pairs for extended properties
graph LR
    subgraph "Logical Metadata Model"
        USERTABLE["User Table\nemployees"]
TABLEMETA["TableMeta\ntable_id=5\nname='employees'\nschema=..."]
COLMETA1["ColMeta\ntable_id=5\nfield_id=0\nname='id'\ntype=Int32"]
COLMETA2["ColMeta\ntable_id=5\nfield_id=1\nname='name'\ntype=Utf8"]
end
    
    subgraph "Physical Storage"
        SYSCATTABLE["SysCatalog Table 0"]
RECORDBATCH["RecordBatch\nwith MVCC columns"]
COLUMNCHUNKS["Column Chunks\nin ColumnStore"]
end
    
 
   USERTABLE --> TABLEMETA
 
   USERTABLE --> COLMETA1
 
   USERTABLE --> COLMETA2
    
 
   TABLEMETA --> RECORDBATCH
 
   COLMETA1 --> RECORDBATCH
 
   COLMETA2 --> RECORDBATCH
    
 
   RECORDBATCH --> SYSCATTABLE
 
   SYSCATTABLE --> COLUMNCHUNKS

Both metadata types include MVCC columns (row_id, created_by, deleted_by) to support transactional schema changes and time-travel queries over metadata history.

Sources : llkv-table/README.md:27-29 llkv-column-map/README.md:13-16

Catalog Manager

The CatalogManager provides the high-level API for catalog operations and coordinates between the SQL layer, runtime, and storage. Key responsibilities include:

  • Table lifecycle : Create, drop, rename, and truncate operations
  • Schema queries : Resolve table names to table IDs and field names to field IDs
  • Type management : Register and resolve custom type aliases
  • Namespace isolation : Maintain separate table namespaces for user data and temporary objects
  • Identifier resolution : Translate qualified names (schema.table.column) into physical identifiers
graph TB
    subgraph "Catalog Manager Responsibilities"
        LIFECYCLE["Table Lifecycle\ncreate/drop/rename"]
SCHEMAQUERY["Schema Queries\nname→id resolution"]
TYPEMGMT["Type Management\ncustom types/aliases"]
NAMESPACES["Namespace Isolation\nuser vs temporary"]
end
    
    subgraph "Core Components"
        CATMGR["CatalogManager"]
CACHE["In-Memory Cache\nTableMeta/ColMeta"]
TYPEREG["TypeRegistry"]
end
    
    subgraph "Persistence"
        SYSCAT["SysCatalog"]
APPENDPATH["Arrow Append Path"]
end
    
 
   LIFECYCLE --> CATMGR
 
   SCHEMAQUERY --> CATMGR
 
   TYPEMGMT --> CATMGR
 
   NAMESPACES --> CATMGR
    
 
   CATMGR --> CACHE
 
   CATMGR --> TYPEREG
 
   CATMGR --> SYSCAT
    
 
   SYSCAT --> APPENDPATH

The manager maintains an in-memory cache of metadata loaded from table 0 on startup and synchronizes changes back through the standard table append path.

Sources : llkv-runtime/README.md:37-40

Identifier Resolution

LLKV uses a multi-stage identifier resolution process to translate SQL names into physical storage keys:

Resolution Pipeline

  1. String names (Expr<String>): SQL parser produces expressions with bare column names
  2. Qualified resolution (IdentifierResolver): Resolve names to specific tables considering scope and aliases
  3. Field IDs (Expr<FieldId>): Convert to numeric field identifiers for execution
  4. Logical field IDs (LogicalFieldId): Add namespace and table context for storage lookup
  5. Physical keys (PhysicalKey): Map to actual pager keys for column chunks

Sources : llkv-table/README.md:36-40 llkv-sql/src/sql_engine.rs36

graph LR
    SQL["SQL String\n'SELECT name\nFROM users'"]
EXPRSTR["Expr<String>\nfield='name'"]
RESOLUTION["IdentifierResolver\ncontext + scope"]
EXPRFID["Expr<FieldId>\ntable_id=5\nfield_id=1"]
LOGICALFID["LogicalFieldId\nnamespace=0\ntable=5\nfield=1"]
PHYSKEY["PhysicalKey\nkey=0x1234"]
SQL --> EXPRSTR
 
   EXPRSTR --> RESOLUTION
 
   RESOLUTION --> EXPRFID
 
   EXPRFID --> LOGICALFID
 
   LOGICALFID --> PHYSKEY

Identifier Context

The IdentifierContext structure tracks available tables and columns within a query scope:

  • Tracks visible tables and their aliases
  • Maintains column availability for each table
  • Handles nested contexts for subqueries
  • Supports correlated column references across scope boundaries

The IdentifierResolver consults the catalog manager to build these contexts during query planning.

Sources : llkv-sql/src/sql_engine.rs36

Catalog Operations

CREATE TABLE Flow

When a CREATE TABLE statement executes, the following sequence occurs:

Sources : llkv-runtime/README.md:13-18 llkv-table/README.md:27-29

DROP TABLE Flow

Table deletion is implemented as a soft delete using MVCC:

  1. Mark the TableMeta row as deleted by setting deleted_by to the current transaction ID
  2. Mark all associated ColMeta rows as deleted
  3. The table's data remains physically present but invisible to queries observing later snapshots
  4. Background garbage collection can eventually reclaim space from dropped tables

This approach ensures that in-flight transactions using earlier snapshots can still access the table definition.

Sources : llkv-table/README.md:32-34

Type Registry

The TypeRegistry manages custom type aliases created with CREATE DOMAIN (or CREATE TYPE in DuckDB dialect):

Type Alias Storage

  • Type definitions are stored alongside other metadata in the catalog
  • Aliases map user-defined names to base Arrow DataType instances
  • Type resolution occurs during expression planning and column definition
  • Nested type references are recursively resolved

Type Resolution Process

When a column is defined with a custom type:

  1. Parser produces type name as string
  2. TypeRegistry resolves name to base DataType
  3. Column is stored with resolved base type
  4. Type alias is preserved in ColMeta metadata for introspection

Sources : llkv-sql/src/sql_engine.rs:639-657

Namespace Management

LLKV supports multiple namespaces to isolate different categories of tables:

Namespace IDPurposeLifetimeStorage
0 (default)User tablesPersistentMain pager
1 (temporary)Temporary tables, stagingTransaction scopeMemPager
2+ (custom)Reserved for future useVariesConfigurable
graph TB
    subgraph "Persistent Namespace (0)"
        USERTBL1["users table"]
USERTBL2["orders table"]
SYSCAT["SysCatalog\n(table 0)"]
end
    
    subgraph "Temporary Namespace (1)"
        TEMPTBL1["#temp_results"]
TEMPTBL2["#staging_data"]
end
    
    subgraph "Storage Backends"
        MAINPAGER["BoxedPager\n(persistent)"]
MEMPAGER["MemPager\n(in-memory)"]
end
    
 
   USERTBL1 --> MAINPAGER
 
   USERTBL2 --> MAINPAGER
 
   SYSCAT --> MAINPAGER
    
 
   TEMPTBL1 --> MEMPAGER
 
   TEMPTBL2 --> MEMPAGER

The TEMPORARY_NAMESPACE_ID constant identifies ephemeral tables created within transactions that should not persist beyond commit or rollback.

Sources : llkv-runtime/README.md:26-32 llkv-sql/src/sql_engine.rs26

Catalog Bootstrap

The system catalog faces a bootstrapping challenge: table 0 stores metadata for all tables, including itself. LLKV solves this with a two-phase initialization:

Phase 1: Hardcoded Schema

On first startup, the ColumnStore initializes with an empty catalog. When the runtime creates table 0, it uses a hardcoded schema definition for SysCatalog that includes the minimal fields needed to store TableMeta and ColMeta:

  • table_id (UInt32)
  • table_name (Utf8)
  • field_id (UInt32)
  • field_name (Utf8)
  • data_type (Utf8, serialized)
  • Standard MVCC columns

Phase 2: Self-Description

Once table 0 exists, the runtime appends metadata describing table 0 itself into table 0. Subsequent startups load the catalog by scanning table 0 using the hardcoded schema, then validate that the self-description matches.

This bootstrap approach ensures that:

  • No external metadata files are required
  • Catalog schema can evolve through standard migration paths
  • The system remains self-contained within a single pager instance

Sources : llkv-column-map/README.md:36-40

Integration with Storage Layer

The catalog leverages the same storage infrastructure as user data:

Column Store Interaction

  • LogicalFieldId encodes (namespace_id, table_id, field_id) to uniquely identify columns across all tables
  • The ColumnStore maintains a mapping from LogicalFieldId to PhysicalKey
  • Catalog queries fetch metadata by scanning table 0 using standard ColumnStream APIs
  • Metadata mutations append RecordBatches through ColumnStore::append, ensuring ACID properties

MVCC for Metadata

Schema changes are transactional:

  • CREATE TABLE within a transaction remains invisible to other transactions until commit
  • DROP TABLE marks metadata as deleted without immediate physical removal
  • Concurrent transactions see consistent snapshots of the schema based on their transaction IDs
  • Schema conflicts (e.g., duplicate table names) are detected during commit watermark advancement

Sources : llkv-column-map/README.md:19-29 llkv-table/README.md:32-34

Catalog Consistency

Several mechanisms ensure catalog consistency across failures and concurrent access:

Atomic Metadata Updates

All catalog changes (create, drop, alter) execute as atomic append operations. The ColumnStore::append method ensures either all metadata rows are written or none are, preventing partial schema states.

Conflict Detection

On transaction commit, the runtime validates that:

  • No conflicting table names exist in the target namespace
  • Referenced tables for foreign keys still exist
  • Column types remain compatible with constraints

If conflicts are detected, the commit fails and the transaction rolls back, discarding staged metadata.

Recovery After Crash

Since metadata uses the same MVCC append path as data:

  • Uncommitted metadata changes (transactions that never committed) remain invisible
  • The catalog reflects the last successfully committed snapshot
  • No separate recovery log or checkpoint is required for metadata

Sources : llkv-runtime/README.md:20-24

Performance Considerations

Metadata Caching

The CatalogManager caches frequently accessed metadata in memory:

  • Table name → table ID mappings
  • Table ID → schema mappings
  • Field name → field ID mappings per table
  • Custom type definitions

Cache invalidation occurs on:

  • Explicit DDL operations (CREATE, DROP, ALTER)
  • Transaction commit with staged schema changes
  • Cross-session schema modifications (future: requires catalog versioning)

Scan Optimization

Metadata scans leverage the same optimizations as user data:

  • Predicate pushdown to filter by table_id or field_id
  • Projection to fetch only required columns
  • MVCC filtering to skip deleted entries

For common operations like "lookup table by name", the catalog manager maintains auxiliary indexes in memory to avoid full scans.

Sources : llkv-table/README.md:23-24