Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

CatalogManager API

Loading…

CatalogManager API

Relevant source files

Purpose and Scope

This page documents the CatalogManager API, which provides the primary interface for table lifecycle management in LLKV. The CatalogManager coordinates table creation, modification, deletion, and schema management operations. It serves as the bridge between high-level DDL operations and the low-level storage of metadata in the system catalog.

For details on how metadata is physically stored, see System Catalog and SysCatalog. For information about custom type definitions, see Custom Types and Type Registry.

Overview

The CatalogManager is the central coordinator for all catalog operations in LLKV. It manages:

  • Table ID allocation : Assigns unique identifiers to new tables
  • Schema registration : Validates and stores table schemas with Arrow integration
  • Field ID mapping : Assigns logical field IDs to columns and maintains resolution
  • Index registration : Tracks single-column and multi-column index metadata
  • Metadata snapshots : Provides consistent views of catalog state
  • DDL coordination : Orchestrates CREATE/ALTER/DROP operations
graph TB
    subgraph "High-Level Operations"
        DDL["DDL Statements\n(CREATE/ALTER/DROP)"]
QUERY["Query Planning\n(Name Resolution)"]
EXEC["Query Execution\n(Schema Access)"]
end
    
    subgraph "CatalogManager Layer"
        CM["CatalogManager"]
SNAPSHOT["TableCatalogSnapshot"]
RESOLVER["FieldResolver"]
RESULT["CreateTableResult"]
end
    
    subgraph "Metadata Structures"
        TABLEMETA["TableMeta"]
COLMETA["ColMeta"]
INDEXMETA["Index Descriptors"]
SCHEMA["Arrow Schema"]
end
    
    subgraph "Persistence Layer"
        SYSCAT["SysCatalog\n(Table 0)"]
STORE["ColumnStore"]
end
    
 
   DDL --> CM
 
   QUERY --> CM
 
   EXEC --> CM
    
 
   CM --> SNAPSHOT
 
   CM --> RESOLVER
 
   CM --> RESULT
    
 
   CM --> TABLEMETA
 
   CM --> COLMETA
 
   CM --> INDEXMETA
 
   CM --> SCHEMA
    
 
   TABLEMETA --> SYSCAT
 
   COLMETA --> SYSCAT
 
   INDEXMETA --> SYSCAT
    
 
   SYSCAT --> STORE

The CatalogManager maintains an in-memory cache of catalog metadata for performance while delegating persistence to SysCatalog (table 0).

Sources: llkv-table/src/lib.rs:1-98

Core Types

CatalogManager

The CatalogManager struct is the primary API for catalog operations. While the exact implementation is in the catalog module, it is exported from the main crate interface.

Key Responsibilities:

  • Maintains in-memory catalog cache
  • Allocates table and field IDs
  • Validates schema changes
  • Coordinates with SysCatalog for persistence
  • Provides snapshot isolation for metadata reads

CreateTableResult

Returned by table creation operations, this structure contains:

  • The newly assigned TableId
  • The created Table instance
  • Initial field ID mappings
  • Registration confirmation

TableCatalogSnapshot

A consistent, immutable view of catalog metadata at a specific point in time. Used to ensure that query planning and execution see a stable view of table schemas even as concurrent DDL operations occur.

Properties:

  • Immutable after creation
  • Contains table and column metadata
  • Includes field ID mappings
  • May include index registrations

FieldResolver

Provides mapping from string column names to FieldId identifiers. This is critical for translating SQL column references into the internal field ID system used by the storage layer.

Functionality:

  • Resolves qualified names (e.g., table.column)
  • Handles field aliases
  • Supports case-insensitive lookups (depending on configuration)
  • Validates field existence

Sources: llkv-table/src/lib.rs:54-55 llkv-table/src/lib.rs:79-80

Table Lifecycle Operations

CREATE TABLE

The CatalogManager handles table creation through a multi-step process:

  1. Validation : Checks table name uniqueness and schema validity
  2. ID Allocation : Assigns a new TableId from available range
  3. Field ID Assignment : Maps each column to a unique FieldId
  4. Schema Storage : Registers Arrow schema with type information
  5. Metadata Persistence : Writes TableMeta and ColMeta to system catalog
  6. Table Instantiation : Creates Table instance backed by ColumnStore

Table ID Ranges:

RangePurposeConstant
0System CatalogCATALOG_TABLE_ID
1-999User Tables-
1000-9999Information SchemaINFORMATION_SCHEMA_TABLE_ID_START
10000+Temporary TablesTEMPORARY_TABLE_ID_START

The CatalogManager ensures IDs are allocated from the appropriate range based on table type.

DROP TABLE

Table deletion involves:

  1. Dependency Checking : Validates no foreign keys reference the table
  2. Metadata Removal : Deletes entries from system catalog
  3. Storage Cleanup : Marks column store data for cleanup (may be deferred)
  4. Cache Invalidation : Removes table from in-memory cache

The operation is typically transactional - either all steps succeed or the table remains.

ALTER TABLE Operations

The CatalogManager coordinates schema modifications, though validation is delegated to specialized functions. Operations include:

  • ADD COLUMN : Assigns new FieldId, updates schema
  • DROP COLUMN : Validates no dependencies, marks column deleted
  • ALTER COLUMN TYPE : Validates type compatibility, updates metadata
  • RENAME COLUMN : Updates name mappings
sequenceDiagram
    participant Client
    participant CM as CatalogManager
    participant Validator
    participant SysCat as SysCatalog
    participant Store as ColumnStore
    
    Client->>CM: create_table(name, schema)
    CM->>CM: Allocate TableId
    CM->>CM: Assign FieldIds
    CM->>Validator: Validate schema
    Validator-->>CM: OK
    CM->>SysCat: Write TableMeta
    CM->>SysCat: Write ColMeta records
    SysCat-->>CM: Persisted
    CM->>Store: Initialize ColumnStore
    Store-->>CM: Table handle
    CM->>CM: Update cache
    CM-->>Client: CreateTableResult

The validate_alter_table_operation function (referenced in exports) performs constraint checks before modifications are committed.

Sources: llkv-table/src/lib.rs:22-27 llkv-table/src/lib.rs:76-78

Schema and Field Management

Field ID Assignment

Every column in LLKV is assigned a unique FieldId at creation time. This numeric identifier:

  • Persists across schema changes : Remains stable even if column is renamed
  • Enables versioning : Different table versions can reference same field ID
  • Optimizes storage : Physical storage keys use field IDs, not string names
  • Supports MVCC : System columns like created_by have reserved field IDs

The CatalogManager maintains a monotonic counter per table to allocate field IDs sequentially.

Arrow Schema Integration

The CatalogManager integrates tightly with Apache Arrow schemas:

  • Validates Arrow DataType compatibility
  • Maps Arrow fields to FieldId assignments
  • Stores schema metadata in serialized form
  • Reconstructs Arrow Schema from stored metadata

This allows LLKV to leverage Arrow’s type system while maintaining its own field ID system for storage efficiency.

FieldResolver API

The FieldResolver is obtained from a TableCatalogSnapshot and provides:

resolve(column_name: &str) -> Result<FieldId>
resolve_qualified(table_name: &str, column_name: &str) -> Result<FieldId>
get_field_name(field_id: FieldId) -> Option<&str>
get_field_type(field_id: FieldId) -> Option<&DataType>

This bidirectional mapping supports both query translation (name → ID) and result formatting (ID → name).

Sources: llkv-table/src/lib.rs:3-21 llkv-table/src/lib.rs54

Index Registration

Single-Column Indexes

The SingleColumnIndexDescriptor and SingleColumnIndexRegistration types manage metadata for indexes on individual columns:

SingleColumnIndexDescriptor:

  • Field ID being indexed
  • Index type (e.g., BTree, Hash)
  • Index-specific parameters
  • Creation timestamp

SingleColumnIndexRegistration:

  • Links table to index descriptor
  • Tracks index state (building, ready, failed)
  • Stores index metadata in system catalog

The CatalogManager maintains a registry of active indexes and coordinates their creation and maintenance.

Multi-Column Indexes

For composite indexes spanning multiple columns, the MultiColumnUniqueRegistration type (referenced in exports) provides similar functionality with support for:

  • Multiple field IDs in index key
  • Column ordering
  • Uniqueness constraints
  • Compound key generation

Sources: llkv-table/src/lib.rs55 llkv-table/src/lib.rs73

Metadata Snapshots

Snapshot Creation

A TableCatalogSnapshot provides a consistent view of catalog state. Snapshots are created:

  • On Demand : When planning a query
  • Periodically : For long-running operations
  • At Transaction Start : For transaction isolation

The snapshot is immutable and won’t reflect concurrent DDL changes, ensuring query planning sees a stable schema.

Snapshot Contents

A snapshot typically includes:

  • All TableMeta records (table definitions)
  • All ColMeta records (column definitions)
  • Field ID mappings for all tables
  • Index registrations (optional)
  • Custom type definitions (optional)
  • Constraint metadata (optional)

Cache Invalidation

When the CatalogManager modifies metadata:

  1. Updates system catalog (table 0)
  2. Increments epoch/version counter
  3. Invalidates stale snapshots
  4. Updates in-memory cache

Existing snapshots remain valid but represent a previous version. New snapshots will reflect the changes.

Sources: llkv-table/src/lib.rs54

Integration with SysCatalog

The CatalogManager uses SysCatalog (documented in System Catalog and SysCatalog) as its persistence layer:

Write Operations

  • CREATE TABLE : Writes TableMeta and ColMeta records
  • ALTER TABLE : Updates existing metadata records
  • DROP TABLE : Marks metadata as deleted
  • Index Registration : Writes index descriptor records

Read Operations

  • Snapshot Creation : Reads all metadata records
  • Table Lookup : Queries TableMeta by name or ID
  • Field Resolution : Retrieves ColMeta for a table
  • Index Discovery : Loads index descriptors
sequenceDiagram
    participant Runtime
    participant CM as CatalogManager
    participant Cache as "In-Memory Cache"
    participant SC as SysCatalog
    participant Store as ColumnStore
    
    Runtime->>CM: Initialize
    CM->>SC: Bootstrap table 0
    SC->>Store: Initialize ColumnStore(0)
    Store-->>SC: Ready
    SC-->>CM: SysCatalog ready
    
    CM->>SC: Read all TableMeta
    SC->>Store: scan(TableMeta)
    Store-->>SC: RecordBatch
    SC-->>CM: Vec<TableMeta>
    
    CM->>SC: Read all ColMeta
    SC->>Store: scan(ColMeta)
    Store-->>SC: RecordBatch
    SC-->>CM: Vec<ColMeta>
    
    CM->>Cache: Populate
    Cache-->>CM: Loaded
    
    CM-->>Runtime: CatalogManager ready

Bootstrapping

On system startup:

  1. SysCatalog initializes (creates table 0 if needed)
  2. CatalogManager reads all metadata from table 0
  3. In-memory cache is populated
  4. System is ready for operations

Sources: llkv-table/src/lib.rs:30-31 llkv-table/src/lib.rs:81-85

Usage Patterns

Creating a Table

// Typical usage pattern (conceptual)
let result = catalog_manager.create_table(
    table_name,
    schema,         // Arrow Schema
    table_id_hint   // Optional TableId preference
)?;

let table: Table = result.table;
let table_id: TableId = result.table_id;
let field_ids: HashMap<String, FieldId> = result.field_mappings;

Resolving Column Names

// Get a snapshot for consistent reads
let snapshot = catalog_manager.snapshot();

// Resolve column references
let resolver = snapshot.field_resolver(table_id)?;
let field_id = resolver.resolve("column_name")?;

// Use field_id in storage operations

Registering an Index

// Register a single-column index
let descriptor = SingleColumnIndexDescriptor::new(
    field_id,
    IndexType::BTree,
    options
);

catalog_manager.register_index(
    table_id,
    descriptor
)?;

Checking Metadata Changes

// Capture snapshot version
let snapshot_v1 = catalog_manager.snapshot();
let version_1 = snapshot_v1.version();

// ... DDL operations occur ...

// Create new snapshot and check for changes
let snapshot_v2 = catalog_manager.snapshot();
let version_2 = snapshot_v2.version();

if version_1 != version_2 {
    // Metadata has changed, invalidate plans
}

Sources: llkv-table/src/lib.rs:54-89

Thread Safety and Concurrency

The CatalogManager typically uses interior mutability (e.g., RwLock or Mutex) to allow:

  • Concurrent Reads : Multiple threads can read snapshots simultaneously
  • Exclusive Writes : DDL operations acquire exclusive locks
  • Snapshot Isolation : Snapshots remain valid even during concurrent DDL

This design allows high read concurrency while ensuring DDL operations are serialized and atomic.

The CatalogManager coordinates with several related modules:

  • catalog module: Contains the implementation (not shown in provided files)
  • sys_catalog module: Persistence layer for metadata [llkv-table/src/sys_catalog.rs]
  • metadata module: Extended metadata management [llkv-table/src/metadata.rs]
  • ddl module: DDL-specific helpers [llkv-table/src/ddl.rs]
  • resolvers module: Name resolution utilities [llkv-table/src/resolvers.rs]
  • constraints module: Constraint validation [llkv-table/src/constraints.rs]

Sources: llkv-table/src/lib.rs:34-46 llkv-table/src/lib.rs:68-69 llkv-table/src/lib.rs74 llkv-table/src/lib.rs79

Dismiss

Refresh this wiki

Enter email to refresh