This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
CatalogManager API
Loading…
CatalogManager API
Relevant source files
Purpose and Scope
This page documents the CatalogManager API, which provides the primary interface for table lifecycle management in LLKV. The CatalogManager coordinates table creation, modification, deletion, and schema management operations. It serves as the bridge between high-level DDL operations and the low-level storage of metadata in the system catalog.
For details on how metadata is physically stored, see System Catalog and SysCatalog. For information about custom type definitions, see Custom Types and Type Registry.
Overview
The CatalogManager is the central coordinator for all catalog operations in LLKV. It manages:
- Table ID allocation : Assigns unique identifiers to new tables
- Schema registration : Validates and stores table schemas with Arrow integration
- Field ID mapping : Assigns logical field IDs to columns and maintains resolution
- Index registration : Tracks single-column and multi-column index metadata
- Metadata snapshots : Provides consistent views of catalog state
- DDL coordination : Orchestrates CREATE/ALTER/DROP operations
graph TB
subgraph "High-Level Operations"
DDL["DDL Statements\n(CREATE/ALTER/DROP)"]
QUERY["Query Planning\n(Name Resolution)"]
EXEC["Query Execution\n(Schema Access)"]
end
subgraph "CatalogManager Layer"
CM["CatalogManager"]
SNAPSHOT["TableCatalogSnapshot"]
RESOLVER["FieldResolver"]
RESULT["CreateTableResult"]
end
subgraph "Metadata Structures"
TABLEMETA["TableMeta"]
COLMETA["ColMeta"]
INDEXMETA["Index Descriptors"]
SCHEMA["Arrow Schema"]
end
subgraph "Persistence Layer"
SYSCAT["SysCatalog\n(Table 0)"]
STORE["ColumnStore"]
end
DDL --> CM
QUERY --> CM
EXEC --> CM
CM --> SNAPSHOT
CM --> RESOLVER
CM --> RESULT
CM --> TABLEMETA
CM --> COLMETA
CM --> INDEXMETA
CM --> SCHEMA
TABLEMETA --> SYSCAT
COLMETA --> SYSCAT
INDEXMETA --> SYSCAT
SYSCAT --> STORE
The CatalogManager maintains an in-memory cache of catalog metadata for performance while delegating persistence to SysCatalog (table 0).
Sources: llkv-table/src/lib.rs:1-98
Core Types
CatalogManager
The CatalogManager struct is the primary API for catalog operations. While the exact implementation is in the catalog module, it is exported from the main crate interface.
Key Responsibilities:
- Maintains in-memory catalog cache
- Allocates table and field IDs
- Validates schema changes
- Coordinates with
SysCatalogfor persistence - Provides snapshot isolation for metadata reads
CreateTableResult
Returned by table creation operations, this structure contains:
- The newly assigned
TableId - The created
Tableinstance - Initial field ID mappings
- Registration confirmation
TableCatalogSnapshot
A consistent, immutable view of catalog metadata at a specific point in time. Used to ensure that query planning and execution see a stable view of table schemas even as concurrent DDL operations occur.
Properties:
- Immutable after creation
- Contains table and column metadata
- Includes field ID mappings
- May include index registrations
FieldResolver
Provides mapping from string column names to FieldId identifiers. This is critical for translating SQL column references into the internal field ID system used by the storage layer.
Functionality:
- Resolves qualified names (e.g.,
table.column) - Handles field aliases
- Supports case-insensitive lookups (depending on configuration)
- Validates field existence
Sources: llkv-table/src/lib.rs:54-55 llkv-table/src/lib.rs:79-80
Table Lifecycle Operations
CREATE TABLE
The CatalogManager handles table creation through a multi-step process:
- Validation : Checks table name uniqueness and schema validity
- ID Allocation : Assigns a new
TableIdfrom available range - Field ID Assignment : Maps each column to a unique
FieldId - Schema Storage : Registers Arrow schema with type information
- Metadata Persistence : Writes
TableMetaandColMetato system catalog - Table Instantiation : Creates
Tableinstance backed byColumnStore
Table ID Ranges:
| Range | Purpose | Constant |
|---|---|---|
| 0 | System Catalog | CATALOG_TABLE_ID |
| 1-999 | User Tables | - |
| 1000-9999 | Information Schema | INFORMATION_SCHEMA_TABLE_ID_START |
| 10000+ | Temporary Tables | TEMPORARY_TABLE_ID_START |
The CatalogManager ensures IDs are allocated from the appropriate range based on table type.
DROP TABLE
Table deletion involves:
- Dependency Checking : Validates no foreign keys reference the table
- Metadata Removal : Deletes entries from system catalog
- Storage Cleanup : Marks column store data for cleanup (may be deferred)
- Cache Invalidation : Removes table from in-memory cache
The operation is typically transactional - either all steps succeed or the table remains.
ALTER TABLE Operations
The CatalogManager coordinates schema modifications, though validation is delegated to specialized functions. Operations include:
- ADD COLUMN : Assigns new
FieldId, updates schema - DROP COLUMN : Validates no dependencies, marks column deleted
- ALTER COLUMN TYPE : Validates type compatibility, updates metadata
- RENAME COLUMN : Updates name mappings
sequenceDiagram
participant Client
participant CM as CatalogManager
participant Validator
participant SysCat as SysCatalog
participant Store as ColumnStore
Client->>CM: create_table(name, schema)
CM->>CM: Allocate TableId
CM->>CM: Assign FieldIds
CM->>Validator: Validate schema
Validator-->>CM: OK
CM->>SysCat: Write TableMeta
CM->>SysCat: Write ColMeta records
SysCat-->>CM: Persisted
CM->>Store: Initialize ColumnStore
Store-->>CM: Table handle
CM->>CM: Update cache
CM-->>Client: CreateTableResult
The validate_alter_table_operation function (referenced in exports) performs constraint checks before modifications are committed.
Sources: llkv-table/src/lib.rs:22-27 llkv-table/src/lib.rs:76-78
Schema and Field Management
Field ID Assignment
Every column in LLKV is assigned a unique FieldId at creation time. This numeric identifier:
- Persists across schema changes : Remains stable even if column is renamed
- Enables versioning : Different table versions can reference same field ID
- Optimizes storage : Physical storage keys use field IDs, not string names
- Supports MVCC : System columns like
created_byhave reserved field IDs
The CatalogManager maintains a monotonic counter per table to allocate field IDs sequentially.
Arrow Schema Integration
The CatalogManager integrates tightly with Apache Arrow schemas:
- Validates Arrow
DataTypecompatibility - Maps Arrow fields to
FieldIdassignments - Stores schema metadata in serialized form
- Reconstructs Arrow
Schemafrom stored metadata
This allows LLKV to leverage Arrow’s type system while maintaining its own field ID system for storage efficiency.
FieldResolver API
The FieldResolver is obtained from a TableCatalogSnapshot and provides:
resolve(column_name: &str) -> Result<FieldId>
resolve_qualified(table_name: &str, column_name: &str) -> Result<FieldId>
get_field_name(field_id: FieldId) -> Option<&str>
get_field_type(field_id: FieldId) -> Option<&DataType>
This bidirectional mapping supports both query translation (name → ID) and result formatting (ID → name).
Sources: llkv-table/src/lib.rs:3-21 llkv-table/src/lib.rs54
Index Registration
Single-Column Indexes
The SingleColumnIndexDescriptor and SingleColumnIndexRegistration types manage metadata for indexes on individual columns:
SingleColumnIndexDescriptor:
- Field ID being indexed
- Index type (e.g., BTree, Hash)
- Index-specific parameters
- Creation timestamp
SingleColumnIndexRegistration:
- Links table to index descriptor
- Tracks index state (building, ready, failed)
- Stores index metadata in system catalog
The CatalogManager maintains a registry of active indexes and coordinates their creation and maintenance.
Multi-Column Indexes
For composite indexes spanning multiple columns, the MultiColumnUniqueRegistration type (referenced in exports) provides similar functionality with support for:
- Multiple field IDs in index key
- Column ordering
- Uniqueness constraints
- Compound key generation
Sources: llkv-table/src/lib.rs55 llkv-table/src/lib.rs73
Metadata Snapshots
Snapshot Creation
A TableCatalogSnapshot provides a consistent view of catalog state. Snapshots are created:
- On Demand : When planning a query
- Periodically : For long-running operations
- At Transaction Start : For transaction isolation
The snapshot is immutable and won’t reflect concurrent DDL changes, ensuring query planning sees a stable schema.
Snapshot Contents
A snapshot typically includes:
- All
TableMetarecords (table definitions) - All
ColMetarecords (column definitions) - Field ID mappings for all tables
- Index registrations (optional)
- Custom type definitions (optional)
- Constraint metadata (optional)
Cache Invalidation
When the CatalogManager modifies metadata:
- Updates system catalog (table 0)
- Increments epoch/version counter
- Invalidates stale snapshots
- Updates in-memory cache
Existing snapshots remain valid but represent a previous version. New snapshots will reflect the changes.
Sources: llkv-table/src/lib.rs54
Integration with SysCatalog
The CatalogManager uses SysCatalog (documented in System Catalog and SysCatalog) as its persistence layer:
Write Operations
- CREATE TABLE : Writes
TableMetaandColMetarecords - ALTER TABLE : Updates existing metadata records
- DROP TABLE : Marks metadata as deleted
- Index Registration : Writes index descriptor records
Read Operations
- Snapshot Creation : Reads all metadata records
- Table Lookup : Queries
TableMetaby name or ID - Field Resolution : Retrieves
ColMetafor a table - Index Discovery : Loads index descriptors
sequenceDiagram
participant Runtime
participant CM as CatalogManager
participant Cache as "In-Memory Cache"
participant SC as SysCatalog
participant Store as ColumnStore
Runtime->>CM: Initialize
CM->>SC: Bootstrap table 0
SC->>Store: Initialize ColumnStore(0)
Store-->>SC: Ready
SC-->>CM: SysCatalog ready
CM->>SC: Read all TableMeta
SC->>Store: scan(TableMeta)
Store-->>SC: RecordBatch
SC-->>CM: Vec<TableMeta>
CM->>SC: Read all ColMeta
SC->>Store: scan(ColMeta)
Store-->>SC: RecordBatch
SC-->>CM: Vec<ColMeta>
CM->>Cache: Populate
Cache-->>CM: Loaded
CM-->>Runtime: CatalogManager ready
Bootstrapping
On system startup:
SysCataloginitializes (creates table 0 if needed)CatalogManagerreads all metadata from table 0- In-memory cache is populated
- System is ready for operations
Sources: llkv-table/src/lib.rs:30-31 llkv-table/src/lib.rs:81-85
Usage Patterns
Creating a Table
// Typical usage pattern (conceptual)
let result = catalog_manager.create_table(
table_name,
schema, // Arrow Schema
table_id_hint // Optional TableId preference
)?;
let table: Table = result.table;
let table_id: TableId = result.table_id;
let field_ids: HashMap<String, FieldId> = result.field_mappings;
Resolving Column Names
// Get a snapshot for consistent reads
let snapshot = catalog_manager.snapshot();
// Resolve column references
let resolver = snapshot.field_resolver(table_id)?;
let field_id = resolver.resolve("column_name")?;
// Use field_id in storage operations
Registering an Index
// Register a single-column index
let descriptor = SingleColumnIndexDescriptor::new(
field_id,
IndexType::BTree,
options
);
catalog_manager.register_index(
table_id,
descriptor
)?;
Checking Metadata Changes
// Capture snapshot version
let snapshot_v1 = catalog_manager.snapshot();
let version_1 = snapshot_v1.version();
// ... DDL operations occur ...
// Create new snapshot and check for changes
let snapshot_v2 = catalog_manager.snapshot();
let version_2 = snapshot_v2.version();
if version_1 != version_2 {
// Metadata has changed, invalidate plans
}
Sources: llkv-table/src/lib.rs:54-89
Thread Safety and Concurrency
The CatalogManager typically uses interior mutability (e.g., RwLock or Mutex) to allow:
- Concurrent Reads : Multiple threads can read snapshots simultaneously
- Exclusive Writes : DDL operations acquire exclusive locks
- Snapshot Isolation : Snapshots remain valid even during concurrent DDL
This design allows high read concurrency while ensuring DDL operations are serialized and atomic.
Related Modules
The CatalogManager coordinates with several related modules:
catalogmodule: Contains the implementation (not shown in provided files)sys_catalogmodule: Persistence layer for metadata [llkv-table/src/sys_catalog.rs]metadatamodule: Extended metadata management [llkv-table/src/metadata.rs]ddlmodule: DDL-specific helpers [llkv-table/src/ddl.rs]resolversmodule: Name resolution utilities [llkv-table/src/resolvers.rs]constraintsmodule: Constraint validation [llkv-table/src/constraints.rs]
Sources: llkv-table/src/lib.rs:34-46 llkv-table/src/lib.rs:68-69 llkv-table/src/lib.rs74 llkv-table/src/lib.rs79
Dismiss
Refresh this wiki
Enter email to refresh