Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Catalog and Metadata Management

Loading…

Catalog and Metadata Management

Relevant source files

Purpose and Scope

This document explains the catalog and metadata infrastructure that tracks all tables, columns, indexes, constraints, and custom types in the LLKV system. The catalog provides schema information and manages the lifecycle of database objects.

For details on the table creation and management API, see CatalogManager API. For implementation details of how metadata is stored, see System Catalog and SysCatalog. For custom type definitions and aliases, see Custom Types and Type Registry.

Overview

The LLKV catalog is a self-describing system : all metadata about tables, columns, indexes, and constraints is stored as structured records in Table 0 , which is itself a table managed by the same ColumnStore that manages user data. This bootstrapped design means the catalog uses the same storage primitives as regular tables.

The catalog provides:

  • Schema tracking : Table and column metadata with names and types
  • Index registry : Persisted sort indexes and multi-column indexes
  • Constraint metadata : Primary keys, foreign keys, unique constraints, and check constraints
  • Trigger definitions : Event triggers with timing and execution metadata
  • Custom type registry : User-defined type aliases

Self-Describing Architecture

Sources: llkv-table/src/lib.rs:1-98 llkv-table/src/table.rs:499-511

Metadata Types

The catalog stores several types of metadata records, each representing a different aspect of database schema and configuration.

Core Metadata Records

Metadata TypeDescriptionKey Fields
TableMetaTable definitionstable_id, name, schema
ColMetaColumn definitionscol_id, name, flags, default
SingleColumnIndexEntryMetaSingle-column index registryfield_id, index_kind
MultiColumnIndexEntryMetaMulti-column index registryfield_ids, index_kind
TriggerEntryMetaTrigger definitionstrigger_id, timing, event
CustomTypeMetaUser-defined typestype_name, base_type

Constraint Metadata

Constraints are stored as specialized metadata records that enforce referential integrity and data validation:

Constraint TypeMetadata Structure
Primary KeyPrimaryKeyConstraint - columns forming the primary key
Foreign KeyForeignKeyConstraint - parent/child table references and actions
UniqueUniqueConstraint - columns with unique constraint
CheckCheckConstraint - validation expression

Sources: llkv-table/src/lib.rs:81-86 llkv-table/src/lib.rs:56-67

Table ID Ranges

Table IDs are partitioned into reserved ranges to distinguish system tables from user tables and temporary objects.

graph LR
    subgraph "Table ID Space"
        CATALOG["0\nSystem Catalog"]
USER["1-999\nUser Tables"]
INFOSCHEMA["1000+\nInformation Schema"]
TEMP["10000+\nTemporary Tables"]
end
    
    CATALOG -.special case.-> CATALOG
    USER -.normal tables.-> USER
    INFOSCHEMA -.system views.-> INFOSCHEMA
    TEMP -.session-local.-> TEMP

Reserved Table ID Constants

The is_reserved_table_id() function checks whether a table ID is in the reserved range (Table 0). User code cannot directly instantiate Table objects for reserved IDs.

Sources: llkv-table/src/lib.rs:75-78 llkv-table/src/table.rs:110-126

Storage Architecture

Catalog as a Table

The system catalog is physically stored as Table 0 in the ColumnStore. Each metadata type (table definitions, column definitions, indexes, etc.) is stored as a column in this special table, with each row representing one metadata record.

graph TB
    subgraph "Logical View"
        CATALOG["System Catalog API\n(SysCatalog)"]
end
    
    subgraph "Physical Storage"
        TABLE0["Table 0"]
COLS["Columns:\n- table_meta\n- col_meta\n- index_meta\n- trigger_meta\n- constraint_meta"]
end
    
    subgraph "ColumnStore Layer"
        DESCRIPTORS["Column Descriptors"]
CHUNKS["Data Chunks\n(Serialized Arrow)"]
end
    
    subgraph "Pager Layer"
        KVSTORE["Key-Value Store"]
end
    
 
   CATALOG --> TABLE0
 
   TABLE0 --> COLS
 
   COLS --> DESCRIPTORS
 
   DESCRIPTORS --> CHUNKS
 
   CHUNKS --> KVSTORE

Accessing the Catalog

The Table::catalog() method provides access to the system catalog without exposing the underlying table structure. The SysCatalog type wraps the ColumnStore and provides typed methods for reading and writing metadata.

The get_table_meta() and get_cols_meta() convenience methods delegate to the catalog:

Sources: llkv-table/src/table.rs:499-511

Metadata Operations

Table Creation

Table creation is coordinated by the CatalogManager, which handles metadata persistence, catalog registration, and storage initialization. The Table type provides factory methods that delegate to the catalog manager:

This factory pattern ensures that table creation is properly coordinated across three layers:

  1. MetadataManager : Assigns table IDs and tracks metadata
  2. TableCatalog : Maintains name-to-ID mappings
  3. ColumnStore : Initializes physical storage

Sources: llkv-table/src/table.rs:80-103

Defensive Metadata Persistence

When appending data, the Table::append() method defensively persists column names to the catalog if they’re missing. This ensures metadata consistency even when batches arrive with only field_id metadata and no column names.

This defensive approach handles cases like CSV import where column names are known but may not have been explicitly registered in the catalog.

Sources: llkv-table/src/table.rs:327-344

sequenceDiagram
    participant Client
    participant Table
    participant ColumnStore
    participant Catalog
    
    Client->>Table: schema()
    Table->>ColumnStore: user_field_ids_for_table(table_id)
    ColumnStore-->>Table: logical_fields: [LogicalFieldId]
    
    Table->>Catalog: get_cols_meta(field_ids)
    Catalog-->>Table: metas: [ColMeta]
    
    loop "For each field"
        Table->>ColumnStore: data_type(lfid)
        ColumnStore-->>Table: DataType
        Table->>Table: Build Field with metadata
    end
    
    Table-->>Client: Arc<Schema>

Schema Resolution

The Table::schema() method constructs an Arrow Schema by querying the catalog for column metadata and combining it with physical data type information from the ColumnStore.

The resulting schema includes:

  • row_id field (always first)
  • User-defined columns with names from the catalog
  • field_id stored in field metadata for each column

Sources: llkv-table/src/table.rs:519-549

Index Registration

The catalog tracks persisted sort indexes for columns, allowing efficient range scans and ordered reads.

Registering Indexes

Listing Indexes

Index metadata is stored in the catalog and used by the query planner to optimize scan operations.

Sources: llkv-table/src/table.rs:145-173

Integration with CSV Import/Export

CSV import and export operations rely on the catalog to resolve column names and field IDs. The CsvWriter queries the catalog when building projections to ensure that columns are properly aliased.

This integration ensures that exported CSV files have human-readable column headers even when the underlying storage uses numeric field IDs.

Sources: llkv-csv/src/writer.rs:320-368

graph TB
    subgraph "SQL Layer"
        SQLENGINE["SqlEngine"]
PLANNER["Query Planner"]
end
    
    subgraph "Catalog Layer"
        CATALOG["CatalogManager"]
METADATA["MetadataManager"]
SYSCAT["SysCatalog"]
end
    
    subgraph "Table Layer"
        TABLE["Table"]
CONSTRAINTS["ConstraintService"]
end
    
    subgraph "Storage Layer"
        COLSTORE["ColumnStore"]
PAGER["Pager"]
end
    
 
   SQLENGINE -->|CREATE TABLE| CATALOG
 
   SQLENGINE -->|ALTER TABLE| CATALOG
 
   SQLENGINE -->|DROP TABLE| CATALOG
    
 
   CATALOG --> METADATA
 
   CATALOG --> SYSCAT
 
   CATALOG --> TABLE
    
 
   TABLE --> SYSCAT
 
   TABLE --> CONSTRAINTS
 
   TABLE --> COLSTORE
    
 
   SYSCAT --> COLSTORE
 
   COLSTORE --> PAGER
    
    PLANNER -.resolves schema.-> CATALOG
    CONSTRAINTS -.validates.-> SYSCAT

Relationship to Other Systems

The catalog sits at the center of the LLKV architecture, connecting several subsystems:

  • SQL Layer : Issues DDL commands that modify the catalog
  • Query Planner : Resolves table and column names via the catalog
  • Table Layer : Queries metadata during data operations
  • Constraint Layer : Uses catalog to track and enforce constraints
  • Storage Layer : Physically persists catalog records as Table 0

Sources: llkv-table/src/lib.rs:34-98

Dismiss

Refresh this wiki

Enter email to refresh