This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Catalog and Metadata Management
Loading…
Catalog and Metadata Management
Relevant source files
- llkv-csv/src/writer.rs
- llkv-table/examples/direct_comparison.rs
- llkv-table/examples/performance_benchmark.rs
- llkv-table/examples/test_streaming.rs
- llkv-table/src/lib.rs
- llkv-table/src/table.rs
Purpose and Scope
This document explains the catalog and metadata infrastructure that tracks all tables, columns, indexes, constraints, and custom types in the LLKV system. The catalog provides schema information and manages the lifecycle of database objects.
For details on the table creation and management API, see CatalogManager API. For implementation details of how metadata is stored, see System Catalog and SysCatalog. For custom type definitions and aliases, see Custom Types and Type Registry.
Overview
The LLKV catalog is a self-describing system : all metadata about tables, columns, indexes, and constraints is stored as structured records in Table 0 , which is itself a table managed by the same ColumnStore that manages user data. This bootstrapped design means the catalog uses the same storage primitives as regular tables.
The catalog provides:
- Schema tracking : Table and column metadata with names and types
- Index registry : Persisted sort indexes and multi-column indexes
- Constraint metadata : Primary keys, foreign keys, unique constraints, and check constraints
- Trigger definitions : Event triggers with timing and execution metadata
- Custom type registry : User-defined type aliases
Self-Describing Architecture
Sources: llkv-table/src/lib.rs:1-98 llkv-table/src/table.rs:499-511
Metadata Types
The catalog stores several types of metadata records, each representing a different aspect of database schema and configuration.
Core Metadata Records
| Metadata Type | Description | Key Fields |
|---|---|---|
TableMeta | Table definitions | table_id, name, schema |
ColMeta | Column definitions | col_id, name, flags, default |
SingleColumnIndexEntryMeta | Single-column index registry | field_id, index_kind |
MultiColumnIndexEntryMeta | Multi-column index registry | field_ids, index_kind |
TriggerEntryMeta | Trigger definitions | trigger_id, timing, event |
CustomTypeMeta | User-defined types | type_name, base_type |
Constraint Metadata
Constraints are stored as specialized metadata records that enforce referential integrity and data validation:
| Constraint Type | Metadata Structure |
|---|---|
| Primary Key | PrimaryKeyConstraint - columns forming the primary key |
| Foreign Key | ForeignKeyConstraint - parent/child table references and actions |
| Unique | UniqueConstraint - columns with unique constraint |
| Check | CheckConstraint - validation expression |
Sources: llkv-table/src/lib.rs:81-86 llkv-table/src/lib.rs:56-67
Table ID Ranges
Table IDs are partitioned into reserved ranges to distinguish system tables from user tables and temporary objects.
graph LR
subgraph "Table ID Space"
CATALOG["0\nSystem Catalog"]
USER["1-999\nUser Tables"]
INFOSCHEMA["1000+\nInformation Schema"]
TEMP["10000+\nTemporary Tables"]
end
CATALOG -.special case.-> CATALOG
USER -.normal tables.-> USER
INFOSCHEMA -.system views.-> INFOSCHEMA
TEMP -.session-local.-> TEMP
Reserved Table ID Constants
The is_reserved_table_id() function checks whether a table ID is in the reserved range (Table 0). User code cannot directly instantiate Table objects for reserved IDs.
Sources: llkv-table/src/lib.rs:75-78 llkv-table/src/table.rs:110-126
Storage Architecture
Catalog as a Table
The system catalog is physically stored as Table 0 in the ColumnStore. Each metadata type (table definitions, column definitions, indexes, etc.) is stored as a column in this special table, with each row representing one metadata record.
graph TB
subgraph "Logical View"
CATALOG["System Catalog API\n(SysCatalog)"]
end
subgraph "Physical Storage"
TABLE0["Table 0"]
COLS["Columns:\n- table_meta\n- col_meta\n- index_meta\n- trigger_meta\n- constraint_meta"]
end
subgraph "ColumnStore Layer"
DESCRIPTORS["Column Descriptors"]
CHUNKS["Data Chunks\n(Serialized Arrow)"]
end
subgraph "Pager Layer"
KVSTORE["Key-Value Store"]
end
CATALOG --> TABLE0
TABLE0 --> COLS
COLS --> DESCRIPTORS
DESCRIPTORS --> CHUNKS
CHUNKS --> KVSTORE
Accessing the Catalog
The Table::catalog() method provides access to the system catalog without exposing the underlying table structure. The SysCatalog type wraps the ColumnStore and provides typed methods for reading and writing metadata.
The get_table_meta() and get_cols_meta() convenience methods delegate to the catalog:
Sources: llkv-table/src/table.rs:499-511
Metadata Operations
Table Creation
Table creation is coordinated by the CatalogManager, which handles metadata persistence, catalog registration, and storage initialization. The Table type provides factory methods that delegate to the catalog manager:
This factory pattern ensures that table creation is properly coordinated across three layers:
- MetadataManager : Assigns table IDs and tracks metadata
- TableCatalog : Maintains name-to-ID mappings
- ColumnStore : Initializes physical storage
Sources: llkv-table/src/table.rs:80-103
Defensive Metadata Persistence
When appending data, the Table::append() method defensively persists column names to the catalog if they’re missing. This ensures metadata consistency even when batches arrive with only field_id metadata and no column names.
This defensive approach handles cases like CSV import where column names are known but may not have been explicitly registered in the catalog.
Sources: llkv-table/src/table.rs:327-344
sequenceDiagram
participant Client
participant Table
participant ColumnStore
participant Catalog
Client->>Table: schema()
Table->>ColumnStore: user_field_ids_for_table(table_id)
ColumnStore-->>Table: logical_fields: [LogicalFieldId]
Table->>Catalog: get_cols_meta(field_ids)
Catalog-->>Table: metas: [ColMeta]
loop "For each field"
Table->>ColumnStore: data_type(lfid)
ColumnStore-->>Table: DataType
Table->>Table: Build Field with metadata
end
Table-->>Client: Arc<Schema>
Schema Resolution
The Table::schema() method constructs an Arrow Schema by querying the catalog for column metadata and combining it with physical data type information from the ColumnStore.
The resulting schema includes:
row_idfield (always first)- User-defined columns with names from the catalog
field_idstored in field metadata for each column
Sources: llkv-table/src/table.rs:519-549
Index Registration
The catalog tracks persisted sort indexes for columns, allowing efficient range scans and ordered reads.
Registering Indexes
Listing Indexes
Index metadata is stored in the catalog and used by the query planner to optimize scan operations.
Sources: llkv-table/src/table.rs:145-173
Integration with CSV Import/Export
CSV import and export operations rely on the catalog to resolve column names and field IDs. The CsvWriter queries the catalog when building projections to ensure that columns are properly aliased.
This integration ensures that exported CSV files have human-readable column headers even when the underlying storage uses numeric field IDs.
Sources: llkv-csv/src/writer.rs:320-368
graph TB
subgraph "SQL Layer"
SQLENGINE["SqlEngine"]
PLANNER["Query Planner"]
end
subgraph "Catalog Layer"
CATALOG["CatalogManager"]
METADATA["MetadataManager"]
SYSCAT["SysCatalog"]
end
subgraph "Table Layer"
TABLE["Table"]
CONSTRAINTS["ConstraintService"]
end
subgraph "Storage Layer"
COLSTORE["ColumnStore"]
PAGER["Pager"]
end
SQLENGINE -->|CREATE TABLE| CATALOG
SQLENGINE -->|ALTER TABLE| CATALOG
SQLENGINE -->|DROP TABLE| CATALOG
CATALOG --> METADATA
CATALOG --> SYSCAT
CATALOG --> TABLE
TABLE --> SYSCAT
TABLE --> CONSTRAINTS
TABLE --> COLSTORE
SYSCAT --> COLSTORE
COLSTORE --> PAGER
PLANNER -.resolves schema.-> CATALOG
CONSTRAINTS -.validates.-> SYSCAT
Relationship to Other Systems
The catalog sits at the center of the LLKV architecture, connecting several subsystems:
- SQL Layer : Issues DDL commands that modify the catalog
- Query Planner : Resolves table and column names via the catalog
- Table Layer : Queries metadata during data operations
- Constraint Layer : Uses catalog to track and enforce constraints
- Storage Layer : Physically persists catalog records as Table 0
Sources: llkv-table/src/lib.rs:34-98
Dismiss
Refresh this wiki
Enter email to refresh