This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
System Catalog and SysCatalog
Relevant source files
- Cargo.lock
- Cargo.toml
- llkv-aggregate/README.md
- llkv-column-map/README.md
- llkv-csv/README.md
- llkv-expr/README.md
- llkv-join/README.md
- llkv-runtime/README.md
- llkv-storage/README.md
- llkv-table/README.md
Purpose and Scope
This document describes the system catalog infrastructure that stores and manages table and column metadata for LLKV. The system catalog treats metadata as first-class data, persisting it in table 0 using the same Arrow-based storage mechanisms that handle user data. This ensures crash consistency, enables transactional DDL operations, and simplifies the overall architecture by eliminating separate metadata storage layers.
For information about the higher-level catalog management API that orchestrates table lifecycle operations, see CatalogManager API. For details on custom type definitions and the type registry, see Custom Types and Type Registry.
System Catalog as Table 0
LLKV stores all table and column metadata in a special table with ID 0, known as the system catalog. This design leverages the existing storage infrastructure rather than introducing a separate metadata store.
Key Properties
| Property | Description |
|---|---|
| Table ID | Always 0, reserved at system initialization |
| Storage Format | Arrow RecordBatch with predefined schema |
| MVCC Semantics | Full transaction support with snapshot isolation |
| Persistence | Uses the same ColumnStore and Pager as user tables |
| Crash Safety | Metadata mutations are atomic through the append pipeline |
The system catalog contains two types of metadata records:
- Table Metadata (
TableMeta): Defines table schemas, IDs, and names - Column Metadata (
ColMeta): Describes individual columns within tables
graph TB
subgraph "Metadata Storage Model"
UserTables["User Tables\n(ID ≥ 1)"]
SysCatalog["System Catalog\n(Table 0)"]
TableMeta["TableMeta Records\n• table_id\n• table_name\n• schema"]
ColMeta["ColMeta Records\n• table_id\n• col_name\n• col_id\n• data_type"]
end
subgraph "Storage Layer"
ColumnStore["ColumnStore"]
Pager["Pager (MemPager/SimdRDrivePager)"]
end
UserTables -->|described by| SysCatalog
SysCatalog --> TableMeta
SysCatalog --> ColMeta
SysCatalog -->|persisted via| ColumnStore
UserTables -->|persisted via| ColumnStore
ColumnStore --> Pager
style SysCatalog fill:#f9f9f9
Sources: llkv-table/README.md:28-29 llkv-column-map/README.md:10-16
Metadata Schema
The system catalog stores metadata using a predefined Arrow schema with the following structure:
TableMeta Schema
| Field Name | Arrow Type | Description |
|---|---|---|
table_id | UInt32 | Unique identifier for the table |
table_name | Utf8 | Human-readable table name |
schema | Binary | Serialized Arrow schema definition |
row_id | UInt64 | MVCC row identifier (auto-injected) |
created_by | UInt64 | Transaction ID that created this record |
deleted_by | UInt64 | Transaction ID that deleted this record (NULL if active) |
ColMeta Schema
| Field Name | Arrow Type | Description |
|---|---|---|
table_id | UInt32 | References the parent table |
col_id | UInt32 | Column identifier within the table |
col_name | Utf8 | Column name |
data_type | Utf8 | Arrow data type descriptor |
row_id | UInt64 | MVCC row identifier (auto-injected) |
created_by | UInt64 | Transaction ID that created this record |
deleted_by | UInt64 | Transaction ID that deleted this record (NULL if active) |
Sources: llkv-table/README.md:13-17 Diagram 4 from high-level architecture
SysCatalog Implementation
The SysCatalog struct serves as the programmatic interface to the system catalog, providing methods to read and write metadata while abstracting the underlying Arrow storage details.
graph LR
subgraph "SysCatalog Interface"
SysCatalog["SysCatalog"]
CreateTable["create_table()"]
GetTable["get_table_meta()"]
ListTables["list_tables()"]
DropTable["drop_table()"]
CreateCol["create_column()"]
GetCol["get_column_meta()"]
ListCols["list_columns()"]
end
subgraph "Storage Backend"
Table0["Table (ID=0)"]
ColumnStore["ColumnStore"]
end
SysCatalog --> CreateTable
SysCatalog --> GetTable
SysCatalog --> ListTables
SysCatalog --> DropTable
SysCatalog --> CreateCol
SysCatalog --> GetCol
SysCatalog --> ListCols
CreateTable --> Table0
GetTable --> Table0
ListTables --> Table0
DropTable --> Table0
CreateCol --> Table0
GetCol --> Table0
ListCols --> Table0
Table0 --> ColumnStore
Core Components
Sources: llkv-table/README.md:28-29 llkv-runtime/README.md39
Metadata Query Process
When the runtime queries the catalog (e.g., during SELECT planning), it follows this flow:
Sources: llkv-table/README.md:23-25 llkv-runtime/README.md:36-40
sequenceDiagram
participant Runtime as RuntimeContext
participant Catalog as SysCatalog
participant Table0 as Table (ID=0)
participant Store as ColumnStore
Runtime->>Catalog: get_table_meta("users")
Catalog->>Table0: scan_stream()\nWHERE table_name = 'users'
Table0->>Store: ColumnStream with predicate
Store->>Store: Apply MVCC filtering
Store-->>Table0: RecordBatch
Table0-->>Catalog: RecordBatch
Note over Catalog: Deserialize TableMeta\nfrom Arrow batch
Catalog-->>Runtime: TableMeta struct
Runtime->>Catalog: list_columns(table_id)
Catalog->>Table0: scan_stream()\nWHERE table_id = X
Table0->>Store: ColumnStream with predicate
Store-->>Table0: RecordBatch
Table0-->>Catalog: RecordBatch
Note over Catalog: Deserialize ColMeta\nrecords
Catalog-->>Runtime: Vec<ColMeta>
Metadata Operations
DDL operations (CREATE TABLE, DROP TABLE, ALTER TABLE) modify the system catalog through the same transactional append pipeline used for INSERT statements.
graph TD
ParseSQL["Parse SQL:\nCREATE TABLE users (...)"]
CreatePlan["CreateTablePlan"]
RuntimeExec["Runtime.execute_create_table()"]
ValidateSchema["Validate Schema"]
AllocTableID["Allocate table_id"]
BuildTableMeta["Build TableMeta RecordBatch"]
BuildColMeta["Build ColMeta RecordBatch"]
AppendTable["Table(0).append(TableMeta)"]
AppendCols["Table(0).append(ColMeta)"]
ColumnStore["ColumnStore.append()"]
CommitPager["Pager.batch_put()"]
ParseSQL --> CreatePlan
CreatePlan --> RuntimeExec
RuntimeExec --> ValidateSchema
ValidateSchema --> AllocTableID
AllocTableID --> BuildTableMeta
AllocTableID --> BuildColMeta
BuildTableMeta --> AppendTable
BuildColMeta --> AppendCols
AppendTable --> ColumnStore
AppendCols --> ColumnStore
ColumnStore --> CommitPager
style AppendTable fill:#f9f9f9
style AppendCols fill:#f9f9f9
CREATE TABLE Flow
Key Implementation Details:
- Schema Validation : The runtime validates the Arrow schema before allocating resources
- Table ID Allocation : Monotonically increasing IDs are assigned via
CatalogManager - Atomic Append : Both
TableMetaand allColMetarecords are appended in a single transaction - MVCC Tagging : The
created_bycolumn is set to the current transaction ID
Sources: llkv-runtime/README.md:36-40 llkv-table/README.md:22-24
graph TD
DropPlan["DropTablePlan"]
RuntimeExec["Runtime.execute_drop_table()"]
LookupMeta["SysCatalog.get_table_meta()"]
CheckExists["Verify table exists"]
BuildDeleteMeta["Build RecordBatch:\n• table_id\n• deleted_by = current_txn"]
AppendDelete["Table(0).append(delete_batch)"]
ColumnStore["ColumnStore.append()"]
DropPlan --> RuntimeExec
RuntimeExec --> LookupMeta
LookupMeta --> CheckExists
CheckExists --> BuildDeleteMeta
BuildDeleteMeta --> AppendDelete
AppendDelete --> ColumnStore
style BuildDeleteMeta fill:#f9f9f9
DROP TABLE Flow
Dropping a table uses MVCC soft-delete semantics rather than physical deletion:
The deleted_by column is updated to mark the metadata as deleted. MVCC visibility rules ensure that:
- Transactions with snapshots before the deletion still see the table
- Transactions starting after the deletion do not see the table
Sources: llkv-table/README.md:32-34 Diagram 4 from high-level architecture
sequenceDiagram
participant Main as main() or SqlEngine::new()
participant Runtime as RuntimeContext::new()
participant CatMgr as CatalogManager::new()
participant Table as Table::open_or_create()
participant Store as ColumnStore::open()
participant Pager as Pager (MemPager/SimdRDrivePager)
Main->>Runtime: new(pager)
Runtime->>CatMgr: new(pager)
CatMgr->>Store: open(pager, root_key)
Store->>Pager: batch_get([root_key])
alt Catalog Exists
Pager-->>Store: Catalog data
Store-->>CatMgr: ColumnStore (loaded)
Note over CatMgr: Deserialize catalog entries\nelse First Run
Pager-->>Store: NULL
Store-->>CatMgr: ColumnStore (empty)
CatMgr->>Table: open_or_create(table_id=0)
Note over CatMgr: Create system catalog schema
Table->>Store: Initialize table 0
Store->>Pager: batch_put(catalog_schema)
end
CatMgr-->>Runtime: CatalogManager (initialized)
Runtime-->>Main: RuntimeContext (ready)
Bootstrap Process
When LLKV initializes, the system catalog must bootstrap itself before any user operations can proceed.
Initialization Sequence
Bootstrap Steps:
- Pager Initialization : The storage backend is opened (in-memory or persistent)
- Catalog Discovery : The
ColumnStoreattempts to load the catalog from the pager root key - Schema Creation : If no catalog exists, table 0 is created with the predefined schema
- Ready State : The runtime can now service DDL and DML operations
Sources: llkv-runtime/README.md:26-31 llkv-storage/README.md:12-16
graph TB
subgraph "SQL Query Processing"
ParsedSQL["Parsed SQL AST"]
SelectPlan["SelectPlan<String>"]
ResolvedPlan["SelectPlan<FieldId>"]
end
subgraph "RuntimeContext"
CatalogLookup["Catalog Lookup"]
FieldResolution["Field Name → FieldId\nResolution"]
SchemaValidation["Schema Validation"]
end
subgraph "System Catalog"
SysCatalog["SysCatalog"]
TableMetaCache["In-Memory Metadata Cache"]
end
ParsedSQL --> SelectPlan
SelectPlan --> CatalogLookup
CatalogLookup --> SysCatalog
SysCatalog --> TableMetaCache
TableMetaCache --> FieldResolution
FieldResolution --> SchemaValidation
SchemaValidation --> ResolvedPlan
Integration with Runtime
The RuntimeContext uses the system catalog for all schema-dependent operations:
Schema Resolution Flow
Usage Examples
| Operation | Catalog Interaction |
|---|---|
| SELECT | Resolve table names → table IDs, resolve column names → field IDs |
| INSERT | Validate schema compatibility, check for required columns |
| JOIN | Resolve schemas for both tables, validate join key compatibility |
| CREATE INDEX | (Future) Persist index metadata as new catalog record type |
| ALTER TABLE | Update existing metadata records with new schema definitions |
Sources: llkv-runtime/README.md:36-40 llkv-expr/README.md:50-54
Dual-Context Catalog Access
During explicit transactions, the runtime maintains two catalog views:
Catalog Visibility Rules
- Persistent Context : Sees only metadata committed before the transaction's snapshot
- Staging Context : Sees tables created within the current transaction
- On Commit : Staged metadata is replayed into the persistent context
- On Rollback : Staged metadata is discarded
This dual-view approach ensures that:
- DDL operations remain transactional
- Uncommitted schema changes don't leak to other sessions
- Catalog queries are snapshot-isolated like DML operations
Sources: llkv-runtime/README.md:26-31 llkv-table/README.md:32-34
Metadata Caching
The CatalogManager maintains an in-memory cache of frequently accessed metadata to avoid repeated scans of table 0:
| Cache Structure | Purpose | Invalidation Strategy |
|---|---|---|
| Table Name → ID Map | Fast table resolution during planning | Invalidated on CREATE/DROP TABLE |
| Table ID → Schema Map | Quick schema validation during INSERT | Invalidated on ALTER TABLE |
| Column Name → FieldId Map | Field resolution for expressions | Rebuilt on schema changes |
The cache is session-local and does not require cross-session synchronization in the current single-process model.
Sources: Inferred from llkv-runtime/README.md:12-17
Summary
The LLKV system catalog demonstrates the principle of treating metadata as data by storing all table and column definitions in table 0 using the same Arrow-based storage infrastructure that handles user tables. This design:
- Simplifies Architecture : Eliminates the need for separate metadata storage systems
- Ensures Consistency : Metadata mutations use MVCC transactions like all other data
- Enables Crash Recovery : The pager's atomicity guarantees extend to schema changes
- Supports Transactional DDL : Schema modifications can be rolled back or committed atomically
The SysCatalog interface abstracts the underlying Arrow storage, providing a type-safe API for the runtime to query and modify metadata. The bootstrap process ensures the system catalog exists before any user operations proceed, and the dual-context model enables proper transaction isolation for DDL operations.
Sources: llkv-table/README.md:28-29 llkv-runtime/README.md:36-40 llkv-column-map/README.md:10-16 Diagram 4 from high-level architecture