This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Custom Types and Type Registry
Loading…
Custom Types and Type Registry
Relevant source files
Purpose and Scope
This document describes LLKV’s custom type system, which enables users to define and manage type aliases that extend Apache Arrow’s native type system. Custom types are persisted in the system catalog and provide a mechanism for creating domain-specific type names that map to underlying Arrow DataType definitions.
For information about the system catalog infrastructure that stores custom type metadata, see System Catalog and SysCatalog. For details on how tables use these types in their schemas, see Table Abstraction.
Type System Architecture
LLKV’s type system is built on Apache Arrow’s columnar type system but adds a layer of indirection through custom type definitions. This allows users to create semantic type names (e.g., email_address, currency_amount) that map to specific Arrow types with additional constraints or metadata.
Type Resolution Flow
graph TB
subgraph "SQL Layer"
DDL["CREATE TYPE Statement"]
COLDEF["Column Definition\nwith Custom Type"]
end
subgraph "Type Registry"
SYSCAT["SysCatalog\nTable 0"]
TYPEMETA["CustomTypeMeta\nRecords"]
RESOLVER["Type Resolver"]
end
subgraph "Arrow Type System"
ARROW["Arrow DataType"]
SCHEMA["Arrow Schema"]
end
subgraph "Column Storage"
COLSTORE["ColumnStore"]
DESCRIPTOR["ColumnDescriptor"]
end
DDL --> TYPEMETA
TYPEMETA --> SYSCAT
COLDEF --> RESOLVER
RESOLVER --> TYPEMETA
RESOLVER --> ARROW
ARROW --> SCHEMA
SCHEMA --> COLSTORE
COLSTORE --> DESCRIPTOR
style TYPEMETA fill:#f9f9f9
style SYSCAT fill:#f9f9f9
style RESOLVER fill:#f9f9f9
- User defines custom types via SQL DDL
- Type metadata is stored in the system catalog
- Column definitions reference custom types by name
- Type resolver translates names to Arrow DataTypes
- Physical storage uses Arrow’s native columnar format
Sources: llkv-table/src/lib.rs82 llkv-table/src/lib.rs:81-85
CustomTypeMeta Structure
CustomTypeMeta is the fundamental metadata structure that describes a custom type definition. It is stored as a record in the system catalog (Table 0) alongside other metadata like TableMeta and ColMeta.
CustomTypeMeta Fields
classDiagram
class CustomTypeMeta {+type_id: TypeId\n+type_name: String\n+base_type: ArrowDataType\n+nullable: bool\n+metadata: HashMap~String,String~\n+created_at: Timestamp\n+created_by: TransactionId\n+deleted_by: Option~TransactionId~}
class SysCatalog {+register_custom_type()\n+get_custom_type()\n+list_custom_types()\n+drop_custom_type()}
class ArrowDataType {<<enumeration>>\nInt64\nUtf8\nDecimal128\nDate32\nTimestamp\nStruct\nList}
class ColumnDescriptor {+field_id: FieldId\n+data_type: DataType}
CustomTypeMeta --> ArrowDataType : maps_to
SysCatalog --> CustomTypeMeta : stores
ColumnDescriptor --> ArrowDataType : uses
| Field | Type | Description |
|---|---|---|
type_id | TypeId | Unique identifier for the custom type |
type_name | String | User-defined name (e.g., “email_address”) |
base_type | ArrowDataType | Underlying Arrow type definition |
nullable | bool | Whether NULL values are permitted |
metadata | HashMap<String, String> | Additional type-specific metadata |
created_at | Timestamp | Type creation timestamp |
created_by | TransactionId | Transaction that created the type |
deleted_by | Option<TransactionId> | MVCC deletion marker |
Sources: llkv-table/src/lib.rs82
Type Registration and Lifecycle
Custom types are managed through the SysCatalog interface, which provides operations for the complete type lifecycle: registration, retrieval, modification, and deletion.
sequenceDiagram
participant User
participant SqlEngine
participant CatalogManager
participant SysCatalog
participant Table0 as Table 0
User->>SqlEngine: CREATE TYPE email_address AS VARCHAR(255)
SqlEngine->>SqlEngine: Parse DDL statement
SqlEngine->>CatalogManager: register_custom_type(name, base_type)
CatalogManager->>CatalogManager: Validate type name uniqueness
CatalogManager->>CatalogManager: Assign new TypeId
CatalogManager->>SysCatalog: Insert CustomTypeMeta record
SysCatalog->>SysCatalog: Build RecordBatch with metadata
SysCatalog->>SysCatalog: Add MVCC columns (created_by)
SysCatalog->>Table0: append(batch)
Table0->>Table0: Write to ColumnStore
Table0-->>SysCatalog: Success
SysCatalog-->>CatalogManager: TypeId
CatalogManager-->>SqlEngine: Result
SqlEngine-->>User: Type created successfully
Type Registration Flow
Registration Steps
- DDL statement parsed by SQL layer
CatalogManagervalidates type name uniqueness- New
TypeIdallocated CustomTypeMetarecord constructed- Metadata written to system catalog (Table 0)
- MVCC columns (
created_by,deleted_by) added automatically - Type becomes available for schema definitions
Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85
Type Lifecycle Operations
Type States
- Registered : Type defined but not yet used in any table schemas
- InUse : One or more columns reference this type
- Modified : Type definition updated (if ALTER TYPE is supported)
- Deprecated : Type soft-deleted via MVCC (deleted_by set)
- Deleted : Type permanently removed from catalog
Sources: llkv-table/src/lib.rs:81-85
Type Resolution and Schema Integration
When creating tables or altering schemas, the type resolver translates custom type names to Arrow DataType instances. This resolution happens during DDL execution and schema validation.
Resolution Process
graph LR
subgraph "Table Definition"
COL1["Column: 'email'\nType: 'email_address'"]
COL2["Column: 'age'\nType: 'INT'"]
end
subgraph "Type Resolution"
RESOLVER["Type Resolver"]
CACHE["Type Cache"]
end
subgraph "System Catalog"
CUSTOM["CustomTypeMeta\nemail_address → Utf8(255)"]
BUILTIN["Built-in Types\nINT → Int32"]
end
subgraph "Arrow Schema"
FIELD1["Field: 'email'\nDataType: Utf8"]
FIELD2["Field: 'age'\nDataType: Int32"]
end
COL1 --> RESOLVER
COL2 --> RESOLVER
RESOLVER --> CACHE
CACHE --> CUSTOM
RESOLVER --> BUILTIN
CUSTOM --> FIELD1
BUILTIN --> FIELD2
FIELD1 --> SCHEMA["Arrow Schema"]
FIELD2 --> SCHEMA
- Column definition specifies type by name
- Type resolver checks cache for previous resolution
- Cache miss triggers lookup in
SysCatalog - Custom type metadata retrieved from Table 0
- Base Arrow
DataTypeextracted - Type constraints/metadata applied
- Resolved type cached for subsequent use
- Arrow
Fieldconstructed with final type
Sources: llkv-table/src/lib.rs43 llkv-table/src/lib.rs54
DDL Operations for Custom Types
CREATE TYPE Statement
Processing Steps
| Step | Action | Component |
|---|---|---|
| 1 | Parse SQL | SqlEngine → sqlparser |
| 2 | Extract type definition | SQL preprocessing layer |
| 3 | Validate base type | CatalogManager |
| 4 | Check name uniqueness | SysCatalog query |
| 5 | Allocate TypeId | CatalogManager |
| 6 | Construct metadata | CustomTypeMeta builder |
| 7 | Write to catalog | SysCatalog::append() |
| 8 | Update cache | Type resolver cache |
Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs68
DROP TYPE Statement
Deletion Process
MVCC Soft Delete
- Type records are not physically removed
deleted_byfield set to transaction ID- Historical queries can still see old type definitions
- New schemas cannot reference deleted types
- Cache invalidation ensures immediate visibility
Sources: llkv-table/src/lib.rs:81-85
Storage in System Catalog
Custom type metadata is stored in the system catalog (Table 0) alongside other metadata types like TableMeta, ColMeta, and constraint information.
System Catalog Schema for CustomTypeMeta
| Column Name | Type | Description |
|---|---|---|
row_id | RowId | Unique row identifier |
metadata_type | Utf8 | Discriminator: “CustomType” |
type_id | UInt64 | Custom type identifier |
type_name | Utf8 | User-defined type name |
base_type_json | Utf8 | Serialized Arrow DataType |
nullable | Boolean | Nullability flag |
metadata_json | Utf8 | Additional metadata as JSON |
created_at | Timestamp | Creation timestamp |
created_by | UInt64 | Creating transaction ID |
deleted_by | UInt64 (nullable) | Deleting transaction ID (MVCC) |
Storage Characteristics
- Custom types stored in same table as other metadata
metadata_typecolumn distinguishes record types- Arrow JSON serialization for base type persistence
- Metadata JSON for extensible properties
- MVCC columns enable temporal queries
- Indexed by
type_namefor fast lookup
Sources: llkv-table/src/lib.rs:81-85
Type Catalog Query Examples
Retrieving Custom Type Definition
Query Optimization
- Type name indexed for fast lookups
- MVCC predicate filters deleted types
- Result caching minimizes catalog queries
- Batch operations for multiple type resolutions
Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85
graph TB
subgraph "User Operations"
CREATE["CREATE TYPE"]
DROP["DROP TYPE"]
ALTER["ALTER TYPE"]
QUERY["Type Resolution"]
end
subgraph "CatalogManager"
API["CatalogManager API"]
VALIDATION["Type Validation"]
DEPENDENCY["Dependency Tracking"]
CACHE_MGR["Cache Manager"]
end
subgraph "SysCatalog"
SYSCAT["SysCatalog"]
TABLE0["Table 0"]
end
subgraph "Type System"
RESOLVER["Type Resolver"]
ARROW["Arrow DataType"]
end
CREATE --> API
DROP --> API
ALTER --> API
QUERY --> API
API --> VALIDATION
API --> DEPENDENCY
API --> CACHE_MGR
VALIDATION --> SYSCAT
DEPENDENCY --> SYSCAT
CACHE_MGR --> RESOLVER
SYSCAT --> TABLE0
RESOLVER --> ARROW
style API fill:#f9f9f9
style SYSCAT fill:#f9f9f9
Integration with CatalogManager
The CatalogManager provides high-level operations for custom type management, coordinating between the SQL layer, type resolver, and system catalog.
CatalogManager Responsibilities
| Function | Description |
|---|---|
register_custom_type() | Create new type definition |
get_custom_type() | Retrieve type by name or ID |
list_custom_types() | Query all non-deleted types |
drop_custom_type() | Mark type as deleted (MVCC) |
resolve_type_name() | Translate name to Arrow type |
check_type_dependencies() | Find columns using type |
invalidate_type_cache() | Clear cached type definitions |
Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs68
Type System Best Practices
Naming Conventions
Recommended Type Names
- Use descriptive, domain-specific names:
customer_id,email_address,currency_amount - Avoid generic names that conflict with SQL keywords:
text,number,date - Use snake_case for consistency with column names
- Include units or constraints in name:
price_usd,duration_seconds
Type Reusability
When to Create Custom Types
- Domain concepts appearing in multiple tables
- Types with specific constraints (precision, length)
- Semantic meaning beyond base type
- Types requiring validation or transformation logic
When to Use Base Types
- One-off column definitions
- Standard SQL types without constraints
- Internal implementation columns
Performance Considerations
Cache Behavior
- Type resolution results are cached per session
- First resolution incurs catalog lookup cost
- Subsequent resolutions served from memory
- Cache invalidation on type modifications
Query Impact
- Custom types add one indirection layer
- Physical storage uses base Arrow types
- No runtime performance penalty
- Query plans operate on resolved types
Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85
Dismiss
Refresh this wiki
Enter email to refresh