Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Custom Types and Type Registry

Loading…

Custom Types and Type Registry

Relevant source files

Purpose and Scope

This document describes LLKV’s custom type system, which enables users to define and manage type aliases that extend Apache Arrow’s native type system. Custom types are persisted in the system catalog and provide a mechanism for creating domain-specific type names that map to underlying Arrow DataType definitions.

For information about the system catalog infrastructure that stores custom type metadata, see System Catalog and SysCatalog. For details on how tables use these types in their schemas, see Table Abstraction.

Type System Architecture

LLKV’s type system is built on Apache Arrow’s columnar type system but adds a layer of indirection through custom type definitions. This allows users to create semantic type names (e.g., email_address, currency_amount) that map to specific Arrow types with additional constraints or metadata.

Type Resolution Flow

graph TB
    subgraph "SQL Layer"
        DDL["CREATE TYPE Statement"]
COLDEF["Column Definition\nwith Custom Type"]
end
    
    subgraph "Type Registry"
        SYSCAT["SysCatalog\nTable 0"]
TYPEMETA["CustomTypeMeta\nRecords"]
RESOLVER["Type Resolver"]
end
    
    subgraph "Arrow Type System"
        ARROW["Arrow DataType"]
SCHEMA["Arrow Schema"]
end
    
    subgraph "Column Storage"
        COLSTORE["ColumnStore"]
DESCRIPTOR["ColumnDescriptor"]
end
    
 
   DDL --> TYPEMETA
 
   TYPEMETA --> SYSCAT
    
 
   COLDEF --> RESOLVER
 
   RESOLVER --> TYPEMETA
 
   RESOLVER --> ARROW
    
 
   ARROW --> SCHEMA
 
   SCHEMA --> COLSTORE
 
   COLSTORE --> DESCRIPTOR
    
    style TYPEMETA fill:#f9f9f9
    style SYSCAT fill:#f9f9f9
    style RESOLVER fill:#f9f9f9
  • User defines custom types via SQL DDL
  • Type metadata is stored in the system catalog
  • Column definitions reference custom types by name
  • Type resolver translates names to Arrow DataTypes
  • Physical storage uses Arrow’s native columnar format

Sources: llkv-table/src/lib.rs82 llkv-table/src/lib.rs:81-85

CustomTypeMeta Structure

CustomTypeMeta is the fundamental metadata structure that describes a custom type definition. It is stored as a record in the system catalog (Table 0) alongside other metadata like TableMeta and ColMeta.

CustomTypeMeta Fields

classDiagram
    class CustomTypeMeta {+type_id: TypeId\n+type_name: String\n+base_type: ArrowDataType\n+nullable: bool\n+metadata: HashMap~String,String~\n+created_at: Timestamp\n+created_by: TransactionId\n+deleted_by: Option~TransactionId~}
    
    class SysCatalog {+register_custom_type()\n+get_custom_type()\n+list_custom_types()\n+drop_custom_type()}
    
    class ArrowDataType {<<enumeration>>\nInt64\nUtf8\nDecimal128\nDate32\nTimestamp\nStruct\nList}
    
    class ColumnDescriptor {+field_id: FieldId\n+data_type: DataType}
    
    CustomTypeMeta --> ArrowDataType : maps_to
    SysCatalog --> CustomTypeMeta : stores
    ColumnDescriptor --> ArrowDataType : uses
FieldTypeDescription
type_idTypeIdUnique identifier for the custom type
type_nameStringUser-defined name (e.g., “email_address”)
base_typeArrowDataTypeUnderlying Arrow type definition
nullableboolWhether NULL values are permitted
metadataHashMap<String, String>Additional type-specific metadata
created_atTimestampType creation timestamp
created_byTransactionIdTransaction that created the type
deleted_byOption<TransactionId>MVCC deletion marker

Sources: llkv-table/src/lib.rs82

Type Registration and Lifecycle

Custom types are managed through the SysCatalog interface, which provides operations for the complete type lifecycle: registration, retrieval, modification, and deletion.

sequenceDiagram
    participant User
    participant SqlEngine
    participant CatalogManager
    participant SysCatalog
    participant Table0 as Table 0
    
    User->>SqlEngine: CREATE TYPE email_address AS VARCHAR(255)
    SqlEngine->>SqlEngine: Parse DDL statement
    SqlEngine->>CatalogManager: register_custom_type(name, base_type)
    
    CatalogManager->>CatalogManager: Validate type name uniqueness
    CatalogManager->>CatalogManager: Assign new TypeId
    
    CatalogManager->>SysCatalog: Insert CustomTypeMeta record
    SysCatalog->>SysCatalog: Build RecordBatch with metadata
    SysCatalog->>SysCatalog: Add MVCC columns (created_by)
    
    SysCatalog->>Table0: append(batch)
    Table0->>Table0: Write to ColumnStore
    Table0-->>SysCatalog: Success
    
    SysCatalog-->>CatalogManager: TypeId
    CatalogManager-->>SqlEngine: Result
    SqlEngine-->>User: Type created successfully

Type Registration Flow

Registration Steps

  1. DDL statement parsed by SQL layer
  2. CatalogManager validates type name uniqueness
  3. New TypeId allocated
  4. CustomTypeMeta record constructed
  5. Metadata written to system catalog (Table 0)
  6. MVCC columns (created_by, deleted_by) added automatically
  7. Type becomes available for schema definitions

Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85

Type Lifecycle Operations

Type States

  • Registered : Type defined but not yet used in any table schemas
  • InUse : One or more columns reference this type
  • Modified : Type definition updated (if ALTER TYPE is supported)
  • Deprecated : Type soft-deleted via MVCC (deleted_by set)
  • Deleted : Type permanently removed from catalog

Sources: llkv-table/src/lib.rs:81-85

Type Resolution and Schema Integration

When creating tables or altering schemas, the type resolver translates custom type names to Arrow DataType instances. This resolution happens during DDL execution and schema validation.

Resolution Process

graph LR
    subgraph "Table Definition"
        COL1["Column: 'email'\nType: 'email_address'"]
COL2["Column: 'age'\nType: 'INT'"]
end
    
    subgraph "Type Resolution"
        RESOLVER["Type Resolver"]
CACHE["Type Cache"]
end
    
    subgraph "System Catalog"
        CUSTOM["CustomTypeMeta\nemail_address → Utf8(255)"]
BUILTIN["Built-in Types\nINT → Int32"]
end
    
    subgraph "Arrow Schema"
        FIELD1["Field: 'email'\nDataType: Utf8"]
FIELD2["Field: 'age'\nDataType: Int32"]
end
    
 
   COL1 --> RESOLVER
 
   COL2 --> RESOLVER
    
 
   RESOLVER --> CACHE
 
   CACHE --> CUSTOM
 
   RESOLVER --> BUILTIN
    
 
   CUSTOM --> FIELD1
 
   BUILTIN --> FIELD2
    
 
   FIELD1 --> SCHEMA["Arrow Schema"]
FIELD2 --> SCHEMA
  1. Column definition specifies type by name
  2. Type resolver checks cache for previous resolution
  3. Cache miss triggers lookup in SysCatalog
  4. Custom type metadata retrieved from Table 0
  5. Base Arrow DataType extracted
  6. Type constraints/metadata applied
  7. Resolved type cached for subsequent use
  8. Arrow Field constructed with final type

Sources: llkv-table/src/lib.rs43 llkv-table/src/lib.rs54

DDL Operations for Custom Types

CREATE TYPE Statement

Processing Steps

StepActionComponent
1Parse SQLSqlEnginesqlparser
2Extract type definitionSQL preprocessing layer
3Validate base typeCatalogManager
4Check name uniquenessSysCatalog query
5Allocate TypeIdCatalogManager
6Construct metadataCustomTypeMeta builder
7Write to catalogSysCatalog::append()
8Update cacheType resolver cache

Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs68

DROP TYPE Statement

Deletion Process

MVCC Soft Delete

  • Type records are not physically removed
  • deleted_by field set to transaction ID
  • Historical queries can still see old type definitions
  • New schemas cannot reference deleted types
  • Cache invalidation ensures immediate visibility

Sources: llkv-table/src/lib.rs:81-85

Storage in System Catalog

Custom type metadata is stored in the system catalog (Table 0) alongside other metadata types like TableMeta, ColMeta, and constraint information.

System Catalog Schema for CustomTypeMeta

Column NameTypeDescription
row_idRowIdUnique row identifier
metadata_typeUtf8Discriminator: “CustomType”
type_idUInt64Custom type identifier
type_nameUtf8User-defined type name
base_type_jsonUtf8Serialized Arrow DataType
nullableBooleanNullability flag
metadata_jsonUtf8Additional metadata as JSON
created_atTimestampCreation timestamp
created_byUInt64Creating transaction ID
deleted_byUInt64 (nullable)Deleting transaction ID (MVCC)

Storage Characteristics

  • Custom types stored in same table as other metadata
  • metadata_type column distinguishes record types
  • Arrow JSON serialization for base type persistence
  • Metadata JSON for extensible properties
  • MVCC columns enable temporal queries
  • Indexed by type_name for fast lookup

Sources: llkv-table/src/lib.rs:81-85

Type Catalog Query Examples

Retrieving Custom Type Definition

Query Optimization

  • Type name indexed for fast lookups
  • MVCC predicate filters deleted types
  • Result caching minimizes catalog queries
  • Batch operations for multiple type resolutions

Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85

graph TB
    subgraph "User Operations"
        CREATE["CREATE TYPE"]
DROP["DROP TYPE"]
ALTER["ALTER TYPE"]
QUERY["Type Resolution"]
end
    
    subgraph "CatalogManager"
        API["CatalogManager API"]
VALIDATION["Type Validation"]
DEPENDENCY["Dependency Tracking"]
CACHE_MGR["Cache Manager"]
end
    
    subgraph "SysCatalog"
        SYSCAT["SysCatalog"]
TABLE0["Table 0"]
end
    
    subgraph "Type System"
        RESOLVER["Type Resolver"]
ARROW["Arrow DataType"]
end
    
 
   CREATE --> API
 
   DROP --> API
 
   ALTER --> API
 
   QUERY --> API
    
 
   API --> VALIDATION
 
   API --> DEPENDENCY
 
   API --> CACHE_MGR
    
 
   VALIDATION --> SYSCAT
 
   DEPENDENCY --> SYSCAT
 
   CACHE_MGR --> RESOLVER
    
 
   SYSCAT --> TABLE0
 
   RESOLVER --> ARROW
    
    style API fill:#f9f9f9
    style SYSCAT fill:#f9f9f9

Integration with CatalogManager

The CatalogManager provides high-level operations for custom type management, coordinating between the SQL layer, type resolver, and system catalog.

CatalogManager Responsibilities

FunctionDescription
register_custom_type()Create new type definition
get_custom_type()Retrieve type by name or ID
list_custom_types()Query all non-deleted types
drop_custom_type()Mark type as deleted (MVCC)
resolve_type_name()Translate name to Arrow type
check_type_dependencies()Find columns using type
invalidate_type_cache()Clear cached type definitions

Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs68

Type System Best Practices

Naming Conventions

Recommended Type Names

  • Use descriptive, domain-specific names: customer_id, email_address, currency_amount
  • Avoid generic names that conflict with SQL keywords: text, number, date
  • Use snake_case for consistency with column names
  • Include units or constraints in name: price_usd, duration_seconds

Type Reusability

When to Create Custom Types

  • Domain concepts appearing in multiple tables
  • Types with specific constraints (precision, length)
  • Semantic meaning beyond base type
  • Types requiring validation or transformation logic

When to Use Base Types

  • One-off column definitions
  • Standard SQL types without constraints
  • Internal implementation columns

Performance Considerations

Cache Behavior

  • Type resolution results are cached per session
  • First resolution incurs catalog lookup cost
  • Subsequent resolutions served from memory
  • Cache invalidation on type modifications

Query Impact

  • Custom types add one indirection layer
  • Physical storage uses base Arrow types
  • No runtime performance penalty
  • Query plans operate on resolved types

Sources: llkv-table/src/lib.rs54 llkv-table/src/lib.rs:81-85

Dismiss

Refresh this wiki

Enter email to refresh