compositional-acset-comparison

star 25

Compositional algorithm and data analysis via algebraic databases

plurigrid By plurigrid schedule Updated 2/16/2026

name: compositional-acset-comparison description: Compositional algorithm and data analysis via algebraic databases version: 1.0.0

Compositional ACSet Comparison Skill

Trit: 0 (ERGODIC - Coordinator) Color: #26D826 (Green) Domain: Compositional algorithm/data analysis via algebraic databases

Homoiconic Insight

In self-hosted Lisps, the boundary between data structures and algorithms dissolves:

  • Code is data, data is code (homoiconicity)
  • Evaluation time is phase-scoped (RED/BLUE/GREEN gadgets)
  • Entanglement avoided by leaving phases open until explicitly closed
  • Compositional structure preserved across algorithm ↔ data boundary

Overview

Compare data structures and their properties (density/sparsity, dynamic/static, versioning strategies) using the richness afforded by ACSets. Uses Gay.jl-aided superrandom walks for deterministic exploration of comparison dimensions.

Canonical Triads

schema-validation (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓  [Property Analysis]
three-match (-1) ⊗ compositional-acset-comparison (0) ⊗ koopman-generator (+1) = 0 ✓  [Dynamic Traversal]
temporal-coalgebra (-1) ⊗ compositional-acset-comparison (0) ⊗ oapply-colimit (+1) = 0 ✓  [Versioning]
polyglot-spi (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓  [Homoiconic Interop]

Golden Thread Walk Dimensions

Each dimension is explored via φ-angle (137.508°) golden spiral for maximal dispersion:

Step Dimension Hex Color Hue
1 Storage Hierarchy #EE2B2B
2 Density/Sparsity #2BEE64 137.51°
3 Dynamic/Static #9D2BEE 275.02°
4 Versioning Strategy #EED52B 52.52°
5 Traversal Patterns #2BCDEE 190.03°
6 Index Structures #EE2B94 327.54°
7 Compression #5BEE2B 105.05°
8 Query Model #332BEE 242.55°
9 Embedding Support #EE6C2B 20.06°
10 Interoperability #2BEEA5 157.57°
11 Concurrency #DE2BEE 295.08°
12 Memory Model #C5EE2B 72.59°

Comparison Matrix: DuckDB vs LanceDB

Dimension 1: Storage Hierarchy (#EE2B2B)

DuckDB                          LanceDB
──────                          ───────
Table                           Database
  └─RowGroup (122K rows)          └─Table
      └─Column                        └─Manifest (version)
          └─Segment                       └─Fragment
              └─Block                         └─Column
                                                  └─VectorColumn

ACSet Morphism Depth:

  • DuckDB: 4 levels (Table→RowGroup→Column→Segment)
  • LanceDB: 5 levels (Database→Table→Manifest→Fragment→Column)

Dimension 2: Density/Sparsity (#2BEE64)

Property DuckDB LanceDB
Default Dense columnar Dense Arrow arrays
Sparse Support Via NULL bitmask Via Arrow validity bitmask
Vector Sparsity N/A Sparse via IVF partitioning
Storage Efficiency ALP, ZSTD compression Lance columnar format
ACSet Rep DenseFinColumn DenseFinColumn with VectorColumn extension

Density Formula:

density(acset, obj) = nparts(acset, obj) / theoretical_max(acset, obj)
# DuckDB Segment: ~2048 rows per vector batch
# LanceDB Fragment: variable, optimized for vector search

Dimension 3: Dynamic/Static (#9D2BEE)

Property DuckDB LanceDB
Schema Evolution ALTER TABLE Manifest versioning
Row Updates In-place (TRANSIENT→PERSISTENT) Append + compaction
Index Updates Dynamic B-Tree/ART Rebuild IVF partitions
ACSet Mutation set_subpart!, rem_part! Append-only, version chains

State Machine:

DuckDB Segment: TRANSIENT ⟷ PERSISTENT (bidirectional)
LanceDB Manifest: V1 → V2 → V3 → ... (append-only chain)

Dimension 4: Versioning Strategy (#EED52B) ⭐ Lance SDK 1.0.0

Critical Update (December 15, 2025): Lance SDK adopts SemVer 1.0.0

Component Versioning Strategy
Lance SDK SemVer 1.0.0 MAJOR.MINOR.PATCH
Lance File Format 2.1 Binary compatibility, independent
Lance Table Format Feature flags Full backward compat, no linear versions
Lance Namespace Spec Per-operation Iceberg REST Catalog style

Key Insight: Breaking SDK changes will NOT invalidate existing Lance data.

# ACSet representation of versioning strategies
@present SchVersioning(FreeSchema) begin
  SDKVersion::Ob      # SemVer (1.0.0)
  FileFormat::Ob      # Binary compat (2.1)
  TableFormat::Ob     # Feature flags
  NamespaceSpec::Ob   # Per-operation
  
  # Morphisms: SDK ≠ Format
  sdk_file::Hom(SDKVersion, FileFormat)      # Many-to-one
  file_table::Hom(FileFormat, TableFormat)   # Independent
  table_ns::Hom(TableFormat, NamespaceSpec)  # Independent
end

DuckDB Versioning:

  • Temporal tables via VERSION AT
  • Extension versioning separate from core

Dimension 5: Traversal Patterns (#2BCDEE)

Pattern DuckDB LanceDB
Sequential Scan RowGroup→Column→Segment Fragment→Column
Index Scan ART/B-Tree navigation IVF partition probe
Vector Search N/A (extension) Centroid→Partition→Rows
Time Travel FOR SYSTEM_TIME AS OF checkout(version)

ACSet Incident Queries:

# DuckDB: Find all segments in a column
incident(duckdb_acset, col_id, :column)

# LanceDB: Find all centroids for an index
incident(lancedb_acset, idx_id, :partition_index) |>
  flatmap(p -> incident(lancedb_acset, p, :centroid_partition))

Dimension 6: Index Structures (#EE2B94)

Index Type DuckDB LanceDB
Primary None (heap) None (Lance format)
Secondary ART (Radix Tree) Scalar indexes
Vector Extension (vss) IVF_PQ, IVF_HNSW_SQ, IVF_HNSW_PQ
Full-Text Extension (fts) N/A

ACSet Index Representation:

# LanceDB vector index hierarchy
VectorIndex → Partition → Centroid
    ↓
index_column → VectorColumn → Column

Dimension 7: Compression (#5BEE2B)

Algorithm DuckDB LanceDB
Numeric ALP (Adaptive Lossless) Arrow encoding
String Dictionary, FSST Dictionary
General ZSTD, LZ4 ZSTD
Vector N/A PQ (Product Quantization)

Dimension 8: Query Model (#332BEE)

Aspect DuckDB LanceDB
Language SQL Python/Rust API + SQL filter
Optimization Volcano/push-based Vector-first + filter
Execution Vectorized (2048 batch) Arrow RecordBatch
Parallelism Morsel-driven Partition-parallel

Dimension 9: Embedding Support (#EE6C2B)

Feature DuckDB LanceDB
Native No Yes (FixedSizeList)
Generation UDF/Extension EmbeddingFunction registry
Storage ARRAY type VectorColumn
Search Extension (vss) Native (IVF, HNSW)

Dimension 10: Interoperability (#2BEEA5)

Format DuckDB LanceDB
Arrow Full support Native (Lance = Arrow extension)
Parquet Read/Write Read (convert to Lance)
CSV/JSON Read/Write Via Arrow
ACSets Via Tables.jl Via Arrow → Tables.jl

Cross-Language (from ACSets Intertypes):

# Generate interoperable types
generate_module(DuckDBACSet, [PydanticTarget, JacksonTarget])
generate_module(LanceDBACSet, [PydanticTarget, JacksonTarget])

Dimension 11: Concurrency (#DE2BEE)

Aspect DuckDB LanceDB
Model MVCC Optimistic (manifest-based)
Writers Single (or WAL) Single (append)
Readers Unlimited concurrent Unlimited concurrent
Isolation Snapshot Version snapshot

Dimension 12: Memory Model (#C5EE2B)

Aspect DuckDB LanceDB
Buffer Pool BufferManager Memory-mapped Arrow
Eviction LRU OS page cache
Allocation Unified allocator Arrow allocator
Out-of-Core Automatic spill Lazy loading

Interleaved 3-Stream Comparison

Using GF(3) conservation for balanced parallel analysis:

Stream 1 (Blue, -1): Validation/Constraints
  #31945E → #B3DA86 → #8810F2 → #2F5194 → #2452AA → #245FB4

Stream 2 (Green, 0): Coordination/Transport
  #6D59D2 → #9E2981 → #72E24F → #31C5B4 → #C04DDD → #1C8EEE

Stream 3 (Red, +1): Generation/Composition
  #E22FA7 → #E812C8 → #6F68E6 → #25D840 → #DA387F → #A82358

Crystal Family Analogy

Data structures map to crystal symmetry:

Crystal Family Symmetry DuckDB Analog LanceDB Analog
Cubic (#9E94DD) Order 48 RowGroup uniformity Fragment uniformity
Hexagonal (#65F475) Order 24 Column types Vector dimensions
Tetragonal (#E764F1) Order 16 Segment blocking Partition structure
Orthorhombic (#2ADC56) Order 8 Type system Index types
Monoclinic (#CD7B61) Order 4 Compression Quantization
Triclinic (#E4338F) Order 2 Raw storage Raw Arrow

Hierarchical Control Palette

Powers PCT cascade for harmonious comparison:

Level 5 (Program): "Compare DuckDB vs LanceDB"
    ↓ sets reference for
Level 4 (Transition): Dimension sequence [30° steps]
    ↓ sets reference for
Level 3 (Configuration): Property relationships
    ↓ sets reference for
Level 2 (Sensation): Individual metrics
    ↓ sets reference for
Level 1 (Intensity): Numeric values

Colors: #B322C0 → #D5268C → #DC3946 → #DF884A → #E0D551 → #A3E04E

XY Model Phenomenology

At τ=0.5 (ordered phase, τ < τ_c=0.893):

  • Smooth field, defects bound in pairs
  • High valence, disentangled
  • Antivortex at (4,3): #C33567

Interpretation: Both DuckDB and LanceDB are in "ordered phase" - mature, production-ready systems with well-defined structures.

Usage

using ACSets, Catlab

# Load both schemas
include("DuckDBACSet.jl")
include("LanceDBACSet.jl")

# Compare morphism structures
compare_schemas(SchDuckDB, SchLanceDB)

# Analyze density
density_analysis = map([SchDuckDB, SchLanceDB]) do sch
  Dict(ob => sparsity_metric(sch, ob) for ob in obs(sch))
end

# Traverse with Gay.jl colors
for (i, dimension) in enumerate(DIMENSIONS)
  color = gay_color_at(1000000, i)
  analyze_dimension(dimension, color)
end

Skill Files

File Purpose Gay.jl Seed
DuckDBACSet.jl Schema for DuckDB storage layer 1000000
LanceDBACSet.jl Schema for LanceDB vector store 1000000
IrreversibleMorphisms.jl Analysis of lossy morphisms 2000000
SideBySideComparison.jl Visual comparison tables 3000000
ComparisonUtils.jl 12-dimension comparison utilities 1000000
GhristCoverage.jl Persistent homology coverage analysis 4000000
ColoringFunctor.jl Schema coloring + GF(3) verification 4000000
GeometricMorphism.jl Presheaf topos translation analysis 4000000

Ghrist Persistent Homology Integration

Based on de Silva & Ghrist "Coverage in Sensor Networks via Persistent Homology":

AM Radio Coverage Analogy:

  • Radio stations = Schema objects (Table, Column, etc.)
  • Coverage radius = Morphism composability range
  • Signal overlap = Translatable concepts between schemas
  • Dead zones = Irreversible information loss

Betti Numbers for Schemas:

  • β₀: Connected components (isolated subsystems)
  • β₁: Coverage holes (information flow gaps)
  • β₂: Enclosed voids (unreachable regions)

Persistent Holes (never die):

  • 🔴 parent_manifest: Temporal irreversibility (version chain)
  • 🔴 source_column: Semantic irreversibility (embedding loss)

Geometric Morphism Analysis

For presheaf topoi PSh(SchDuckDB) and PSh(SchLanceDB):

Essential Image (lossless translation):

  • Table ↔ Table ✓
  • Column ↔ Column ✓

Partial Coverage (lossy translation):

  • RowGroup ~ Fragment
  • VectorColumn → Column (loses vector semantics)

Dead Zones (no translation):

  • Segment → ??? (DuckDB-only)
  • Manifest ← ??? (LanceDB-only)
  • VectorIndex ← ??? (LanceDB-only)

References

Scientific Skill Interleaving

This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:

Annotated Data

  • anndata [○] via bicomodule
    • Hub for annotated matrices

Bibliography References

  • general: 734 citations in bib.duckdb

Cat# Integration

This skill maps to Cat# = Comod(P) as a bicomodule in the equipment structure:

Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826

GF(3) Naturality

The skill participates in triads satisfying:

(-1) + (0) + (+1) ≡ 0 (mod 3)

This ensures compositional coherence in the Cat# equipment structure.

Install via CLI
npx skills add https://github.com/plurigrid/asi --skill compositional-acset-comparison
Repository Details
star Stars 25
call_split Forks 7
navigation Branch main
article Path SKILL.md
More from Creator