name: uni-db description: >- Comprehensive developer reference for uni-db, an embedded multi-model graph database with OpenCypher querying, Locy logic programming, vector/hybrid search, and Pydantic OGM. ALWAYS use this skill when: writing code that imports uni_db or uni_pydantic, writing Cypher queries for uni-db, writing Locy rules, configuring vector or hybrid search, defining uni-db schemas, using graph algorithms, or answering questions about uni-db architecture, APIs, or features. Trigger on mentions of: uni-db, uni, Cypher queries in graph DB context, Locy rules, vector search with graph data, Pydantic OGM for graphs, graph algorithms, ALONG/FOLD/BEST BY/DERIVE/ASSUME/ABDUCE, embedded graph database.
uni-db Developer Skill
What is uni-db?
uni-db is an embedded, serverless multi-model graph database (graph + vector + document + columnar) that runs inside your process with no server required. It supports OpenCypher queries with extensions for vector search, full-text search, DDL, and time travel. Locy is its Datalog-inspired logic programming language for recursive rules, probabilistic reasoning, abductive inference, and inline neural predicates with calibration (CREATE MODEL / CALIBRATE / VALIDATE / NeuralProvenance in EXPLAIN). APIs are available in Python (sync/async via PyO3) and Rust (async/blocking), with a Pydantic OGM layer. Built-in capabilities include 8 vector index algorithms (Flat, IVF-Flat/SQ/PQ/RQ, HNSW-Flat/SQ/PQ) with scalar, product, and RaBitQ quantization, 4 scalar index types (BTree, Hash, Bitmap, LabelList), BM25 full-text search, hybrid search with RRF fusion, and 36+ graph algorithms.
Architecture: Three Scopes
uni-db uses three scoping levels that separate lifecycle, reads, and writes:
Uni / AsyncUni (database handle)
+-- Factory: open(), session(), schema()
+-- Admin: flush(), snapshots, indexes, compaction
+-- NO direct query or mutation
Session / AsyncSession (read scope)
+-- Parameters: set(), get()
+-- Reads: query(), locy()
+-- Analysis: explain(), profile()
+-- Factory: tx() -> Transaction
Transaction / AsyncTransaction (write scope)
+-- Reads: query() (sees uncommitted writes)
+-- Writes: execute(), bulk loading
+-- Locy: locy() (DERIVE auto-applies mutations)
+-- Lifecycle: commit(), rollback()
Pattern: db = Uni.open(path) -> session = db.session() -> tx = session.tx() -> tx.execute(...) -> tx.commit()
Sync/async duality: every type has an async counterpart (Uni / AsyncUni, Session / AsyncSession, Transaction / AsyncTransaction). Shared types (results, data classes, exceptions) are the same for both.
Single-writer, multi-reader: only one Transaction can be open at a time per Uni instance. Multiple Sessions can read concurrently with snapshot isolation.
Facade accessors on Uni / AsyncUni:
| Accessor | Returns | Purpose |
|---|---|---|
db.rules() |
RuleRegistry |
Locy rule management (register, list, remove) |
db.compaction() |
Compaction / AsyncCompaction |
Storage compaction |
db.indexes() |
Indexes / AsyncIndexes |
Index management (list, rebuild) |
db.xervo() |
Xervo / AsyncXervo |
ML model runtime (embed, generate, prefetch) |
Key builder terminal methods:
| Builder | Created by | Terminal |
|---|---|---|
SessionQueryBuilder |
session.query_with(cypher) |
.fetch_all(), .fetch_one(), .cursor() |
SessionLocyBuilder |
session.locy_with(program) |
.run() |
TxExecuteBuilder |
tx.execute_with(cypher) |
.run(), .profile() |
TxQueryBuilder |
tx.query_with(cypher) |
.fetch_all(), .fetch_one(), .execute() |
SchemaBuilder |
db.schema() |
.apply() |
BulkWriterBuilder |
tx.bulk_writer() |
.build() |
TransactionBuilder |
session.tx_with() |
.start() |
Quick Start: Python
from uni_db import Uni, DataType
# Open or create a database
db = Uni.open("./my_db")
# Define schema
db.schema() \
.label("Person") \
.property("name", DataType.STRING()) \
.property("age", DataType.INT64()) \
.apply()
# Write via transaction
session = db.session()
with session.tx() as tx:
tx.execute("CREATE (:Person {name: 'Alice', age: 30})")
tx.commit()
# Read via session
result = session.query("MATCH (p:Person) RETURN p.name, p.age")
for row in result:
print(f"{row['p.name']}: {row['p.age']}")
db.shutdown()
Quick Start: Rust
use uni_db::{Uni, DataType, Value};
#[tokio::main]
async fn main() -> uni_db::Result<()> {
// Open (or create) a database
let db = Uni::open("./my_db").build().await?;
// Define schema
db.schema()
.label("Person")
.property("name", DataType::String)
.property("age", DataType::Int64)
.apply().await?;
// Write via transaction
let session = db.session();
let tx = session.tx().await?;
tx.execute("CREATE (:Person {name: 'Alice', age: 30})").await?;
tx.commit().await?;
// Read via session
let result = session.query("MATCH (p:Person) RETURN p.name, p.age").await?;
for row in result.rows() {
println!("{}: {}", row.get::<String>("p.name")?, row.get::<i64>("p.age")?);
}
db.shutdown().await
}
Database factory methods (same in Python and Rust):
| Method | Behavior |
|---|---|
Uni.open(path) |
Open existing or create new database at path |
Uni.create(path) |
Create new database; error if path exists |
Uni.open_existing(path) |
Open existing; error if path does not exist |
Uni.temporary() |
Temp directory, auto-cleaned on drop |
Uni.in_memory() |
Purely in-memory, no persistence |
Uni.builder() |
Advanced configuration via UniBuilder |
Storage backends: local filesystem, S3 (s3://bucket/path), GCS (gs://bucket/path), Azure (az://account/container/path).
Data Types Quick Reference
| Uni Type | Python Factory | Rust Enum | Cypher DDL |
|---|---|---|---|
| String | DataType.STRING() |
DataType::String |
STRING |
| Int32 | DataType.INT32() |
DataType::Int32 |
INT32 |
| Int64 | DataType.INT64() |
DataType::Int64 |
INT64 |
| Float32 | DataType.FLOAT32() |
DataType::Float32 |
FLOAT32 |
| Float64 | DataType.FLOAT64() |
DataType::Float64 |
FLOAT64 |
| Bool | DataType.BOOL() |
DataType::Bool |
BOOL |
| Timestamp | DataType.TIMESTAMP() |
DataType::Timestamp |
TIMESTAMP |
| Date | DataType.DATE() |
DataType::Date |
DATE |
| DateTime | DataType.DATETIME() |
DataType::DateTime |
DATETIME |
| Duration | DataType.DURATION() |
DataType::Duration |
DURATION |
| Btic | DataType.BTIC() |
DataType::Btic |
BTIC |
| Vector(N) | DataType.vector(N) |
DataType::Vector { dimensions: N } |
VECTOR(N) |
| List(T) | DataType.list(inner) |
DataType::List(Box<T>) |
LIST(T) |
| Map(K,V) | DataType.map(k, v) |
DataType::Map(Box<K>, Box<V>) |
MAP(K, V) |
| JSON | DataType.JSON() |
DataType::CypherValue |
JSON |
| Bytes | DataType.BYTES() |
DataType::Bytes |
BYTES |
| CRDT types | DataType.crdt(CrdtType.G_COUNTER()) |
DataType::Crdt(CrdtKind::GCounter) |
CRDT(GCOUNTER) |
CRDT types: GCounter, GSet, ORSet, LWWRegister, LWWMap, Rga, VectorClock, VCRegister.
Essential Patterns
CRUD
-- Create node
CREATE (n:Person {name: 'Alice', age: 30})
-- Create node with ext_id (for MERGE/lookup)
CREATE (n:Person {ext_id: 'user-123', name: 'Alice'})
-- Create edge
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:KNOWS {since: 2023}]->(b)
-- Read
MATCH (p:Person) WHERE p.age > 25 RETURN p.name, p.age
-- Update
MATCH (n:Person {name: 'Alice'}) SET n.age = 31, n.updated = datetime()
-- Delete (must have no edges)
MATCH (n:Person {name: 'Alice'}) DELETE n
-- Detach delete (removes edges first)
MATCH (n:Person {name: 'Alice'}) DETACH DELETE n
Parameters
# Python — inline params
result = session.query(
"MATCH (n:Person {name: $name}) RETURN n",
params={"name": "Alice"}
)
# Python — session-level params
session.set("min_age", 25)
result = session.query("MATCH (p:Person) WHERE p.age > $min_age RETURN p")
# Python — builder pattern
result = session.query_with("MATCH (n:Person) WHERE n.age > $age") \
.param("age", 25) \
.timeout(5.0) \
.fetch_all()
Transactions
# Context manager — auto-rollback on exception
with session.tx() as tx:
tx.execute("CREATE (:Person {name: 'Alice', age: 30})")
tx.execute("CREATE (:Person {name: 'Bob', age: 25})")
result = tx.commit()
print(f"Committed {result.mutations_committed} mutations at version {result.version}")
# Async equivalent
async with await session.tx() as tx:
await tx.execute("CREATE (:Person {name: 'Alice'})")
await tx.commit()
Bulk Loading
with session.tx() as tx:
with tx.bulk_writer().batch_size(5000).build() as writer:
vids = writer.insert_vertices("Person", [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
])
writer.insert_edges("KNOWS", [(vids[0], vids[1], {"since": 2024})])
stats = writer.commit()
tx.commit()
print(f"Inserted {stats.vertices_inserted} vertices, {stats.edges_inserted} edges")
Schema Definition
db.schema() \
.label("Person") \
.property("name", DataType.STRING()) \
.property("age", DataType.INT64()) \
.vector("embedding", 384) \
.index("name", "btree") \
.label("Company") \
.property("name", DataType.STRING()) \
.edge_type("WORKS_AT", ["Person"], ["Company"]) \
.property("since", DataType.DATE()) \
.apply()
Session Parameters
session = db.session()
session.set("company", "Acme")
result = session.query("MATCH (c:Company {name: $company}) RETURN c")
Vector Search
-- Basic vector search (top-K scan)
CALL uni.vector.query('Document', 'embedding', $query_vector, 10)
YIELD node, score
RETURN node.title, score ORDER BY score DESC
-- ~= operator (shorthand for vector top-K scan, desugars to uni.vector.query)
MATCH (d:Doc) WHERE d.embedding ~= $query_vector RETURN d.title LIMIT 10
-- Inline per-row similarity scoring (no CALL/YIELD needed)
MATCH (d:Doc)
RETURN d.title, similar_to(d.embedding, $query_vector) AS score
ORDER BY score DESC
-- Hybrid search: vector + FTS with RRF fusion (correct way)
MATCH (d:Doc)
RETURN d.title,
similar_to([d.embedding, d.content], [$query_vector, $query_text]) AS score
ORDER BY score DESC
-- Hybrid search via procedure
CALL uni.search('Document', {vector: 'embedding', fts: 'content'},
'graph databases', null, 10)
YIELD node, score, vector_score, fts_score
RETURN node.title, score
Note:
=~is regex match,~=is vector similarity — they are unrelated operators.
Locy Quick Example
result = session.locy("""
CREATE RULE reachable AS
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a IS reachable OR a.name = 'Alice'
YIELD KEY b
QUERY reachable
""")
for row in result["reachable"]:
print(row)
Cypher Cheat Sheet
1. Match by property:
MATCH (n:Person {name: 'Alice'}) RETURN n
2. Match by ext_id:
MATCH (n:Person {ext_id: 'user-123'}) RETURN n
3. Create node with properties:
CREATE (n:Person {ext_id: 'user-456', name: 'Bob', age: 25}) RETURN n
4. Create edge:
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:KNOWS {since: 2023}]->(b)
5. MERGE (upsert) -- requires ext_id:
MERGE (n:Person {ext_id: 'user-123'})
ON CREATE SET n.name = 'Alice', n.created = datetime()
ON MATCH SET n.last_seen = datetime()
RETURN n
6. Variable-length path:
MATCH (a:Person)-[:KNOWS*1..3]->(b:Person)
WHERE a.name = 'Alice'
RETURN DISTINCT b.name
7. Aggregation with WITH:
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c.name AS company, count(p) AS employees
WHERE employees > 10
RETURN company, employees ORDER BY employees DESC
8. UNWIND list:
UNWIND $names AS name
MATCH (n:Person {name: name})
RETURN n
9. OPTIONAL MATCH:
MATCH (p:Person)
OPTIONAL MATCH (p)-[r:MANAGES]->(m:Person)
RETURN p.name, collect(m.name) AS manages
10. RETURN with ORDER BY, LIMIT, SKIP:
MATCH (p:Person)
RETURN p.name, p.age
ORDER BY p.age DESC
SKIP 10 LIMIT 20
11. System-managed timestamps (created_at / updated_at):
Every vertex and edge automatically carries a creation and modification timestamp. Access via Cypher functions, returns DateTime (UTC, ns). Read-only, no schema declaration needed.
MATCH (n:Person) WHERE created_at(n) > datetime("2026-05-01")
RETURN n, updated_at(n) AS last_modified
MATCH (a)-[r:KNOWS]->(b) RETURN r, created_at(r), updated_at(r)
updated_at bumps on any write that touches the row — including idempotent MERGE / same-value SET.
Critical Gotchas
ext_id is REQUIRED for MERGE -- Without
ext_id,MERGEalways creates new nodes because there is no stable identity to match against. Always includeext_idin the MERGE pattern.CREATE creates NEW nodes per expression --
CREATE (a:Node), (b:Node)creates two separate nodes. To reference the same node later in a pattern, use variable binding:CREATE (a:Node {name: 'X'}), (a)-[:REL]->(b:Other).Single-writer -- Only one Transaction can be open at a time per Uni instance. Multiple Sessions can read concurrently.
flush() for durability -- Writes are buffered in L0; call
db.flush()or rely on auto-flush (threshold: 10k mutations or 5s interval) for persistence to storage.VID vs UniId vs ext_id -- VID is internal (u64 auto-increment), UniId is content-hash (SHA3-256), ext_id is user-supplied string. Use
ext_idfor MERGE and user-facing lookups.Schema-first for columnar performance -- Define labels and properties via
db.schema()before bulk loading. Without schema, properties go to JSONB overflow and lose columnar benefits. Usestrict_schema: trueinUniConfigto reject writes with undeclared labels/edge types.DETACH DELETE required when node has edges --
DELETEalone fails if the node has any edges. UseDETACH DELETEto remove edges first.Vector index metric must match embedding model -- Use
cosinefor normalized embeddings (most models),l2for raw/unnormalized embeddings. Mismatched metric produces poor search results.Locy rules are NOT standard Datalog -- Locy has
ALONG,FOLD,BEST BY,PROB,DERIVE,ASSUME,ABDUCEwhich do not exist in standard Datalog. IS/IS NOT references invoke other rules.Unbounded variable-length paths --
[*]without an upper bound causes exponential expansion. Always set an upper bound:[*..5].Always use $param parameters -- String concatenation in Cypher causes injection risk and prevents plan caching.
Cartesian products from disconnected patterns --
MATCH (a:Person), (b:Company)creates a cross product of all persons and companies. Connect patterns or use WITH to pipeline results.BulkWriter for initial data loading -- Always use
tx.bulk_writer()for loading more than a few thousand records. It bypasses WAL and defers index rebuilds for 10-100x faster throughput.Context managers for transactions -- Always use
with session.tx() as tx:(orasync with) to guarantee auto-rollback on exceptions. Forgetting to commit or rollback leaks the write lock.Locy DERIVE in a transaction -- When
locy()is called on a Transaction, DERIVE commands automatically apply mutations to the transaction. On a Session (read-only), DERIVE returns aDerivedFactSetthat must be explicitly applied viatx.apply(derived).Index your MERGE keys for batched upserts -- A single-node, single-label
MERGE (n:Label {key: ...})with a literal{...}key map takes a batched fast path (one L0 snapshot per statement, no per-row query planning) -- this is what makesUNWIND $rows AS e MERGE (n:Label {key: e.key}) ...fast. For the per-row match to be an index point-lookup rather than a filtered label scan, put a scalar index on the key (db.schema().label("Label").index("key", IndexType::Scalar(ScalarType::Hash))orCREATE INDEX); without one the lookup is a full label scan (fine in-memory, O(N x label_size) for large on-disk labels). Multi-node/edge MERGE (e.g.MERGE (a)-[:R]->(b)) and non-literal property maps (MERGE (n:Label $props)) fall back to the slower per-row general path -- prefer a single-node MERGE for the node, then batchedCREATE/MATCHfor edges.
Tuning for Ingest-Heavy Workloads
Per-read latency in uni-db is sensitive to how many frozen overlay segments have accumulated since the last CSR compaction. Each L0 → L1 flush adds one frozen segment; subsequent reads consult main CSR + every frozen segment + the active overlay. The defaults are tuned for mixed read/write traffic, but ingest-heavy benchmarks (embedding pipelines, bulk imports interleaved with reads) can accumulate segments faster than the default compaction threshold catches up.
Symptoms: read latency that is flat early in the run and then steps up to several milliseconds and stays there. See issue #55 for the reference investigation.
Levers (UniConfig):
let config = UniConfig {
// Suppress the 5-second timer-based flush; rely only on the count
// threshold (default 10,000 mutations). Useful for benchmark runs
// where you don't need fine-grained durability checkpoints.
auto_flush_interval: None,
// OR: keep the timer but require more mutations before it fires.
// Default 1 — single inserts wake the timer. Raise to coalesce
// small bursts. Higher values reduce flush frequency, which can
// hurt read latency if not paired with aggressive compaction below.
auto_flush_min_mutations: 100,
// Bigger flushes amortize the per-flush cost; fewer frozen segments.
auto_flush_threshold: 50_000,
compaction: CompactionConfig {
// Compact frozen segments back into Main CSR sooner. Default 2
// (lowered from 4 in the issue #55 fix). Set to 1 to compact
// after every flush — minimal read latency, max compaction CPU.
frozen_segments_compact_threshold: 1,
..CompactionConfig::default()
},
..UniConfig::default()
};
Diagnostic: if MATCH (a)-[r:LINK]->(b) WHERE id(a)=$nid for a small constant out-degree gets noticeably slower over a long write-heavy run, this is the regime.
Allocator: use mimalloc for mutation-heavy workloads
For workloads that run many small mutations (per-statement Cypher CREATE/MERGE, concurrent writers, mutation streams), the default glibc allocator becomes the dominant bottleneck. At 24-session concurrency, profile showed ~50% of CPU time in __memset (zeroing fresh heap pages), kernel clear_page_erms, and glibc arena locks.
Python users: every PyO3 wheel (uni-db, uni-db-cuda, uni-db-metal, uni-db-onnx, etc.) ships with mimalloc built in as the Rust-side global allocator. No configuration needed.
Rust library consumers: opt in via the mimalloc feature flag:
[dependencies]
uni-db = { version = "...", features = ["mimalloc"] }
// in your binary's main.rs:
#[global_allocator]
static GLOBAL: uni_db::MiMalloc = uni_db::MiMalloc;
CLI users: the uni binary already uses mimalloc by default.
Measured win on concurrent_mutations benchmark (24 sessions × 100 CREATE each):
| N sessions | glibc | mimalloc | speedup |
|---|---|---|---|
| 1 | 139 ms | 45 ms | 3.08× |
| 24 | 984 ms | 394 ms | 2.50× |
The 3× win at sess=1 reveals glibc was bloated even single-threaded for the per-statement parse + plan + DataFusion churn. mimalloc's thread-local arenas avoid the page-fault traffic. CPython's PyMem_* allocator is untouched in Python wheels — only Rust allocations route through mimalloc.
When to recommend it: any code that runs tx.execute("CREATE ...") in a loop, multi-session writers, ingest pipelines, or benchmarks. Skip if the consumer has a strong reason to pick a different allocator (custom allocators for embedded use, etc.).
When to Load References
When the SKILL.md overview is insufficient for the user's task, load the appropriate reference file for detailed API signatures, examples, and patterns.
| User's task | Load reference |
|---|---|
| Writing/debugging Cypher queries, WHERE clauses, pattern matching | references/cypher.md |
| Python API usage, async patterns, builders, result types | references/python-api.md |
| Rust API usage, builders, error handling, blocking API | references/rust-api.md |
| Pydantic models, OGM, QueryBuilder, relationships | references/pydantic-ogm.md |
| Vector search, FTS, hybrid search, similar_to, embeddings | references/vector-hybrid-search.md |
| Locy rules, recursive logic, ALONG/FOLD/DERIVE/ASSUME/ABDUCE | references/locy.md |
| Locy neural predicates: CREATE MODEL/FEATURES/CALIBRATE/VALIDATE/NeuralProvenance EXPLAIN | references/neural-predicates.md |
| Schema design, data types, indexes, identity (ext_id/VID) | references/schema-indexing.md |
| BTIC temporal intervals, Allen algebra, certainty/granularity | references/btic.md |
| Xervo ML runtime, providers, model catalog, auto-embedding | references/xervo.md |
| Graph algorithms (PageRank, WCC, shortest path, etc.) | references/graph-algorithms.md |
Load multiple references when a task spans domains. Examples:
- RAG pipeline:
references/vector-hybrid-search.md+references/python-api.md - Locy with vector similarity:
references/locy.md+references/vector-hybrid-search.md - Schema + bulk loading:
references/schema-indexing.md+references/python-api.md - Rust graph algorithms:
references/graph-algorithms.md+references/rust-api.md - BTIC temporal queries in Python:
references/btic.md+references/python-api.md - RAG with Xervo embeddings:
references/xervo.md+references/vector-hybrid-search.md