name: neo4j-cypher-patterns description: Expert guide to Neo4j Cypher queries and SKUEL's graph patterns. Use when writing Cypher queries, optimizing graph traversals, understanding relationship types, analyzing query performance, or when the user mentions Cypher, Neo4j, graph queries, or asks about relationships between entities. allowed-tools: Read, Grep, Glob
Neo4j Cypher Patterns for SKUEL
Quick Start
SKUEL uses Neo4j as its graph database with a Entity Type Architecture. All domains flow toward LifePath (the destination).
Entity Labels (Neo4j Node Labels)
All domain entities use multi-label architecture: every entity gets :Entity (universal base) plus a domain-specific label. Match on the domain label for fast indexed queries, or :Entity for cross-domain queries.
| Domain | Label | UID Format | Example |
|---|---|---|---|
| Activity (6) — user-owned | |||
| Tasks | Task |
task_{slug}_{random} |
task_fix-bug_abc123 |
| Goals | Goal |
goal_{slug}_{random} |
goal_launch-product_def456 |
| Habits | Habit |
habit_{slug}_{random} |
habit_daily-run_xyz789 |
| Events | Event |
event_{slug}_{random} |
event_team-standup_ghi012 |
| Choices | Choice |
choice_{slug}_{random} |
choice_accept-offer_jkl345 |
| Principles | Principle |
principle_{slug}_{random} |
principle_small-steps_mno678 |
| Curriculum (4) — shared content | |||
| Knowledge Units | Ku |
ku_{slug}_{random} |
ku_python-basics_abc123 |
| Path Steps | PathStep |
ps:{random} |
ps:intro-to-python |
| Learning Paths | LearningPath |
lp:{random} |
lp:become-python-developer |
| Exercises | Exercise |
varies | |
| Ontology — shared taxonomy | |||
| Knowledge Domains | KnowledgeDomain |
kd.{domain_name} |
kd.self_awareness |
| Curated Content — shared content | |||
| Resources | Resource |
(no fixed format) | |
| User-authored content + Reports (3) — ADR-054 | |||
| User Entries | UserEntry |
ue_{slug}_{random} |
ue_my-essay_abc123 |
| Activity Reports | ActivityReport |
ar_{random} |
|
| Entry Reports | EntryReport |
sr_{random} |
|
| Destination | |||
| Life Path | LifePath |
lp_{random} |
lp_abc123 |
| Other | |||
| Users | User |
user_{name} |
user_mike |
| Finance | Expense |
expense_{random} |
expense_abc123 |
| Groups | Group |
group_{slug}_{random} |
Core Relationships (Most Common)
// Ownership - Universal OWNS relationship (all Activity Domains)
(user:User)-[:OWNS]->(task:Task)
(user:User)-[:OWNS]->(goal:Goal)
(user:User)-[:OWNS]->(habit:Habit)
// Knowledge application
(task:Task)-[:APPLIES_KNOWLEDGE]->(ku:Ku)
(goal:Goal)-[:REQUIRES_KNOWLEDGE]->(ku:Ku)
(habit:Habit)-[:REINFORCES_KNOWLEDGE]->(ku:Ku)
// Goal hierarchy
(task:Task)-[:FULFILLS_GOAL]->(goal:Goal)
(habit:Habit)-[:SUPPORTS_GOAL]->(goal:Goal)
(goal:Goal)-[:SUBGOAL_OF]->(parent:Goal)
// Knowledge structure
(ku:Ku)-[:REQUIRES_KNOWLEDGE]->(prereq:Ku)
(ku:Ku)-[:ENABLES_KNOWLEDGE]->(enabled:Ku)
(ku:Ku)-[:RELATED_TO]->(related:Ku)
// MOC organization — any Ku can organize other Kus (emergent identity)
(moc:Ku)-[:ORGANIZES {order: 1}]->(child:Ku)
// Resource citations — curriculum cites reference material
(ps:PathStep)-[:CITES_RESOURCE]->(r:Resource)
(ku:Ku)-[:CITES_RESOURCE]->(r:Resource)
// Domain taxonomy (World Layer)
(ku:Ku)-[:IN_DOMAIN]->(d:KnowledgeDomain)
// Principles guidance
(goal:Goal)-[:GUIDED_BY_PRINCIPLE]->(principle:Principle)
(choice:Choice)-[:ALIGNED_WITH_PRINCIPLE]->(principle:Principle)
// Life path (everything flows toward the life path)
// Designation flips entity_type on the LP node — it does NOT add a :LifePath
// label, so match by property, never by label.
(user:User)-[:ULTIMATE_PATH]->(lp:Entity {entity_type: 'life_path'})
(entity:Entity)-[:SERVES_LIFE_PATH]->(lp:Entity {entity_type: 'life_path'})
Query Patterns
Pattern 1: Get User's Entities
// Get all active tasks for a user via universal OWNS relationship
MATCH (u:User {uid: $user_uid})-[:OWNS]->(t:Task)
WHERE t.status IN ['pending', 'in_progress']
RETURN t
ORDER BY t.priority DESC, t.due_date ASC
Pattern 2: Entity with Graph Context
// Get task with its full neighborhood
MATCH (t:Task {uid: $uid})
OPTIONAL MATCH (t)-[:APPLIES_KNOWLEDGE]->(ku:Ku)
OPTIONAL MATCH (t)-[:FULFILLS_GOAL]->(g:Goal)
OPTIONAL MATCH (t)-[:DEPENDS_ON]->(dep:Task)
RETURN t,
collect(DISTINCT ku) as applied_knowledge,
collect(DISTINCT g) as goals,
collect(DISTINCT dep) as dependencies
Pattern 3: Relationship Traversal
// Find all knowledge required for a goal (including transitive)
MATCH (g:Goal {uid: $goal_uid})
MATCH path = (g)-[:REQUIRES_KNOWLEDGE*1..3]->(ku:Ku)
RETURN DISTINCT ku
ORDER BY length(path)
Pattern 4: Graph-Aware Search
// Search tasks with relationship filter
MATCH (t:Task)
WHERE t.title CONTAINS $query OR t.description CONTAINS $query
OPTIONAL MATCH (t)-[:APPLIES_KNOWLEDGE]->(ku:Ku)
WITH t, collect(ku) as knowledge
WHERE size(knowledge) > 0 // Only tasks that apply knowledge
RETURN t, knowledge
Pattern 5: User Learning Progress
// Get user's mastery state for knowledge units
MATCH (u:User {uid: $user_uid})-[r:MASTERED|IN_PROGRESS|VIEWED]->(ku:Curriculum)
RETURN ku.uid,
type(r) as status,
r.mastery_score as score,
r.mastered_at as mastered_at
Query Builders (SKUEL Infrastructure)
SKUEL has two query builders for domain services (SKUEL001: no APOC in domain services):
| Builder | Location | Use Case |
|---|---|---|
| UnifiedQueryBuilder | adapters/persistence/neo4j/query/ |
Generic CRUD (used by backends) |
| CypherGenerator | adapters/persistence/neo4j/query/cypher/ |
Pure Cypher, semantic traversal |
SKUEL001 linter rule: APOC is scoped to apoc.meta.* (schema introspection only). Domain services use pure Cypher — never APOC in core/services/.
Three-Layer Architecture
Layer 1: UniversalNeo4jBackend (Generic CRUD)
├── Uses UnifiedQueryBuilder for generic operations
└── Powers ALL 20 entity types with CRUD, search, relationships
Layer 2: Domain Backends (Domain-Specific Cypher)
├── 27 typed subclasses in backends/ (9 cluster files — import directly from the cluster file)
├── 13 standalone backends (CrossDomainBackend, UserBackend, UserProgressBackend, SessionBackend, InsightBackend, LifePathBackend, ZpdBackend, ZpdSnapshotBackend, VectorSearchBackend, IngestionBackend, JupyterSyncBackend, EmbeddingsBackend, KnowledgeDomainBackend)
├── Domain-specific relationship Cypher (ORGANIZES, SHARES_WITH, FULFILLS_EXERCISE, etc.)
└── Rule: If a Cypher query uses domain-specific relationships, it belongs here
Layer 3: Services (Business Logic + Cross-Domain Aggregation)
├── Domain services delegate to backend methods, NOT execute_query()
├── Two service-layer Cypher exceptions (both use QueryExecutor directly):
│ ├── user_context_queries.py — MEGA-QUERY (full user state snapshot)
│ └── CrossDomainQueryService — 9 targeted cross-domain reads (returns frozen typed dataclasses)
└── Orchestration, events, validation — no other inline Cypher
Filter Operators
All query builders support these operators:
| Operator | Usage | Cypher Output |
|---|---|---|
eq (default) |
priority='high' |
n.priority = 'high' |
gt |
due_date__gt=date |
n.due_date > $date |
lt |
hours__lt=5.0 |
n.hours < 5.0 |
gte |
due_date__gte=date |
n.due_date >= $date |
lte |
score__lte=8 |
n.score <= 8 |
contains |
title__contains='urgent' |
n.title CONTAINS 'urgent' |
in |
priority__in=['high', 'urgent'] |
n.priority IN ['high', 'urgent'] |
Intent-Based Traversal
All 9 domains (6 Activity + Ku/Ps/Lp) read graph context through mechanism B: the shared
_CoreIntelligenceMixin.get_with_context → UnifiedRelationshipService.get_with_context. The edge
vocabulary is registry-sourced from DomainConfig.cross_domain_relationship_types (the single
source of truth) — there is no per-domain get_suggested_query_intent() (deleted) and no per-domain
{Domain}RelationshipService subclass.
Both graph readers (query_with_intent and get_cross_domain_context) now run ONE
incident-edge-attributed producer (build_domain_context_with_paths); the old flat
build_context_query_for_intent is deleted (PR #243). For a non-registry caller,
QueryIntent / a domain's default_context_intent selects the edge slice from
_INTENT_EDGE_SETS (in cross_domain_backend). Those slices:
| Intent | Focus Relationships |
|---|---|
HIERARCHICAL |
HAS_CHILD, PARENT_OF, CHILD_OF |
PREREQUISITE |
REQUIRES_KNOWLEDGE, PREREQUISITE_FOR, ENABLES |
PRACTICE |
PRACTICES, REINFORCES, APPLIES_KNOWLEDGE |
GOAL_ACHIEVEMENT |
FULFILLS_GOAL, SUPPORTS_GOAL, SUBGOAL_OF, GUIDED_BY_PRINCIPLE, CONTRIBUTES_TO_GOAL |
else (EXPLORATORY/SPECIFIC/AGGREGATION/RELATIONSHIP) |
generic traversal, no edge filter |
See: docs/roadmap/intent-traversal-registry-convergence.md (authoritative).
Index Architecture (Bootstrap)
Neo4j indexes are created automatically at startup via Neo4jSchemaManager in services_bootstrap/compose.py:
| Index Type | Method | When Created | Purpose |
|---|---|---|---|
| Domain indexes | sync_domain_indexes() |
Always | UID, user_uid, status, date, composite — 48 indexes |
| Full-text indexes | sync_fulltext_indexes() |
Always | Lucene keyword search across 15 domains — Cypher-first foundation |
| Auth indexes | sync_auth_indexes() |
Always | Rate limiting, session lookup, email uniqueness |
| Vector indexes | sync_vector_indexes() |
FULL tier only | 1024-dim cosine similarity on Entity + ContentChunk |
Full-text indexes are the Cypher-first search foundation — always available, no embeddings needed:
-- Full-text search (Lucene-based, relevance-ranked)
CALL db.index.fulltext.queryNodes('task_fulltext_idx', 'urgent deadline')
YIELD node, score
RETURN node.uid, node.title, score
-- Vector search (FULL tier only, 1024-dim BAAI/bge-large-en-v1.5)
CALL db.index.vector.queryNodes('entity_embedding_idx', 10, $embedding)
YIELD node, score
RETURN node.uid, node.title, score
All DDL is idempotent (IF NOT EXISTS) — safe on every startup.
Best Practices
1. Always Use Parameters
// GOOD - parameterized
MATCH (t:Task {uid: $uid})
// BAD - string interpolation (SQL injection risk)
MATCH (t:Task {uid: '${uid}'})
Exception: labels, property names, and relationship types cannot be parameterized in Neo4j. SKUEL validates all interpolated values at the infrastructure boundary:
# Shared guards in _helpers.py (used by all 5 query builder modules)
from adapters.persistence.neo4j.query.cypher._helpers import validate_label, validate_identifier
validate_label(label) # raises ValueError if not a known NeoLabel value
validate_identifier(field) # raises ValueError if not a safe identifier (^[a-zA-Z_][a-zA-Z0-9_]*$)
# Relationship types — also validated in _build_direction_pattern() (single choke point for mixin Cypher)
# Uses validate_relationship_type() from core/utils/validation_helpers.py
# Accepts RelationshipName enum values OR safe identifiers
# Field names — validated in _search_mixin.py, _user_entity_mixin.py, and all query builders
from core.utils.validation_helpers import validate_field_name
validate_field_name(name) # regex check, max 64 chars
Coverage: All 5 query builder modules (crud_queries.py, domain_queries.py, relationship_queries.py, semantic_queries.py, intelligence_queries.py) validate labels, field names, relationship types, and property keys before f-string interpolation. _build_direction_pattern() is the single choke point for mixin-level relationship Cypher (get_related_entities, get_related_uids, count_related). traverse() and find_path() validate pipe-separated patterns.
The same pattern applies to DDL (vector indexes, schema creation) — validate label, field_name, and similarity before building the query string. See adapters/persistence/neo4j/neo4j_schema_manager.py for the pattern.
2. Use OPTIONAL MATCH for Nullable Relationships
// GOOD - returns task even without knowledge
MATCH (t:Task {uid: $uid})
OPTIONAL MATCH (t)-[:APPLIES_KNOWLEDGE]->(ku:Curriculum)
// RISKY - returns nothing if no knowledge relationship
MATCH (t:Task {uid: $uid})-[:APPLIES_KNOWLEDGE]->(ku:Curriculum)
3. Use COLLECT to Prevent Cartesian Products
// GOOD - one row per task
MATCH (t:Task {uid: $uid})
OPTIONAL MATCH (t)-[:APPLIES_KNOWLEDGE]->(ku:Curriculum)
OPTIONAL MATCH (t)-[:FULFILLS_GOAL]->(g:Goal)
RETURN t, collect(DISTINCT ku) as knowledge, collect(DISTINCT g) as goals
// BAD - cartesian product of knowledge × goals
MATCH (t:Task {uid: $uid})
OPTIONAL MATCH (t)-[:APPLIES_KNOWLEDGE]->(ku:Curriculum)
OPTIONAL MATCH (t)-[:FULFILLS_GOAL]->(g:Goal)
RETURN t, ku, g
4. Use RelationshipName Enum (SKUEL013)
from core.models.relationship_names import RelationshipName
# GOOD - type-safe, IDE autocomplete
query = f"MATCH (a)-[:{RelationshipName.REQUIRES_KNOWLEDGE.value}]->(b)"
# GOOD - multi-line with Neo4j property maps (escape braces!)
query = f"""
MATCH (parent:Entity {{uid: $uid}})-[:{RelationshipName.HAS_SUBTASK.value}]->(child)
RETURN child
"""
# BAD - typo-prone, no compile-time check
query = "MATCH (a)-[:REQURES_KNOWLEDGE]->(b)" # typo!
5. Check Ownership for Multi-Tenant Security
// GOOD - ownership verified via universal OWNS relationship
MATCH (u:User {uid: $user_uid})-[:OWNS]->(t:Task {uid: $task_uid})
RETURN t
// BAD - no ownership check (security risk)
MATCH (t:Task {uid: $task_uid})
RETURN t
Note: The OWNS relationship is the universal ownership pattern. Domain-specific variants (HAS_TASK, HAS_GOAL, etc.) exist in RelationshipName but OWNS is what the backends use.
6. Per-Query Server-Side Timeout (TimedDriver)
Every query through the shared driver carries a server-side per-tx ceiling. Default 120s (env NEO4J_TRANSACTION_TIMEOUT); a runaway is aborted by the Neo4j server, not by the client hanging. Bulk ingestion is already wrapped to 600s; MEGA-QUERY and analytics inherit the default. Startup DDL (Neo4jSchemaManager) is intentionally untimed (raw driver).
If a specific query legitimately needs longer, wrap the call site:
from adapters.persistence.neo4j.timed_driver import (
neo4j_query_timeout,
unbounded_neo4j_query_timeout,
)
# Bound the enclosed block to 300s instead of the default 120s:
with neo4j_query_timeout(300.0):
async with self._driver.session() as session:
result = await session.run(long_running_aggregation, params)
# Escape hatch for one-off admin maintenance through the wrapped driver:
with unbounded_neo4j_query_timeout():
...
Rule: Don't wrap by default — the 120s ceiling exists to catch unintended runaways (a Cartesian explosion, a typo'd MATCH with no anchor). Only wrap when you know the query is legitimately long-running. The with block MUST enclose the full await chain — the override is a ContextVar read at call time, so awaited work outside the block is unbounded by it.
See: docs/patterns/NEO4J_QUERY_TIMEOUT.md for the override mechanism, when-to-wrap table, and the ContextVar + asyncio.create_task caveat.
7. Schema-Change Monitoring (opt-in)
SchemaChangeDetector (core/services/schema_change_detector.py) fingerprints the live Neo4j schema (labels, indexes, constraints, relationship types) and, on drift, invalidates the adapter's lazily-built query-optimization caches (_index_aware_builder, _enhanced_templates) via the auto-registered AdaptiveOptimizationHandler.
It is exposed as an on-demand capability on the adapter — Neo4jAdapter.check_schema_changes(), initialize_schema_monitoring(), stop_schema_monitoring() — and is wired into the composition root as an opt-in background poll:
# In .env — both default off / 900s
NEO4J_SCHEMA_MONITORING=true # start the background poll at startup
NEO4J_SCHEMA_MONITORING_INTERVAL=900 # poll interval (seconds); must be ≥ 1
- Off by default. Gated by
config.database.schema_monitoring_enabled, not byINTELLIGENCE_TIER— it's plain graph infrastructure (no API calls), so it can run in either tier. Keeping it off by default preserves the CORE-tier "no background workers" guarantee. - Where it's wired.
services_bootstrap/compose.pycallsinitialize_schema_monitoring()right after the startup DDL sync (so it baselines against the freshly-synced schema);shutdown_skuelcallsstop_schema_monitoring(). The detector owns its ownasynciopoll task, which lives on the single loop shared by bootstrap andserver.serve(). - Non-fatal. A failed start warns and continues — monitoring is an optimization, never a correctness gate.
- Interval is validated at the env boundary (
DatabaseConfig.from_envrejects values < 1): a non-positive interval is truthy and would makeasyncio.sleep(<=0)busy-spin Neo4j introspection.
Rule: Don't enable it where the schema is static after startup DDL (the common case) — it catches nothing at runtime and adds periodic introspection load. Enable it only where schema genuinely drifts mid-session.
8. Coerce string-stored temporals in comparisons
Date/datetime fields are stored as ISO strings (DTO .isoformat()), so comparing them directly to date()/datetime() yields null and silently drops rows. Wrap the stored side: datetime(n.created_at) >= datetime($w). datetime() is universally safe (parses date and datetime strings, no-op on natives); date() errors on a datetime string → use date(datetime(field)). The writer decides the type: DTO .isoformat() → string (coerce); Cypher = datetime() → native (leave). See PATTERNS.md Pattern 10 + Key Rules #17–18.
9. Relationship reads/writes go through real, config-keyed methods
UnifiedRelationshipService has no __getattr__ — calling a method it doesn't define is an AttributeError, and get_related_uids(method_key, uid) takes an exact DomainRelationshipConfig method-key that fails closed on a typo. Don't invent get_<x>_<y> methods or guess keys; never trust a mocked relationship service (it resolves any attribute). See /docs/patterns/UNIFIED_RELATIONSHIP_SERVICE.md § Phantom methods & keys.
Additional Resources
- reference.md - Complete relationship type catalog (80+ types)
- examples.md - Full query examples for each domain
- docs/patterns/NEO4J_QUERY_TIMEOUT.md - Per-query server-side timeout (TimedDriver, override mechanism)
- ADR-064 - Why the chokepoint is a driver wrapper, not 124 call-site edits
Related Skills
- skuel-search-architecture - Unified search using Cypher patterns
- python - Python services executing Cypher queries
Deep Dive Resources
Architecture:
- Query Architecture - Graph database architecture
- RELATIONSHIPS_ARCHITECTURE.md - Lateral relationship types, service API, Cypher patterns
- ADR-037 - Lateral relationships visualization
Patterns:
- query_architecture.md - Query architecture patterns
Code:
/core/models/relationship_names.py- RelationshipName enum (source of truth for all 80+ relationship types)
Foundation
This skill has no prerequisites. It is a foundational pattern.
See Also
/docs/patterns/query_architecture.md- Query architecture documentation/docs/patterns/query_architecture.md- Database architecture/core/models/relationship_names.py- RelationshipName enum (source of truth)