neo4j-reference - SKILL.md Agent Skill

name: neo4j-reference description: Neo4j documentation — Cypher queries, Python driver v6, vector indexes, APOC, constraints, query tuning, and GraphRAG patterns. Use when writing Cypher, working with the graph repository, or optimizing database operations. allowed-tools: Read, Grep, Glob

Neo4j Onboarding Guide: Research & Best Practices

This document serves as the primary onboarding guide for any developer joining the LLMitM v2 project. It provides a high-level map of the official Neo4j documentation, followed by a curated set of deep-dive research reports that are essential for understanding our architecture, design patterns, and implementation choices. Each report is summarized to explain its relevance and provide context for why it is required reading.

Neo4j Documentation Map

This section provides a comprehensive, hierarchically structured map of the Neo4j documentation, based on the exhaustive research conducted. It is designed to serve as a top-level exploration guide with heavy cross-linking to the official documentation for every feature.

Cypher Manual (Entry Point)
- Core Concepts
- Clauses
- Advanced Patterns & Expressions
- Functions
  - Complete Function Reference
  - Vector Functions
- Performance & Administration
Create Applications (Entry Point)
GenAI (Entry Point)
Tools & Ecosystem (Entry Point)
- Neo4j Workspace (Bloom & Explore)
- Neo4j Browser
- Neo4j Data Importer
- Neo4j Desktop
- Libraries & Protocols
  - APOC Library
  - Model Context Protocol (MCP) Server
    - Exposed Tools (read-cypher, get-schema)
    - Configuration

Curated Research Reports

1. Cypher Language Deep Dive

Cypher Clauses and Query Patterns

This report provides a foundational understanding of all major Cypher clauses, from basic reads and writes to advanced transformations. It is essential reading for any developer to understand how the agent interacts with the graph at a fundamental level. The document covers pattern matching, filtering, data manipulation, and query chaining, providing the core vocabulary for all graph operations.

Cypher Subqueries and Advanced Patterns

Building on the basics, this document explores the advanced patterns required for complex agentic reasoning. It details the four types of subqueries (CALL, EXIST, COUNT, COLLECT) and how they enable conditional logic and inline data aggregation. It also covers Quantified Path Patterns (QPP) for traversing variable-length chains, which is critical for walking our ActionGraph step sequences.

Cypher Functions and Expressions

This report catalogs the extensive library of over 100 built-in Cypher functions, including list, string, mathematical, and vector operations. It also covers the use of CASE statements and list/pattern comprehensions for inline data transformation. A developer must understand these tools to write efficient and expressive queries for the GraphRepository.

Cypher Indexes, Constraints, and Tuning

This document covers the critical administrative aspects of managing a Neo4j database. It details all index types (range, text, vector) and constraint types (unique, existence, key) that ensure data integrity and query performance. It also explains how to use EXPLAIN and PROFILE to debug and optimize query performance, a mandatory skill for maintaining a healthy production system.

2. GenAI & Vector Search

Vector Indexes Deep Dive

This is one of the most critical documents for understanding the core of our agent's long-term memory and fuzzy lookup capabilities. It provides a deep dive into Neo4j's HNSW-based vector indexes, covering their creation, configuration, and the two methods for querying (procedure vs. SEARCH clause). Understanding this is essential for working on the context assembly and fingerprint matching components of the orchestrator.

GenAI Plugin: Embeddings and Text Generation

This report details the genai plugin, which allows for embedding generation and LLM text completion directly within Cypher queries. This capability is used in our self-repair classification logic as a fallback for ambiguous errors. A developer should understand how to call these procedures and configure the supported LLM providers.

GraphRAG Retrievers

This document explores the neo4j-graphrag-python package, focusing on its powerful retriever classes. It specifically highlights the VectorCypherRetriever, which implements the four-layer retrieval pipeline (vector search -> graph traversal -> business logic -> rerank) that is central to our context assembly strategy. This pattern is the foundation of how our agent gathers relevant knowledge from the graph before invoking the LLM.

GraphRAG Knowledge Graph Builder and Pipeline

While not used in the core runtime of LLMitM v2, this report is important for understanding how we can build and enrich our knowledge graph from unstructured data sources (e.g., threat intelligence reports). It covers the SimpleKGPipeline for entity/relationship extraction, which is a key part of our data ingestion and graph maintenance strategy.

GenAI Blog Articles and Best Practices

This report synthesizes key insights and architectural patterns from Neo4j's official GenAI blog. It covers topics like Text2Cypher best practices, the RAG vs. Fine-Tuning debate, and real-world case studies. This provides valuable context and reinforces the design decisions made in our project.

GenAI Ecosystem Overview

This document provides a high-level overview of the entire Neo4j GenAI ecosystem, including partner integrations and related labs projects. It helps a developer understand where our project fits within the broader landscape of graph-based AI. It also introduces the concept of the neo4j-agent-memory library, which informs our own memory and state management design.

3. Application Development & Integration

Python Driver v6 Comprehensive Guide

This is the definitive guide to the official Neo4j Python driver, which is the exclusive interface between our Python application and the database. It covers everything from basic connection and querying to advanced transaction management and performance tuning. Every developer must be proficient with the patterns in this document, as they are used throughout the GraphRepository.

Python Driver Advanced Patterns

This report builds on the comprehensive guide, focusing on advanced topics like causal consistency with bookmarks, concurrency management, and the @unit_of_work decorator. These patterns are critical for ensuring data integrity and building a scalable, resilient system. Understanding these concepts is mandatory for anyone modifying the core orchestrator or repository logic.

GraphQL Library and Change Data Capture (CDC)

This document covers two key integration technologies. The GraphQL Library section explains how we can auto-generate a flexible API for our graph, which is used for external monitoring and data exploration tools. The Change Data Capture (CDC) section details how we can build a reactive architecture, allowing the agent to respond to real-time changes in the graph, which is a key part of our future roadmap for proactive self-repair.

4. Tools & Ecosystem

APOC Library

This report provides an overview of the APOC (Awesome Procedures on Cypher) library, which provides over 130 useful procedures and functions that extend Cypher. It covers data import/export, pathfinding algorithms, and periodic execution for batch operations. We use apoc.periodic.iterate for large-scale data migrations and updates.

MCP Server Deep Dive

This document details the Model Context Protocol (MCP) server, a standardized bridge for LLMs to interact with Neo4j. While our production system uses the deterministic GraphRepository, the MCP server is a valuable tool for debugging and interactive exploration. A developer should understand how to configure and use it to query the graph conversationally during development.

5. Operations

Backup, Restore & Snapshot Strategies

Covers all backup/restore methods for Neo4j Community Edition in our Docker setup: binary dump/load for fast snapshots, APOC Cypher export for git-tracked diffable snapshots, and online reset. Documents the dual-strategy approach (binary + Cypher), Makefile targets (make snapshot, make restore, make reset), and key gotchas (volume naming, healthchecks, schema separation).