neo4j-gds-skill

star 82

Neo4j Graph Data Science (GDS) embedded plugin via Python client or Cypher — covers GraphDataScience, gds.v2 plugin endpoints, gds.version, native projection, Cypher projection, graph catalog operations, stream/stats/mutate/write modes, memory estimation, PageRank, Louvain, WCC, FastRP, KNN, Node Similarity, ML pipelines, and cleanup. Use for Aura Pro, self-managed, local, or offline Neo4j DBMS with the GDS plugin installed. Does NOT cover Aura Graph Analytics GDS Sessions, AuraGraphDataScience, GdsSessions, gds.graph.project.remote, or AuraDB Cypher API projection/session management — use neo4j-aura-graph-analytics-skill. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT cover driver setup — use neo4j-driver-python-skill or other driver skill.

neo4j-contrib By neo4j-contrib schedule Updated 6/9/2026

name: neo4j-gds-skill description: Neo4j Graph Data Science (GDS) embedded plugin via Python client or Cypher — covers GraphDataScience, gds.v2 plugin endpoints, gds.version, native projection, Cypher projection, graph catalog operations, stream/stats/mutate/write modes, memory estimation, PageRank, Louvain, WCC, FastRP, KNN, Node Similarity, ML pipelines, and cleanup. Use for Aura Pro, self-managed, local, or offline Neo4j DBMS with the GDS plugin installed. Does NOT cover Aura Graph Analytics GDS Sessions, AuraGraphDataScience, GdsSessions, gds.graph.project.remote, or AuraDB Cypher API projection/session management — use neo4j-aura-graph-analytics-skill. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT cover driver setup — use neo4j-driver-python-skill or other driver skill. version: 1.0.1 allowed-tools: Bash WebFetch

When to Use

  • Running GDS algorithms against embedded GDS plugin through Python client (graphdatascience)
  • Running GDS algorithms through CALL gds.* Cypher procedures
  • Aura Pro, self-managed Neo4j, local Neo4j, or offline DBMS with GDS plugin installed
  • Projecting named in-memory graphs, running centrality/community/similarity/path/embedding algorithms
  • Chaining algorithms via mutate mode; building FastRP → KNN pipelines
  • Writing node embeddings for Neo4j vector indexes / structural similarity search
  • Memory estimation before large graph operations

When NOT to Use

  • Aura Graph Analytics Sessions / AGA / GdsSessions / AuraGraphDataScienceneo4j-aura-graph-analytics-skill
  • AuraDB Cypher API with { memory: ... } or { sessionId: ... }neo4j-aura-graph-analytics-skill
  • Cypher query authoringneo4j-cypher-skill
  • Driver/connection setupneo4j-driver-python-skill
  • GraphRAG retrievalneo4j-graphrag-skill
  • Creating/querying vector indexes over written embeddingsneo4j-vector-index-skill
Context Use
Aura Pro with GDS plugin This skill
Self-managed/local/offline Neo4j with GDS plugin This skill
AuraDB serverless analytics session neo4j-aura-graph-analytics-skill
Self-managed Neo4j attached to AGA session neo4j-aura-graph-analytics-skill
Non-Neo4j data source neo4j-aura-graph-analytics-skill

Pre-flight

Use only with embedded GDS plugin.

from graphdatascience import GraphDataScience

gds = GraphDataScience("neo4j+s://xxx.databases.neo4j.io", auth=("neo4j", "pw"), aura_ds=True)
gds = GraphDataScience("bolt://localhost:7687", auth=("neo4j", "password"))
print(gds.server_version())
RETURN gds.version() AS gds_version

If Unknown function 'gds.version' → GDS plugin unavailable. AuraDB serverless analytics → neo4j-aura-graph-analytics-skill. Self-managed/local → install or enable GDS plugin.

pip install graphdatascience              # Python client
pip install graphdatascience[rust_ext]    # 3–10× faster serialization

Compatibility: graphdatascience v1.22 — GDS >= 2.6 and < 2.28 / < 2026.6, Python >= 3.10 and < 3.15, Neo4j Driver >= 4.4.12 and < 7.0.

V2 rules:

  • Prefer gds.v2.* when endpoint exists.
  • Use snake_case endpoints and parameters: page_rank, fast_rp, mutate_property, write_property.
  • Use typed result attributes: result.write_millis, not result["writeMillis"].
  • Use v1 if v2 endpoint missing/incompatible; label fallback.

Graph Catalog Operations

Native Projection

CALL gds.graph.project(
  'myGraph',
  ['Person', 'City'],
  { KNOWS: { orientation: 'UNDIRECTED' }, LIVES_IN: {} }
)
YIELD graphName, nodeCount, relationshipCount
G, result = gds.v2.graph.project("myGraph", "Person", "KNOWS")
print(result.node_count, result.relationship_count)

G, result = gds.v2.graph.project(
    "myGraph",
    {"Person": {"properties": ["age", "score"]}, "City": {}},
    {"KNOWS": {"orientation": "UNDIRECTED"}, "LIVES_IN": {"properties": ["since"]}}
)

Native projection: plugin/simple Python-client workflow only. AGA Sessions → neo4j-aura-graph-analytics-skill. V1 fallback: gds.graph.project(...).

Cypher Projection (use for new Cypher workflows, filters, transforms)

G, result = gds.graph.cypher.project(
    """
    MATCH (source:Person)-[r:KNOWS]->(target:Person)
    WHERE source.active = true
    RETURN gds.graph.project($graph_name, source, target,
        { sourceNodeProperties: source { .score }, relationshipType: 'KNOWS' })
    """,
    database="neo4j", graph_name="activeGraph"
)

gds.graph.cypher.project must end with one RETURN gds.graph.project(...) clause. If validation fails: use gds.run_cypher(...), then gds.graph.get("graphName"). Use v1 gds.graph.cypher.project(...) if v2 graph projection cannot express required filter/transform.

AGA Sessions → neo4j-aura-graph-analytics-skill; never use plugin Cypher projection.

Undirected Projection

Native projection: set orientation: 'UNDIRECTED' per relationship type. Plugin Cypher projection: set undirectedRelationshipTypes: ['*'] in fifth gds.graph.project(...) config argument.

Leiden is defined for directed and undirected graphs. Project undirected relationships when community structure is naturally symmetric.

Inspect and Drop

G.node_count()              # 12_043
G.relationship_count()      # 87_211
G.node_properties()         # projected + mutated properties by label
G.relationship_properties() # projected + mutated properties by type
G.size_in_bytes()
gds.v2.graph.drop(G)        # frees JVM heap

G = gds.v2.graph.get("myGraph")       # re-attach to existing projection

gds.v2.graph.list()

Memory Estimation — run before large projections and algorithms

CALL gds.graph.project.estimate(['Person'], 'KNOWS')
YIELD requiredMemory, bytesMin, bytesMax, nodeCount, relationshipCount
G, project_result = gds.v2.graph.project("myGraph", "Person", "KNOWS")
print(project_result.node_count)

# Algorithm estimation:
est = gds.v2.page_rank.estimate(G, damping_factor=0.85)
print(est.required_memory)

Projection estimate fallback: use v1 gds.graph.project.estimate(...) if v2 estimate endpoint unavailable.


Execution Modes

Mode Side effect Returns Use when
stream None Row per node/pair Inspect results; top-N
stats None Single aggregate row Summary/convergence check
mutate Adds node property or relationship type/property to in-memory graph only Stats row Chain algorithms
write Persists node property or relationship to Neo4j DB Stats row Final step — make queryable

Pattern: stream to verify → mutate to chain → write to persist.

mutate_property must not exist in the in-memory graph. Relationship algorithms such as KNN also require mutate_relationship_type. After write, re-project to use written properties in subsequent GDS calls (in-memory graph does not see DB writes).


gds.util.asNode() — Enrich Stream Results

stream mode yields nodeId (internal GDS integer). gds.util.asNode(nodeId) translates it back to the DB node so you can access properties.

// Single property
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10

// Multiple properties — convert once with WITH
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name AS name, node.born AS born, score
ORDER BY score DESC LIMIT 10

Not needed for write, mutate, or stats modes — those don't return per-node data.


Core Algorithms

PageRank (centrality)

CALL gds.pageRank.stream('myGraph', { dampingFactor: 0.85, maxIterations: 20 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC LIMIT 10
// score: relative influence — not absolute. Compare within same run only.
// didConverge: true means score stabilized; if false, increase maxIterations.

CALL gds.pageRank.write('myGraph', { writeProperty: 'pagerank', dampingFactor: 0.85 })
YIELD nodePropertiesWritten, ranIterations, didConverge
pr_df = gds.v2.page_rank.stream(G, damping_factor=0.85)
mutate_result = gds.v2.page_rank.mutate(G, mutate_property="pagerank", damping_factor=0.85)
write_result = gds.v2.page_rank.write(G, write_property="pagerank", damping_factor=0.85)
print(write_result.write_millis)

Louvain (community detection)

CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId

CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity
louvain_df = gds.v2.louvain.stream(G)
write_result = gds.v2.louvain.write(G, write_property="community")
print(write_result.community_count)

Leiden is a refinement of Louvain avoiding poorly connected communities — use when community quality > raw speed. modularity in stats result: range -0.5 to 1.0. [field] Values > 0.3 often indicate meaningful community structure; > 0.7 is strong. Leiden is defined for directed and undirected graphs. Project undirected relationships when community structure is naturally symmetric.

WCC — Weakly Connected Components

Run WCC first to understand graph structure; partition disconnected graphs before expensive algorithms.

CALL gds.wcc.stream('myGraph', { minComponentSize: 10 })
YIELD nodeId, componentId

CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount
wcc_df = gds.v2.wcc.stream(G)
write_result = gds.v2.wcc.write(G, write_property="componentId")
print(write_result.node_properties_written)

Betweenness Centrality

gds.v2.betweenness_centrality.stream(G)          # identifies bottleneck/bridge nodes
gds.v2.betweenness_centrality.write(G, write_property="betweenness")

Node Similarity

Jaccard similarity from common neighbors — no node properties required.

gds.v2.node_similarity.stream(G, similarity_cutoff=0.1, top_k=10)
gds.v2.node_similarity.write(G, write_relationship_type="SIMILAR", write_property="score",
                             similarity_cutoff=0.1, top_k=10)

FastRP (node embeddings)

Fast, scalable, production ML pipelines. Set randomSeed for reproducibility.

CALL gds.fastRP.mutate('myGraph', {
  embeddingDimension: 256,
  iterationWeights: [0.0, 1.0, 1.0],
  featureProperties: ['score'],
  propertyRatio: 0.5,
  normalizationStrength: -0.5,
  randomSeed: 42,
  mutateProperty: 'embedding'
})
YIELD nodePropertiesWritten
gds.v2.fast_rp.mutate(G, embedding_dimension=256, iteration_weights=[0.0, 1.0, 1.0],
                      random_seed=42, mutate_property="embedding")
write_result = gds.v2.fast_rp.write(G, embedding_dimension=256, write_property="embedding",
                                    random_seed=42)
print(write_result.write_millis)

For ANN search over structural embeddings, after write, create a Neo4j vector index over the written property. Use neo4j-vector-index-skill.

KNN — K-Nearest Neighbors

Finds k most similar nodes per node based on node properties (typically embeddings).

CALL gds.knn.stream('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  sampleRate: 0.5, similarityCutoff: 0.7
})
YIELD node1, node2, similarity

CALL gds.knn.write('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  writeRelationshipType: 'SIMILAR', writeProperty: 'score'
})
YIELD relationshipsWritten
knn_df = gds.v2.knn.stream(G, node_properties=["embedding"], top_k=10)
gds.v2.knn.write(G, node_properties=["embedding"], top_k=10,
                 write_relationship_type="SIMILAR", write_property="score")

FastRP → KNN Pipeline (recommendation)

# 1. Project
G, _ = gds.v2.graph.project("myGraph", "Product",
    {"BOUGHT_TOGETHER": {"orientation": "UNDIRECTED"}})

# 2. Estimate memory
print(gds.v2.fast_rp.estimate(G, embedding_dimension=128).required_memory)

# 3. Embed
gds.v2.fast_rp.mutate(G, embedding_dimension=128, random_seed=42, mutate_property="emb")

# 4. Similarity
gds.v2.knn.write(G, node_properties=["emb"], top_k=10,
                 write_relationship_type="SIMILAR", write_property="score")

# 5. Cleanup
gds.v2.graph.drop(G)

Algorithm Selection

Goal Algorithm
Influence via network links PageRank / ArticleRank
Bottleneck / bridge nodes Betweenness Centrality
Direct connections Degree Centrality
Community (general, fast) Louvain
Community (higher quality) Leiden
Is graph connected? WCC (run first)
Similarity from embeddings KNN
Similarity from neighbors Node Similarity
Shortest path (positive weights) Dijkstra / A*
k alternative paths Yen's
Fast scalable embeddings FastRP
Feature-rich nodes GraphSAGE (gds.beta.graphSage)

Full algorithm catalog → references/algorithms.md


Common Errors

Error Cause Fix
Unknown function 'gds.version' Embedded GDS plugin unavailable AGA → neo4j-aura-graph-analytics-skill; self-managed/local → install plugin
Insufficient heap memory / OOM Graph too large for available JVM heap Run gds.graph.project.estimate; increase dbms.memory.heap.max_size
Procedure not found: gds.leiden Older or incompatible GDS Check CALL gds.list() for available procedures; upgrade GDS or use Louvain
Node property 'X' not found after mutate Property not projected or wrong graph name Verify G.node_properties() includes the property; check mutate_property spelling
Graph 'myGraph' already exists Leftover projection from failed run CALL gds.graph.drop('myGraph') or gds.v2.graph.drop(G)
mutate_property already exists Re-running algorithm on same projection Drop and re-project, or use different mutate_property name
No algorithm results Source/target node not in projection Verify node labels/rel types match projection; check G.node_count()

Full Workflow

  1. Create gds with GraphDataScience(...).
  2. Verify plugin: gds.server_version() or RETURN gds.version().
  3. Estimate memory: gds.graph.project.estimate(...) and algorithm .estimate(...).
  4. Project named graph with gds.v2.graph.project(...).
  5. Run gds.v2.*.stream first; switch to mutate; use write only when satisfied.
  6. Drop graph with gds.v2.graph.drop(G).
  7. Use v1 only for endpoints missing in v2, such as plugin Cypher projection.

Built-in test datasets: gds.v2.graph.datasets.load_cora(), gds.v2.graph.datasets.load_karate_club(), gds.v2.graph.datasets.load_imdb()


MCP Tool Mapping

Operation MCP tool
RETURN gds.version() read-cypher
gds.pageRank.stream(...) read-cypher
gds.pageRank.write(...) write-cypher
gds.graph.drop(...) write-cypher
List available procedures read-cypherCALL gds.list()

Before any write-cypher: show exact Cypher, expected nodes/relationships affected, and ask for confirmation. For algorithm write mode, estimate or run stats first when available.


References


Checklist

  • Embedded GDS plugin confirmed with gds.version() or gds.server_version()
  • Graph/algorithm memory estimated before large work
  • Python examples prefer gds.v2.*, snake_case params, typed result attributes
  • v1 APIs used only as explicit fallback
  • Projection uses native or plugin Cypher projection; no gds.graph.project.remote(...)
  • Named graph dropped after use (gds.v2.graph.drop(G) or v1 fallback)
  • Execution mode chosen: stream (inspect) → mutate (chain) → write (persist)
  • write_property/mutate_property checked for collision with existing properties
  • randomSeed set for reproducible embeddings
  • WCC run first on graphs that may be disconnected
Install via CLI
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-gds-skill
Repository Details
star Stars 82
call_split Forks 31
navigation Branch main
article Path SKILL.md
More from Creator
neo4j-contrib
neo4j-contrib Explore all skills →