graphify

star 7

Knowledge graph development for LLM applications. Graph storage selection, graph algorithms, extraction from documents/code/websites, entity extraction, Graph RAG, and visualization. Use when building knowledge graphs, graph databases, or LLM graph applications.

v1truv1us By v1truv1us schedule Updated 6/2/2026

name: graphify description: Knowledge graph development for LLM applications. Graph storage selection, graph algorithms, extraction from documents/code/websites, entity extraction, Graph RAG, and visualization. Use when building knowledge graphs, graph databases, or LLM graph applications. metadata: category: user-invoked disable-model-invocation: true

Graphify

Build knowledge graphs for LLM applications. Knowledge graphs improve AI responses by providing structured context with relationships, enable retrieval-augmented generation with graph traversal, and support agentic workflows with graph-defined tools.

When to Use

  • Building knowledge bases with relationships (not just chunks)
  • Implementing Graph RAG for better recall
  • Extracting structure from unstructured documents
  • Building agent memories with relationships
  • Creating recommendation systems
  • Analyzing code dependencies

1. Graph Storage Selection

Choose storage based on query patterns and scale requirements.

In-Memory Graph

Use for: prototyping, small graphs (<10K nodes), single-machine apps

// Example: GraphLib or native Map/Set
const graph = new Map<string, Set<string>>();

PostgreSQL with Extensions

Use when: already using PostgreSQL, need ACID compliance, moderate scale

  • pggraph: Native graph support via extensions
  • Works with existing Postgres infrastructure

Neo4j

Use when: complex relationship queries, Cypher proficiency, managed needed

  • Best for: traversals, path finding, graph algorithms
  • Avoid if: simple queries dominate

Redis

Use when: caching, real-time, ephemeral graphs

  • Best for: session graphs, rate limiting, recent activity

AWS Neptune

Use when: managed, need Gremlin/SPARQL, AWS ecosystem

  • Serverless option available
  • Integration with AWS services

Decision Matrix:

Scenario Recommended
Prototyping In-memory
Already on Postgres PostgreSQL
Complex traversals Neo4j
Caching/real-time Redis
Managed AWS Neptune
Knowledge base Neo4j or PostgreSQL

2. Graph Algorithms

Select algorithm based on the question you're answering.

Traversal (BFS/DFS)

Use for: exploration, finding any path, connectivity

  • BFS: Shortest unweighted path, level-by-level
  • DFS: Deep exploration, cycle detection
// BFS for shortest path
function bfs(graph, start, goal) {
  const queue = [[start]];
  const visited = new Set([start]);
  while (queue.length) {
    const path = queue.shift();
    const node = path[path.length - 1];
    if (node === goal) return path;
    for (const neighbor of graph.get(node) || []) {
      if (!visited.has(neighbor)) {
        visited.add(neighbor);
        queue.push([...path, neighbor]);
      }
    }
  }
}

Shortest Path (Dijkstra, A*)

Use for: weighted routing, travel time, cost optimization

Centrality Measures

Use for: identifying important nodes

  • PageRank: Importance via links/votes
  • Betweenness: Bridge identification
  • Degree: Direct influence

Community Detection

Use for: clustering, segmentation

  • Louvain: Large-scale community detection
  • Label Propagation: Fast clustering

When to Use Each:

Question Algorithm
How do I get from A to B? BFS/Dijkstra
What's the best order? Topological sort
What's most important? PageRank
Who are the bridges? Betweenness
What groups exist? Louvain

3. Graph Extraction Sources

Extract graphs from different data sources.

From Documents (PDF, Markdown)

Process: chunk → extract entities → extract relationships

// Extract entities from text chunk
prompt = `Extract entities from: {chunk}
Entities as JSON: { "entities": [{"id": "...", "type": "...", "name": "..."}] }`;

From Code (AST Parsing)

Extract: imports, function calls, class relationships

// Dependency graph from imports
imports.map(file => ({
  source: file.path,
  targets: file.imports,
  type: 'imports'
}));

From Websites

Link graphs from HTML parsing

// Extract links
links = html.querySelectorAll('a[href]')
  .map(a => ({ source: pageUrl, target: a.href, type: 'links_to' }));

From SQL

Schema graphs: tables, columns, foreign keys

// Extract schema relationships
foreignKeys.map(fk => ({
  source: fk.fromTable,
  target: fk.toTable,
  type: 'references',
  via: fk.column
}));

From JSON/YAML

Configuration graphs

// Dependencies from package.json
deps.map(d => ({ source: 'package', target: d.name, type: 'depends_on' }));

4. LLM Graph Construction

Build graphs using LLMs for entity and relationship extraction.

Entity Extraction Prompt

Extract all entities from the following text.
For each entity, provide: id, type, name, description.

Text: {text}

Output as JSON array:

Relationship Extraction Prompt

Extract relationships between these entities.
For each relationship: source, target, type, confidence (0-1).

Entities: {entities}

Relationships:

Relationship Confidence

  • Use LLM to provide confidence scores
  • Filter by threshold (e.g., confidence > 0.7)
  • Allow incremental updating

Semantic Search with Embeddings

// Embed entities for semantic search
entities.forEach(entity => {
  entity.embedding = embed(entity.name + ' ' + entity.description);
});

// Query: find similar entities
similar = vectorSearch(queryEmbedding, entities, topK: 10);

Incremental Graph Building

  1. Process new document
  2. Extract entities (match existing → link, new → add)
  3. Extract relationships (add/update)
  4. Update embeddings

5. LLM Graph Integration

Use graphs with LLMs for improved retrieval.

Graph RAG Pattern

Context from knowledge graph:
{graph_context}

Question: {question}

Based on the graph context above, answer:

Graph retrieval steps:

  1. Convert question to graph query
  2. Traverse relevant subgraphs
  3. Include relationship context in prompt

Graph Tools for Agents

Define tools from graph structure:

// Graph-defined tools
const tools = graph.nodes.map(node => ({
  name: `query_${node.type}`,
  description: `Query ${node.type} entities`,
  parameters: { ... }
}));

Subagent Orchestration via Graph

// Route through graph
function orchestrate(query, graph) {
  const relevant = graph.query(query);
  const agent = selectAgent(relevant.type);
  return agent.execute(query, relevant.context);
}

Hybrid RAG: Vector + Graph

Approach Best For
Vector only Similarity search
Graph only Relationship queries
Hybrid Both similarity + relationships

Execute both, combine results.


6. Graph Visualization

Choose visualization based on context.

Mermaid

For documentation, README files:

graph TD
    A[User] --> B[Login]
    B --> C[Dashboard]
    C --> D[Query Graph]
    D --> E[Results]

D3.js

For interactive web applications:

// D3 force-directed graph
const simulation = d3.forceSimulation(nodes)
  .force('link', d3.forceLink(links).id(d => d.id))
  .force('charge', d3.forceManyBody())
  .force('center', d3.forceCenter(width / 2, height / 2));

Graphviz (DOT)

For static diagrams:

digraph {
  User -> Login -> Dashboard
  Dashboard -> Query
  Query -> Graph
}

Selection Guide:

Context Recommended
Documentation Mermaid
Web app D3.js
Static analysis Graphviz
CLI output ASCII

Process Summary

Step 1: Choose Storage

Start simple, upgrade as needed

Step 2: Extract Graph

  • From documents → chunk + LLM extraction
  • From code → AST parsing
  • From existing data → schema extraction

Step 3: Build Incrementally

  • Process documents
  • Deduplicate entities
  • Add relationships
  • Update embeddings

Step 4: Integrate with LLM

  • Graph RAG for retrieval
  • Graph tools for agents

Step 5: Visualize

As needed for debugging/documentation


Common Mistakes

Mistake Reality
"Start with Neo4j" Start in-memory, upgrade when needed
"Extract everything" Focus on useful relationships
"Graph replaces vector" Use hybrid approach
"One-time build" Graphs need maintenance

Verification

  • Graph storage selected based on query patterns
  • Algorithm chosen for actual questions
  • Extraction working from primary sources
  • Entity/relationship extraction prompts defined
  • Graph RAG pattern implemented
  • Visualization working for debugging

Anti-Rationalization Table

Excuse Counter
"Start with Neo4j" Start in-memory, upgrade when needed. Premature infrastructure adds operational cost.
"Extract everything" Focus on useful relationships. Over-extraction creates noise and slows queries.
"Graph replaces vector" Use hybrid approach. Graph and vector complement each other, they do not compete.
"One-time build is enough" Graphs need maintenance. Stale graphs produce stale answers.
"I don't need confidence scores on relationships" Without confidence, you cannot filter low-quality edges. Scores enable quality control.
Install via CLI
npx skills add https://github.com/v1truv1us/ai-eng-system --skill graphify
Repository Details
star Stars 7
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator