name: graphify description: Knowledge graph development for LLM applications. Graph storage selection, graph algorithms, extraction from documents/code/websites, entity extraction, Graph RAG, and visualization. Use when building knowledge graphs, graph databases, or LLM graph applications. metadata: category: user-invoked disable-model-invocation: true
Graphify
Build knowledge graphs for LLM applications. Knowledge graphs improve AI responses by providing structured context with relationships, enable retrieval-augmented generation with graph traversal, and support agentic workflows with graph-defined tools.
When to Use
- Building knowledge bases with relationships (not just chunks)
- Implementing Graph RAG for better recall
- Extracting structure from unstructured documents
- Building agent memories with relationships
- Creating recommendation systems
- Analyzing code dependencies
1. Graph Storage Selection
Choose storage based on query patterns and scale requirements.
In-Memory Graph
Use for: prototyping, small graphs (<10K nodes), single-machine apps
// Example: GraphLib or native Map/Set
const graph = new Map<string, Set<string>>();
PostgreSQL with Extensions
Use when: already using PostgreSQL, need ACID compliance, moderate scale
- pggraph: Native graph support via extensions
- Works with existing Postgres infrastructure
Neo4j
Use when: complex relationship queries, Cypher proficiency, managed needed
- Best for: traversals, path finding, graph algorithms
- Avoid if: simple queries dominate
Redis
Use when: caching, real-time, ephemeral graphs
- Best for: session graphs, rate limiting, recent activity
AWS Neptune
Use when: managed, need Gremlin/SPARQL, AWS ecosystem
- Serverless option available
- Integration with AWS services
Decision Matrix:
| Scenario | Recommended |
|---|---|
| Prototyping | In-memory |
| Already on Postgres | PostgreSQL |
| Complex traversals | Neo4j |
| Caching/real-time | Redis |
| Managed AWS | Neptune |
| Knowledge base | Neo4j or PostgreSQL |
2. Graph Algorithms
Select algorithm based on the question you're answering.
Traversal (BFS/DFS)
Use for: exploration, finding any path, connectivity
- BFS: Shortest unweighted path, level-by-level
- DFS: Deep exploration, cycle detection
// BFS for shortest path
function bfs(graph, start, goal) {
const queue = [[start]];
const visited = new Set([start]);
while (queue.length) {
const path = queue.shift();
const node = path[path.length - 1];
if (node === goal) return path;
for (const neighbor of graph.get(node) || []) {
if (!visited.has(neighbor)) {
visited.add(neighbor);
queue.push([...path, neighbor]);
}
}
}
}
Shortest Path (Dijkstra, A*)
Use for: weighted routing, travel time, cost optimization
Centrality Measures
Use for: identifying important nodes
- PageRank: Importance via links/votes
- Betweenness: Bridge identification
- Degree: Direct influence
Community Detection
Use for: clustering, segmentation
- Louvain: Large-scale community detection
- Label Propagation: Fast clustering
When to Use Each:
| Question | Algorithm |
|---|---|
| How do I get from A to B? | BFS/Dijkstra |
| What's the best order? | Topological sort |
| What's most important? | PageRank |
| Who are the bridges? | Betweenness |
| What groups exist? | Louvain |
3. Graph Extraction Sources
Extract graphs from different data sources.
From Documents (PDF, Markdown)
Process: chunk → extract entities → extract relationships
// Extract entities from text chunk
prompt = `Extract entities from: {chunk}
Entities as JSON: { "entities": [{"id": "...", "type": "...", "name": "..."}] }`;
From Code (AST Parsing)
Extract: imports, function calls, class relationships
// Dependency graph from imports
imports.map(file => ({
source: file.path,
targets: file.imports,
type: 'imports'
}));
From Websites
Link graphs from HTML parsing
// Extract links
links = html.querySelectorAll('a[href]')
.map(a => ({ source: pageUrl, target: a.href, type: 'links_to' }));
From SQL
Schema graphs: tables, columns, foreign keys
// Extract schema relationships
foreignKeys.map(fk => ({
source: fk.fromTable,
target: fk.toTable,
type: 'references',
via: fk.column
}));
From JSON/YAML
Configuration graphs
// Dependencies from package.json
deps.map(d => ({ source: 'package', target: d.name, type: 'depends_on' }));
4. LLM Graph Construction
Build graphs using LLMs for entity and relationship extraction.
Entity Extraction Prompt
Extract all entities from the following text.
For each entity, provide: id, type, name, description.
Text: {text}
Output as JSON array:
Relationship Extraction Prompt
Extract relationships between these entities.
For each relationship: source, target, type, confidence (0-1).
Entities: {entities}
Relationships:
Relationship Confidence
- Use LLM to provide confidence scores
- Filter by threshold (e.g., confidence > 0.7)
- Allow incremental updating
Semantic Search with Embeddings
// Embed entities for semantic search
entities.forEach(entity => {
entity.embedding = embed(entity.name + ' ' + entity.description);
});
// Query: find similar entities
similar = vectorSearch(queryEmbedding, entities, topK: 10);
Incremental Graph Building
- Process new document
- Extract entities (match existing → link, new → add)
- Extract relationships (add/update)
- Update embeddings
5. LLM Graph Integration
Use graphs with LLMs for improved retrieval.
Graph RAG Pattern
Context from knowledge graph:
{graph_context}
Question: {question}
Based on the graph context above, answer:
Graph retrieval steps:
- Convert question to graph query
- Traverse relevant subgraphs
- Include relationship context in prompt
Graph Tools for Agents
Define tools from graph structure:
// Graph-defined tools
const tools = graph.nodes.map(node => ({
name: `query_${node.type}`,
description: `Query ${node.type} entities`,
parameters: { ... }
}));
Subagent Orchestration via Graph
// Route through graph
function orchestrate(query, graph) {
const relevant = graph.query(query);
const agent = selectAgent(relevant.type);
return agent.execute(query, relevant.context);
}
Hybrid RAG: Vector + Graph
| Approach | Best For |
|---|---|
| Vector only | Similarity search |
| Graph only | Relationship queries |
| Hybrid | Both similarity + relationships |
Execute both, combine results.
6. Graph Visualization
Choose visualization based on context.
Mermaid
For documentation, README files:
graph TD
A[User] --> B[Login]
B --> C[Dashboard]
C --> D[Query Graph]
D --> E[Results]
D3.js
For interactive web applications:
// D3 force-directed graph
const simulation = d3.forceSimulation(nodes)
.force('link', d3.forceLink(links).id(d => d.id))
.force('charge', d3.forceManyBody())
.force('center', d3.forceCenter(width / 2, height / 2));
Graphviz (DOT)
For static diagrams:
digraph {
User -> Login -> Dashboard
Dashboard -> Query
Query -> Graph
}
Selection Guide:
| Context | Recommended |
|---|---|
| Documentation | Mermaid |
| Web app | D3.js |
| Static analysis | Graphviz |
| CLI output | ASCII |
Process Summary
Step 1: Choose Storage
Start simple, upgrade as needed
Step 2: Extract Graph
- From documents → chunk + LLM extraction
- From code → AST parsing
- From existing data → schema extraction
Step 3: Build Incrementally
- Process documents
- Deduplicate entities
- Add relationships
- Update embeddings
Step 4: Integrate with LLM
- Graph RAG for retrieval
- Graph tools for agents
Step 5: Visualize
As needed for debugging/documentation
Common Mistakes
| Mistake | Reality |
|---|---|
| "Start with Neo4j" | Start in-memory, upgrade when needed |
| "Extract everything" | Focus on useful relationships |
| "Graph replaces vector" | Use hybrid approach |
| "One-time build" | Graphs need maintenance |
Verification
- Graph storage selected based on query patterns
- Algorithm chosen for actual questions
- Extraction working from primary sources
- Entity/relationship extraction prompts defined
- Graph RAG pattern implemented
- Visualization working for debugging
Anti-Rationalization Table
| Excuse | Counter |
|---|---|
| "Start with Neo4j" | Start in-memory, upgrade when needed. Premature infrastructure adds operational cost. |
| "Extract everything" | Focus on useful relationships. Over-extraction creates noise and slows queries. |
| "Graph replaces vector" | Use hybrid approach. Graph and vector complement each other, they do not compete. |
| "One-time build is enough" | Graphs need maintenance. Stale graphs produce stale answers. |
| "I don't need confidence scores on relationships" | Without confidence, you cannot filter low-quality edges. Scores enable quality control. |